Beginner Example

This demo walks through the full pipeline of feature selection, causal discovery, and causal reasoning. We will use the AFS module to perform feature selection, then pass the reduced dataset to CL for causal discovery, and finally use CRV for causal reasoning and validation on the learned causal model.

Note: Ensure that Cytoscape is open before running the visualization steps in Step 6.

Step 1: Import Required Modules

import pandas as pd
from ETIA.AFS import AFS
from ETIA.CausalLearning import CausalLearner

# Additional imports for visualization and path finding
from ETIA.CRV.visualization import Visualization  # Visualization class provided
from ETIA.CRV.queries import one_potentially_directed_path  # Function provided
from ETIA.CRV import find_adjset  # Function provided

Step 2: Load Example Dataset

We start by loading the example dataset example_dataset.csv which contains several features and two target variables.

data = pd.read_csv('example_dataset.csv')

# Display the first few rows of the dataset
print("Original Dataset:")
print(data.head())

Step 3: Define Target Features

We define two target variables ('t1' and 't2') for which we want to perform feature selection and causal discovery.

target_features = {'t1': 'categorical', 't2': 'categorical'}

Step 4: Run Automated Feature Selection (AFS)

Now, we initialize the AFS module and run it on the dataset to select the most relevant features.

# Initialize the AFS module with depth 1
afs_instance = AFS(depth=1)

# Run AFS to select features for the target variables
afs_result = afs_instance.run_AFS(data=data, target_features=target_features)

# Display the selected features and the best configuration found
print("Selected Features by AFS:")
print(afs_result['selected_features'])

print("Best AFS Configuration:")
print(afs_result['best_config'])

# Extract the reduced dataset containing only the selected features and the target variables
reduced_data = afs_result['reduced_data']

Step 5: Run Causal Learner (CL)

Next, we use the CausalLearner (CL) to discover causal relationships between the selected features and the target variables. The reduced dataset from AFS is passed as input to CL.

# Initialize the CausalLearner with the reduced dataset
learner = CausalLearner(dataset_input=reduced_data)

# Run the causal discovery process
opt_conf, matrix_mec_graph, run_time, library_results = learner.learn_model()

# Display the results of causal discovery
print("Optimal Causal Discovery Configuration from CL:")
print(opt_conf)

print("MEC Matrix Graph (Markov Equivalence Class):")
print(matrix_mec_graph)

Step 6: Run Causal Reasoning Validator (CRV)

Finally, we use the Causal Reasoning Validator (CRV) to perform causal reasoning and validation on the learned causal model from CL.

Note: Ensure that Cytoscape is open before running this step, as the visualization requires Cytoscape to be running.

### Visualize the Causal Graph using Cytoscape

We use the Visualization class to send the graph to Cytoscape for visualization.

# Initialize the Visualization class with the adjacency matrix
visualization = Visualization(matrix_pd=matrix_mec_graph, net_name='CausalGraph', collection_name='CausalAnalysis')

# Plot the graph in Cytoscape
visualization.plot_cytoscape()

# Optionally, set a specific layout and export the visualization
visualization.set_layout(layout_name='force-directed')
visualization.export_to_png(file_path='causal_graph.png')

### Find a Path from a Variable to a Target Variable

We can find a potentially directed path from a variable to a target using the one_potentially_directed_path function.

# Define the variable names (ensure they exist in your dataset and graph)
source_variable = 'X1'  # Replace with an actual variable name from your dataset
target_variable = 't1'  # Target variable

# Get the adjacency matrix as a NumPy array
adjacency_matrix = matrix_mec_graph.values
node_names = list(matrix_mec_graph.columns)
node_indices = {name: idx for idx, name in enumerate(node_names)}

# Find one potentially directed path from source to target
path = one_potentially_directed_path(
    matrix=adjacency_matrix,
    start=node_indices[source_variable],
    end=node_indices[target_variable]
)

if path:
    path_variables = [node_names[idx] for idx in path]
    print(f"\nA potentially directed path from {source_variable} to {target_variable}:")
    print(" -> ".join(path_variables))
else:
    print(f"\nNo potentially directed path found from {source_variable} to {target_variable}.")

### Compute the Adjustment Set

We compute the adjustment set for estimating the causal effect of the source variable on the target variable.

# Define the graph type (e.g., 'pag' for Partial Ancestral Graph)
graph_type = 'pag'  # Adjust based on your graph's type

# Find the adjustment set using the provided function
adj_set_can, adj_set_min = find_adjset(
    graph_pd=matrix_mec_graph,
    graph_type=graph_type,
    target_name=[target_variable],
    exposure_names=[source_variable],
    r_path='/path/to/Rscript'  # Replace with the correct path
)

print(f"\nCanonical Adjustment Set for {source_variable} and {target_variable}:")
print(adj_set_can if adj_set_can else "No canonical adjustment set found.")

print(f"\nMinimal Adjustment Set for {source_variable} and {target_variable}:")
print(adj_set_min if adj_set_min else "No minimal adjustment set found.")

### Calculate Edge Confidence (Optional)

We can estimate the confidence of the edges in the causal graph by performing bootstrapping.

# Calculate edge consistency and similarity confidence
edge_consistency, edge_similarity = calculate_confidence(
    dataset=learner.dataset,
    opt_conf=opt_conf,
    n_bootstraps=50  # Adjust the number of bootstraps as needed
)

print("\nEdge Consistency:")
print(edge_consistency)

print("\nEdge Similarity:")
print(edge_similarity)

Step 7: (Optional) Save Progress

You can save the progress of the experiment if needed.

learner.save_progress(path="causal_pipeline_progress.pkl")

# To load the saved progress later:
# learner = learner.load_progress(path="causal_pipeline_progress.pkl")

Explanation

### Overview

This example demonstrates the complete pipeline of using the AFS, CL, and CRV modules for causal analysis:

  1. Feature Selection (AFS): Identifies the most relevant features for the target variables.

  2. Causal Discovery (CL): Discovers causal relationships among the selected features.

  3. Causal Reasoning and Validation (CRV): Validates the causal model, visualizes it, finds causal paths, and computes adjustment sets.

### Visualization with Cytoscape

  • Visualization Class: We use the Visualization class to handle graph visualization in Cytoscape.

  • Plotting: The plot_cytoscape method sends the graph to Cytoscape for visualization.

  • Layout and Export: Use set_layout and export_to_png to adjust the layout and save the visualization.

### Finding Paths

  • ``one_potentially_directed_path`` Function: Searches for a potentially directed path from a start node to an end node in the causal graph.

  • Node Mapping: Maps node names to indices for processing and back to interpret the results.

### Computing Adjustment Sets

  • ``find_adjset`` Function: Uses the dagitty R package to compute adjustment sets for causal effect estimation.

  • Parameters: - graph_pd: The adjacency matrix as a pandas DataFrame. - graph_type: Type of the graph (e.g., 'dag', 'cpdag', 'mag', 'pag'). - target_name: The target variable. - exposure_names: The exposure variable(s). - r_path: Path to the Rscript executable.

### Calculating Edge Confidence

  • Bootstrap Methods: Functions like bootstrapping_causal_graph and edge_metrics_on_bootstraps estimate the confidence of edges via bootstrapping.

  • Edge Consistency and Similarity: Metrics to assess the stability of the discovered causal relationships.

### Dependencies and Setup

  • Cytoscape: Ensure Cytoscape is installed and running.

  • R and dagitty: The find_adjset function requires R and the dagitty package.

  • Python Packages: Install required Python packages (e.g., py4cytoscape, numpy, pandas).

### Variable Names

  • Source and Target Variables: Replace 'X1' and 't1' with actual variable names from your dataset.

  • Node Names: Ensure node names in the adjacency matrix match those used in your dataset.

### Error Handling

  • Module Imports: Confirm all modules and functions are correctly imported.

  • Path Corrections: Update paths like /path/to/Rscript to correct locations on your system.

  • Function Compatibility: Verify method compatibility with your module versions.

By following these steps, you can utilize the full pipeline provided by the AFS, CL, and CRV modules to perform comprehensive causal analysis on your dataset. This includes selecting relevant features, discovering causal structures, visualizing the causal graph, finding causal paths, computing adjustment sets, and assessing the confidence of causal relationships.