Beginner Example
This demo walks through the full pipeline of feature selection, causal discovery, and causal reasoning. We will use the AFS module to perform feature selection, then pass the reduced dataset to CL for causal discovery, and finally use CRV for causal reasoning and validation on the learned causal model.
Note: Ensure that Cytoscape is open before running the visualization steps in Step 6.
Step 1: Import Required Modules
import pandas as pd
from ETIA.AFS import AFS
from ETIA.CausalLearning import CausalLearner
# Additional imports for visualization and path finding
from ETIA.CRV.visualization import Visualization # Visualization class provided
from ETIA.CRV.queries import one_potentially_directed_path # Function provided
from ETIA.CRV import find_adjset # Function provided
Step 2: Load Example Dataset
We start by loading the example dataset example_dataset.csv which contains several features and two target variables.
data = pd.read_csv('example_dataset.csv')
# Display the first few rows of the dataset
print("Original Dataset:")
print(data.head())
Step 3: Define Target Features
We define two target variables ('t1' and 't2') for which we want to perform feature selection and causal discovery.
target_features = {'t1': 'categorical', 't2': 'categorical'}
Step 4: Run Automated Feature Selection (AFS)
Now, we initialize the AFS module and run it on the dataset to select the most relevant features.
# Initialize the AFS module with depth 1
afs_instance = AFS(depth=1)
# Run AFS to select features for the target variables
afs_result = afs_instance.run_AFS(data=data, target_features=target_features)
# Display the selected features and the best configuration found
print("Selected Features by AFS:")
print(afs_result['selected_features'])
print("Best AFS Configuration:")
print(afs_result['best_config'])
# Extract the reduced dataset containing only the selected features and the target variables
reduced_data = afs_result['reduced_data']
Step 5: Run Causal Learner (CL)
Next, we use the CausalLearner (CL) to discover causal relationships between the selected features and the target variables. The reduced dataset from AFS is passed as input to CL.
# Initialize the CausalLearner with the reduced dataset
learner = CausalLearner(dataset_input=reduced_data)
# Run the causal discovery process
opt_conf, matrix_mec_graph, run_time, library_results = learner.learn_model()
# Display the results of causal discovery
print("Optimal Causal Discovery Configuration from CL:")
print(opt_conf)
print("MEC Matrix Graph (Markov Equivalence Class):")
print(matrix_mec_graph)
Step 6: Run Causal Reasoning Validator (CRV)
Finally, we use the Causal Reasoning Validator (CRV) to perform causal reasoning and validation on the learned causal model from CL.
Note: Ensure that Cytoscape is open before running this step, as the visualization requires Cytoscape to be running.
### Visualize the Causal Graph using Cytoscape
We use the Visualization class to send the graph to Cytoscape for visualization.
# Initialize the Visualization class with the adjacency matrix
visualization = Visualization(matrix_pd=matrix_mec_graph, net_name='CausalGraph', collection_name='CausalAnalysis')
# Plot the graph in Cytoscape
visualization.plot_cytoscape()
# Optionally, set a specific layout and export the visualization
visualization.set_layout(layout_name='force-directed')
visualization.export_to_png(file_path='causal_graph.png')
### Find a Path from a Variable to a Target Variable
We can find a potentially directed path from a variable to a target using the one_potentially_directed_path function.
# Define the variable names (ensure they exist in your dataset and graph)
source_variable = 'X1' # Replace with an actual variable name from your dataset
target_variable = 't1' # Target variable
# Get the adjacency matrix as a NumPy array
adjacency_matrix = matrix_mec_graph.values
node_names = list(matrix_mec_graph.columns)
node_indices = {name: idx for idx, name in enumerate(node_names)}
# Find one potentially directed path from source to target
path = one_potentially_directed_path(
matrix=adjacency_matrix,
start=node_indices[source_variable],
end=node_indices[target_variable]
)
if path:
path_variables = [node_names[idx] for idx in path]
print(f"\nA potentially directed path from {source_variable} to {target_variable}:")
print(" -> ".join(path_variables))
else:
print(f"\nNo potentially directed path found from {source_variable} to {target_variable}.")
### Compute the Adjustment Set
We compute the adjustment set for estimating the causal effect of the source variable on the target variable.
# Define the graph type (e.g., 'pag' for Partial Ancestral Graph)
graph_type = 'pag' # Adjust based on your graph's type
# Find the adjustment set using the provided function
adj_set_can, adj_set_min = find_adjset(
graph_pd=matrix_mec_graph,
graph_type=graph_type,
target_name=[target_variable],
exposure_names=[source_variable],
r_path='/path/to/Rscript' # Replace with the correct path
)
print(f"\nCanonical Adjustment Set for {source_variable} and {target_variable}:")
print(adj_set_can if adj_set_can else "No canonical adjustment set found.")
print(f"\nMinimal Adjustment Set for {source_variable} and {target_variable}:")
print(adj_set_min if adj_set_min else "No minimal adjustment set found.")
### Calculate Edge Confidence (Optional)
We can estimate the confidence of the edges in the causal graph by performing bootstrapping.
# Calculate edge consistency and similarity confidence
edge_consistency, edge_similarity = calculate_confidence(
dataset=learner.dataset,
opt_conf=opt_conf,
n_bootstraps=50 # Adjust the number of bootstraps as needed
)
print("\nEdge Consistency:")
print(edge_consistency)
print("\nEdge Similarity:")
print(edge_similarity)
Step 7: (Optional) Save Progress
You can save the progress of the experiment if needed.
learner.save_progress(path="causal_pipeline_progress.pkl")
# To load the saved progress later:
# learner = learner.load_progress(path="causal_pipeline_progress.pkl")
—
Explanation
### Overview
This example demonstrates the complete pipeline of using the AFS, CL, and CRV modules for causal analysis:
Feature Selection (AFS): Identifies the most relevant features for the target variables.
Causal Discovery (CL): Discovers causal relationships among the selected features.
Causal Reasoning and Validation (CRV): Validates the causal model, visualizes it, finds causal paths, and computes adjustment sets.
### Visualization with Cytoscape
Visualization Class: We use the
Visualizationclass to handle graph visualization in Cytoscape.Plotting: The
plot_cytoscapemethod sends the graph to Cytoscape for visualization.Layout and Export: Use
set_layoutandexport_to_pngto adjust the layout and save the visualization.
### Finding Paths
``one_potentially_directed_path`` Function: Searches for a potentially directed path from a start node to an end node in the causal graph.
Node Mapping: Maps node names to indices for processing and back to interpret the results.
### Computing Adjustment Sets
``find_adjset`` Function: Uses the
dagittyR package to compute adjustment sets for causal effect estimation.Parameters: -
graph_pd: The adjacency matrix as a pandas DataFrame. -graph_type: Type of the graph (e.g.,'dag','cpdag','mag','pag'). -target_name: The target variable. -exposure_names: The exposure variable(s). -r_path: Path to the Rscript executable.
### Calculating Edge Confidence
Bootstrap Methods: Functions like
bootstrapping_causal_graphandedge_metrics_on_bootstrapsestimate the confidence of edges via bootstrapping.Edge Consistency and Similarity: Metrics to assess the stability of the discovered causal relationships.
### Dependencies and Setup
Cytoscape: Ensure Cytoscape is installed and running.
R and dagitty: The
find_adjsetfunction requires R and thedagittypackage.Python Packages: Install required Python packages (e.g.,
py4cytoscape,numpy,pandas).
### Variable Names
Source and Target Variables: Replace
'X1'and't1'with actual variable names from your dataset.Node Names: Ensure node names in the adjacency matrix match those used in your dataset.
### Error Handling
Module Imports: Confirm all modules and functions are correctly imported.
Path Corrections: Update paths like
/path/to/Rscriptto correct locations on your system.Function Compatibility: Verify method compatibility with your module versions.
—
By following these steps, you can utilize the full pipeline provided by the AFS, CL, and CRV modules to perform comprehensive causal analysis on your dataset. This includes selecting relevant features, discovering causal structures, visualizing the causal graph, finding causal paths, computing adjustment sets, and assessing the confidence of causal relationships.