Beginner Example ================== This demo walks through the full pipeline of feature selection, causal discovery, and causal reasoning. We will use the AFS module to perform feature selection, then pass the reduced dataset to CL for causal discovery, and finally use CRV for causal reasoning and validation on the learned causal model. **Note:** Ensure that Cytoscape is open before running the visualization steps in Step 6. Step 1: Import Required Modules ------------------------------- .. code-block:: python import pandas as pd from ETIA.AFS import AFS from ETIA.CausalLearning import CausalLearner # Additional imports for visualization and path finding from ETIA.CRV.visualization import Visualization # Visualization class provided from ETIA.CRV.queries import one_potentially_directed_path # Function provided from ETIA.CRV import find_adjset # Function provided Step 2: Load Example Dataset ---------------------------- We start by loading the example dataset ``example_dataset.csv`` which contains several features and two target variables. .. code-block:: python data = pd.read_csv('example_dataset.csv') # Display the first few rows of the dataset print("Original Dataset:") print(data.head()) Step 3: Define Target Features ------------------------------ We define two target variables (``'t1'`` and ``'t2'``) for which we want to perform feature selection and causal discovery. .. code-block:: python target_features = {'t1': 'categorical', 't2': 'categorical'} Step 4: Run Automated Feature Selection (AFS) --------------------------------------------- Now, we initialize the AFS module and run it on the dataset to select the most relevant features. .. code-block:: python # Initialize the AFS module with depth 1 afs_instance = AFS(depth=1) # Run AFS to select features for the target variables afs_result = afs_instance.run_AFS(data=data, target_features=target_features) # Display the selected features and the best configuration found print("Selected Features by AFS:") print(afs_result['selected_features']) print("Best AFS Configuration:") print(afs_result['best_config']) # Extract the reduced dataset containing only the selected features and the target variables reduced_data = afs_result['reduced_data'] Step 5: Run Causal Learner (CL) ------------------------------- Next, we use the CausalLearner (CL) to discover causal relationships between the selected features and the target variables. The reduced dataset from AFS is passed as input to CL. .. code-block:: python # Initialize the CausalLearner with the reduced dataset learner = CausalLearner(dataset_input=reduced_data) # Run the causal discovery process opt_conf, matrix_mec_graph, run_time, library_results = learner.learn_model() # Display the results of causal discovery print("Optimal Causal Discovery Configuration from CL:") print(opt_conf) print("MEC Matrix Graph (Markov Equivalence Class):") print(matrix_mec_graph) Step 6: Run Causal Reasoning Validator (CRV) -------------------------------------------- Finally, we use the Causal Reasoning Validator (CRV) to perform causal reasoning and validation on the learned causal model from CL. **Note:** Ensure that Cytoscape is open before running this step, as the visualization requires Cytoscape to be running. ### Visualize the Causal Graph using Cytoscape We use the ``Visualization`` class to send the graph to Cytoscape for visualization. .. code-block:: python # Initialize the Visualization class with the adjacency matrix visualization = Visualization(matrix_pd=matrix_mec_graph, net_name='CausalGraph', collection_name='CausalAnalysis') # Plot the graph in Cytoscape visualization.plot_cytoscape() # Optionally, set a specific layout and export the visualization visualization.set_layout(layout_name='force-directed') visualization.export_to_png(file_path='causal_graph.png') ### Find a Path from a Variable to a Target Variable We can find a potentially directed path from a variable to a target using the ``one_potentially_directed_path`` function. .. code-block:: python # Define the variable names (ensure they exist in your dataset and graph) source_variable = 'X1' # Replace with an actual variable name from your dataset target_variable = 't1' # Target variable # Get the adjacency matrix as a NumPy array adjacency_matrix = matrix_mec_graph.values node_names = list(matrix_mec_graph.columns) node_indices = {name: idx for idx, name in enumerate(node_names)} # Find one potentially directed path from source to target path = one_potentially_directed_path( matrix=adjacency_matrix, start=node_indices[source_variable], end=node_indices[target_variable] ) if path: path_variables = [node_names[idx] for idx in path] print(f"\nA potentially directed path from {source_variable} to {target_variable}:") print(" -> ".join(path_variables)) else: print(f"\nNo potentially directed path found from {source_variable} to {target_variable}.") ### Compute the Adjustment Set We compute the adjustment set for estimating the causal effect of the source variable on the target variable. .. code-block:: python # Define the graph type (e.g., 'pag' for Partial Ancestral Graph) graph_type = 'pag' # Adjust based on your graph's type # Find the adjustment set using the provided function adj_set_can, adj_set_min = find_adjset( graph_pd=matrix_mec_graph, graph_type=graph_type, target_name=[target_variable], exposure_names=[source_variable], r_path='/path/to/Rscript' # Replace with the correct path ) print(f"\nCanonical Adjustment Set for {source_variable} and {target_variable}:") print(adj_set_can if adj_set_can else "No canonical adjustment set found.") print(f"\nMinimal Adjustment Set for {source_variable} and {target_variable}:") print(adj_set_min if adj_set_min else "No minimal adjustment set found.") ### Calculate Edge Confidence (Optional) We can estimate the confidence of the edges in the causal graph by performing bootstrapping. .. code-block:: python # Calculate edge consistency and similarity confidence edge_consistency, edge_similarity = calculate_confidence( dataset=learner.dataset, opt_conf=opt_conf, n_bootstraps=50 # Adjust the number of bootstraps as needed ) print("\nEdge Consistency:") print(edge_consistency) print("\nEdge Similarity:") print(edge_similarity) Step 7: (Optional) Save Progress -------------------------------- You can save the progress of the experiment if needed. .. code-block:: python learner.save_progress(path="causal_pipeline_progress.pkl") # To load the saved progress later: # learner = learner.load_progress(path="causal_pipeline_progress.pkl") --- Explanation ----------- ### Overview This example demonstrates the complete pipeline of using the AFS, CL, and CRV modules for causal analysis: 1. **Feature Selection (AFS)**: Identifies the most relevant features for the target variables. 2. **Causal Discovery (CL)**: Discovers causal relationships among the selected features. 3. **Causal Reasoning and Validation (CRV)**: Validates the causal model, visualizes it, finds causal paths, and computes adjustment sets. ### Visualization with Cytoscape - **Visualization Class**: We use the ``Visualization`` class to handle graph visualization in Cytoscape. - **Plotting**: The ``plot_cytoscape`` method sends the graph to Cytoscape for visualization. - **Layout and Export**: Use ``set_layout`` and ``export_to_png`` to adjust the layout and save the visualization. ### Finding Paths - **``one_potentially_directed_path`` Function**: Searches for a potentially directed path from a start node to an end node in the causal graph. - **Node Mapping**: Maps node names to indices for processing and back to interpret the results. ### Computing Adjustment Sets - **``find_adjset`` Function**: Uses the ``dagitty`` R package to compute adjustment sets for causal effect estimation. - **Parameters**: - ``graph_pd``: The adjacency matrix as a pandas DataFrame. - ``graph_type``: Type of the graph (e.g., ``'dag'``, ``'cpdag'``, ``'mag'``, ``'pag'``). - ``target_name``: The target variable. - ``exposure_names``: The exposure variable(s). - ``r_path``: Path to the Rscript executable. ### Calculating Edge Confidence - **Bootstrap Methods**: Functions like ``bootstrapping_causal_graph`` and ``edge_metrics_on_bootstraps`` estimate the confidence of edges via bootstrapping. - **Edge Consistency and Similarity**: Metrics to assess the stability of the discovered causal relationships. ### Dependencies and Setup - **Cytoscape**: Ensure Cytoscape is installed and running. - **R and dagitty**: The ``find_adjset`` function requires R and the ``dagitty`` package. - **Python Packages**: Install required Python packages (e.g., ``py4cytoscape``, ``numpy``, ``pandas``). ### Variable Names - **Source and Target Variables**: Replace ``'X1'`` and ``'t1'`` with actual variable names from your dataset. - **Node Names**: Ensure node names in the adjacency matrix match those used in your dataset. ### Error Handling - **Module Imports**: Confirm all modules and functions are correctly imported. - **Path Corrections**: Update paths like ``/path/to/Rscript`` to correct locations on your system. - **Function Compatibility**: Verify method compatibility with your module versions. --- By following these steps, you can utilize the full pipeline provided by the AFS, CL, and CRV modules to perform comprehensive causal analysis on your dataset. This includes selecting relevant features, discovering causal structures, visualizing the causal graph, finding causal paths, computing adjustment sets, and assessing the confidence of causal relationships.