ETIA.CausalLearning package
- class CausalLearner(dataset_input: str | Dataset | None = None, configurations: Configurations | None = None, verbose: bool = False, n_jobs: int | None = None, random_seed: int | None = None)[source]
Bases:
objectCausalLearner class for automated causal discovery.
- Parameters:
dataset_input (str or Dataset) – Either a file path to the dataset or a Dataset instance containing the data.
configurations (Configurations, optional) – A Configurations object containing experiment configurations. If None, default configurations are used.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
n_jobs (int, optional) – Number of jobs for parallel processing. Default is the number of CPU cores.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.
- add_configurations_from_file(filename)[source]
Adds additional configurations to the experiment from a JSON file.
- get_best_model_between_algorithms(algorithms)[source]
Gets the best model between specified algorithms.
- get_best_model_between_family(**kwargs)[source]
Gets the best model within a family of algorithms based on specified criteria.
- learn_model()[source]
Runs the causal discovery process using the OCT algorithm.
- Returns:
opt_conf: The optimal configuration found.
matrix_mec_graph: The MEC graph matrix.
matrix_graph: The graph matrix
run_time: The runtime of the CDHPO process.
library_results: Results from the causal discovery libraries.
- Return type:
Tuple containing
- print_results(opt_conf=None)[source]
Prints the results of the causal discovery process.
- Parameters:
opt_conf (dict, optional) – The optimal configuration to print. If None, uses self.opt_conf.
- set_dataset(dataset)[source]
Sets the dataset for the causal learner.
- Parameters:
dataset (Dataset) – The Dataset object to set.
- Raises:
TypeError – If dataset is not of type Dataset.
- set_configurations(configurations)[source]
Sets the configurations for the causal learner.
- Parameters:
configurations (Configurations) – The Configurations object to set.
- Raises:
TypeError – If configurations is not of type Configurations.
- save_progress(path=None)[source]
Saves the progress of the experiment to a file.
- Parameters:
path (str, optional) – The file path to save the progress to. If None, saves to ‘Experiment.pkl’ in results_folder.
- static load_progress(path)[source]
Loads the progress of the experiment from a file.
- Parameters:
path (str) – The file path to load the progress from.
- Returns:
The loaded CausalLearner object.
- Return type:
- add_configurations_from_file(filename)[source]
Adds additional configurations to the experiment from a JSON file.
- Parameters:
filename (str) – The filename of the JSON file containing configurations.
- get_best_model_between_algorithms(algorithms)[source]
Gets the best model between specified algorithms.
- Parameters:
algorithms (list) – A list of algorithm names to consider.
- Returns:
The best configuration among the specified algorithms.
- Return type:
dict
- get_best_model_between_family(causal_sufficiency=None, assume_faithfulness=None, is_output_mec=None, accepts_missing_values=None)[source]
Gets the best model within a family of algorithms based on specified criteria.
- Parameters:
causal_sufficiency (bool, optional) – Filter algorithms that admit latent variables.
assume_faithfulness (bool, optional) – Filter algorithms based on faithfulness assumption.
is_output_mec (bool, optional) – Filter algorithms that output MEC graphs.
accepts_missing_values (bool, optional) – Filter algorithms that accept missing values.
- Returns:
The best configuration among the filtered algorithms.
- Return type:
dict
- class Configurations(dataset: Dataset | None = None, n_lags: int = 0, time_lagged: bool = False, time_series: bool = False, conf_file: str | None = None, n_jobs: int | None = -1, verbose=False)[source]
Bases:
objectConfigurations class for setting up the causal discovery experiment.
- Parameters:
dataset (Dataset) – The dataset object.
n_lags (int, optional) – Number of lags (for time series).
time_lagged (bool, optional) – Indicates if the data is time-lagged.
time_series (bool, optional) – Indicates if the dataset is time series data.
conf_file (str, optional) – JSON configuration file containing parameters for the causal discovery experiment.
n_jobs (int, optional) – Number of jobs to use for parallel processing.
verbose (bool, optional) – Whether to print debug information.
- cdhpo_params
Parameters for the CDHPO algorithm.
- Type:
CDHPOParameters
- results_folder
Folder path for storing results.
- Type:
str
- set_default_configuration()[source]
Set default configurations based on the dataset when no JSON configuration file is provided.
- class MVP_ProtocolBase[source]
Bases:
objectA base class for running protocols for causal discovery algorithms.
This class provides the foundation for implementing various protocols to evaluate causal discovery algorithms. Derived classes should implement specific protocols (e.g., KFoldCV, Holdout). This class should not be instantiated directly.
- run_protocol(data, algorithm, parameters, n_jobs=1)[source]
Runs the protocol and returns the results in array format.
- set_params(parameters)[source]
Sets the parameters of the protocol.
- Parameters:
parameters (dict) – A dictionary containing the protocol-specific parameters to set. Each key corresponds to a parameter name and its value defines the parameter’s value.
- Return type:
None
- run_protocol(data, algorithm, parameters, n_jobs=1)[source]
Runs the protocol using the specified causal discovery algorithm and dataset.
- Parameters:
data (Any) – The dataset on which to run the causal discovery algorithm. Can be in various formats (e.g., pandas DataFrame).
algorithm (Any) – The causal discovery algorithm to evaluate within the protocol.
parameters (dict) – A dictionary of parameters for both the protocol and the algorithm.
n_jobs (int, optional) – The number of parallel jobs to run during the evaluation. Default is 1.
- Returns:
The results of the protocol in array format, which may vary based on the specific implementation.
- Return type:
Any
- class KFoldCV[source]
Bases:
MVP_ProtocolBaseClass implementing a K-Fold Cross-Validation protocol for running a causal discovery algorithm.
- folds
Number of folds to be used in the cross-validation. Default is 10.
- Type:
int
- folds_to_run
Number of folds to run the cross-validation for. Default is 1.
- Type:
int
- train_indexes
A list of indexes for the training samples.
- Type:
list of int
- test_indexes
A list of indexes for the test samples.
- Type:
list of int
- data_train
A list of training data samples for each fold.
- Type:
list of pd.DataFrame
- data_test
A list of test data samples for each fold.
- Type:
list of pd.DataFrame
- set_params(parameters, verbose=False)[source]
Set the number of folds and the number of folds to run the protocol for.
- run_cd_algorithm(data, algorithm, parameters, fold)[source]
Run the causal discovery algorithm on the specified fold.
- run_protocol(data, algorithm, parameters, n_jobs=1)[source]
Run the K-Fold cross-validation protocol.
- set_params(parameters, verbose=False)[source]
Set the number of folds and the number of folds to run the protocol for.
- Parameters:
parameters (dict) – A dictionary of parameters, including the number of folds and the number of folds to run.
verbose (bool, optional) – If True, enables detailed logging. Default is False.
- run_cd_algorithm(data, algorithm, parameters, fold)[source]
Run the causal discovery algorithm on the specified fold.
- Parameters:
data (pd.DataFrame) – The dataset on which to run the causal discovery algorithm.
algorithm (object) – The causal discovery algorithm to be used.
parameters (dict) – A dictionary of parameters to pass to the algorithm.
fold (int) – The current fold number for which to run the algorithm.
- Returns:
A list containing the MEC graph and library results produced by the causal discovery algorithm.
- Return type:
list of np.ndarray
- init_protocol(data)[source]
Initialize the K-Fold protocol by splitting the data into training and test sets for each fold.
- Parameters:
data (pd.DataFrame) – The dataset to be used for the cross-validation.
- run_protocol(data, algorithm, parameters, n_jobs=1)[source]
Run the K-Fold cross-validation protocol with the specified causal discovery algorithm.
- Parameters:
data (pd.DataFrame) – The dataset on which to run the algorithm.
algorithm (object) – The causal discovery algorithm to use.
parameters (dict) – A dictionary of parameters to be passed to the algorithm.
n_jobs (int, optional) – The number of CPU cores to use for parallel computation. Default is 1.
- Returns:
A list containing the results of the protocol, with the MEC graphs and other results.
- Return type:
list of np.ndarray
- class RandomForestRegressor_[source]
Bases:
objectWrapper class for setting up a RandomForestRegressor model with custom parameters.
- set_regressor_params(parameters)[source]
Configures and returns a RandomForestRegressor object with the specified parameters.
- set_regressor_params(parameters)[source]
Configures and returns a RandomForestRegressor object with the specified parameters.
- Parameters:
parameters (dict) –
- A dictionary containing the following keys:
’n_trees’: int, The number of trees in the forest.
’min_samples_leaf’: int or float, The minimum number of samples required to be at a leaf node.
’max_depth’: int, The maximum depth of the tree.
- Returns:
A RandomForestRegressor object configured with the specified parameters.
- Return type:
RandomForestRegressor
Examples
>>> params = {'n_trees': 100, 'min_samples_leaf': 0.1, 'max_depth': 10} >>> regressor = RandomForestRegressor_().set_regressor_params(params) >>> print(regressor) RandomForestRegressor(max_depth=10, min_samples_leaf=0.1)
- class LinearRegression_[source]
Bases:
objectWrapper class for setting up a LinearRegression model with custom parameters.
- set_regressor_params(parameters)[source]
Configures and returns a LinearRegression object. Currently, LinearRegression does not require parameters in this method.
- set_regressor_params(parameters)[source]
Configures and returns a LinearRegression object.
- Parameters:
parameters (dict) – A dictionary containing the model parameters (though LinearRegression does not currently use parameters in this implementation).
- Returns:
A LinearRegression object configured with default parameters.
- Return type:
LinearRegression
Examples
>>> params = {} >>> regressor = LinearRegression_().set_regressor_params(params) >>> print(regressor) LinearRegression()
Subpackages
Submodules
ETIA.CausalLearning.CausalLearner module
- class CausalLearner(dataset_input: str | Dataset | None = None, configurations: Configurations | None = None, verbose: bool = False, n_jobs: int | None = None, random_seed: int | None = None)[source]
Bases:
objectCausalLearner class for automated causal discovery.
- Parameters:
dataset_input (str or Dataset) – Either a file path to the dataset or a Dataset instance containing the data.
configurations (Configurations, optional) – A Configurations object containing experiment configurations. If None, default configurations are used.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
n_jobs (int, optional) – Number of jobs for parallel processing. Default is the number of CPU cores.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.
- add_configurations_from_file(filename)[source]
Adds additional configurations to the experiment from a JSON file.
- get_best_model_between_algorithms(algorithms)[source]
Gets the best model between specified algorithms.
- get_best_model_between_family(**kwargs)[source]
Gets the best model within a family of algorithms based on specified criteria.
- learn_model()[source]
Runs the causal discovery process using the OCT algorithm.
- Returns:
opt_conf: The optimal configuration found.
matrix_mec_graph: The MEC graph matrix.
matrix_graph: The graph matrix
run_time: The runtime of the CDHPO process.
library_results: Results from the causal discovery libraries.
- Return type:
Tuple containing
- print_results(opt_conf=None)[source]
Prints the results of the causal discovery process.
- Parameters:
opt_conf (dict, optional) – The optimal configuration to print. If None, uses self.opt_conf.
- set_dataset(dataset)[source]
Sets the dataset for the causal learner.
- Parameters:
dataset (Dataset) – The Dataset object to set.
- Raises:
TypeError – If dataset is not of type Dataset.
- set_configurations(configurations)[source]
Sets the configurations for the causal learner.
- Parameters:
configurations (Configurations) – The Configurations object to set.
- Raises:
TypeError – If configurations is not of type Configurations.
- save_progress(path=None)[source]
Saves the progress of the experiment to a file.
- Parameters:
path (str, optional) – The file path to save the progress to. If None, saves to ‘Experiment.pkl’ in results_folder.
- static load_progress(path)[source]
Loads the progress of the experiment from a file.
- Parameters:
path (str) – The file path to load the progress from.
- Returns:
The loaded CausalLearner object.
- Return type:
- add_configurations_from_file(filename)[source]
Adds additional configurations to the experiment from a JSON file.
- Parameters:
filename (str) – The filename of the JSON file containing configurations.
- get_best_model_between_algorithms(algorithms)[source]
Gets the best model between specified algorithms.
- Parameters:
algorithms (list) – A list of algorithm names to consider.
- Returns:
The best configuration among the specified algorithms.
- Return type:
dict
- get_best_model_between_family(causal_sufficiency=None, assume_faithfulness=None, is_output_mec=None, accepts_missing_values=None)[source]
Gets the best model within a family of algorithms based on specified criteria.
- Parameters:
causal_sufficiency (bool, optional) – Filter algorithms that admit latent variables.
assume_faithfulness (bool, optional) – Filter algorithms based on faithfulness assumption.
is_output_mec (bool, optional) – Filter algorithms that output MEC graphs.
accepts_missing_values (bool, optional) – Filter algorithms that accept missing values.
- Returns:
The best configuration among the filtered algorithms.
- Return type:
dict