ETIA.CausalLearning package

Bases: object

CausalLearner class for automated causal discovery.

Parameters:

dataset_input (str or Dataset) – Either a file path to the dataset or a Dataset instance containing the data.
configurations (Configurations, optional) – A Configurations object containing experiment configurations. If None, default configurations are used.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
n_jobs (int, optional) – Number of jobs for parallel processing. Default is the number of CPU cores.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.

learn_model()[source]: Runs the causal discovery process.

print_results(opt_conf=None)[source]: Prints the results of the causal discovery process.

set_dataset(dataset)[source]: Sets the dataset for the causal learner.

set_configurations(configurations)[source]: Sets the configurations for the causal learner.

save_progress(path=None)[source]: Saves the progress of the experiment to a file.

load_progress(path)[source]: Loads the progress of the experiment from a file.

add_configurations_from_file(filename)[source]: Adds additional configurations to the experiment from a JSON file.

update_learnt_model()[source]: Updates the learnt model with new configurations.

get_best_model_between_algorithms(algorithms)[source]: Gets the best model between specified algorithms.

get_best_model_between_family(**kwargs)[source]: Gets the best model within a family of algorithms based on specified criteria.

learn_model()[source]

Runs the causal discovery process using the OCT algorithm.

Returns:

opt_conf: The optimal configuration found.
matrix_mec_graph: The MEC graph matrix.
matrix_graph: The graph matrix
run_time: The runtime of the CDHPO process.
library_results: Results from the causal discovery libraries.

Return type:

Tuple containing

print_results(opt_conf=None)[source]

Prints the results of the causal discovery process.

Parameters:: opt_conf (dict, optional) – The optimal configuration to print. If None, uses self.opt_conf.

set_dataset(dataset)[source]

Sets the dataset for the causal learner.

Parameters:: dataset (Dataset) – The Dataset object to set.
Raises:: TypeError – If dataset is not of type Dataset.

set_configurations(configurations)[source]

Sets the configurations for the causal learner.

Parameters:: configurations (Configurations) – The Configurations object to set.
Raises:: TypeError – If configurations is not of type Configurations.

save_progress(path=None)[source]

Saves the progress of the experiment to a file.

Parameters:: path (str, optional) – The file path to save the progress to. If None, saves to ‘Experiment.pkl’ in results_folder.

static load_progress(path)[source]

Loads the progress of the experiment from a file.

Parameters:: path (str) – The file path to load the progress from.
Returns:: The loaded CausalLearner object.
Return type:: CausalLearner

add_configurations_from_file(filename)[source]

Adds additional configurations to the experiment from a JSON file.

Parameters:: filename (str) – The filename of the JSON file containing configurations.

update_learnt_model()[source]: Updates the learnt model with the new configurations.

get_best_model_between_algorithms(algorithms)[source]

Gets the best model between specified algorithms.

Parameters:: algorithms (list) – A list of algorithm names to consider.
Returns:: The best configuration among the specified algorithms.
Return type:: dict

get_best_model_between_family(causal_sufficiency=None, assume_faithfulness=None, is_output_mec=None, accepts_missing_values=None)[source]

Gets the best model within a family of algorithms based on specified criteria.

Parameters:

causal_sufficiency (bool, optional) – Filter algorithms that admit latent variables.
assume_faithfulness (bool, optional) – Filter algorithms based on faithfulness assumption.
is_output_mec (bool, optional) – Filter algorithms that output MEC graphs.
accepts_missing_values (bool, optional) – Filter algorithms that accept missing values.

Returns:

The best configuration among the filtered algorithms.

Return type:

dict

class Configurations(dataset: Dataset | None = None, n_lags: int = 0, time_lagged: bool = False, time_series: bool = False, conf_file: str | None = None, n_jobs: int | None = -1, verbose=False)[source]

Bases: object

Configurations class for setting up the causal discovery experiment.

Parameters:

dataset (Dataset) – The dataset object.
n_lags (int, optional) – Number of lags (for time series).
time_lagged (bool, optional) – Indicates if the data is time-lagged.
time_series (bool, optional) – Indicates if the dataset is time series data.
conf_file (str, optional) – JSON configuration file containing parameters for the causal discovery experiment.
n_jobs (int, optional) – Number of jobs to use for parallel processing.
verbose (bool, optional) – Whether to print debug information.

cdhpo_params

Parameters for the CDHPO algorithm.

Type:: CDHPOParameters

results_folder

Folder path for storing results.

Type:: str

set_default_configuration()[source]: Set default configurations based on the dataset when no JSON configuration file is provided.

process_conf_file()[source]: Process the JSON file containing all vital information, such as algorithms, algorithm parameters, run mode, etc.

add_configurations_from_file(filename: str) → None[source]

Add additional configurations to the experiment from a JSON file.

Parameters:: filename (str) – The filename of the JSON file containing configurations.

class MVP_ProtocolBase[source]

Bases: object

A base class for running protocols for causal discovery algorithms.

This class provides the foundation for implementing various protocols to evaluate causal discovery algorithms. Derived classes should implement specific protocols (e.g., KFoldCV, Holdout). This class should not be instantiated directly.

set_params(parameters)[source]: Sets the parameters for the protocol.

run_protocol(data, algorithm, parameters, n_jobs=1)[source]: Runs the protocol and returns the results in array format.

set_params(parameters)[source]

Sets the parameters of the protocol.

Parameters:: parameters (dict) – A dictionary containing the protocol-specific parameters to set. Each key corresponds to a parameter name and its value defines the parameter’s value.
Return type:: None

run_protocol(data, algorithm, parameters, n_jobs=1)[source]

Runs the protocol using the specified causal discovery algorithm and dataset.

Parameters:

data (Any) – The dataset on which to run the causal discovery algorithm. Can be in various formats (e.g., pandas DataFrame).
algorithm (Any) – The causal discovery algorithm to evaluate within the protocol.
parameters (dict) – A dictionary of parameters for both the protocol and the algorithm.
n_jobs (int, optional) – The number of parallel jobs to run during the evaluation. Default is 1.

Returns:

The results of the protocol in array format, which may vary based on the specific implementation.

Return type:

Any

class KFoldCV[source]

Bases: MVP_ProtocolBase

Class implementing a K-Fold Cross-Validation protocol for running a causal discovery algorithm.

folds

Number of folds to be used in the cross-validation. Default is 10.

Type:: int

folds_to_run

Number of folds to run the cross-validation for. Default is 1.

Type:: int

train_indexes

A list of indexes for the training samples.

Type:: list of int

test_indexes

A list of indexes for the test samples.

Type:: list of int

data_train

A list of training data samples for each fold.

Type:: list of pd.DataFrame

data_test

A list of test data samples for each fold.

Type:: list of pd.DataFrame

set_params(parameters, verbose=False)[source]: Set the number of folds and the number of folds to run the protocol for.

run_cd_algorithm(data, algorithm, parameters, fold)[source]: Run the causal discovery algorithm on the specified fold.

init_protocol(data)[source]: Initialize the K-Fold protocol.

run_protocol(data, algorithm, parameters, n_jobs=1)[source]: Run the K-Fold cross-validation protocol.

set_params(parameters, verbose=False)[source]

Set the number of folds and the number of folds to run the protocol for.

Parameters:

parameters (dict) – A dictionary of parameters, including the number of folds and the number of folds to run.
verbose (bool, optional) – If True, enables detailed logging. Default is False.

run_cd_algorithm(data, algorithm, parameters, fold)[source]

Run the causal discovery algorithm on the specified fold.

Parameters:

data (pd.DataFrame) – The dataset on which to run the causal discovery algorithm.
algorithm (object) – The causal discovery algorithm to be used.
parameters (dict) – A dictionary of parameters to pass to the algorithm.
fold (int) – The current fold number for which to run the algorithm.

Returns:

A list containing the MEC graph and library results produced by the causal discovery algorithm.

Return type:

list of np.ndarray

init_protocol(data)[source]

Initialize the K-Fold protocol by splitting the data into training and test sets for each fold.

Parameters:: data (pd.DataFrame) – The dataset to be used for the cross-validation.

run_protocol(data, algorithm, parameters, n_jobs=1)[source]

Run the K-Fold cross-validation protocol with the specified causal discovery algorithm.

Parameters:

data (pd.DataFrame) – The dataset on which to run the algorithm.
algorithm (object) – The causal discovery algorithm to use.
parameters (dict) – A dictionary of parameters to be passed to the algorithm.
n_jobs (int, optional) – The number of CPU cores to use for parallel computation. Default is 1.

Returns:

A list containing the results of the protocol, with the MEC graphs and other results.

Return type:

list of np.ndarray

class RandomForestRegressor_[source]

Bases: object

Wrapper class for setting up a RandomForestRegressor model with custom parameters.

set_regressor_params(parameters)[source]: Configures and returns a RandomForestRegressor object with the specified parameters.

set_regressor_params(parameters)[source]

Configures and returns a RandomForestRegressor object with the specified parameters.

Parameters:

parameters (dict) –

A dictionary containing the following keys:

’n_trees’: int, The number of trees in the forest.
’min_samples_leaf’: int or float, The minimum number of samples required to be at a leaf node.
’max_depth’: int, The maximum depth of the tree.

Returns:

A RandomForestRegressor object configured with the specified parameters.

Return type:

RandomForestRegressor

Examples

>>> params = {'n_trees': 100, 'min_samples_leaf': 0.1, 'max_depth': 10}
>>> regressor = RandomForestRegressor_().set_regressor_params(params)
>>> print(regressor)
RandomForestRegressor(max_depth=10, min_samples_leaf=0.1)

class LinearRegression_[source]

Bases: object

Wrapper class for setting up a LinearRegression model with custom parameters.

set_regressor_params(parameters)[source]: Configures and returns a LinearRegression object. Currently, LinearRegression does not require parameters in this method.

set_regressor_params(parameters)[source]

Configures and returns a LinearRegression object.

Parameters:: parameters (dict) – A dictionary containing the model parameters (though LinearRegression does not currently use parameters in this implementation).
Returns:: A LinearRegression object configured with default parameters.
Return type:: LinearRegression

Examples

>>> params = {}
>>> regressor = LinearRegression_().set_regressor_params(params)
>>> print(regressor)
LinearRegression()

Subpackages

Submodules

ETIA.CausalLearning.CausalLearner module

Bases: object

CausalLearner class for automated causal discovery.

Parameters:

dataset_input (str or Dataset) – Either a file path to the dataset or a Dataset instance containing the data.
configurations (Configurations, optional) – A Configurations object containing experiment configurations. If None, default configurations are used.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
n_jobs (int, optional) – Number of jobs for parallel processing. Default is the number of CPU cores.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.

learn_model()[source]: Runs the causal discovery process.

print_results(opt_conf=None)[source]: Prints the results of the causal discovery process.

set_dataset(dataset)[source]: Sets the dataset for the causal learner.

set_configurations(configurations)[source]: Sets the configurations for the causal learner.

save_progress(path=None)[source]: Saves the progress of the experiment to a file.

load_progress(path)[source]: Loads the progress of the experiment from a file.

add_configurations_from_file(filename)[source]: Adds additional configurations to the experiment from a JSON file.

update_learnt_model()[source]: Updates the learnt model with new configurations.

get_best_model_between_algorithms(algorithms)[source]: Gets the best model between specified algorithms.

get_best_model_between_family(**kwargs)[source]: Gets the best model within a family of algorithms based on specified criteria.

learn_model()[source]

Runs the causal discovery process using the OCT algorithm.

Returns:

opt_conf: The optimal configuration found.
matrix_mec_graph: The MEC graph matrix.
matrix_graph: The graph matrix
run_time: The runtime of the CDHPO process.
library_results: Results from the causal discovery libraries.

Return type:

Tuple containing

print_results(opt_conf=None)[source]

Prints the results of the causal discovery process.

Parameters:: opt_conf (dict, optional) – The optimal configuration to print. If None, uses self.opt_conf.

set_dataset(dataset)[source]

Sets the dataset for the causal learner.

Parameters:: dataset (Dataset) – The Dataset object to set.
Raises:: TypeError – If dataset is not of type Dataset.

set_configurations(configurations)[source]

Sets the configurations for the causal learner.

Parameters:: configurations (Configurations) – The Configurations object to set.
Raises:: TypeError – If configurations is not of type Configurations.

save_progress(path=None)[source]

Saves the progress of the experiment to a file.

Parameters:: path (str, optional) – The file path to save the progress to. If None, saves to ‘Experiment.pkl’ in results_folder.

static load_progress(path)[source]

Loads the progress of the experiment from a file.

Parameters:: path (str) – The file path to load the progress from.
Returns:: The loaded CausalLearner object.
Return type:: CausalLearner

add_configurations_from_file(filename)[source]

Adds additional configurations to the experiment from a JSON file.

Parameters:: filename (str) – The filename of the JSON file containing configurations.

update_learnt_model()[source]: Updates the learnt model with the new configurations.

get_best_model_between_algorithms(algorithms)[source]

Gets the best model between specified algorithms.

Parameters:: algorithms (list) – A list of algorithm names to consider.
Returns:: The best configuration among the specified algorithms.
Return type:: dict

get_best_model_between_family(causal_sufficiency=None, assume_faithfulness=None, is_output_mec=None, accepts_missing_values=None)[source]

Gets the best model within a family of algorithms based on specified criteria.

Parameters:

causal_sufficiency (bool, optional) – Filter algorithms that admit latent variables.
assume_faithfulness (bool, optional) – Filter algorithms based on faithfulness assumption.
is_output_mec (bool, optional) – Filter algorithms that output MEC graphs.
accepts_missing_values (bool, optional) – Filter algorithms that accept missing values.

Returns:

The best configuration among the filtered algorithms.

Return type:

dict