ETIA.AFS package

class AFS(depth: int = 1, verbose: bool = False, num_processors: int | None = None, oos_protocol: Dict[str, Any] | None = None, random_seed: int | None = None)[source]

Bases: object

Automated Feature Selection (AFS) class.

Parameters:

depth (int, optional) – The depth of the feature selection process. Default is 1.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
num_processors (int, optional) – Number of processors to use for parallel processing. Default is the number of CPU cores.
oos_protocol (dict, optional) – A dictionary specifying the out-of-sample protocol. Default is a 5-fold cross-validation.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.

run_AFS(data, target_features, pred_configs=None, dataset_name='dataset')[source]: Runs the AFS process on the provided data and target features.

Runs the AFS process on the provided data and target features.

Parameters:

data (str or pd.DataFrame or np.ndarray) – The dataset to use. Can be a filename (str), a pandas DataFrame, or a NumPy array.
target_features (Union[Dict[str, str], List[str]]) – A dictionary mapping feature names to their types, or a list of feature names (in which case the types are inferred).
pred_configs (Union[List[Dict[str, Any]], float], optional) –
- If list, it is a list of predictive configurations provided by the user.
- If float (between 0 and 1), it indicates the percentage of default configurations to sample and run.
- If None, all default configurations are used.
dataset_name (str, optional) – The name of the dataset (used for saving intermediate files). Default is ‘dataset’.

Returns:

A dictionary containing: - ‘original_data’: The original dataset - ‘reduced_data’: The dataset with only the selected features and target features - ‘best_config’: The configuration that led to the best feature selection - ‘selected_features’: The selected features for each target

Return type:

dict

Examples

To run feature selection on a dataset: >>> afs = AFS() >>> result = afs.run_AFS(data=”data.csv”, target_features=[“feature1”, “feature2”]) >>> print(result[“selected_features”])

recursive_fs_for_target(data: DataFrame, target_feature: str, target_type: str, pred_configs: List[Dict[str, Any]], dataset_name: str, depth: int, visited_features: set | None = None) → Dict[str, Any][source]: Recursively runs feature selection for a specific target feature up to the specified depth.

run_fs_for_config(data: DataFrame, target_feature: str, target_type: str, config: Dict[str, Any], dataset_name: str, train_inds: List[ndarray], test_inds: List[ndarray], feature_columns: List[str]) → Tuple[List[float], List[Tuple[ndarray, ndarray, Dict[str, Any], Any, Preprocessor | None]], DataFrame][source]: Runs the feature selection process for a specific configuration.

bootstrap_bias_correction(fold_predictions: List[Tuple[ndarray, ndarray]], target_type: str, B: int = 1000, conf_interval: float = 0.95) → float[source]: Applies bootstrap bias correction to the fold predictions.

class FeatureSelector(r_path: str)[source]

Bases: object

Feature selection with the MXM R package.

feature_selection(config, target_name, data_pd, dataset_name, train_idx_name=None, verbose=False)[source]: Runs the feature selection process based on the provided configuration.

run_r_script(script_path: str, data_file_path: str, target_name: str, config: Dict[str, Any], output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the specified R script for feature selection.

fbed(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the FBED feature selection algorithm.

ses(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the SES feature selection algorithm.

feature_selection(config: Dict[str, Any], target_name: str, data_pd: DataFrame, dataset_name: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the feature selection process based on the provided configuration.

class OOS[source]

Bases: object

Out-of-sample protocols for data splitting.

data_split(oos_protocol, X, y=None, target_type='continuous')[source]: Splits the data according to the specified out-of-sample protocol.

data_split(oos_protocol: Dict[str, Any], X: Any, y: Any | None = None, target_type: str = 'continuous') → Tuple[List[ndarray], List[ndarray]][source]

Splits the data according to the specified out-of-sample protocol.

Parameters:

oos_protocol (dict) – A dictionary that specifies the out-of-sample protocol. The ‘name’ key should specify the type of protocol (e.g., ‘KFoldCV’, ‘Holdout’). The ‘folds’ or ‘test_size’ key specifies the number of folds or test size (for holdout).
X (array-like) – The feature data (input variables).
y (array-like, optional) – The target vector (output variables). Required for stratified splits.
target_type (str, optional) – Indicates whether the target is ‘continuous’ or ‘categorical’. Default is ‘continuous’.

Returns:

train_inds (list of np.ndarray) – A list containing the training indices for each fold or holdout split.
test_inds (list of np.ndarray) – A list containing the testing indices for each fold or holdout split.

Raises:

ValueError – If an unsupported protocol name is provided.

class PredictiveConfigurator[source]

Bases: object

Reads the available predictive learning, feature selection, and preprocessing algorithms from JSON files and creates the predictive configurations.

path

The path to the directory containing the JSON configuration files.

Type:: str

pred_algs

Dictionary containing the available predictive algorithms and their configurations.

Type:: dict

fs_algs

Dictionary containing the available feature selection algorithms and their configurations.

Type:: dict

preprocess_algs

Dictionary containing the available preprocessing algorithms and their configurations.

Type:: dict

create_predictive_configs()[source]: Creates a list of all possible predictive configurations by combining available algorithms.

create_predictive_configs() → List[Dict[str, Any]][source]

Creates a list of predictive configurations by combining available algorithms and their options.

It reads configurations from the loaded JSON files for predictive models, feature selection methods, and preprocessing algorithms, and combines them to create all possible configurations.

Returns:: A list of dictionaries, where each dictionary is a unique combination of a predictive model, feature selection algorithm, and preprocessing method.
Return type:: List[Dict[str, Any]]

class PredictiveModel[source]

Bases: object

A class for creating and training predictive models.

random_forest(config, target_type)[source]: Creates a Random Forest model based on the configuration and target type.

linear_regression()[source]: Creates a Linear Regression model.

fit(config, train_X, train_y, selected_features, preprocessor, target_type)[source]: Fits the model to the training data using the specified configuration.

predict(X)[source]: Makes predictions using the trained model.

random_forest(config: Dict[str, Any], target_type: str)[source]

Creates a Random Forest model based on the configuration and target type.

Parameters:

config (dict) – Configuration settings for the Random Forest model, including hyperparameters like n_estimators, min_samples_leaf, and max_features.
target_type (str) – The type of the target variable (‘categorical’ for classification, ‘continuous’ for regression).

Returns:

model – The initialized Random Forest model.

Return type:

RandomForestClassifier or RandomForestRegressor

linear_regression()[source]

Creates a Linear Regression model.

Returns:: model – The initialized Linear Regression model.
Return type:: LinearRegression

fit(config: Dict[str, Any], train_X: Any, train_y: Any, selected_features: Any, preprocessor: Any | None, target_type: str)[source]

Fits the model to the training data.

Parameters:

config (dict) – Configuration settings for the model, including the type of model (‘random_forest’ or ‘linear_regression’).
train_X (array-like) – Training data for the input variables.
train_y (array-like) – Training data for the target variable.
selected_features (any) – The features selected for model training.
preprocessor (object, optional) – A preprocessor object that can be used to transform the input data. Default is None.
target_type (str) – The type of the target variable (‘categorical’ or ‘continuous’).

Raises:

ValueError – If an unsupported model type is specified in the configuration.

predict(X: Any) → Any[source]

Makes predictions using the trained model.

Parameters:: X (array-like) – The input data for which predictions are to be made.
Returns:: predictions – The predicted values based on the input data.
Return type:: array-like

class Preprocessor(method: str = 'standard')[source]

Bases: object

Preprocessor class for data preprocessing.

fit_transform(data)[source]: Fits the preprocessor to the data and transforms it.

transform(data)[source]: Transforms the data using the fitted preprocessor.

fit_transform(data: Any) → Any[source]: Fits the preprocessor to the data and transforms it.

transform(data: Any) → Any[source]: Transforms the data using the fitted preprocessor.

Subpackages

Submodules

ETIA.AFS.AFS module

class AFS(depth: int = 1, verbose: bool = False, num_processors: int | None = None, oos_protocol: Dict[str, Any] | None = None, random_seed: int | None = None)[source]

Bases: object

Automated Feature Selection (AFS) class.

Parameters:

depth (int, optional) – The depth of the feature selection process. Default is 1.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
num_processors (int, optional) – Number of processors to use for parallel processing. Default is the number of CPU cores.
oos_protocol (dict, optional) – A dictionary specifying the out-of-sample protocol. Default is a 5-fold cross-validation.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.

run_AFS(data, target_features, pred_configs=None, dataset_name='dataset')[source]: Runs the AFS process on the provided data and target features.

Runs the AFS process on the provided data and target features.

Parameters:

data (str or pd.DataFrame or np.ndarray) – The dataset to use. Can be a filename (str), a pandas DataFrame, or a NumPy array.
target_features (Union[Dict[str, str], List[str]]) – A dictionary mapping feature names to their types, or a list of feature names (in which case the types are inferred).
pred_configs (Union[List[Dict[str, Any]], float], optional) –
- If list, it is a list of predictive configurations provided by the user.
- If float (between 0 and 1), it indicates the percentage of default configurations to sample and run.
- If None, all default configurations are used.
dataset_name (str, optional) – The name of the dataset (used for saving intermediate files). Default is ‘dataset’.

Returns:

A dictionary containing: - ‘original_data’: The original dataset - ‘reduced_data’: The dataset with only the selected features and target features - ‘best_config’: The configuration that led to the best feature selection - ‘selected_features’: The selected features for each target

Return type:

dict

Examples

To run feature selection on a dataset: >>> afs = AFS() >>> result = afs.run_AFS(data=”data.csv”, target_features=[“feature1”, “feature2”]) >>> print(result[“selected_features”])

recursive_fs_for_target(data: DataFrame, target_feature: str, target_type: str, pred_configs: List[Dict[str, Any]], dataset_name: str, depth: int, visited_features: set | None = None) → Dict[str, Any][source]: Recursively runs feature selection for a specific target feature up to the specified depth.

run_fs_for_config(data: DataFrame, target_feature: str, target_type: str, config: Dict[str, Any], dataset_name: str, train_inds: List[ndarray], test_inds: List[ndarray], feature_columns: List[str]) → Tuple[List[float], List[Tuple[ndarray, ndarray, Dict[str, Any], Any, Preprocessor | None]], DataFrame][source]: Runs the feature selection process for a specific configuration.

bootstrap_bias_correction(fold_predictions: List[Tuple[ndarray, ndarray]], target_type: str, B: int = 1000, conf_interval: float = 0.95) → float[source]: Applies bootstrap bias correction to the fold predictions.

ETIA.AFS.feature_selector module

class FeatureSelector(r_path: str)[source]

Bases: object

Feature selection with the MXM R package.

feature_selection(config, target_name, data_pd, dataset_name, train_idx_name=None, verbose=False)[source]: Runs the feature selection process based on the provided configuration.

run_r_script(script_path: str, data_file_path: str, target_name: str, config: Dict[str, Any], output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the specified R script for feature selection.

fbed(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the FBED feature selection algorithm.

ses(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the SES feature selection algorithm.

feature_selection(config: Dict[str, Any], target_name: str, data_pd: DataFrame, dataset_name: str, train_idx_name: str | None = None, verbose: bool = False) → DataFrame[source]: Runs the feature selection process based on the provided configuration.

ETIA.AFS.oos module

class OOS[source]

Bases: object

Out-of-sample protocols for data splitting.

data_split(oos_protocol, X, y=None, target_type='continuous')[source]: Splits the data according to the specified out-of-sample protocol.

data_split(oos_protocol: Dict[str, Any], X: Any, y: Any | None = None, target_type: str = 'continuous') → Tuple[List[ndarray], List[ndarray]][source]

Splits the data according to the specified out-of-sample protocol.

Parameters:

oos_protocol (dict) – A dictionary that specifies the out-of-sample protocol. The ‘name’ key should specify the type of protocol (e.g., ‘KFoldCV’, ‘Holdout’). The ‘folds’ or ‘test_size’ key specifies the number of folds or test size (for holdout).
X (array-like) – The feature data (input variables).
y (array-like, optional) – The target vector (output variables). Required for stratified splits.
target_type (str, optional) – Indicates whether the target is ‘continuous’ or ‘categorical’. Default is ‘continuous’.

Returns:

train_inds (list of np.ndarray) – A list containing the training indices for each fold or holdout split.
test_inds (list of np.ndarray) – A list containing the testing indices for each fold or holdout split.

Raises:

ValueError – If an unsupported protocol name is provided.

ETIA.AFS.predictive_configurator module

class PredictiveConfigurator[source]

Bases: object

Reads the available predictive learning, feature selection, and preprocessing algorithms from JSON files and creates the predictive configurations.

path

The path to the directory containing the JSON configuration files.

Type:: str

pred_algs

Dictionary containing the available predictive algorithms and their configurations.

Type:: dict

fs_algs

Dictionary containing the available feature selection algorithms and their configurations.

Type:: dict

preprocess_algs

Dictionary containing the available preprocessing algorithms and their configurations.

Type:: dict

create_predictive_configs()[source]: Creates a list of all possible predictive configurations by combining available algorithms.

create_predictive_configs() → List[Dict[str, Any]][source]

Creates a list of predictive configurations by combining available algorithms and their options.

It reads configurations from the loaded JSON files for predictive models, feature selection methods, and preprocessing algorithms, and combines them to create all possible configurations.

Returns:: A list of dictionaries, where each dictionary is a unique combination of a predictive model, feature selection algorithm, and preprocessing method.
Return type:: List[Dict[str, Any]]

ETIA.AFS.predictive_model module

class PredictiveModel[source]

Bases: object

A class for creating and training predictive models.

random_forest(config, target_type)[source]: Creates a Random Forest model based on the configuration and target type.

linear_regression()[source]: Creates a Linear Regression model.

fit(config, train_X, train_y, selected_features, preprocessor, target_type)[source]: Fits the model to the training data using the specified configuration.

predict(X)[source]: Makes predictions using the trained model.

random_forest(config: Dict[str, Any], target_type: str)[source]

Creates a Random Forest model based on the configuration and target type.

Parameters:

config (dict) – Configuration settings for the Random Forest model, including hyperparameters like n_estimators, min_samples_leaf, and max_features.
target_type (str) – The type of the target variable (‘categorical’ for classification, ‘continuous’ for regression).

Returns:

model – The initialized Random Forest model.

Return type:

RandomForestClassifier or RandomForestRegressor

linear_regression()[source]

Creates a Linear Regression model.

Returns:: model – The initialized Linear Regression model.
Return type:: LinearRegression

fit(config: Dict[str, Any], train_X: Any, train_y: Any, selected_features: Any, preprocessor: Any | None, target_type: str)[source]

Fits the model to the training data.

Parameters:

config (dict) – Configuration settings for the model, including the type of model (‘random_forest’ or ‘linear_regression’).
train_X (array-like) – Training data for the input variables.
train_y (array-like) – Training data for the target variable.
selected_features (any) – The features selected for model training.
preprocessor (object, optional) – A preprocessor object that can be used to transform the input data. Default is None.
target_type (str) – The type of the target variable (‘categorical’ or ‘continuous’).

Raises:

ValueError – If an unsupported model type is specified in the configuration.

predict(X: Any) → Any[source]

Makes predictions using the trained model.

Parameters:: X (array-like) – The input data for which predictions are to be made.
Returns:: predictions – The predicted values based on the input data.
Return type:: array-like

ETIA.AFS.preprocessor module

class Preprocessor(method: str = 'standard')[source]

Bases: object

Preprocessor class for data preprocessing.

fit_transform(data)[source]: Fits the preprocessor to the data and transforms it.

transform(data)[source]: Transforms the data using the fitted preprocessor.

fit_transform(data: Any) → Any[source]: Fits the preprocessor to the data and transforms it.

transform(data: Any) → Any[source]: Transforms the data using the fitted preprocessor.