ETIA.AFS package
- class AFS(depth: int = 1, verbose: bool = False, num_processors: int | None = None, oos_protocol: Dict[str, Any] | None = None, random_seed: int | None = None)[source]
Bases:
objectAutomated Feature Selection (AFS) class.
- Parameters:
depth (int, optional) – The depth of the feature selection process. Default is 1.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
num_processors (int, optional) – Number of processors to use for parallel processing. Default is the number of CPU cores.
oos_protocol (dict, optional) – A dictionary specifying the out-of-sample protocol. Default is a 5-fold cross-validation.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.
- run_AFS(data, target_features, pred_configs=None, dataset_name='dataset')[source]
Runs the AFS process on the provided data and target features.
- run_AFS(data: str | DataFrame | ndarray, target_features: List[str] | Dict[str, str], pred_configs: List[Dict[str, Any]] | float | None = None, dataset_name: str = 'dataset') Dict[str, Any][source]
Runs the AFS process on the provided data and target features.
- Parameters:
data (str or pd.DataFrame or np.ndarray) – The dataset to use. Can be a filename (str), a pandas DataFrame, or a NumPy array.
target_features (Union[Dict[str, str], List[str]]) – A dictionary mapping feature names to their types, or a list of feature names (in which case the types are inferred).
pred_configs (Union[List[Dict[str, Any]], float], optional) –
If list, it is a list of predictive configurations provided by the user.
If float (between 0 and 1), it indicates the percentage of default configurations to sample and run.
If None, all default configurations are used.
dataset_name (str, optional) – The name of the dataset (used for saving intermediate files). Default is ‘dataset’.
- Returns:
A dictionary containing: - ‘original_data’: The original dataset - ‘reduced_data’: The dataset with only the selected features and target features - ‘best_config’: The configuration that led to the best feature selection - ‘selected_features’: The selected features for each target
- Return type:
dict
Examples
To run feature selection on a dataset: >>> afs = AFS() >>> result = afs.run_AFS(data=”data.csv”, target_features=[“feature1”, “feature2”]) >>> print(result[“selected_features”])
- recursive_fs_for_target(data: DataFrame, target_feature: str, target_type: str, pred_configs: List[Dict[str, Any]], dataset_name: str, depth: int, visited_features: set | None = None) Dict[str, Any][source]
Recursively runs feature selection for a specific target feature up to the specified depth.
- run_fs_for_config(data: DataFrame, target_feature: str, target_type: str, config: Dict[str, Any], dataset_name: str, train_inds: List[ndarray], test_inds: List[ndarray], feature_columns: List[str]) Tuple[List[float], List[Tuple[ndarray, ndarray, Dict[str, Any], Any, Preprocessor | None]], DataFrame][source]
Runs the feature selection process for a specific configuration.
- class FeatureSelector(r_path: str)[source]
Bases:
objectFeature selection with the MXM R package.
- feature_selection(config, target_name, data_pd, dataset_name, train_idx_name=None, verbose=False)[source]
Runs the feature selection process based on the provided configuration.
- run_r_script(script_path: str, data_file_path: str, target_name: str, config: Dict[str, Any], output_file: str, train_idx_name: str | None = None, verbose: bool = False) DataFrame[source]
Runs the specified R script for feature selection.
- fbed(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) DataFrame[source]
Runs the FBED feature selection algorithm.
- class OOS[source]
Bases:
objectOut-of-sample protocols for data splitting.
- data_split(oos_protocol, X, y=None, target_type='continuous')[source]
Splits the data according to the specified out-of-sample protocol.
- data_split(oos_protocol: Dict[str, Any], X: Any, y: Any | None = None, target_type: str = 'continuous') Tuple[List[ndarray], List[ndarray]][source]
Splits the data according to the specified out-of-sample protocol.
- Parameters:
oos_protocol (dict) – A dictionary that specifies the out-of-sample protocol. The ‘name’ key should specify the type of protocol (e.g., ‘KFoldCV’, ‘Holdout’). The ‘folds’ or ‘test_size’ key specifies the number of folds or test size (for holdout).
X (array-like) – The feature data (input variables).
y (array-like, optional) – The target vector (output variables). Required for stratified splits.
target_type (str, optional) – Indicates whether the target is ‘continuous’ or ‘categorical’. Default is ‘continuous’.
- Returns:
train_inds (list of np.ndarray) – A list containing the training indices for each fold or holdout split.
test_inds (list of np.ndarray) – A list containing the testing indices for each fold or holdout split.
- Raises:
ValueError – If an unsupported protocol name is provided.
- class PredictiveConfigurator[source]
Bases:
objectReads the available predictive learning, feature selection, and preprocessing algorithms from JSON files and creates the predictive configurations.
- path
The path to the directory containing the JSON configuration files.
- Type:
str
- pred_algs
Dictionary containing the available predictive algorithms and their configurations.
- Type:
dict
- fs_algs
Dictionary containing the available feature selection algorithms and their configurations.
- Type:
dict
- preprocess_algs
Dictionary containing the available preprocessing algorithms and their configurations.
- Type:
dict
- create_predictive_configs()[source]
Creates a list of all possible predictive configurations by combining available algorithms.
- create_predictive_configs() List[Dict[str, Any]][source]
Creates a list of predictive configurations by combining available algorithms and their options.
It reads configurations from the loaded JSON files for predictive models, feature selection methods, and preprocessing algorithms, and combines them to create all possible configurations.
- Returns:
A list of dictionaries, where each dictionary is a unique combination of a predictive model, feature selection algorithm, and preprocessing method.
- Return type:
List[Dict[str, Any]]
- class PredictiveModel[source]
Bases:
objectA class for creating and training predictive models.
- random_forest(config, target_type)[source]
Creates a Random Forest model based on the configuration and target type.
- fit(config, train_X, train_y, selected_features, preprocessor, target_type)[source]
Fits the model to the training data using the specified configuration.
- random_forest(config: Dict[str, Any], target_type: str)[source]
Creates a Random Forest model based on the configuration and target type.
- Parameters:
config (dict) – Configuration settings for the Random Forest model, including hyperparameters like n_estimators, min_samples_leaf, and max_features.
target_type (str) – The type of the target variable (‘categorical’ for classification, ‘continuous’ for regression).
- Returns:
model – The initialized Random Forest model.
- Return type:
RandomForestClassifier or RandomForestRegressor
- linear_regression()[source]
Creates a Linear Regression model.
- Returns:
model – The initialized Linear Regression model.
- Return type:
LinearRegression
- fit(config: Dict[str, Any], train_X: Any, train_y: Any, selected_features: Any, preprocessor: Any | None, target_type: str)[source]
Fits the model to the training data.
- Parameters:
config (dict) – Configuration settings for the model, including the type of model (‘random_forest’ or ‘linear_regression’).
train_X (array-like) – Training data for the input variables.
train_y (array-like) – Training data for the target variable.
selected_features (any) – The features selected for model training.
preprocessor (object, optional) – A preprocessor object that can be used to transform the input data. Default is None.
target_type (str) – The type of the target variable (‘categorical’ or ‘continuous’).
- Raises:
ValueError – If an unsupported model type is specified in the configuration.
- class Preprocessor(method: str = 'standard')[source]
Bases:
objectPreprocessor class for data preprocessing.
Subpackages
Submodules
ETIA.AFS.AFS module
- class AFS(depth: int = 1, verbose: bool = False, num_processors: int | None = None, oos_protocol: Dict[str, Any] | None = None, random_seed: int | None = None)[source]
Bases:
objectAutomated Feature Selection (AFS) class.
- Parameters:
depth (int, optional) – The depth of the feature selection process. Default is 1.
verbose (bool, optional) – If True, prints detailed logs. Default is False.
num_processors (int, optional) – Number of processors to use for parallel processing. Default is the number of CPU cores.
oos_protocol (dict, optional) – A dictionary specifying the out-of-sample protocol. Default is a 5-fold cross-validation.
random_seed (int, optional) – Seed for random number generator to ensure reproducibility. Default is None.
- run_AFS(data, target_features, pred_configs=None, dataset_name='dataset')[source]
Runs the AFS process on the provided data and target features.
- run_AFS(data: str | DataFrame | ndarray, target_features: List[str] | Dict[str, str], pred_configs: List[Dict[str, Any]] | float | None = None, dataset_name: str = 'dataset') Dict[str, Any][source]
Runs the AFS process on the provided data and target features.
- Parameters:
data (str or pd.DataFrame or np.ndarray) – The dataset to use. Can be a filename (str), a pandas DataFrame, or a NumPy array.
target_features (Union[Dict[str, str], List[str]]) – A dictionary mapping feature names to their types, or a list of feature names (in which case the types are inferred).
pred_configs (Union[List[Dict[str, Any]], float], optional) –
If list, it is a list of predictive configurations provided by the user.
If float (between 0 and 1), it indicates the percentage of default configurations to sample and run.
If None, all default configurations are used.
dataset_name (str, optional) – The name of the dataset (used for saving intermediate files). Default is ‘dataset’.
- Returns:
A dictionary containing: - ‘original_data’: The original dataset - ‘reduced_data’: The dataset with only the selected features and target features - ‘best_config’: The configuration that led to the best feature selection - ‘selected_features’: The selected features for each target
- Return type:
dict
Examples
To run feature selection on a dataset: >>> afs = AFS() >>> result = afs.run_AFS(data=”data.csv”, target_features=[“feature1”, “feature2”]) >>> print(result[“selected_features”])
- recursive_fs_for_target(data: DataFrame, target_feature: str, target_type: str, pred_configs: List[Dict[str, Any]], dataset_name: str, depth: int, visited_features: set | None = None) Dict[str, Any][source]
Recursively runs feature selection for a specific target feature up to the specified depth.
- run_fs_for_config(data: DataFrame, target_feature: str, target_type: str, config: Dict[str, Any], dataset_name: str, train_inds: List[ndarray], test_inds: List[ndarray], feature_columns: List[str]) Tuple[List[float], List[Tuple[ndarray, ndarray, Dict[str, Any], Any, Preprocessor | None]], DataFrame][source]
Runs the feature selection process for a specific configuration.
ETIA.AFS.feature_selector module
- class FeatureSelector(r_path: str)[source]
Bases:
objectFeature selection with the MXM R package.
- feature_selection(config, target_name, data_pd, dataset_name, train_idx_name=None, verbose=False)[source]
Runs the feature selection process based on the provided configuration.
- run_r_script(script_path: str, data_file_path: str, target_name: str, config: Dict[str, Any], output_file: str, train_idx_name: str | None = None, verbose: bool = False) DataFrame[source]
Runs the specified R script for feature selection.
- fbed(target_name: str, config: Dict[str, Any], data_file_path: str, output_file: str, train_idx_name: str | None = None, verbose: bool = False) DataFrame[source]
Runs the FBED feature selection algorithm.
ETIA.AFS.oos module
- class OOS[source]
Bases:
objectOut-of-sample protocols for data splitting.
- data_split(oos_protocol, X, y=None, target_type='continuous')[source]
Splits the data according to the specified out-of-sample protocol.
- data_split(oos_protocol: Dict[str, Any], X: Any, y: Any | None = None, target_type: str = 'continuous') Tuple[List[ndarray], List[ndarray]][source]
Splits the data according to the specified out-of-sample protocol.
- Parameters:
oos_protocol (dict) – A dictionary that specifies the out-of-sample protocol. The ‘name’ key should specify the type of protocol (e.g., ‘KFoldCV’, ‘Holdout’). The ‘folds’ or ‘test_size’ key specifies the number of folds or test size (for holdout).
X (array-like) – The feature data (input variables).
y (array-like, optional) – The target vector (output variables). Required for stratified splits.
target_type (str, optional) – Indicates whether the target is ‘continuous’ or ‘categorical’. Default is ‘continuous’.
- Returns:
train_inds (list of np.ndarray) – A list containing the training indices for each fold or holdout split.
test_inds (list of np.ndarray) – A list containing the testing indices for each fold or holdout split.
- Raises:
ValueError – If an unsupported protocol name is provided.
ETIA.AFS.predictive_configurator module
- class PredictiveConfigurator[source]
Bases:
objectReads the available predictive learning, feature selection, and preprocessing algorithms from JSON files and creates the predictive configurations.
- path
The path to the directory containing the JSON configuration files.
- Type:
str
- pred_algs
Dictionary containing the available predictive algorithms and their configurations.
- Type:
dict
- fs_algs
Dictionary containing the available feature selection algorithms and their configurations.
- Type:
dict
- preprocess_algs
Dictionary containing the available preprocessing algorithms and their configurations.
- Type:
dict
- create_predictive_configs()[source]
Creates a list of all possible predictive configurations by combining available algorithms.
- create_predictive_configs() List[Dict[str, Any]][source]
Creates a list of predictive configurations by combining available algorithms and their options.
It reads configurations from the loaded JSON files for predictive models, feature selection methods, and preprocessing algorithms, and combines them to create all possible configurations.
- Returns:
A list of dictionaries, where each dictionary is a unique combination of a predictive model, feature selection algorithm, and preprocessing method.
- Return type:
List[Dict[str, Any]]
ETIA.AFS.predictive_model module
- class PredictiveModel[source]
Bases:
objectA class for creating and training predictive models.
- random_forest(config, target_type)[source]
Creates a Random Forest model based on the configuration and target type.
- fit(config, train_X, train_y, selected_features, preprocessor, target_type)[source]
Fits the model to the training data using the specified configuration.
- random_forest(config: Dict[str, Any], target_type: str)[source]
Creates a Random Forest model based on the configuration and target type.
- Parameters:
config (dict) – Configuration settings for the Random Forest model, including hyperparameters like n_estimators, min_samples_leaf, and max_features.
target_type (str) – The type of the target variable (‘categorical’ for classification, ‘continuous’ for regression).
- Returns:
model – The initialized Random Forest model.
- Return type:
RandomForestClassifier or RandomForestRegressor
- linear_regression()[source]
Creates a Linear Regression model.
- Returns:
model – The initialized Linear Regression model.
- Return type:
LinearRegression
- fit(config: Dict[str, Any], train_X: Any, train_y: Any, selected_features: Any, preprocessor: Any | None, target_type: str)[source]
Fits the model to the training data.
- Parameters:
config (dict) – Configuration settings for the model, including the type of model (‘random_forest’ or ‘linear_regression’).
train_X (array-like) – Training data for the input variables.
train_y (array-like) – Training data for the target variable.
selected_features (any) – The features selected for model training.
preprocessor (object, optional) – A preprocessor object that can be used to transform the input data. Default is None.
target_type (str) – The type of the target variable (‘categorical’ or ‘continuous’).
- Raises:
ValueError – If an unsupported model type is specified in the configuration.