ETIA.data package
Submodules
ETIA.data.Dataset module
- class Dataset(filename=None, data_time_info=None, time_series=False, data=None, dataset_name=None)[source]
Bases:
objectA class for representing datasets and providing functionalities for loading, manipulating, and processing datasets.
- Parameters:
filename (str, optional) – The name of the CSV file containing the dataset. Default is None.
data_time_info (dict, optional) – Dictionary containing time-related information (lags, etc.). Default is None.
time_series (bool, optional) – Boolean indicating if the dataset is time series data. Default is False.
data (pd.DataFrame, optional) – A pandas DataFrame containing preloaded data (e.g., from AFS). Default is None.
dataset_name (str, optional) – The name of the dataset. If not provided, it defaults to ‘Preloaded Dataset’ or the filename.
- dataset_name
The name of the dataset.
- Type:
str
- data_time_info
Information related to time and lags in the dataset.
- Type:
dict
- time_series
Boolean flag indicating if the data is a time series dataset.
- Type:
bool
- n_lags
The number of time lags in the dataset.
- Type:
int
- data
The loaded dataset.
- Type:
pd.DataFrame
- data_type_info
Information on the types of variables in the dataset.
- Type:
dict
- data_type
General type of data (e.g., continuous, categorical).
- Type:
str
- data_general_info
General information about the dataset.
- Type:
dict
- processed_data
Data after processing (currently empty).
- Type:
dict
- annotations
Annotations on the dataset (optional).
- Type:
dict
- get_info()[source]
Returns all the general information of the dataset including type and time-related info.
- load_file(filename)[source]
Loads a new dataset from a CSV file.
- Parameters:
filename (str) – Name of the CSV file to load the dataset from.
- load_np_dataset(dataset, column_names)[source]
Loads a new dataset from a NumPy array.
- Parameters:
dataset (np.ndarray) – The dataset as a NumPy array.
column_names (list) – List of column names for the dataset.
- Raises:
TypeError – If the input is not a NumPy array.
- load_pd_dataset(dataset)[source]
Loads a new dataset from a pandas DataFrame.
- Parameters:
dataset (pd.DataFrame) – The dataset as a pandas DataFrame.
- Raises:
TypeError – If the input is not a pandas DataFrame.
- convert_to_time_lag(n_lags)[source]
Converts the dataset into time-lagged data.
- Parameters:
n_lags (int) – Number of time lags to add to the dataset.
- Returns:
The dataset with added time lags (if applicable).
- Return type:
pd.DataFrame
- get_dataset()[source]
Returns the dataset stored in the Dataset instance.
- Returns:
The loaded dataset.
- Return type:
pd.DataFrame
- get_data_type_info()[source]
Returns the data type information of the dataset.
- Returns:
A dictionary containing information about the variable types in the dataset.
- Return type:
dict
- get_data_time_info()[source]
Returns the time-related information of the dataset.
- Returns:
A dictionary containing time-related information such as lags and whether the dataset is time-lagged.
- Return type:
dict
- get_info()[source]
Returns general information about the dataset, including data types and time-related information.
- Returns:
A dictionary containing: - data_type_info: Information about variable types in the dataset. - data_time_info: Time-related information about the dataset. - data_type: General type of data (e.g., continuous, categorical). - data_general_info: General information about the dataset. - dataset_name: The name of the dataset.
- Return type:
dict
ETIA.data.utils module
- class DataTypes(value)[source]
Bases:
EnumAn enumeration.
- CONTINUOUS = 1
- DISCRETE = 2
- MIXED = 3
- GRAPH = 4
- COVARIANCE = 5
- ALL = 6
- var_types_and_categorical_encoding(data, unique_val_thr=5)[source]
Returns information about the data type (continuous or categorical) of each column in data. :param data: pandas array with possible nan values, str, int, floats and objects :param unique_val_thr: int value to
- Returns:
- numpy array with two columns :
1st column has the names of the variables and the 2nd column has the information ‘continuous’ or ‘catagorical’
- Return type:
data_type_info