ETIA.data package

Submodules

ETIA.data.Dataset module

class Dataset(filename=None, data_time_info=None, time_series=False, data=None, dataset_name=None)[source]

Bases: object

A class for representing datasets and providing functionalities for loading, manipulating, and processing datasets.

Parameters:
  • filename (str, optional) – The name of the CSV file containing the dataset. Default is None.

  • data_time_info (dict, optional) – Dictionary containing time-related information (lags, etc.). Default is None.

  • time_series (bool, optional) – Boolean indicating if the dataset is time series data. Default is False.

  • data (pd.DataFrame, optional) – A pandas DataFrame containing preloaded data (e.g., from AFS). Default is None.

  • dataset_name (str, optional) – The name of the dataset. If not provided, it defaults to ‘Preloaded Dataset’ or the filename.

dataset_name

The name of the dataset.

Type:

str

data_time_info

Information related to time and lags in the dataset.

Type:

dict

time_series

Boolean flag indicating if the data is a time series dataset.

Type:

bool

n_lags

The number of time lags in the dataset.

Type:

int

data

The loaded dataset.

Type:

pd.DataFrame

data_type_info

Information on the types of variables in the dataset.

Type:

dict

data_type

General type of data (e.g., continuous, categorical).

Type:

str

data_general_info

General information about the dataset.

Type:

dict

processed_data

Data after processing (currently empty).

Type:

dict

annotations

Annotations on the dataset (optional).

Type:

dict

load_file(filename)[source]

Loads a new dataset from a CSV file.

load_np_dataset(dataset, column_names)[source]

Loads a new dataset from a NumPy array.

load_pd_dataset(dataset)[source]

Loads a new dataset from a pandas DataFrame.

convert_to_time_lag(n_lags)[source]

Converts the dataset into time-lagged data.

get_dataset()[source]

Returns the dataset stored in the Dataset instance.

get_data_type_info()[source]

Returns the data type information of the dataset.

get_data_time_info()[source]

Returns the time-related information of the dataset.

get_info()[source]

Returns all the general information of the dataset including type and time-related info.

annotate_dataset(annotations)[source]

Stores annotations for the dataset.

load_file(filename)[source]

Loads a new dataset from a CSV file.

Parameters:

filename (str) – Name of the CSV file to load the dataset from.

load_np_dataset(dataset, column_names)[source]

Loads a new dataset from a NumPy array.

Parameters:
  • dataset (np.ndarray) – The dataset as a NumPy array.

  • column_names (list) – List of column names for the dataset.

Raises:

TypeError – If the input is not a NumPy array.

load_pd_dataset(dataset)[source]

Loads a new dataset from a pandas DataFrame.

Parameters:

dataset (pd.DataFrame) – The dataset as a pandas DataFrame.

Raises:

TypeError – If the input is not a pandas DataFrame.

convert_to_time_lag(n_lags)[source]

Converts the dataset into time-lagged data.

Parameters:

n_lags (int) – Number of time lags to add to the dataset.

Returns:

The dataset with added time lags (if applicable).

Return type:

pd.DataFrame

get_dataset()[source]

Returns the dataset stored in the Dataset instance.

Returns:

The loaded dataset.

Return type:

pd.DataFrame

get_data_type_info()[source]

Returns the data type information of the dataset.

Returns:

A dictionary containing information about the variable types in the dataset.

Return type:

dict

get_data_time_info()[source]

Returns the time-related information of the dataset.

Returns:

A dictionary containing time-related information such as lags and whether the dataset is time-lagged.

Return type:

dict

get_info()[source]

Returns general information about the dataset, including data types and time-related information.

Returns:

A dictionary containing: - data_type_info: Information about variable types in the dataset. - data_time_info: Time-related information about the dataset. - data_type: General type of data (e.g., continuous, categorical). - data_general_info: General information about the dataset. - dataset_name: The name of the dataset.

Return type:

dict

annotate_dataset(annotations)[source]

Stores annotations for the dataset.

Parameters:

annotations (dict) – Dictionary of annotations related to the dataset.

ETIA.data.utils module

class DataTypes(value)[source]

Bases: Enum

An enumeration.

CONTINUOUS = 1
DISCRETE = 2
MIXED = 3
GRAPH = 4
COVARIANCE = 5
ALL = 6
var_types_and_categorical_encoding(data, unique_val_thr=5)[source]

Returns information about the data type (continuous or categorical) of each column in data. :param data: pandas array with possible nan values, str, int, floats and objects :param unique_val_thr: int value to

Returns:

numpy array with two columns :

1st column has the names of the variables and the 2nd column has the information ‘continuous’ or ‘catagorical’

Return type:

data_type_info

get_data_info(data)[source]