carte_ai.src package

Submodules

carte_ai.src.carte_table_to_graph module

class carte_ai.src.carte_table_to_graph.Table2GraphTransformer(*, include_edge_attr: bool = True, lm_model: str = 'fasttext', n_components: int = 300, n_jobs: int = 1, fasttext_model_path: str | None = None)

Bases: TransformerMixin, BaseEstimator

Transformer from tables to a list of graphs.

Parameters

include_edge_attrbool, optional

Whether to include edge attributes, by default True.

lm_modelstr, optional

Language model to use, by default “fasttext”.

n_componentsint, optional

Number of components for the encoder, by default 300.

n_jobsint, optional

Number of jobs for parallel processing, by default 1.

fasttext_model_pathstr, optional

Path to the FastText model file, required if lm_model is ‘fasttext’.

fit(X, y=None)

Fit function used for the Table2GraphTransformer.

Parameters

Xpandas.DataFrame

Input data to fit.

yarray-like, optional

Target values, by default None.

Returns

selfTable2GraphTransformer

Fitted transformer.

Example Usage:


import fasttext
from huggingface_hub import hf_hub_download

# Download the FastText model from HuggingFace Hub
model_path = hf_hub_download("hi-paris/fastText", "cc.en.300.bin")

# Initialize the Table2GraphTransformer
preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)

# View the transformer details
help(Table2GraphTransformer)

# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)

# Transform the test data
X_test = preprocessor.transform(X_test)

carte_ai.src.carte_estimator module

CARTE estimators for regression and classification.

class carte_ai.src.carte_estimator.BaseCARTEEstimator(*, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, cross_validate, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: BaseEstimator

Base class for CARTE Estimator.

fit(X, y)

Fit the CARTE model.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

yarray-like of shape (n_samples,)

Target values.

Returns

selfobject

Fitted estimator.

Example Usage:


# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]

# Define the estimator and run fit/predict
estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)

# Obtain the r2 score on predictions
score = r2_score(y_test, y_pred)
print("\nThe R2 score for CARTE:", "{:.4f}".format(score))
class carte_ai.src.carte_estimator.BaseCARTEMultitableEstimator(*, source_data, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, target_fraction, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path)

Bases: BaseCARTEEstimator

Base class for CARTE Multitable Estimator.

fit(X, y)

Fit the CARTE Multitable model.

Parameters

Xlist of graph objects with size (n_samples)

The input samples of the target data.

yarray-like of shape (n_samples,)

Target values.

Returns

selfobject

Fitted estimator.

class carte_ai.src.carte_estimator.CARTEClassifier(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: ClassifierMixin, BaseCARTEEstimator

CARTE Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’

The loss function used for backpropagation.

scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’

The scoring function used for validation.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

cross_validatebool, default=False

Indicates whether to use cross-validation strategy for train/validation split

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

decision_function(X)

Compute the decision function of X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

decision : ndarray, shape (n_samples,)

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)

The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.CARTEMultitableClassifer(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: ClassifierMixin, BaseCARTEMultitableEstimator

CARTE Multitable Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’

The loss function used for backpropagation.

scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’

The scoring function used for validation.

source_datedict, default={}

The source data used in multitable estimator.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

target_fractionfloat, default=0.125

The fraction of target data inside of a batch when training

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

decision_function(X)

Compute the decision function of X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

decision : ndarray, shape (n_samples,)

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)

The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEMultitableClassifer

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.CARTEMultitableRegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: RegressorMixin, BaseCARTEMultitableEstimator

CARTE Multitable Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’

The loss function used for backpropagation.

scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’

The scoring function used for validation.

source_datedict, default={}

The source data used in multitable estimator.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

target_fractionfloat, default=0.125

The fraction of target data inside of a batch when training

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

predict(X)

Predict values for X.

Returns the weighted average of the singletable model and all pairwise model with 1-source.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.CARTERegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: RegressorMixin, BaseCARTEEstimator

CARTE Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’

The loss function used for backpropagation.

scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’

The scoring function used for validation.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

cross_validatebool, default=False

Indicates whether to use cross-validation strategy for train/validation split

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

predict(X)

Predict values for X. Returns the average of predicted values over all the models.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTERegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.CARTE_AblationClassifier(*, ablation_method: str = 'exclude-edge', loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = False, freeze_pretrain: bool = False, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: CARTEClassifier

CARTE Ablation Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE

Parameters

ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’

The ablation method for CARTE Estimators.

loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’

The loss function used for backpropagation.

scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’

The scoring function used for validation.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

cross_validatebool, default=False

Indicates whether to use cross-validation strategy for train/validation split

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTE_AblationClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.CARTE_AblationRegressor(*, ablation_method: str = 'exclude-edge', loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: CARTERegressor

CARTE Ablation Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE

Parameters

ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’

The ablation method for CARTE Estimators.

loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’

The loss function used for backpropagation.

scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’

The scoring function used for validation.

num_layersint, default=1

The number of layers for the NN model

load_pretrainbool, default=True

Indicates whether to load pretrained weights or not

freeze_pretrainbool, default=True

Indicates whether to freeze the pretrained weights in the training or not

learning_ratefloat, default=1e-3

The learning rate of the model. The model uses AdamW as the optimizer

batch_sizeint, default=16

The batch size used for training

max_epochint or None, default=500

The maximum number of epoch for training

dropoutfloat, default=0

The dropout rate for training

val_sizefloat, default=0.1

The size of the validation set used for early stopping

cross_validatebool, default=False

Indicates whether to use cross-validation strategy for train/validation split

early_stopping_patienceint or None, default=40

The early stopping patience when early stopping is used. If set to None, no early stopping is employed

num_modelint, default=1

The total number of models used for Bagging strategy

random_stateint or None, default=0

Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.

n_jobsint, default=1

Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.

device{“cpu”, “gpu”}, default=”cpu”,

The device used for the estimator.

disable_pbarbool, default=True

Indicates whether to show progress bars for the training process.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTE_AblationRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.carte_estimator.IdxIterator(n_batch: int, domain_indicator: Tensor, target_fraction: float)

Bases: object

Class for iterating indices to set up the batch for CARTE Multitables

sample()
set_num_samples()

carte_ai.src.baseline_multitable module

Baselines for multitable problem.

class carte_ai.src.baseline_multitable.CatBoostMultitableClassifier(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for CatBoost Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CatBoostMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.CatBoostMultitableRegressor(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for CatBoost Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CatBoostMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.GradientBoostingClassifierBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: ClassifierMixin, GradientBoostingMultitableBase

Base class for Gradient Boosting Multitable Classifier.

decision_function(X)

Compute the decision function of X.

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)

The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GradientBoostingClassifierBase

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.GradientBoostingMultitableBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: BaseEstimator

Base class for Gradient Boosting Multitable Estimator.

fit(X, y)

Fit the model.

Parameters

XPandas dataframe of the target dataset (n_samples)

The input samples.

yarray-like of shape (n_samples,)

Target values.

Returns

selfobject

Fitted estimator.

class carte_ai.src.baseline_multitable.GradientBoostingRegressorBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: RegressorMixin, GradientBoostingMultitableBase

Base class for Gradient Boosting Multitable Regressor.

predict(X)

Predict values for X. Returns the average of predicted values over all the models.

Parameters

Xlist of graph objects with size (n_samples)

The input samples.

Returns

yndarray, shape (n_samples,)

The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GradientBoostingRegressorBase

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.HistGBMultitableClassifier(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for Historgram Gradient Boosting Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') HistGBMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.HistGBMultitableRegressor(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for Historgram Gradient Boosting Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') HistGBMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.XGBoostMultitableClassifier(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for XGBoost Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') XGBoostMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_multitable.XGBoostMultitableRegressor(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for XGBoost Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') XGBoostMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

carte_ai.src.baseline_singletable_nn module

Neural network baseline for comparison.

class carte_ai.src.baseline_singletable_nn.BaseMLPEstimator(*, hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: MLPBase

Base class for MLP Estimator.

class carte_ai.src.baseline_singletable_nn.BaseRESNETEstimator(*, normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: MLPBase

Base class for RESNET Estimator.

class carte_ai.src.baseline_singletable_nn.MLPBase(*, hidden_dim, learning_rate, weight_decay, batch_size, val_size, num_model, max_epoch, early_stopping_patience, n_jobs, device, random_state, disable_pbar)

Bases: BaseEstimator

Base class for MLP.

fit(X, y)
class carte_ai.src.baseline_singletable_nn.MLPClassifier(*, loss: str = 'binary_crossentropy', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: ClassifierMixin, BaseMLPEstimator

decision_function(X)
predict(X)
predict_proba(X)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLPClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_singletable_nn.MLPRegressor(*, loss: str = 'squared_error', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: RegressorMixin, BaseMLPEstimator

predict(X)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLPRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_singletable_nn.MLP_Model(input_dim: int, hidden_dim: int, output_dim: int, dropout_prob: float, num_layers: int)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.baseline_singletable_nn.RESNETClassifier(*, loss: str = 'binary_crossentropy', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: ClassifierMixin, BaseRESNETEstimator

decision_function(X)
predict(X)
predict_proba(X)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RESNETClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_singletable_nn.RESNETRegressor(*, loss: str = 'squared_error', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: RegressorMixin, BaseRESNETEstimator

predict(X)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RESNETRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class carte_ai.src.baseline_singletable_nn.RESNET_Model(input_dim: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.baseline_singletable_nn.Residual_Block(input_dim: int, output_dim: int, hidden_factor: int, normalization: str | None = 'layernorm', hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2)

Bases: Module

forward(x: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters() None
class carte_ai.src.baseline_singletable_nn.TabularDataset(X, y)

Bases: Dataset

carte_ai.src.carte_gridsearch module

Custom grid search used for CARTE-GNN model

carte_ai.src.carte_gridsearch.carte_gridsearch(estimator, X_train: list, y_train: array, param_distributions: dict, refit: bool = True, n_jobs: int = 1)

CARTE grid search.

This function runs grid search for CARTE GNN models.

Parameters

estimatorCARTE estimator

The CARTE estimator used for grid search

X_trainlist

The list of graph objects for the train data transformed using Table2GraphTransformer

y_trainnumpy array of shape (n_samples,)

The target variable of the train data.

param_distributions: dict

The dictionary of parameter grids to search for the optimial parameter.

refit: bool, default=True

Indicates whether to return a refitted estimator with the best parameter.

n_jobs: int, default=1

Number of jobs to run in parallel. Training the estimator in the grid search is parallelized over the parameter grid.

Returns

ResultPandas DataFrame

The result of each parameter grid.

best_paramsdict

The dictionary of best parameters obtained through grid search.

best_estimatorCARTEGNN estimator

The CARTE estimator trained using the best_params if refit is set to True.

carte_ai.src.carte_model module

class carte_ai.src.carte_model.CARTE_Attention(input_dim: int, output_dim: int, num_heads: int = 1, concat: bool = True, read_out: bool = False)

Bases: Module

forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor, return_attention: bool = False)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()
class carte_ai.src.carte_model.CARTE_Base(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)

Bases: Module

forward(x, edge_index, edge_attr, return_attention=False)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Block(input_dim: int, ff_dim: int, num_heads: int = 1, concat: bool = True, dropout: float = 0.1, read_out: bool = False)

Bases: Module

forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Contrast

Bases: Module

forward(x: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_NN_Model(input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_NN_Model_Ablation(ablation_method: str, input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Pretrain(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

transform(X, y=None)

Apply Table2GraphTransformer to each row of the data.

Parameters

Xpandas.DataFrame

Input data to transform.

yarray-like, optional

Target values, by default None.

Returns

data_graphlist

List of transformed graph objects.

carte_ai.src.evaluate_utils module

carte_ai.src.evaluate_utils.check_pred_output(y_train, y_pred)

Set the output as the mean of train data if it is nan.

carte_ai.src.evaluate_utils.col_names_per_type(data, target_name)

Extract column names per type.

carte_ai.src.evaluate_utils.extract_best_params(data_name, method, num_train, random_state)

Extract the best parameters in the CARTE paper.

carte_ai.src.evaluate_utils.reshape_pred_output(y_pred)

Reshape the predictive output accordingly.

carte_ai.src.evaluate_utils.return_score(y_target, y_pred, task)

Return score results for given task.

carte_ai.src.evaluate_utils.set_score_criterion(task)

Set scoring method for CV and score criterion in final result.

carte_ai.src.evaluate_utils.set_split(data, data_config, num_train, random_state)

Set train/test split given the random state.

carte_ai.src.evaluate_utils.shorten_param(param_name)

Shorten the param_names for column names in search results.

carte_ai.src.preprocess_utils module

Functions used for preprocessing the data.

carte_ai.src.preprocess_utils.extract_fasttext_features(data: DataFrame, extract_col_name: str)
carte_ai.src.preprocess_utils.extract_ken_features(data: DataFrame, extract_col_name: str)
carte_ai.src.preprocess_utils.extract_llm_features(data: DataFrame, extract_col_name: str, device: str = 'cuda:0')
carte_ai.src.preprocess_utils.table2llmfeatures(data: DataFrame, embed_numeric: bool, device: str = 'cuda:0')

carte_ai.src.visualization_utils module

Functions that can be utilized for visualization. For Critical difference diagram, it modifies some of the codes from scikit-posthocs.

carte_ai.src.visualization_utils.critical_difference_diagram(ranks: dict | Series, sig_matrix: DataFrame, *, ax: SubplotBase | None = None, label_fmt_left: str = '{label} ({rank:.2g})', label_fmt_right: str = '({rank:.2g}) {label}', label_props: dict | None = None, marker_props: dict | None = None, elbow_props: dict | None = None, crossbar_props: dict | None = None, color_palette: Dict[str, str] | List = {}, line_style: Dict[str, str] | List = {}, text_h_margin: float = 0.01) Dict[str, list]
carte_ai.src.visualization_utils.generate_df_cdd(df_normalized, train_size='all')
carte_ai.src.visualization_utils.prepare_result(task, models='all', rank_at=2048)
carte_ai.src.visualization_utils.sign_array(p_values: List | ndarray, alpha: float = 0.05) ndarray
carte_ai.src.visualization_utils.sign_plot(x: List | ndarray | DataFrame, g: List | ndarray | None = None, flat: bool = False, labels: bool = True, cmap: List | None = None, cbar_ax_bbox: List | None = None, ax: SubplotBase | None = None, **kwargs) SubplotBase | Tuple[SubplotBase, Colorbar]
carte_ai.src.visualization_utils.sign_table(p_values: List | ndarray | DataFrame, lower: bool = True, upper: bool = True) DataFrame | ndarray

Module contents