carte_ai.src package

Submodules

carte_ai.src.carte_table_to_graph module

class carte_ai.src.carte_table_to_graph.Table2GraphTransformer(*, include_edge_attr: bool = True, lm_model: str = 'fasttext', n_components: int = 300, n_jobs: int = 1, fasttext_model_path: str | None = None)

Bases: TransformerMixin, BaseEstimator

Transformer from tables to a list of graphs.

Parameters

include_edge_attrbool, optional: Whether to include edge attributes, by default True.
lm_modelstr, optional: Language model to use, by default “fasttext”.
n_componentsint, optional: Number of components for the encoder, by default 300.
n_jobsint, optional: Number of jobs for parallel processing, by default 1.
fasttext_model_pathstr, optional: Path to the FastText model file, required if lm_model is ‘fasttext’.

fit(X, y=None)

Fit function used for the Table2GraphTransformer.

Parameters

Xpandas.DataFrame: Input data to fit.
yarray-like, optional: Target values, by default None.

Returns

selfTable2GraphTransformer

Fitted transformer.

Example Usage:

    
      import fasttext
      
      from huggingface_hub import hf_hub_download
      
      # Download the FastText model from HuggingFace Hub
      
      model_path = hf_hub_download("hi-paris/fastText", "cc.en.300.bin")
      
      # Initialize the Table2GraphTransformer
      
      preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)
      
      # View the transformer details
      
      help(Table2GraphTransformer)
      
      # Fit and transform the training data
      
      X_train = preprocessor.fit_transform(X_train, y=y_train)
      
      # Transform the test data
      
      X_test = preprocessor.transform(X_test)

carte_ai.src.carte_estimator module

CARTE estimators for regression and classification.

class carte_ai.src.carte_estimator.BaseCARTEEstimator(*, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, cross_validate, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: BaseEstimator

Base class for CARTE Estimator.

fit(X, y)

Fit the CARTE model.

Parameters

Xlist of graph objects with size (n_samples): The input samples.
yarray-like of shape (n_samples,): Target values.

Returns

selfobject: Fitted estimator.

Example Usage:

  
    # Define some parameters
    
    fixed_params = dict() 
    
    fixed_params["num_model"] = 10 # 10 models for the bagging strategy
    
    fixed_params["disable_pbar"] = False # True if you want cleanness
    
    fixed_params["random_state"] = 0
    
    fixed_params["device"] = "cpu"
    
    fixed_params["n_jobs"] = 10
    
    fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]
    
    # Define the estimator and run fit/predict
    
    estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
    
    estimator.fit(X=X_train, y=y_train)
    
    y_pred = estimator.predict(X_test)
    
    # Obtain the r2 score on predictions
    
    score = r2_score(y_test, y_pred)
    
    print("\nThe R2 score for CARTE:", "{:.4f}".format(score))

class carte_ai.src.carte_estimator.BaseCARTEMultitableEstimator(*, source_data, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, target_fraction, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path)

Bases: BaseCARTEEstimator

Base class for CARTE Multitable Estimator.

fit(X, y)

Fit the CARTE Multitable model.

Parameters

Xlist of graph objects with size (n_samples): The input samples of the target data.
yarray-like of shape (n_samples,): Target values.

Returns

selfobject: Fitted estimator.

class carte_ai.src.carte_estimator.CARTEClassifier(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: ClassifierMixin, BaseCARTEEstimator

CARTE Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’: The loss function used for backpropagation.
scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’: The scoring function used for validation.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
cross_validatebool, default=False: Indicates whether to use cross-validation strategy for train/validation split
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

decision_function(X)

Compute the decision function of X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

decision : ndarray, shape (n_samples,)

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes): The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTEClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.CARTEMultitableClassifer(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: ClassifierMixin, BaseCARTEMultitableEstimator

CARTE Multitable Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’: The loss function used for backpropagation.
scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’: The scoring function used for validation.
source_datedict, default={}: The source data used in multitable estimator.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
target_fractionfloat, default=0.125: The fraction of target data inside of a batch when training
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

decision_function(X)

Compute the decision function of X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

decision : ndarray, shape (n_samples,)

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes): The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTEMultitableClassifer

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.CARTEMultitableRegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: RegressorMixin, BaseCARTEMultitableEstimator

CARTE Multitable Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’: The loss function used for backpropagation.
scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’: The scoring function used for validation.
source_datedict, default={}: The source data used in multitable estimator.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
target_fractionfloat, default=0.125: The fraction of target data inside of a batch when training
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

predict(X)

Predict values for X.

Returns the weighted average of the singletable model and all pairwise model with 1-source.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTEMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.CARTERegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: RegressorMixin, BaseCARTEEstimator

CARTE Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model.

Parameters

loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’: The loss function used for backpropagation.
scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’: The scoring function used for validation.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
cross_validatebool, default=False: Indicates whether to use cross-validation strategy for train/validation split
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

predict(X)

Predict values for X. Returns the average of predicted values over all the models.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTERegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.CARTE_AblationClassifier(*, ablation_method: str = 'exclude-edge', loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = False, freeze_pretrain: bool = False, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: CARTEClassifier

CARTE Ablation Classifier for Classification tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE

Parameters

ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’: The ablation method for CARTE Estimators.
loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’: The loss function used for backpropagation.
scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’: The scoring function used for validation.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
cross_validatebool, default=False: Indicates whether to use cross-validation strategy for train/validation split
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTE_AblationClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.CARTE_AblationRegressor(*, ablation_method: str = 'exclude-edge', loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')

Bases: CARTERegressor

CARTE Ablation Regressor for Regression tasks.

This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE

Parameters

ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’: The ablation method for CARTE Estimators.
loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’: The loss function used for backpropagation.
scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’: The scoring function used for validation.
num_layersint, default=1: The number of layers for the NN model
load_pretrainbool, default=True: Indicates whether to load pretrained weights or not
freeze_pretrainbool, default=True: Indicates whether to freeze the pretrained weights in the training or not
learning_ratefloat, default=1e-3: The learning rate of the model. The model uses AdamW as the optimizer
batch_sizeint, default=16: The batch size used for training
max_epochint or None, default=500: The maximum number of epoch for training
dropoutfloat, default=0: The dropout rate for training
val_sizefloat, default=0.1: The size of the validation set used for early stopping
cross_validatebool, default=False: Indicates whether to use cross-validation strategy for train/validation split
early_stopping_patienceint or None, default=40: The early stopping patience when early stopping is used. If set to None, no early stopping is employed
num_modelint, default=1: The total number of models used for Bagging strategy
random_stateint or None, default=0: Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
n_jobsint, default=1: Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
device{“cpu”, “gpu”}, default=”cpu”,: The device used for the estimator.
disable_pbarbool, default=True: Indicates whether to show progress bars for the training process.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CARTE_AblationRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.carte_estimator.IdxIterator(n_batch: int, domain_indicator: Tensor, target_fraction: float)

Bases: object

Class for iterating indices to set up the batch for CARTE Multitables

sample()

set_num_samples()

carte_ai.src.baseline_multitable module

Baselines for multitable problem.

class carte_ai.src.baseline_multitable.CatBoostMultitableClassifier(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for CatBoost Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CatBoostMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.CatBoostMultitableRegressor(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for CatBoost Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CatBoostMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.GradientBoostingClassifierBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: ClassifierMixin, GradientBoostingMultitableBase

Base class for Gradient Boosting Multitable Classifier.

decision_function(X): Compute the decision function of X.

predict(X)

Predict classes for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted classes.

predict_proba(X)

Predict class probabilities for X.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes): The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GradientBoostingClassifierBase

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.GradientBoostingMultitableBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: BaseEstimator

Base class for Gradient Boosting Multitable Estimator.

fit(X, y)

Fit the model.

Parameters

XPandas dataframe of the target dataset (n_samples): The input samples.
yarray-like of shape (n_samples,): Target values.

Returns

selfobject: Fitted estimator.

class carte_ai.src.baseline_multitable.GradientBoostingRegressorBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)

Bases: RegressorMixin, GradientBoostingMultitableBase

Base class for Gradient Boosting Multitable Regressor.

predict(X)

Predict values for X. Returns the average of predicted values over all the models.

Parameters

Xlist of graph objects with size (n_samples): The input samples.

Returns

yndarray, shape (n_samples,): The predicted values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GradientBoostingRegressorBase

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.HistGBMultitableClassifier(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for Historgram Gradient Boosting Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → HistGBMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.HistGBMultitableRegressor(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for Historgram Gradient Boosting Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → HistGBMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.XGBoostMultitableClassifier(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingClassifierBase

Base class for XGBoost Multitable Classifier.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → XGBoostMultitableClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_multitable.XGBoostMultitableRegressor(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)

Bases: GradientBoostingRegressorBase

Base class for XGBoost Multitable Regressor.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → XGBoostMultitableRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

carte_ai.src.baseline_singletable_nn module

Neural network baseline for comparison.

class carte_ai.src.baseline_singletable_nn.BaseMLPEstimator(*, hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: MLPBase

Base class for MLP Estimator.

class carte_ai.src.baseline_singletable_nn.BaseRESNETEstimator(*, normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: MLPBase

Base class for RESNET Estimator.

class carte_ai.src.baseline_singletable_nn.MLPBase(*, hidden_dim, learning_rate, weight_decay, batch_size, val_size, num_model, max_epoch, early_stopping_patience, n_jobs, device, random_state, disable_pbar)

Bases: BaseEstimator

Base class for MLP.

fit(X, y)

class carte_ai.src.baseline_singletable_nn.MLPClassifier(*, loss: str = 'binary_crossentropy', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: ClassifierMixin, BaseMLPEstimator

decision_function(X)

predict(X)

predict_proba(X)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLPClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_singletable_nn.MLPRegressor(*, loss: str = 'squared_error', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: RegressorMixin, BaseMLPEstimator

predict(X)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLPRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_singletable_nn.MLP_Model(input_dim: int, hidden_dim: int, output_dim: int, dropout_prob: float, num_layers: int)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.baseline_singletable_nn.RESNETClassifier(*, loss: str = 'binary_crossentropy', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: ClassifierMixin, BaseRESNETEstimator

decision_function(X)

predict(X)

predict_proba(X)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RESNETClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_singletable_nn.RESNETRegressor(*, loss: str = 'squared_error', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)

Bases: RegressorMixin, BaseRESNETEstimator

predict(X)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RESNETRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

class carte_ai.src.baseline_singletable_nn.RESNET_Model(input_dim: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.baseline_singletable_nn.Residual_Block(input_dim: int, output_dim: int, hidden_factor: int, normalization: str | None = 'layernorm', hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2)

Bases: Module

forward(x: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters() → None

class carte_ai.src.baseline_singletable_nn.TabularDataset(X, y): Bases: Dataset

carte_ai.src.carte_gridsearch module

Custom grid search used for CARTE-GNN model

carte_ai.src.carte_gridsearch.carte_gridsearch(estimator, X_train: list, y_train: array, param_distributions: dict, refit: bool = True, n_jobs: int = 1)

CARTE grid search.

This function runs grid search for CARTE GNN models.

Parameters

estimatorCARTE estimator: The CARTE estimator used for grid search
X_trainlist: The list of graph objects for the train data transformed using Table2GraphTransformer
y_trainnumpy array of shape (n_samples,): The target variable of the train data.
param_distributions: dict: The dictionary of parameter grids to search for the optimial parameter.
refit: bool, default=True: Indicates whether to return a refitted estimator with the best parameter.
n_jobs: int, default=1: Number of jobs to run in parallel. Training the estimator in the grid search is parallelized over the parameter grid.

Returns

ResultPandas DataFrame: The result of each parameter grid.
best_paramsdict: The dictionary of best parameters obtained through grid search.
best_estimatorCARTEGNN estimator: The CARTE estimator trained using the best_params if refit is set to True.

carte_ai.src.carte_model module

class carte_ai.src.carte_model.CARTE_Attention(input_dim: int, output_dim: int, num_heads: int = 1, concat: bool = True, read_out: bool = False)

Bases: Module

forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor, return_attention: bool = False)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()

class carte_ai.src.carte_model.CARTE_Base(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)

Bases: Module

forward(x, edge_index, edge_attr, return_attention=False)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Block(input_dim: int, ff_dim: int, num_heads: int = 1, concat: bool = True, dropout: float = 0.1, read_out: bool = False)

Bases: Module

forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Contrast

Bases: Module

forward(x: Tensor)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_NN_Model(input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_NN_Model_Ablation(ablation_method: str, input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class carte_ai.src.carte_model.CARTE_Pretrain(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)

Bases: Module

forward(input)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

transform(X, y=None)

Apply Table2GraphTransformer to each row of the data.

Parameters

Xpandas.DataFrame: Input data to transform.
yarray-like, optional: Target values, by default None.

Returns

data_graphlist: List of transformed graph objects.

carte_ai.src.evaluate_utils module

carte_ai.src.evaluate_utils.check_pred_output(y_train, y_pred): Set the output as the mean of train data if it is nan.

carte_ai.src.evaluate_utils.col_names_per_type(data, target_name): Extract column names per type.

carte_ai.src.evaluate_utils.extract_best_params(data_name, method, num_train, random_state): Extract the best parameters in the CARTE paper.

carte_ai.src.evaluate_utils.reshape_pred_output(y_pred): Reshape the predictive output accordingly.

carte_ai.src.evaluate_utils.return_score(y_target, y_pred, task): Return score results for given task.

carte_ai.src.evaluate_utils.set_score_criterion(task): Set scoring method for CV and score criterion in final result.

carte_ai.src.evaluate_utils.set_split(data, data_config, num_train, random_state): Set train/test split given the random state.

carte_ai.src.evaluate_utils.shorten_param(param_name): Shorten the param_names for column names in search results.

carte_ai.src.preprocess_utils module

Functions used for preprocessing the data.

carte_ai.src.preprocess_utils.extract_fasttext_features(data: DataFrame, extract_col_name: str)

carte_ai.src.preprocess_utils.extract_ken_features(data: DataFrame, extract_col_name: str)

carte_ai.src.preprocess_utils.extract_llm_features(data: DataFrame, extract_col_name: str, device: str = 'cuda:0')

carte_ai.src.preprocess_utils.table2llmfeatures(data: DataFrame, embed_numeric: bool, device: str = 'cuda:0')

carte_ai.src.visualization_utils module

Functions that can be utilized for visualization. For Critical difference diagram, it modifies some of the codes from scikit-posthocs.

carte_ai.src.visualization_utils.critical_difference_diagram(ranks: dict | Series, sig_matrix: DataFrame, *, ax: SubplotBase | None = None, label_fmt_left: str = '{label} ({rank:.2g})', label_fmt_right: str = '({rank:.2g}) {label}', label_props: dict | None = None, marker_props: dict | None = None, elbow_props: dict | None = None, crossbar_props: dict | None = None, color_palette: Dict[str, str] | List = {}, line_style: Dict[str, str] | List = {}, text_h_margin: float = 0.01) → Dict[str, list]

carte_ai.src.visualization_utils.generate_df_cdd(df_normalized, train_size='all')

carte_ai.src.visualization_utils.prepare_result(task, models='all', rank_at=2048)

carte_ai.src.visualization_utils.sign_array(p_values: List | ndarray, alpha: float = 0.05) → ndarray

carte_ai.src.visualization_utils.sign_plot(x: List | ndarray | DataFrame, g: List | ndarray | None = None, flat: bool = False, labels: bool = True, cmap: List | None = None, cbar_ax_bbox: List | None = None, ax: SubplotBase | None = None, **kwargs) → SubplotBase | Tuple[SubplotBase, Colorbar]

carte_ai.src.visualization_utils.sign_table(p_values: List | ndarray | DataFrame, lower: bool = True, upper: bool = True) → DataFrame | ndarray