carte_ai.src package
Submodules
carte_ai.src.carte_table_to_graph module
- class carte_ai.src.carte_table_to_graph.Table2GraphTransformer(*, include_edge_attr: bool = True, lm_model: str = 'fasttext', n_components: int = 300, n_jobs: int = 1, fasttext_model_path: str | None = None)
Bases:
TransformerMixin
,BaseEstimator
Transformer from tables to a list of graphs.
Parameters
- include_edge_attrbool, optional
Whether to include edge attributes, by default True.
- lm_modelstr, optional
Language model to use, by default “fasttext”.
- n_componentsint, optional
Number of components for the encoder, by default 300.
- n_jobsint, optional
Number of jobs for parallel processing, by default 1.
- fasttext_model_pathstr, optional
Path to the FastText model file, required if lm_model is ‘fasttext’.
- fit(X, y=None)
Fit function used for the Table2GraphTransformer.
Parameters
- Xpandas.DataFrame
Input data to fit.
- yarray-like, optional
Target values, by default None.
Returns
- selfTable2GraphTransformer
Fitted transformer.
Example Usage:
import fasttext
from huggingface_hub import hf_hub_download
# Download the FastText model from HuggingFace Hub
model_path = hf_hub_download("hi-paris/fastText", "cc.en.300.bin")
# Initialize the Table2GraphTransformer
preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)
# View the transformer details
help(Table2GraphTransformer)
# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)
# Transform the test data
X_test = preprocessor.transform(X_test)
carte_ai.src.carte_estimator module
CARTE estimators for regression and classification.
- class carte_ai.src.carte_estimator.BaseCARTEEstimator(*, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, cross_validate, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
BaseEstimator
Base class for CARTE Estimator.
- fit(X, y)
Fit the CARTE model.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
- yarray-like of shape (n_samples,)
Target values.
Returns
- selfobject
Fitted estimator.
Example Usage:
# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]
# Define the estimator and run fit/predict
estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)
# Obtain the r2 score on predictions
score = r2_score(y_test, y_pred)
print("\nThe R2 score for CARTE:", "{:.4f}".format(score))
- class carte_ai.src.carte_estimator.BaseCARTEMultitableEstimator(*, source_data, num_layers, load_pretrain, freeze_pretrain, learning_rate, batch_size, max_epoch, dropout, val_size, target_fraction, early_stopping_patience, num_model, random_state, n_jobs, device, disable_pbar, pretrained_model_path)
Bases:
BaseCARTEEstimator
Base class for CARTE Multitable Estimator.
- class carte_ai.src.carte_estimator.CARTEClassifier(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
ClassifierMixin
,BaseCARTEEstimator
CARTE Classifier for Classification tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model.
Parameters
- loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’
The loss function used for backpropagation.
- scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’
The scoring function used for validation.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- cross_validatebool, default=False
Indicates whether to use cross-validation strategy for train/validation split
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- decision_function(X)
Compute the decision function of X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
decision : ndarray, shape (n_samples,)
- predict(X)
Predict classes for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted classes.
- predict_proba(X)
Predict class probabilities for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)
The class probabilities of the input samples.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.CARTEMultitableClassifer(*, loss: str = 'binary_crossentropy', scoring: str = 'auroc', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
ClassifierMixin
,BaseCARTEMultitableEstimator
CARTE Multitable Classifier for Classification tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model.
Parameters
- loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’
The loss function used for backpropagation.
- scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’
The scoring function used for validation.
- source_datedict, default={}
The source data used in multitable estimator.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- target_fractionfloat, default=0.125
The fraction of target data inside of a batch when training
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- decision_function(X)
Compute the decision function of
X
.Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
decision : ndarray, shape (n_samples,)
- predict(X)
Predict classes for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted classes.
- predict_proba(X)
Predict class probabilities for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)
The class probabilities of the input samples.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEMultitableClassifer
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.CARTEMultitableRegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', source_data: dict = {}, num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, target_fraction: float = 0.125, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
RegressorMixin
,BaseCARTEMultitableEstimator
CARTE Multitable Regressor for Regression tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model.
Parameters
- loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’
The loss function used for backpropagation.
- scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’
The scoring function used for validation.
- source_datedict, default={}
The source data used in multitable estimator.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- target_fractionfloat, default=0.125
The fraction of target data inside of a batch when training
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- predict(X)
Predict values for X.
Returns the weighted average of the singletable model and all pairwise model with 1-source.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted values.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTEMultitableRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.CARTERegressor(*, loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
RegressorMixin
,BaseCARTEEstimator
CARTE Regressor for Regression tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model.
Parameters
- loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’
The loss function used for backpropagation.
- scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’
The scoring function used for validation.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- cross_validatebool, default=False
Indicates whether to use cross-validation strategy for train/validation split
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- predict(X)
Predict values for X. Returns the average of predicted values over all the models.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted values.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTERegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.CARTE_AblationClassifier(*, ablation_method: str = 'exclude-edge', loss: str = 'binary_crossentropy', scoring: str = 'auroc', num_layers: int = 1, load_pretrain: bool = False, freeze_pretrain: bool = False, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
CARTEClassifier
CARTE Ablation Classifier for Classification tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE
Parameters
- ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’
The ablation method for CARTE Estimators.
- loss{‘binary_crossentropy’, ‘categorical_crossentropy’}, default=’binary_crossentropy’
The loss function used for backpropagation.
- scoring{‘auroc’, ‘auprc’, ‘binary_entropy’}, default=’auroc’
The scoring function used for validation.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- cross_validatebool, default=False
Indicates whether to use cross-validation strategy for train/validation split
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTE_AblationClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.CARTE_AblationRegressor(*, ablation_method: str = 'exclude-edge', loss: str = 'squared_error', scoring: str = 'r2_score', num_layers: int = 1, load_pretrain: bool = True, freeze_pretrain: bool = True, learning_rate: float = 0.001, batch_size: int = 16, max_epoch: int = 500, dropout: float = 0, val_size: float = 0.2, cross_validate: bool = False, early_stopping_patience: None | int = 40, num_model: int = 1, random_state: int = 0, n_jobs: int = 1, device: str = 'cpu', disable_pbar: bool = True, pretrained_model_path='/home/infres/gbrison/fcclip_v2/miniconda3/envs/carte/lib/python3.10/site-packages/carte_ai/data/etc/kg_pretrained.pt')
Bases:
CARTERegressor
CARTE Ablation Regressor for Regression tasks.
This estimator is GNN-based model compatible with the CARTE pretrained model. Note that this is an implementation for the ablation study of CARTE
Parameters
- ablation_method{‘exclude-edge’, ‘exclude-attention’, ‘exclude-attention-edge’}, default=’exclude-edge’
The ablation method for CARTE Estimators.
- loss{‘squared_error’, ‘absolute_error’}, default=’squared_error’
The loss function used for backpropagation.
- scoring{‘r2_score’, ‘squared_error’}, default=’r2_score’
The scoring function used for validation.
- num_layersint, default=1
The number of layers for the NN model
- load_pretrainbool, default=True
Indicates whether to load pretrained weights or not
- freeze_pretrainbool, default=True
Indicates whether to freeze the pretrained weights in the training or not
- learning_ratefloat, default=1e-3
The learning rate of the model. The model uses AdamW as the optimizer
- batch_sizeint, default=16
The batch size used for training
- max_epochint or None, default=500
The maximum number of epoch for training
- dropoutfloat, default=0
The dropout rate for training
- val_sizefloat, default=0.1
The size of the validation set used for early stopping
- cross_validatebool, default=False
Indicates whether to use cross-validation strategy for train/validation split
- early_stopping_patienceint or None, default=40
The early stopping patience when early stopping is used. If set to None, no early stopping is employed
- num_modelint, default=1
The total number of models used for Bagging strategy
- random_stateint or None, default=0
Pseudo-random number generator to control the train/validation data split if early stoppingis enabled, the weight initialization, and the dropout. Pass an int for reproducible output across multiple function calls.
- n_jobsint, default=1
Number of jobs to run in parallel. Training the estimator the score are parallelized over the number of models.
- device{“cpu”, “gpu”}, default=”cpu”,
The device used for the estimator.
- disable_pbarbool, default=True
Indicates whether to show progress bars for the training process.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CARTE_AblationRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.carte_estimator.IdxIterator(n_batch: int, domain_indicator: Tensor, target_fraction: float)
Bases:
object
Class for iterating indices to set up the batch for CARTE Multitables
- sample()
- set_num_samples()
carte_ai.src.baseline_multitable module
Baselines for multitable problem.
- class carte_ai.src.baseline_multitable.CatBoostMultitableClassifier(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingClassifierBase
Base class for CatBoost Multitable Classifier.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CatBoostMultitableClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.CatBoostMultitableRegressor(*, source_data: dict = {}, max_depth: int = 6, learning_rate: float = 0.03, bagging_temperature: float = 1, l2_leaf_reg: float = 3.0, one_hot_max_size: int = 2, iterations: int = 1000, thread_count: int = 1, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingRegressorBase
Base class for CatBoost Multitable Regressor.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CatBoostMultitableRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.GradientBoostingClassifierBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)
Bases:
ClassifierMixin
,GradientBoostingMultitableBase
Base class for Gradient Boosting Multitable Classifier.
- decision_function(X)
Compute the decision function of X.
- predict(X)
Predict classes for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted classes.
- predict_proba(X)
Predict class probabilities for X.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- pndarray, shape (n_samples,) for binary classification or (n_samples, n_classes)
The class probabilities of the input samples.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GradientBoostingClassifierBase
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.GradientBoostingMultitableBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)
Bases:
BaseEstimator
Base class for Gradient Boosting Multitable Estimator.
- class carte_ai.src.baseline_multitable.GradientBoostingRegressorBase(*, source_data, source_fraction, num_model, val_size, random_state, n_jobs)
Bases:
RegressorMixin
,GradientBoostingMultitableBase
Base class for Gradient Boosting Multitable Regressor.
- predict(X)
Predict values for X. Returns the average of predicted values over all the models.
Parameters
- Xlist of graph objects with size (n_samples)
The input samples.
Returns
- yndarray, shape (n_samples,)
The predicted values.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GradientBoostingRegressorBase
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.HistGBMultitableClassifier(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingClassifierBase
Base class for Historgram Gradient Boosting Multitable Classifier.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') HistGBMultitableClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.HistGBMultitableRegressor(*, source_data: dict = {}, learning_rate: float = 0.1, max_depth: None | int = None, max_leaf_nodes: int = 31, min_samples_leaf: int = 20, l2_regularization: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingRegressorBase
Base class for Historgram Gradient Boosting Multitable Regressor.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') HistGBMultitableRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.XGBoostMultitableClassifier(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingClassifierBase
Base class for XGBoost Multitable Classifier.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') XGBoostMultitableClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_multitable.XGBoostMultitableRegressor(*, source_data: dict = {}, n_estimators: int = 100, max_depth: int = 6, min_child_weight: float = 1, subsample: float = 1, learning_rate: float = 0.3, colsample_bylevel: float = 1, colsample_bytree: float = 1, reg_gamma: float = 0, reg_lambda: float = 1, reg_alpha: float = 0, source_fraction: float = 0.5, num_model: int = 1, val_size: float = 0.1, random_state: int = 0, n_jobs: int = 1)
Bases:
GradientBoostingRegressorBase
Base class for XGBoost Multitable Regressor.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') XGBoostMultitableRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
carte_ai.src.baseline_singletable_nn module
Neural network baseline for comparison.
- class carte_ai.src.baseline_singletable_nn.BaseMLPEstimator(*, hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
MLPBase
Base class for MLP Estimator.
- class carte_ai.src.baseline_singletable_nn.BaseRESNETEstimator(*, normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
MLPBase
Base class for RESNET Estimator.
- class carte_ai.src.baseline_singletable_nn.MLPBase(*, hidden_dim, learning_rate, weight_decay, batch_size, val_size, num_model, max_epoch, early_stopping_patience, n_jobs, device, random_state, disable_pbar)
Bases:
BaseEstimator
Base class for MLP.
- fit(X, y)
- class carte_ai.src.baseline_singletable_nn.MLPClassifier(*, loss: str = 'binary_crossentropy', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
ClassifierMixin
,BaseMLPEstimator
- decision_function(X)
- predict(X)
- predict_proba(X)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLPClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_singletable_nn.MLPRegressor(*, loss: str = 'squared_error', hidden_dim: int = 256, num_layers: int = 2, dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
RegressorMixin
,BaseMLPEstimator
- predict(X)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLPRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_singletable_nn.MLP_Model(input_dim: int, hidden_dim: int, output_dim: int, dropout_prob: float, num_layers: int)
Bases:
Module
- forward(X)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.baseline_singletable_nn.RESNETClassifier(*, loss: str = 'binary_crossentropy', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
ClassifierMixin
,BaseRESNETEstimator
- decision_function(X)
- predict(X)
- predict_proba(X)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RESNETClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_singletable_nn.RESNETRegressor(*, loss: str = 'squared_error', normalization: str | None = 'layernorm', num_layers: int = 4, hidden_dim: int = 256, hidden_factor: int = 2, hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2, learning_rate: float = 0.001, weight_decay: float = 0.01, batch_size: int = 128, val_size: float = 0.1, num_model: int = 1, max_epoch: int = 200, early_stopping_patience: None | int = 10, n_jobs: int = 1, device: str = 'cpu', random_state: int = 0, disable_pbar: bool = True)
Bases:
RegressorMixin
,BaseRESNETEstimator
- predict(X)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RESNETRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns
- selfobject
The updated object.
- class carte_ai.src.baseline_singletable_nn.RESNET_Model(input_dim: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)
Bases:
Module
- forward(X)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.baseline_singletable_nn.Residual_Block(input_dim: int, output_dim: int, hidden_factor: int, normalization: str | None = 'layernorm', hidden_dropout_prob: float = 0.2, residual_dropout_prob: float = 0.2)
Bases:
Module
- forward(x: Tensor)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reset_parameters() None
- class carte_ai.src.baseline_singletable_nn.TabularDataset(X, y)
Bases:
Dataset
carte_ai.src.carte_gridsearch module
Custom grid search used for CARTE-GNN model
- carte_ai.src.carte_gridsearch.carte_gridsearch(estimator, X_train: list, y_train: array, param_distributions: dict, refit: bool = True, n_jobs: int = 1)
CARTE grid search.
This function runs grid search for CARTE GNN models.
Parameters
- estimatorCARTE estimator
The CARTE estimator used for grid search
- X_trainlist
The list of graph objects for the train data transformed using Table2GraphTransformer
- y_trainnumpy array of shape (n_samples,)
The target variable of the train data.
- param_distributions: dict
The dictionary of parameter grids to search for the optimial parameter.
- refit: bool, default=True
Indicates whether to return a refitted estimator with the best parameter.
- n_jobs: int, default=1
Number of jobs to run in parallel. Training the estimator in the grid search is parallelized over the parameter grid.
Returns
- ResultPandas DataFrame
The result of each parameter grid.
- best_paramsdict
The dictionary of best parameters obtained through grid search.
- best_estimatorCARTEGNN estimator
The CARTE estimator trained using the best_params if refit is set to True.
carte_ai.src.carte_model module
- class carte_ai.src.carte_model.CARTE_Attention(input_dim: int, output_dim: int, num_heads: int = 1, concat: bool = True, read_out: bool = False)
Bases:
Module
- forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor, return_attention: bool = False)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reset_parameters()
- class carte_ai.src.carte_model.CARTE_Base(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)
Bases:
Module
- forward(x, edge_index, edge_attr, return_attention=False)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.carte_model.CARTE_Block(input_dim: int, ff_dim: int, num_heads: int = 1, concat: bool = True, dropout: float = 0.1, read_out: bool = False)
Bases:
Module
- forward(x: Tensor, edge_index: Tensor, edge_attr: Tensor)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.carte_model.CARTE_Contrast
Bases:
Module
- forward(x: Tensor)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.carte_model.CARTE_NN_Model(input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)
Bases:
Module
- forward(input)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.carte_model.CARTE_NN_Model_Ablation(ablation_method: str, input_dim_x: int, input_dim_e: int, hidden_dim: int, output_dim: int, num_layers: int, **block_args)
Bases:
Module
- forward(input)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class carte_ai.src.carte_model.CARTE_Pretrain(input_dim_x: int, input_dim_e: int, hidden_dim: int, num_layers: int, **block_args)
Bases:
Module
- forward(input)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
carte_ai.src.evaluate_utils module
- carte_ai.src.evaluate_utils.check_pred_output(y_train, y_pred)
Set the output as the mean of train data if it is nan.
- carte_ai.src.evaluate_utils.col_names_per_type(data, target_name)
Extract column names per type.
- carte_ai.src.evaluate_utils.extract_best_params(data_name, method, num_train, random_state)
Extract the best parameters in the CARTE paper.
- carte_ai.src.evaluate_utils.reshape_pred_output(y_pred)
Reshape the predictive output accordingly.
- carte_ai.src.evaluate_utils.return_score(y_target, y_pred, task)
Return score results for given task.
- carte_ai.src.evaluate_utils.set_score_criterion(task)
Set scoring method for CV and score criterion in final result.
- carte_ai.src.evaluate_utils.set_split(data, data_config, num_train, random_state)
Set train/test split given the random state.
- carte_ai.src.evaluate_utils.shorten_param(param_name)
Shorten the param_names for column names in search results.
carte_ai.src.preprocess_utils module
Functions used for preprocessing the data.
- carte_ai.src.preprocess_utils.extract_fasttext_features(data: DataFrame, extract_col_name: str)
- carte_ai.src.preprocess_utils.extract_ken_features(data: DataFrame, extract_col_name: str)
- carte_ai.src.preprocess_utils.extract_llm_features(data: DataFrame, extract_col_name: str, device: str = 'cuda:0')
- carte_ai.src.preprocess_utils.table2llmfeatures(data: DataFrame, embed_numeric: bool, device: str = 'cuda:0')
carte_ai.src.visualization_utils module
Functions that can be utilized for visualization. For Critical difference diagram, it modifies some of the codes from scikit-posthocs.
- carte_ai.src.visualization_utils.critical_difference_diagram(ranks: dict | Series, sig_matrix: DataFrame, *, ax: SubplotBase | None = None, label_fmt_left: str = '{label} ({rank:.2g})', label_fmt_right: str = '({rank:.2g}) {label}', label_props: dict | None = None, marker_props: dict | None = None, elbow_props: dict | None = None, crossbar_props: dict | None = None, color_palette: Dict[str, str] | List = {}, line_style: Dict[str, str] | List = {}, text_h_margin: float = 0.01) Dict[str, list]
- carte_ai.src.visualization_utils.generate_df_cdd(df_normalized, train_size='all')
- carte_ai.src.visualization_utils.prepare_result(task, models='all', rank_at=2048)
- carte_ai.src.visualization_utils.sign_array(p_values: List | ndarray, alpha: float = 0.05) ndarray
- carte_ai.src.visualization_utils.sign_plot(x: List | ndarray | DataFrame, g: List | ndarray | None = None, flat: bool = False, labels: bool = True, cmap: List | None = None, cbar_ax_bbox: List | None = None, ax: SubplotBase | None = None, **kwargs) SubplotBase | Tuple[SubplotBase, Colorbar]
- carte_ai.src.visualization_utils.sign_table(p_values: List | ndarray | DataFrame, lower: bool = True, upper: bool = True) DataFrame | ndarray