Usage#

Pre-requisites#

Data and compute environment#

This code is focused on the computing and data environment from Paris Greater Hospitals. The data format should be OMOP.

Install the package#

Install poetry and python 3.10. Inside the project folder, run the following command:

poetry install

Create the study population#

Three populations are available, corresponding to three predictive tasks: length of stay interpolation (LOS), prognosis of the next diagnosis billing codes (grouped into 21 ICD10 chapters), prognosis of major adverse cardio-vascular events (MACE).

  • The scripts for building each population are in medem.populations.To build the length-of-stay population, run:

poetry run python medem.populations.t1_los_population.py
  • The configuration for each populations are in medem.exeperiences.configurations.py. Most importantly, the user specifies the database name. At loading time, the code will look for a database inside the hive database of the APHP at: hdfs://bbsedsi/apps/hive/warehouse/bigdata/omop_exports_prod/hive/{database_name}.db/. All parameters that can be specified are:

 "database_name": "cse210038_20220921_160214312112",
    "cohort_name": "complete_hospitalization_los", # name 
    "study_start": parse("2017-01-01"), # start of the study period
    "study_end": parse("2022-06-01"), # end of the study period. Outside of this range, all data is thrown away 
    "min_age_at_admission": 18, # minimum age at admission
    "sup_quantile_visits": 0.95, # exclude patients having a number of visits above the resulting threshold number of visits per patient
    "task_name": TASK_LOS_CATEGORICAL, # define the prognosis task 
    "los_categories": np.array([0, 7, np.inf]), # define the categories for the task,
    "with_incomplete_hospitalization": True, # include also outpatient visits ? 
    "visits_w_billing_codes_only": True, # keep only visits with billing codes ?
    "horizon_in_days": 30,  # used for avoiding right-censoring
    "event_tables": DEFAULT_EVENT_CONFIG, # what event tables to use as features
    "test_size": 0.2,
    "n_min_events": 10, # minimum number of events for a given medical code be included. Too rare events are thrown away.
  • The underlying functions used for all population are medem.preprocessing.selection.py:select_population and medem.preprocessing.selection.py:create_outcome. Refer to these functions to see every details on the population flowcharts and task definitions.

Run an experiment#

  • The scripts to benchmark different machine learning pipelines are in medem.experiences.setups. For example to launch the benchmark on the :

poetry run python medem.experiments.setups/los_prediction.py
  • The configurations for the experiments are in medem.experiments.configurations.py. The available parameters are:

CONFIG_LOS_ESTIMATION = {
    "validation_size": 0.1,  # validation size
    "subtrain_size": [0.1, 0.5, 1] # size of the succesives effective train sets 
    "splitting_rs": list(range(5)),  # random seeds 
    "estimator_config": ESTIMATORS_TASK_LOS, # list of estimators to benchmark
    "featurizer_config": FEATURIZERS_TASK_LOS, # list of featurizers to benchmark
    "randomsearch_scoring": "roc_auc", # scoring function for the random search
    "randomsearch_n_iter": 10, # number of random search iterations
    "randomsearch_rs": 0, # random seed for the random search
    "n_min_events": 10, # minimum number of events for a given medical code be included. Too rare events are thrown away.
    "colname_demographics": [
        STATIC_FEATURES_DICT,
    ], # list of static features to include
    "local_embeddings_params": {
        "colname_concept": COLNAME_SOURCE_CODE,
        "window_radius_in_days": 30,
        "window_orientation": "center",
        "backend": "pandas",
        "d": N_COMPONENTS,
    }, # parameters passed to the local embeddings pipeline if present in featurizer config
}
  • To use the slurm cluster on the AP-HP data warehouse, use the dedicated sbatch scripts. You might need to change some path in these scripts that are specific to a given AP-HP user and project:

cd scripts/experiences/
mkdir logs
sbatch los_sbatch.sh