hazardous.metrics.accuracy_in_time#

hazardous.metrics.accuracy_in_time(y_test, y_pred, time_grid, quantiles=None)#

Accuracy in time for prognostic models using competing risks.

\[\mathrm{acc}(\zeta) = \frac{1}{n_{nc}} \sum_{i=1}^n ~ I\{\hat{y}_i=y_{i,\zeta}\} \overline{I\{\delta_i = 0 \cap t_i \leq \zeta \}}\]

where:

  • \(I\) is the indicator function.

  • \(\zeta\) is a fixed time horizon.

  • \(n_{nc}\) is the number of uncensored individuals at \(\zeta\).

  • \(\delta_i\) is the event experienced by the individual \(i\) at \(t_i\).

  • \(\hat{y} = \text{arg}\max\limits_{k \in [0, K]} \hat{F}_k(\zeta|X=x_i)\) where \(\hat{F}_0(\zeta|X=x_i) \triangleq \hat{S}(\zeta|X=x_i)\).

    \(\hat{y}\) is the most probable predicted event for individual \(i\) at \(\zeta\).

  • \(y_{i,\zeta} = \delta_i ~ I\{t_i \leq \zeta \}\) is the observed event for individual \(i\) at \(\zeta\).

The accuracy in time is a metric introduced in [Alberge2024] which evaluates whether observed events are predicted as the most likely at given times. This metric measures if the highest predicted event (one of the event of interest or the survival one) corresponds to the one observed at \(\zeta\) for each patient.

We remove individuals that were censored at times \(t \leq \zeta\), so the accuracy in time essentially represents the accuracy of the estimator on observed events up to \(\zeta\).

At the start, every model’s accuracy will be high because it will predict that patients have survived, which will be true in most cases. However, the true measure of the model’s discriminative power emerges at later time points, when it must determine which specific event will occur for a given patient.

The C-index depends on other individual in the cohort, while the accuracy-in-time for an individual does not. Conceptually, the C-index can help clinicians to priorize treatment allocation by ranking individuals by risk of a given event of interest. The accuracy in time, however, answers a different question: “what is the most likely event that this individual will experience at some fixed time horizon?”. Therefore, the accuracy in time helps clinicians choose the right treatment by priorizing the risk for a given individual.

Parameters:
y_testarray, dict or dataframe of shape (n_samples, 2)

The test target, consisting in the ‘event’ and ‘duration’ columns

y_predarray of shape (n_samples, n_events, n_time_grid)

Cumulative incidence for all competing events, at the time points from the input time_grid.

time_gridarray of shape (n_time_grid,)

Time points used to predict the cumulative incidence.

quantilesarray_like of shape (n_taus,), default=None

The fixed time horizons to compute the accuracy in time, defined as quantiles of time_grid. Taus values are deduplicated. When no quantiles are passed, taus is the time_grid.

Returns:
acc_in_timearray of shape (n_taus)

The accuracy in time computed at the fixed horizons taus.

tausarray of shape (n_taus)

The fixed time horizons effectively used to compute the accuracy in time.

References