`hazardous.metrics`.brier_score_survival#

hazardous.metrics.brier_score_survival(y_train, y_test, y_pred, times)#

Time-dependent Brier score of a survival function estimate.

\[\mathrm{BS}(t) = \frac{1}{n} \sum_{i=1}^n \mathbb{I} (y_i \leq t \land \delta_i = 1) \frac{(0 - \hat{S}(t | \mathbf{x}_i))^2}{\hat{G}(y_i)} + \mathbb{I}(y_i > t) \frac{(1 - \hat{S}(t | \mathbf{x}_i))^2}{\hat{G}(t)} ,\]

where \(\hat{S}(t | \mathbf{x})\) is the predicted probability of surviving up to time point \(t\) for a feature vector \(\mathbf{x}\), and \(\hat{G}(t)\) is the probability of remaining uncensored at time \(t\), estimated on the training set by the Kaplan-Meier estimator on the negation of the binary any-event indicator.

Note that this assumes independence between censoring and the covariates. When this assumption is violated, the IPCW weights are biased and the Brier score is not a proper scoring rule anymore. See [Gerds2006] for a study of this bias.

Parameters:

y_trainrecord-array, dictionnary or dataframe of shape (n_samples, 2): The target, consisting in the ‘event’ and ‘duration’ columns. If the ‘event’ column holds more than 1 event types, they are automatically collapsed to a single event type to compute the Brier score of the “any-event” survival function estimate. This is only used to estimate the IPCW values to adjust for censoring in the evaluation data.
y_testrecord-array, dictionnary or dataframe of shape (n_samples, 2): The ground truth, consisting in the ‘event’ and ‘duration’ columns. The same remark applies as for y_train with respect to the ‘event’ column.
y_predarray-like of shape (n_samples, n_times): Survival probability estimates predicted at times.
timesarray-like of shape (n_times): Times at which the survival probability y_pred has been estimated and for which we compute the Brier score.

Returns:

brier_scorenp.ndarray of shape (n_times)

hazardous.metrics.brier_score_survival#

This Page

`hazardous.metrics`.brier_score_survival#