Vignette MSMpred

This document provides an example of how to work with the Shiny app MSMpred. This app was build in order to fit, validate and make prediction using a multistate model in an interactive and easy way. This app can be especially relevant for users with limited or no experience in R.

Example Data set

The data that will be used as an example is a data set of 2279 patients that received transplants at the European Society for Blood and Marrow Transplantation (EBMT) between 1985 and 1998. Together with the status and the times for each patient corresponding to each possible state, it includes the covariates years since transplantation, patient age at transplant, donor-recipient gender match and information about administration of prophylaxis. The states include recovery (rec), adverse effect (ae), both recovery and adverse effect (recae), relapse (rel) and death (srv).

Data

Data table

When opening the app it is required to select if the user wants to start a new analysis or resume one, already started and saved.

If the New Analysis option is chosen the following steps must be taken:

Before uploading the file, the column’s separator needs to be selected. Note that the decimal separator should be point (“.”).

Then, the time unit should be specified.

After this process is completed, a set of check boxes are going to be displayed to specify, from the variables in the data which ones represent time, status and covariates.

Since the app needs to know what is the state from where the individuals start, a variable “inistat” is going to be created. To do that, it is necessary to answer if the state from where the individuals start from is present in the data set, and if it not, it needs to be entered.

Once all the time and status variable are selected, the user needs to do a match between the time and status variables for each state and then name the state to each the stats and variable names belong. If the button save changes is pressed without wring the names, the name of the status variable for that state is used as default.

A table showing the data set will be displayed after these changes, so it is possible to check if everything is correct.

Covariates filter

Filtering the data allows the user to perform the analysis, taking only into account the selected categories of the covariates.

Descriptive plots and table

Model specification

Model

After that, is necessary to specify the possible transitions between the different states. This app does not allow to include bidirectional transitions or loops in the multistate model. If a loop is detected a popup will appear indicating that a loop have been introduced, so the last transition won’t be added.

Once they are specified:

In the multistate model diagram box the updated diagram of the defined model is returned. This diagram is automatically updated when a transition is added or deleted. In order to make the diagram more understandable the states are plotted in different colors: orange (initial states), blue (transient) and magenta (absorbing).
In the Number of events for each transition box, the number of events for each transition appears. As loops are not allowed, the number of events that make each transition can be interpreted as the number of individuals that make each transition, because individuals only make each transition of their path once.

Covariates/time

In this tab, it is possible to specify the covariates from the data set that the user wants to include in the analysis and also the follow-up time that wants to consider. This time will be used as limit when representing both cumulative incidences and instantaneous hazards plots.

After specifying all this information it is important to always press the button “Save model specification” , since the information is not automatically saved.

Exploring the data

Length of stay

For the states where there are not censored observations, it is possible to see the distribution of the time that the patients stay in each state, through the analysis of the boxplots.

Time until absorbing states

The cumulative incidence plot shows the proportion of individuals who achieve an absorbing state at any particular time. It indicates the proportion of individuals who go to each absorbing state at any particular time.

Instantaneous hazards

In order to sompute the plts it is necessary to:

Select the starting state of the transitions of interest (for the first graph).
If previously covariates have been selected, the covariate of interest (that will be used as a stratifying covariate)
The ending state of the transitions of interest (for the second graph).

Two groups of instantaneous hazard plots are returned: one for the transitions that start in the selected state and other for the transitions that end in the selected state. If a covariate has been selected, this covariate is used to stratify the data. If a numeric covariate is chosen, in each group two plots will appear: one for individuals with a value above the median value of the covariate, and another for the individuals with a value under the median value of the covariate. If a categorical covariate is chosen, for each category of the covariate in each group one graph will appear. If for a specific transition there are not enough individuals to estimate the instantaneous hazard, the instantaneous hazards of that transition won’t appear in the graph.

The instantaneous hazard indicates which is the risk of going through a specific transition just at moment t. Consequently, when no covariates are selected, we can interpret the graph as the risk of transition for the general population. When a specific covariate is selected, the app uses this as a stratifying covariate, so in one side we will have the risk of transition for a concrete group of individuals (e.g., women, age above the median age) and in the other side we will have the risk of transition for other group of individuals (e.g., men, age below the median age).

Save model

If at any point of the analysis the user wants end the analysis and resume it latter it is possible to download the information already specified.

Model output

Fitted model

Either the option Cox (Markov) or Cox (Semi-Markov) should be selected, and taking into account the previously selected transition’s specific covariates, the model will be fitted.

If a null model is fitted, that is, a model without any covariate, only one table is returned. This table contains the value of the log likelihood of the null model.
If in the model some covariates are taken into account, three tables are returned with the most important information of the model.

In the first table the following information is collected:

Row names: specify the analyzed covariate and transition.
coef: the estimated coefficients.
HR (95%CI): the estimated hazard ratios and confidence intervals.
p-value: the p-value of each estimation.

In the second table the value of the log likelihood of the fitted and the null model are shown, as well as the Akaike information criterion (AIC) of the fitted model.

In the third table the following information is collected:

Row names: the name of the applied test.
test: the value of each test statistic.
df: the difference between the estimated coefficients.
p-value: the p-value of each test.

Note: for help with the interpretation of the values on the table check the Help document of the app.

Forest plot

To create the forest plot, the user needs to select the transition that wants to analyse and in the case of numerical covariates, the units to take into account. A first plot will be returned representing the estimated hazard rations and confidence intervals for the covariates taken into account in the selected transition. If no covariate is selected there won’t be any graph in this section.

The forest plot provides the hazard ratios and their confidence intervals for the covariates taken into account in the selected transition. If the covariate is categorical, these hazard ratios compare each category with respect to the reference category of that covariate, while if the covariate is numeric, the hazard ratios are computed for an increment of one unit in that covariate. As one unit increment might not be easily interpretable, the app permits to introduce the units you wish to be taken into account in the graph.

Based on that graph, we say that the covariate has an effect on the transition if the confidence interval of this specific covariate does not cover the 0, and we say that the covariate does not have a significant effect otherwise.

Validation

Selecting the transition analyse, it is possible to evaluate whether the different assumptions reached to fit the Cox model hold regarding linearity, influential observations and proportional hazards.

Linearity: A graph of the martingale residuals for the selected covariate and transitions are returned. This plot will only be displayed if a numeric continuos covariable is present. The martingale residuals are used to determine the best transformation for a covariate in such a way that it optimally explains the time to an individual passes through a certain transition. To find the best transformation for the covariate \(Z_q\) in the transition \(k→l\), the martingale-based residual from a Cox model adjusted with the other \(p−1\) covariates need to be computed. Then, the graphic of martingale residuals respect to the value of the covariate \(Z_i\),\(q\) are represented with a smoothed curve of the points trajectory along the x-axis. If the smoothed curve is reasonably linear, the covariate \(Z_q\) does not require any further transformation in the transition \(k→l\). Since the data exemplified here only has cathegrical variables it is not possible to porduce this plot.
Influential observations: dfbetas residuals for the selected transition are returned. We plot the dfbetas residuals versus \(Z_i\),\(q\),to determine the influence of the individual \(i\) in the estimation of the coefficients of the transition \(k→l\). That is, those residuals represent the difference between the estimator obtained when adjusting the Cox model for the transition \(k→l\) considering all the individuals, \(\hat{β}\)and the estimator from the model without taking into account the individual \(i\) \(\hat{β_i}\). So, those individuals far away from the others have a higher influence on the model estimates. The ideal situation would be that more or less all the points appear in the same area. Take into account that those dfbetas residuals are standardized, hence they take values in \([−1,1]\).

Proportional hazards assumption: Plots of the Schoenfeld residuals for the selected transition are returned. The Schoenfeld residuals determine the difference between the observed and expected value of the covariate \(Z_q\) in each transitioning time between states \(k\) and \(l\). Schoenfeld residuals for each individual are represented with a smoothed curve of the points. A line at 0 is added. If the confidence interval of the smoothed curve covers the 0 line, the proportionality of the hazard can be assumed.

Model comparison

The user can save and upload information of the sessions using the save and load buttons. A table is returned showing the information of all the fitted models. If wanted, the user can save that information to use in other session clicking on the save button. The user needs to save the session id that is shown in order to upload that information in other session. To upload the information of other session, it is necessary to introduce the session id and click on the load button.

Predictions

The user can make predictions for a new individual, after introduction of some of the individual characteristics:

Characteristics of the patient(s): different aspects of the new patient are specified as the initial state or the values of the selected covariates.
Prediction time: the time when the prediction will be done.
Graphical representation: the user can choose which graphical representation wants to see between a transition probability plot or a cumulative hazard plot. Furthermore, if the transition probability plot is selected, he could choose between non-stacked and stacked plots.

Given that information, it will be displayed boxes with the predicted probability of being in each state after the selected time for the new patient and a transition probability plot for the new patient.