This document provides an example of how to work with the Shiny app MSMpred. This app was build in order to fit, validate and make prediction using a multistate model in an interactive and easy way. This app can be especially relevant for users with limited or no experience in R.
The data that will be used as an example is a data set of 2279 patients that received transplants at the European Society for Blood and Marrow Transplantation (EBMT) between 1985 and 1998. Together with the status and the times for each patient corresponding to each possible state, it includes the covariates years since transplantation, patient age at transplant, donor-recipient gender match and information about administration of prophylaxis. The states include recovery (rec), adverse effect (ae), both recovery and adverse effect (recae), relapse (rel) and death (srv).
When opening the app it is required to select if the user wants to start a new analysis or resume one, already started and saved.
If the New Analysis option is chosen the following steps must be taken:
Then, the time
unit should be specified.
A table showing the data set will be displayed after these changes, so it is possible to check if everything is correct.
Filtering the data allows the user to perform the analysis, taking
only into account the selected categories of the covariates.
After that, is necessary to specify the possible transitions between the different states. This app does not allow to include bidirectional transitions or loops in the multistate model. If a loop is detected a popup will appear indicating that a loop have been introduced, so the last transition won’t be added.
Once they are specified:
In the multistate model diagram box the updated diagram of the
defined model is returned. This diagram is automatically updated when a
transition is added or deleted. In order to make the diagram more
understandable the states are plotted in different colors: orange
(initial states), blue (transient) and magenta (absorbing).
In the Number of events for each transition box, the number of events for each transition appears. As loops are not allowed, the number of events that make each transition can be interpreted as the number of individuals that make each transition, because individuals only make each transition of their path once.
In this tab, it is possible to specify the covariates from the data set that the user wants to include in the analysis and also the follow-up time that wants to consider. This time will be used as limit when representing both cumulative incidences and instantaneous hazards plots.
After specifying all this information it is important to always press the button “Save model specification” , since the information is not automatically saved.
For the states where there are not censored observations, it is possible to see the distribution of the time that the patients stay in each state, through the analysis of the boxplots.
The cumulative incidence plot shows the proportion of individuals who achieve an absorbing state at any particular time. It indicates the proportion of individuals who go to each absorbing state at any particular time.
In order to sompute the plts it is necessary to:
Two groups of instantaneous hazard plots are returned: one for the transitions that start in the selected state and other for the transitions that end in the selected state. If a covariate has been selected, this covariate is used to stratify the data. If a numeric covariate is chosen, in each group two plots will appear: one for individuals with a value above the median value of the covariate, and another for the individuals with a value under the median value of the covariate. If a categorical covariate is chosen, for each category of the covariate in each group one graph will appear. If for a specific transition there are not enough individuals to estimate the instantaneous hazard, the instantaneous hazards of that transition won’t appear in the graph.
The instantaneous hazard indicates which is the risk of going through a specific transition just at moment t. Consequently, when no covariates are selected, we can interpret the graph as the risk of transition for the general population. When a specific covariate is selected, the app uses this as a stratifying covariate, so in one side we will have the risk of transition for a concrete group of individuals (e.g., women, age above the median age) and in the other side we will have the risk of transition for other group of individuals (e.g., men, age below the median age).
If at any point of the analysis the user wants end the analysis and resume it latter it is possible to download the information already specified.
Either the option Cox (Markov) or Cox (Semi-Markov) should be selected, and taking into account the previously selected transition’s specific covariates, the model will be fitted.
If a null model is fitted, that is, a model without any covariate, only one table is returned. This table contains the value of the log likelihood of the null model.
If in the model some covariates are taken into account, three tables are returned with the most important information of the model.
In the first table the following information is collected:
In the second table the value of the log likelihood of the fitted and the null model are shown, as well as the Akaike information criterion (AIC) of the fitted model.
In the third table the following information is collected:
Note: for help with the interpretation of the values on the table check the Help document of the app.
To create the forest plot, the user needs to select the transition that wants to analyse and in the case of numerical covariates, the units to take into account. A first plot will be returned representing the estimated hazard rations and confidence intervals for the covariates taken into account in the selected transition. If no covariate is selected there won’t be any graph in this section.
The forest plot provides the hazard ratios and their confidence intervals for the covariates taken into account in the selected transition. If the covariate is categorical, these hazard ratios compare each category with respect to the reference category of that covariate, while if the covariate is numeric, the hazard ratios are computed for an increment of one unit in that covariate. As one unit increment might not be easily interpretable, the app permits to introduce the units you wish to be taken into account in the graph.
Based on that graph, we say that the covariate has an effect on the transition if the confidence interval of this specific covariate does not cover the 0, and we say that the covariate does not have a significant effect otherwise.
Selecting the transition analyse, it is possible to evaluate whether the different assumptions reached to fit the Cox model hold regarding linearity, influential observations and proportional hazards.
Linearity: A graph of the martingale residuals for the selected covariate and transitions are returned. This plot will only be displayed if a numeric continuos covariable is present. The martingale residuals are used to determine the best transformation for a covariate in such a way that it optimally explains the time to an individual passes through a certain transition. To find the best transformation for the covariate \(Z_q\) in the transition \(k→l\), the martingale-based residual from a Cox model adjusted with the other \(p−1\) covariates need to be computed. Then, the graphic of martingale residuals respect to the value of the covariate \(Z_i\),\(q\) are represented with a smoothed curve of the points trajectory along the x-axis. If the smoothed curve is reasonably linear, the covariate \(Z_q\) does not require any further transformation in the transition \(k→l\). Since the data exemplified here only has cathegrical variables it is not possible to porduce this plot.
Influential observations: dfbetas residuals for the selected transition are returned. We plot the dfbetas residuals versus \(Z_i\),\(q\),to determine the influence of the individual \(i\) in the estimation of the coefficients of the transition \(k→l\). That is, those residuals represent the difference between the estimator obtained when adjusting the Cox model for the transition \(k→l\) considering all the individuals, \(\hat{β}\)and the estimator from the model without taking into account the individual \(i\) \(\hat{β_i}\). So, those individuals far away from the others have a higher influence on the model estimates. The ideal situation would be that more or less all the points appear in the same area. Take into account that those dfbetas residuals are standardized, hence they take values in \([−1,1]\).
The user can save and upload information of the sessions using the save and load buttons. A table is returned showing the information of all the fitted models. If wanted, the user can save that information to use in other session clicking on the save button. The user needs to save the session id that is shown in order to upload that information in other session. To upload the information of other session, it is necessary to introduce the session id and click on the load button.
The user can make predictions for a new individual, after introduction of some of the individual characteristics:
Given that information, it will be displayed boxes with the predicted probability of being in each state after the selected time for the new patient and a transition probability plot for the new patient.