Data
The user uploads his/her own data, otherwise, can use the example data from the DIVINE project.
If the user wants to start a new analysis a csv file with the new data must be uploaded.Before uploading the data, the option New Analysis and the column separator should be selected. Once the data is uploaded it is necessary to specify: - The time unit. - The status variables. - The time variables. - The covariates. - Which time variables correspond to each status variable. - If the state from where the individuals come is present in the data set. If not, it is necessary to enter it.
All time and status variable will then be internally written in the form x_time and x_status, where x_time should contain the time until the state x is reached for the first time, and the variable x_status the information of whether this state is reached or not. A new variable called inistat will also be created with the name of the initial state for each individual.
If the user wants to continua an analysis previously started in the app, the option Resume old analysis should be selected.
To understand better the variables x_time and x_status, the following figure illustrates the path of two specific patients from the DIVINE dataset, \(\text{id}=8\) and \(\text{id}=1\) respectively.
The first patient is admitted in the hospital without severe pneumonia (nopneum_time=0, nopneum_status=1), he/she is diagnosed with severe pneumonia after 1 day in hospital (pneum_time=1, pneum_status=1), he/she needs non-invasive mechanical ventilation at day 2 (NIMV_time=2, NIMV_status=1) and invasive mechanical ventilation at day 3 (IMV_time=3, IMV_status=1) and finally he/she dies at day 10 (death_time=10, death_status=1). Consequently, that patient has not reached the states and and both states are censored at time 10 (reco_time=10, reco_status=0 and dcharg_time=10, dcharg_status=0).
The second patient has a better evolution since, even if he/she needs non-invasive mechanical ventilation at day 7.5, he/she recovers from severe pneumonia at day 13 and he/she is discharged at day 23.
Despite the time until each state is analysed through the difference between the date of entry in the previous and actual state, in the above mentioned patient we can observe that the time until non-invasive mechanical ventilation is 7.5 days. This is because the dataset has patients that enter in two states the same day, and this is not allowed in MSMs. To solve this problem, we made a half day imputation for the patients that enter in two states at the same day. Consequently, as the second patient was diagnosed with severe pneumonia and needed non-invasive mechanical ventilation the seventh day, we made the half day imputation obtaining NIMV_time=7.5.
A table showing the data (uploaded data or example data) is returned.
Model specification
The user specifies the transitions which have to take into account in the model, the covariates to take into account in the model and the time that wants to take into account in the model as well as the time unit.
In the Define the transitions box using the pull-down menu the output and input states are selected and with the add/delete buttons each transition is created/deleted.
This app does not allow to include bidirectional transitions or loops in the multistate model. If a loop is detected a popup will appear indicating that a loop have been introduced so the last transition won’t be added.
In the Covariate selection box the covariates of interest are selected. Those selected covariates will be taken into account to fit the model and as potential characteristics to make the predictions.
In the Time specification box tThe follow-up time and the time units for the plots are selected.
In the Multistate model diagram box the updated diagram of the defined model is returned. This diagram is automatically updated when a transition is added or deleted. In order to make the diagram more understandable the states are plotted in different colors: orange (initial states), blue (transient) and magenta (absorbing).
In the Number of events for each transition box the number of events for each transition appear. As loops are not allowed, the number of events that make each transition can be interpreted as the number of individuals that make each transition, because individuals only make each transition of their path once.
de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.
Exploring the data: Descriptive
The user receives descriptive information of the data.
In this section no input is needed, the covariates selected in the Covariates subsection of the Model section will be used.
Two groups of outputs are returned: descriptive plots and a descriptive table.
In the descriptive plots, for the previously selected covariates the following plots are returned:
The descriptive table gives some information about the selected covariates:
With those descriptive plots and the descriptive table we can observe how individuals are distributed through the different categories or values of each covariate.
Exploring the data: Length of stay
The user receives some descriptive information of the data.
In this section no input is needed, the states of the model will be used.
For each initial or transient state of the model a boxplot is returned representing the length of stay in each of those states.
Those boxplots give the user the following information:
Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.
Exploring the data: Time until absorbing states
The user receives descriptive information of the data.
In this section no input is needed, the states of the model will be used.
Two plots are returned:
How to interpret those plots?
The cumulative incidence plot indicates the proportion of individuals who go to each absorbing state at any particular time.
The survival plot for the absorbing states shows the probability of reaching an absorbing state at time t, given that the individual has not experienced such an state before t.
Exploring the data: Instantaneous hazards
The user receives descriptive information of the data.
There are three different inputs:
Two groups of instantaneous hazard plots are returned: one for the transitions that start in the selected state and other for the transitions that end in the selected state.
If a covariate has been selected, this covariate is used to stratify the data. If a numeric covariate is chosen, in each group two plots will appear: one for individuals with a value above the median value of the covariate, and another for the individuals with a value under the median value of the covariate. If a categorical covariate is chosen, for each category of the covariate in each group one graph will appear.
If for a specific transition there are not enough individuals to estimate the instantaneous hazard, the instantaneous hazards of that transition won’t appear in the graph.
The instantaneous hazard indicates which is the risk of going through a specific transition just at moment t. Consequently, when no covariates are selected, we can interpret the graph as the risk of transition for the general population. When a specific covariate is selected, the app uses this as a stratifying covariate, so in one side we will have the risk of transition for a concrete group of individuals (e.g., women, age above the median age) and in the other side we will have the risk of transition for other group of individuals (e.g., men, age below the median age).
de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.
Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.
One of the assumptions when fitting a Markov model is that the that the future state in a sequence depends only on the current state. However, this assumption is not necessarily valid and needs to be checked using the Markov Test. The test is done uising the MarkovTest function from the package mstate.
Selection of the transitions to be tested.
Returns a table with the value of the test and the p-value for each transition.
Liesbeth C. de Wreede, Marta Fiocco, Hein Putter (2010) The {mstate} Package for Estimation and Prediction in Non- and Semi-Parametric Multi-State and Competing Risks Models. Computer Methods and Programs in Biomedicine, 99, 261-274.
Fitted model
The user decides which type of model wants to fit and the model is fitted including the previously selected covariates in all the transitions.
Selection of the type of model. At the moment only a Cox model is available. If for any transition the hypothesis of the markov assumption (result from Markov test) is rejected, Semi-Markov (Cox) should be selected. After that, it is necessary to select the covariate where the Markov assumption does not hold, so time will be included in the model for that transitions.
Take into account that all the previously selected covariates will be considered in all the transitions of the model.
If in the model some covariates are taken into account, three tables are returned with the most important information of the model.
In the first table the following information is collected:
In the second table the value of the log likelihood of the fitted and the null model are shown.
In the third table the following information is collected:
If a null model is fitted, that is, a model without any covariate, only one table is returned. This table contains the value of the log likelihood of the null model.
Table 1:
Positive coefficient -> instantaneous risk increases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 50 years old and the other 65 years old, and the value of the coefficient that we obtain is \(\text{coef} = 0.02\), the 65 years old has \(\exp((65-50) \times \text{coef}) = \exp(15 \times 0.02) = 1.35\) times more risk of transition than the 50 years old.
Negative coefficient -> instantaneous risk decreases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 55 years old and the other 60 years old, and the value of the coefficient that we obtain is \(\text{coef} = -0.04\), the 60 years old has \(\exp((60-55) \times \text{coef}) = \exp(5 \times (-0.04)) = 0.82\) times less risk of transition than the one with 55 years, or what is the same, the one with 55 years has \(1/\exp((60-55) \times \text{coef}) = 1/0.82 = 1.22\) times more risk of transition than the 60 years old.
Table 2:
The log likelihood of a model is used to compare the fitting of different models. We assume that the model with the higher log likelihood provides a better fit of the data.
Table 3:
The three tests that appear in that table analyze if the coefficients of the model can be assumed different from 0. In the test column the value of the test is represented, in the df column the number of estimated coefficientes and in the column p-value the corresponding p-value is shown. Is important to take into account that if \(\text{p-value} < 10^{-6}\), the app will consider \(\text{p-value} = 0\).
de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.
Meira-Machado L, de Uña-Álvarez J, Cadarso-Suárez C, Andersen PK (2009). Multi-state models for the analysis of time-to-event data. Statistical Methods in Medical Research, Volume 18, Issue 2.
Graphics
The user receives graphical representation of the fitted model.
The user needs to choose two things related with the graph:
A forest plot is returned representing the estimated hazard ratios and their confidence intervals for the covariates taken into account in the selected transition.
As those plots represents the estimated coefficients, if no covariate is selected there won’t be any graph in this section.
The forest plot provides the hazard ratios and their confidence intervals of the covariates taken into account in the selected transition. If the covariate is categorical, these hazard ratios compare each category with respect to the reference category of that covariate, while if the covariate is numeric, the hazard ratios are computed for an increment of one unit in that covariate. As one unit increment might not be easily interpretable, the app permits to introduce the units you wish to be taken into account in the graph.
Based on that graph, we say that the covariate has an effect on the transition if the confidence interval of this specific covariate does not cover the 0, and we say that the covariate does not have a significant effect otherwise.
Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.
Model validation: Linear assumption
The user can analyze wheteher the different assumptions reached to fit the Cox model hold.
The user needs to select:
A graph of the martingale residuals for the selected covariate and transition are returned.
As those plots represents the residuals for a concrete covariate, if no covariate is selected there won’t be any graph in this section.
The martingale residuals serve to determine the best transformation for a covariate in such a way that it optimally explains the time to an individual passes through a certain transition. To find the best transformation for the covariate \(Z_q\) in the transition \(k \rightarrow l\), the martingale-based residual from a Cox model adjusted with the other \(p-1\) covariates need to be computed. Then, the graphic of martingale residuals respect to the value of the covariate \(Z_{i,q}\) are represented with a smoothed curve of the points trajectory along the x-axis. If the smoothed curve is reasonably linear, the covariate \(Z_q\) does not require any further transformation in the transition \(k \rightarrow l\).
Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.
Model validation: Influential observations
The user can analyze wheteher the different assumptions reached to fit the Cox model hold.
The user needs to choose the transition to analyze.
The plots of the dfbetas residuals for the selected transition are returned.
As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.
We plot the dfbetas residuals versus \(Z_{i,q}\) to determine the influence of the individual \(i\) in the estimation of the coefficients of the transition \(k \rightarrow l\). That is, those residuals represent the difference between the estimator obtained when adjusting the Cox model for the transition \(k \rightarrow l\) considering all the individuals, \(\hat{\boldsymbol{\beta}}\), and the estimator from the model without taking into account the individual \(i\), \(\hat{\boldsymbol{\beta}}_{(i)}\). So, those individuals far away from the others have a higher influence on the model estimates. The ideal situation would be that more or less all the points appear in the same area.
Take into account that those dfbetas residuals are standardized, hence they take values in \([-1,1]\).
Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.
Model validation: Proportional hazards assumption
The user can analyze wheteher the different assumptions reached to fit the Cox model hold.
The user needs to choose the transition to analyze.
The plots of the Schoenfeld residuals for the selected transition are returned.
As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.
The Schoenfeld residuals determine the difference between the observed and expected value of the covariate \(Z_q\) in each transitioning time between states \(k\) and \(l\).
Schoenfeld residuals for each individual are represented with a smoothed curve of the points. A line at 0 is added. If the confidence interval of the smoothed curve covers the 0 line, the proportionality of the hazard can be assumed.
Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.
Predictions
The user can make predictions over one or two new individuals.
The inputs can be divided in three blocks:
Two groups of outputs are returned:
As the predictions are made based on the characteristics of a new patient, if no covariate is selected there won’t be any output in this section.
In the first group of outputs the probabilities of being in each state are returned. So, with those values we can get a better idea of how the new individual will be after the selected time period.
In the second group of outputs the transitions probability plot is obtained. This plot can be received in a stacked or non-stacked way, despite both plots give the same information: for the new patient, which is the probability of being in each state along time. In the non-stacked plot, the curves indicate which are those probabilities but in the stacked plot, in order to obtain those probabilities the height of each color need to be analyzed.
de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.