Data

Data

The user uploads his/her own data, otherwise, can use the example data from the DIVINE project.

Inputs

If the user wants to start a new analysis a csv file with the new data must be uploaded.Before uploading the data, the option New Analysis and the column separator should be selected. Once the data is uploaded it is necessary to specify: - The time unit. - The status variables. - The time variables. - The covariates. - Which time variables correspond to each status variable. - If the state from where the individuals come is present in the data set. If not, it is necessary to enter it.

All time and status variable will then be internally written in the form x_time and x_status, where x_time should contain the time until the state x is reached for the first time, and the variable x_status the information of whether this state is reached or not. A new variable called inistat will also be created with the name of the initial state for each individual.

If the user wants to continua an analysis previously started in the app, the option Resume old analysis should be selected.

To understand better the variables x_time and x_status, the following figure illustrates the path of two specific patients from the DIVINE dataset, \(\text{id}=8\) and \(\text{id}=1\) respectively.

The first patient is admitted in the hospital without severe pneumonia (nopneum_time=0, nopneum_status=1), he/she is diagnosed with severe pneumonia after 1 day in hospital (pneum_time=1, pneum_status=1), he/she needs non-invasive mechanical ventilation at day 2 (NIMV_time=2, NIMV_status=1) and invasive mechanical ventilation at day 3 (IMV_time=3, IMV_status=1) and finally he/she dies at day 10 (death_time=10, death_status=1). Consequently, that patient has not reached the states and and both states are censored at time 10 (reco_time=10, reco_status=0 and dcharg_time=10, dcharg_status=0).

The second patient has a better evolution since, even if he/she needs non-invasive mechanical ventilation at day 7.5, he/she recovers from severe pneumonia at day 13 and he/she is discharged at day 23.

Despite the time until each state is analysed through the difference between the date of entry in the previous and actual state, in the above mentioned patient we can observe that the time until non-invasive mechanical ventilation is 7.5 days. This is because the dataset has patients that enter in two states the same day, and this is not allowed in MSMs. To solve this problem, we made a half day imputation for the patients that enter in two states at the same day. Consequently, as the second patient was diagnosed with severe pneumonia and needed non-invasive mechanical ventilation the seventh day, we made the half day imputation obtaining NIMV_time=7.5.

Outputs

A table showing the data (uploaded data or example data) is returned.

Model specification

Model specification

The user specifies the transitions which have to take into account in the model, the covariates to take into account in the model and the time that wants to take into account in the model as well as the time unit.

Inputs

In the Define the transitions box using the pull-down menu the output and input states are selected and with the add/delete buttons each transition is created/deleted.

This app does not allow to include bidirectional transitions or loops in the multistate model. If a loop is detected a popup will appear indicating that a loop have been introduced so the last transition won’t be added.

In the Covariate selection box the covariates of interest are selected. Those selected covariates will be taken into account to fit the model and as potential characteristics to make the predictions.

In the Time specification box tThe follow-up time and the time units for the plots are selected.

Outputs

In the Multistate model diagram box the updated diagram of the defined model is returned. This diagram is automatically updated when a transition is added or deleted. In order to make the diagram more understandable the states are plotted in different colors: orange (initial states), blue (transient) and magenta (absorbing).

In the Number of events for each transition box the number of events for each transition appear. As loops are not allowed, the number of events that make each transition can be interpreted as the number of individuals that make each transition, because individuals only make each transition of their path once.

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.

Descriptive

Exploring the data: Descriptive

The user receives descriptive information of the data.

Inputs

In this section no input is needed, the covariates selected in the Covariates subsection of the Model section will be used.

Outputs

Two groups of outputs are returned: descriptive plots and a descriptive table.

In the descriptive plots, for the previously selected covariates the following plots are returned:

For continuous covariates histograms are plotted.
For categorical covariates barplots are returned.

The descriptive table gives some information about the selected covariates:

Variable: the name of the covariate and the covariate type are returned.
Stats/Values: for categorical covariates the categories are returned, and for numerical covariates a little summary with the mean, minimum/maximum values, median and interquartile range.
Freqs (% of Valid): for categorical covariates the frequency of each categorie is returned, and for numerical covariates the number of distinct values.
Valid: the number of valid values for that covariate are returned.

Interpretation

With those descriptive plots and the descriptive table we can observe how individuals are distributed through the different categories or values of each covariate.

Length of stay

Exploring the data: Length of stay

The user receives some descriptive information of the data.

Inputs

In this section no input is needed, the states of the model will be used.

Outputs

For each initial or transient state of the model a boxplot is returned representing the length of stay in each of those states.

Interpretation

Those boxplots give the user the following information:

The horizontal black line identifies the median time of stay in each state (50% of the individuals will stay in that state less than the median time).
The lower and upper hinge of the box indicate the first (Q1) and third quartile (Q3) of the length of stay in that state.
The upper whisker of the box extends from the hinge to the largest length of stay no further than \(1.5 * (Q3-Q1)\) from the hinge, and the lower whisker extends from the hinge to the smallest length of stay at most \(1.5 * (Q3-Q1)\) of the hinge.
The length of stay of other individuals is represented by the outlying points.

References

Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.

Time until absorbing states

Exploring the data: Time until absorbing states

The user receives descriptive information of the data.

Inputs

In this section no input is needed, the states of the model will be used.

Outputs

Two plots are returned:

The cumulative incidence plot shows the proportion of individuals who achieve an absorbing state at any particular time.
The survival functions for each absorbing state are represented.

Interpretation

How to interpret those plots?

The cumulative incidence plot indicates the proportion of individuals who go to each absorbing state at any particular time.
The survival plot for the absorbing states shows the probability of reaching an absorbing state at time t, given that the individual has not experienced such an state before t.

Instantaneous hazards

Exploring the data: Instantaneous hazards

The user receives descriptive information of the data.

Inputs

There are three different inputs:

Selection of the starting state of the transitions of interest (for the first graph).
If previously covariates have been selected, selection of the covariate of interest (this covariate will be used as a stratifying covariate).
Selection of the ending state of the transitions of interest (for the second graph).

Outputs

Two groups of instantaneous hazard plots are returned: one for the transitions that start in the selected state and other for the transitions that end in the selected state.

If a covariate has been selected, this covariate is used to stratify the data. If a numeric covariate is chosen, in each group two plots will appear: one for individuals with a value above the median value of the covariate, and another for the individuals with a value under the median value of the covariate. If a categorical covariate is chosen, for each category of the covariate in each group one graph will appear.

If for a specific transition there are not enough individuals to estimate the instantaneous hazard, the instantaneous hazards of that transition won’t appear in the graph.

Interpretation

The instantaneous hazard indicates which is the risk of going through a specific transition just at moment t. Consequently, when no covariates are selected, we can interpret the graph as the risk of transition for the general population. When a specific covariate is selected, the app uses this as a stratifying covariate, so in one side we will have the risk of transition for a concrete group of individuals (e.g., women, age above the median age) and in the other side we will have the risk of transition for other group of individuals (e.g., men, age below the median age).

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.
Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.

Markov test

One of the assumptions when fitting a Markov model is that the that the future state in a sequence depends only on the current state. However, this assumption is not necessarily valid and needs to be checked using the Markov Test. The test is done uising the MarkovTest function from the package mstate.

Inputs

Selection of the transitions to be tested.

Outputs

Returns a table with the value of the test and the p-value for each transition.

References

Liesbeth C. de Wreede, Marta Fiocco, Hein Putter (2010) The {mstate} Package for Estimation and Prediction in Non- and Semi-Parametric Multi-State and Competing Risks Models. Computer Methods and Programs in Biomedicine, 99, 261-274.

Fitted model

Fitted model

The user decides which type of model wants to fit and the model is fitted including the previously selected covariates in all the transitions.

Inputs

Selection of the type of model. At the moment only a Cox model is available. If for any transition the hypothesis of the markov assumption (result from Markov test) is rejected, Semi-Markov (Cox) should be selected. After that, it is necessary to select the covariate where the Markov assumption does not hold, so time will be included in the model for that transitions.

Take into account that all the previously selected covariates will be considered in all the transitions of the model.

Outputs

If in the model some covariates are taken into account, three tables are returned with the most important information of the model.

In the first table the following information is collected:

Rownames: specify the analyzed covariate and transition.
coef: the estimated coefficients.
HR (95%CI): the estimated hazard ratios and confidence intervals.
p-value: the p-value of each estimation.

In the second table the value of the log likelihood of the fitted and the null model are shown.

In the third table the following information is collected:

Rownames: the name of the applied test.
test: the value of each test.
df: the difference between the estimated coefficients.
p-value: the p-value of each test.

If a null model is fitted, that is, a model without any covariate, only one table is returned. This table contains the value of the log likelihood of the null model.

Interpretation

Table 1:

Factor: compare each category with the reference category.

Positive coefficient -> risk factor. For example, if we compare a male and a female (reference category) with the same characteristics and the value of the coefficient that we obtain is \(\text{coef} = 0.56\), the male has \(\exp(\text{coef}) = \exp(0.56) = 1.75\) times more risk of transition than the female.
Negative coefficient -> protective factor. For example, if we compare a male and a female (reference category) with the same characteristics and the value of the coefficient that we obtain is \(\text{coef} = -0.1\), the male has \(\exp(\text{coef}) = \exp(-0.1) = 0.9\) times less risk of transition than the female, or what is the same, the female has \(1/\exp(\text{coef}) = 1/0.9 = 1.11\) times more risk of transition than the male.

Numeric: compare two values of this covariate.

Positive coefficient -> instantaneous risk increases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 50 years old and the other 65 years old, and the value of the coefficient that we obtain is \(\text{coef} = 0.02\), the 65 years old has \(\exp((65-50) \times \text{coef}) = \exp(15 \times 0.02) = 1.35\) times more risk of transition than the 50 years old.
Negative coefficient -> instantaneous risk decreases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 55 years old and the other 60 years old, and the value of the coefficient that we obtain is \(\text{coef} = -0.04\), the 60 years old has \(\exp((60-55) \times \text{coef}) = \exp(5 \times (-0.04)) = 0.82\) times less risk of transition than the one with 55 years, or what is the same, the one with 55 years has \(1/\exp((60-55) \times \text{coef}) = 1/0.82 = 1.22\) times more risk of transition than the 60 years old.

Table 2:

The log likelihood of a model is used to compare the fitting of different models. We assume that the model with the higher log likelihood provides a better fit of the data.

Table 3:

The three tests that appear in that table analyze if the coefficients of the model can be assumed different from 0. In the test column the value of the test is represented, in the df column the number of estimated coefficientes and in the column p-value the corresponding p-value is shown. Is important to take into account that if \(\text{p-value} < 10^{-6}\), the app will consider \(\text{p-value} = 0\).

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.
Meira-Machado L, de Uña-Álvarez J, Cadarso-Suárez C, Andersen PK (2009). Multi-state models for the analysis of time-to-event data. Statistical Methods in Medical Research, Volume 18, Issue 2.

Graphics

Graphics

The user receives graphical representation of the fitted model.

Inputs

The user needs to choose two things related with the graph:

The transition that the user wants to analyze.
The units to take into account in the case of the numerical covariates.

Outputs

A forest plot is returned representing the estimated hazard ratios and their confidence intervals for the covariates taken into account in the selected transition.

As those plots represents the estimated coefficients, if no covariate is selected there won’t be any graph in this section.

Interpretation

The forest plot provides the hazard ratios and their confidence intervals of the covariates taken into account in the selected transition. If the covariate is categorical, these hazard ratios compare each category with respect to the reference category of that covariate, while if the covariate is numeric, the hazard ratios are computed for an increment of one unit in that covariate. As one unit increment might not be easily interpretable, the app permits to introduce the units you wish to be taken into account in the graph.

Based on that graph, we say that the covariate has an effect on the transition if the confidence interval of this specific covariate does not cover the 0, and we say that the covariate does not have a significant effect otherwise.

References

Linear assumption

Model validation: Linear assumption

The user can analyze wheteher the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to select:

The transition to analyze.
The covariate to analyze.

Outputs

A graph of the martingale residuals for the selected covariate and transition are returned.

As those plots represents the residuals for a concrete covariate, if no covariate is selected there won’t be any graph in this section.

Interpretation

The martingale residuals serve to determine the best transformation for a covariate in such a way that it optimally explains the time to an individual passes through a certain transition. To find the best transformation for the covariate \(Z_q\) in the transition \(k \rightarrow l\), the martingale-based residual from a Cox model adjusted with the other \(p-1\) covariates need to be computed. Then, the graphic of martingale residuals respect to the value of the covariate \(Z_{i,q}\) are represented with a smoothed curve of the points trajectory along the x-axis. If the smoothed curve is reasonably linear, the covariate \(Z_q\) does not require any further transformation in the transition \(k \rightarrow l\).

References

Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.

Influential observations

Model validation: Influential observations

The user can analyze wheteher the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to choose the transition to analyze.

Outputs

The plots of the dfbetas residuals for the selected transition are returned.

As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.

Interpretation

We plot the dfbetas residuals versus \(Z_{i,q}\) to determine the influence of the individual \(i\) in the estimation of the coefficients of the transition \(k \rightarrow l\). That is, those residuals represent the difference between the estimator obtained when adjusting the Cox model for the transition \(k \rightarrow l\) considering all the individuals, \(\hat{\boldsymbol{\beta}}\), and the estimator from the model without taking into account the individual \(i\), \(\hat{\boldsymbol{\beta}}_{(i)}\). So, those individuals far away from the others have a higher influence on the model estimates. The ideal situation would be that more or less all the points appear in the same area.

Take into account that those dfbetas residuals are standardized, hence they take values in \([-1,1]\).

References

Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.

Proportional hazards assumption

Model validation: Proportional hazards assumption

The user can analyze wheteher the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to choose the transition to analyze.

Outputs

The plots of the Schoenfeld residuals for the selected transition are returned.

As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.

Interpretation

The Schoenfeld residuals determine the difference between the observed and expected value of the covariate \(Z_q\) in each transitioning time between states \(k\) and \(l\).

Schoenfeld residuals for each individual are represented with a smoothed curve of the points. A line at 0 is added. If the confidence interval of the smoothed curve covers the 0 line, the proportionality of the hazard can be assumed.

References

Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.

Predictions

Predictions

The user can make predictions over one or two new individuals.

Inputs

The inputs can be divided in three blocks:

Characteristics of the patient(s): different aspects of the new patient are specified as the initial state or the values of the selected covariates.
Prediction time: the time where the prediction wants to be done.
Graphical representation: the user can choose which graphical representation wants to see between a transition probability plot or a cumulative hazard plot. Furthermore, if the transition probability plot is selected he could choose between non-stacked and stacked plots.

Outputs

Two groups of outputs are returned:

Some boxes with the predicted probability of being in each state after the selected time for the new patient.
A transition probability plot for the new patient.

As the predictions are made based on the characteristics of a new patient, if no covariate is selected there won’t be any output in this section.

Interpretation

In the first group of outputs the probabilities of being in each state are returned. So, with those values we can get a better idea of how the new individual will be after the selected time period.

In the second group of outputs the transitions probability plot is obtained. This plot can be received in a stacked or non-stacked way, despite both plots give the same information: for the new patient, which is the probability of being in each state along time. In the non-stacked plot, the curves indicate which are those probabilities but in the stacked plot, in order to obtain those probabilities the height of each color need to be analyzed.

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.