Data

Data

The user uploads his/her own data, otherwise, can use the example data from the DIVINE project.

Data table

Inputs

If wanted a csv file with the new data is uploaded. The csv file must have the following format:

  • Columns separated by commas and decimals represented by points.
  • For each state the dataset has variables named x_time and x_status through which the path of each patient is described. The variable x_time contains the time until the state x is reached for the first time, and the variable x_status indicates if this state is reached or not. If the patient does not reach an state, x_status takes value 0 and x_time takes the last observed time of the patient. We can interpret that the time until this state is censored at the last observed time as the patient has not reached that state.
  • The initial state(s) has to be included in the file following the previous naming and having time equal to 0 when the initial state is not a transient state.
  • One variable named inistat has to be included where for each individual the name of the initial state is specified.
  • The names of the covariates should not include a point.

To understand better the variables x_time and x_status, the following figure illustrates the path of two specific patients from the DIVINE dataset, \(\text{id}=8\) and \(\text{id}=1\) respectively.

The first patient is admitted in the hospital without severe pneumonia (nopneum_time=0, nopneum_status=1), he/she is diagnosed with severe pneumonia after 1 day in hospital (pneum_time=1, pneum_status=1), he/she needs non-invasive mechanical ventilation at day 2 (NIMV_time=2, NIMV_status=1) and invasive mechanical ventilation at day 3 (IMV_time=3, IMV_status=1) and finally he/she dies at day 10 (death_time=10, death_status=1). Consequently, that patient has not reached the states and and both states are censored at time 10 (reco_time=10, reco_status=0 and dcharg_time=10, dcharg_status=0).

The second patient has a better evolution since, even if he/she needs non-invasive mechanical ventilation at day 7.5, he/she recovers from severe pneumonia at day 13 and he/she is discharged at day 23.

Despite the time until each state is analysed through the difference between the date of entry in the previous and actual state, in the abovementioned patient we can observe that the time until non-invasive mechanical ventialtion is 7.5 days. This is because the dataset has patients that enter in two states the same day, and this is not allowed in MSMs. To solve this problem, we made a half day imputation for the patients that enter in two states at the same day. Consequently, as the second patient was diagnosed with severe pneumonia and needed non-invasive mechanical ventilation the seventh day, we made the half day imputation obtaining NIMV_time=7.5.

Outputs

A table showing the data (uploaded data or example data) is returned.

Labels

Inputs

If wanted the user can change the labels of the states and covariates using the boxes. It is very important to click the save state/covariate label button, otherwise the new label would not be saved.

Covariates: descriptive plots

Outputs

For the covariates of the dataset the following plots are returned:

  • For continuous covariates histograms are plotted.
  • For categorical covariates barplots are returned.

Covariates: descriptive table

Outputs

The descriptive table gives some information about the covariates:

  • Variable: the name of the covariate and the covariate type are returned.
  • Stats/Values: for categorical covariates the categories are returned, and for numerical covariates a little summary with the mean, minimum/maximum values, median and interquartile range.
  • Freqs (% of Valid): for categorical covariates the frequency of each categorie is returned, and for numerical covariates the number of distinct values.
  • Valid: the number of valid values for that covariate are returned.

Model specification

Model specification

The user specifies the transitions which have to take into account in the model, the transition specific covariates to take into account in the model and the time that wants to take into account in the model as well as the time unit.

It is very important to click on the save model specification button, otherwise the model would not be saved.

Model

Inputs

In the Define the transitions box using the pull-down menu the output and input states are selected and with the add/delete buttons each transition is created/deleted.

This app does not allow to include bidirectional transitions or loops in the multistate model. If a loop is detected a popup will appear indicating that a loop have been introduced so the last transition won’t be added.

Outputs

In the Multistate model diagram box the updated diagram of the defined model is returned. This diagram is automatically updated when a transition is added or deleted. In order to make the diagram more understandable the states are plotted in different colors: orange (initial states), blue (transient) and magenta (absorbing).

In the Number of events for each transition box the number of events for each transition appear. As loops are not allowed, the number of events that make each transition can be interpreted as the number of individuals that make each transition, because individuals only make each transition of their path once.

Covariates/time

Inputs

In the Time specification box the follow-up time and the time units for the plots are selected.

In the Covariates per transition box the covariates of interest for each transition are selected. Those selected covariates will be taken into account to fit the model and as potential characteristics to make the predictions.

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.

Length of stay

Exploring the data

The user receives some descriptive information of the data.

Length of stay

Inputs

In this section no input is needed, the states of the model will be used.

Outputs

For each initial or transient state of the model a boxplot is returned representing the length of stay in each of those states.

Interpretation

Those boxplots give the user the following information:

  • The horizontal black line identifies the median time of stay in each state (50% of the individuals will stay in that state less than the median time).
  • The lower and upper hinge of the box indicate the first (Q1) and third quartile (Q3) of the length of stay in that state.
  • The upper whisker of the box extends from the hinge to the largest length of stay no further than \(1.5 * (Q3-Q1)\) from the hinge, and the lower whisker extends from the hinge to the smallest length of stay at most \(1.5 * (Q3-Q1)\) of the hinge.
  • The length of stay of other individuals is represented by the outlying points.

References

Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.

Time until absorbing states

Exploring the data

The user receives some descriptive information of the data.

Time until absorbing states

Inputs

In this section no input is needed, the states of the model will be used.

Outputs

The cumulative incidence plot shows the proportion of individuals who achieve an absorbing state at any particular time.

Interpretation

The cumulative incidence plot indicates the proportion of individuals who go to each absorbing state at any particular time.

Instantaneous hazards

Exploring the data

The user receives some descriptive information of the data.

Instantaneous hazards

Inputs

There are three different inputs:

  1. Selection of the starting state of the transitions of interest (for the first graph).
  2. If previously covariates have been selected, selection of the covariate of interest (this covariate will be used as a stratifying covariate).
  3. Selection of the ending state of the transitions of interest (for the second graph).

Outputs

Two groups of instantaneous hazard plots are returned: one for the transitions that start in the selected state and other for the transitions that end in the selected state.

If a covariate has been selected, this covariate is used to stratify the data. If a numeric covariate is chosen, in each group two plots will appear: one for individuals with a value above the median value of the covariate, and another for the individuals with a value under the median value of the covariate. If a categorical covariate is chosen, for each category of the covariate in each group one graph will appear.

If for a specific transition there are not enough individuals to estimate the instantaneous hazard, the instantaneous hazards of that transition won’t appear in the graph.

Interpretation

The instantaneous hazard indicates which is the risk of going through a specific transition just at moment t. Consequently, when no covariates are selected, we can interpret the graph as the risk of transition for the general population. When a specific covariate is selected, the app uses this as a stratifying covariate, so in one side we will have the risk of transition for a concrete group of individuals (e.g., women, age above the median age) and in the other side we will have the risk of transition for other group of individuals (e.g., men, age below the median age).

References

  1. de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.

  2. Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.

Fitted model

Model output

The user decides which type of model wants to fit and the model is fitted including the previously selected transition specific covariates. Some forest plots are returned and the model validation can be done. Finally, the user can compare different fitted models.

Fitted model

Inputs

Selection of the type of model. For the moment only a Cox model is available.

Take into account that the previously selected transition specific covariates will be considered when fitting the model.

If wanted, the user can compute the logarithmic score clicking on the compute the logarithmic score button. This computation takes some time.

Outputs

If in the model some covariates are taken into account, three tables are returned with the most important information of the model.

In the first table the following information is collected:

  • Rownames: specify the analyzed covariate and transition.
  • coef: the estimated coefficients.
  • HR (95%CI): the estimated hazard ratios and confidence intervals.
  • p-value: the p-value of each estimation.

In the second table the value of the log likelihood of the fitted and the null model are shown, as well as the Akaike information criterion (AIC) of the fitted model.

In the third table the following information is collected:

  • Rownames: the name of the applied test.
  • test: the value of each test.
  • df: the difference between the estimated coefficients.
  • p-value: the p-value of each test.

If a null model is fitted, that is, a model without any covariate, only one table is returned. This table contains the value of the log likelihood of the null model.

If the user clicks on the compute the logarithmic score button, this score will be returned.

Interpretation

Table 1:

  1. Factor: compare each category with the reference category.
  • Positive coefficient -> risk factor. For example, if we compare a male and a female (reference category) with the same characteristics and the value of the coefficient that we obtain is \(\text{coef} = 0.56\), the male has \(\exp(\text{coef}) = \exp(0.56) = 1.75\) times more risk of transition than the female.
  • Negative coefficient -> protective factor. For example, if we compare a male and a female (reference category) with the same characteristics and the value of the coefficient that we obtain is \(\text{coef} = -0.1\), the male has \(\exp(\text{coef}) = \exp(-0.1) = 0.9\) times less risk of transition than the female, or what is the same, the female has \(1/\exp(\text{coef}) = 1/0.9 = 1.11\) times more risk of transition than the male.
  1. Numeric: compare two values of this covariate.
  • Positive coefficient -> instantaneous risk increases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 50 years old and the other 65 years old, and the value of the coefficient that we obtain is \(\text{coef} = 0.02\), the 65 years old has \(\exp((65-50) \times \text{coef}) = \exp(15 \times 0.02) = 1.35\) times more risk of transition than the 50 years old.

  • Negative coefficient -> instantaneous risk decreases with each unit of the covariate. For example, if we compare two patients with the same characteristics, but one is 55 years old and the other 60 years old, and the value of the coefficient that we obtain is \(\text{coef} = -0.04\), the 60 years old has \(\exp((60-55) \times \text{coef}) = \exp(5 \times (-0.04)) = 0.82\) times less risk of transition than the one with 55 years, or what is the same, the one with 55 years has \(1/\exp((60-55) \times \text{coef}) = 1/0.82 = 1.22\) times more risk of transition than the 60 years old.

Table 2:

The log likelihood of a model is used to compare the fitting of different models. We assume that the model with the higher log likelihood provides a better fit of the data.

The AIC of a model is used to compare different models. The one with a lower AIC is considered to be better than the other.

Table 3:

The three tests that appear in that table analyze if the coefficients of the model can be assumed different from 0. In the test column the value of the test is represented, in the df column the number of estimated coefficientes and in the column p-value the corresponding p-value is shown. Is important to take into account that if \(\text{p-value} < 10^{-6}\), the app will consider \(\text{p-value} = 0\).

Logarithmic score:

It is not possible to interpret, but if we want to chose the model with a better predictive performance, we need to chose the one with a lower logarithmic score.

References

  1. de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.

  2. Meira-Machado L, de Uña-Álvarez J, Cadarso-Suárez C, Andersen PK (2009). Multi-state models for the analysis of time-to-event data. Statistical Methods in Medical Research, Volume 18, Issue 2.

Forest plot

Model output

The user decides which type of model wants to fit and the model is fitted including the previously selected transition specific covariates. Some forest plots are returned and the model validation can be done. Finally, the user can compare different fitted models.

Forest plot

Inputs

The user needs to choose two things related with the graph:

  • The transition that the user wants to analyze.
  • The units to take into account in the case of the numerical covariates.

Outputs

A forest plot is returned representing the estimated hazard ratios and their confidence intervals for the covariates taken into account in the selected transition.

As those plots represents the estimated coefficients, if no covariate is selected there won’t be any graph in this section.

Interpretation

The forest plot provides the hazard ratios and their confidence intervals of the covariates taken into account in the selected transition. If the covariate is categorical, these hazard ratios compare each category with respect to the reference category of that covariate, while if the covariate is numeric, the hazard ratios are computed for an increment of one unit in that covariate. As one unit increment might not be easily interpretable, the app permits to introduce the units you wish to be taken into account in the graph.

Based on that graph, we say that the covariate has an effect on the transition if the confidence interval of this specific covariate does not cover the 0, and we say that the covariate does not have a significant effect otherwise.

References

Mody A, Lyons PG, Vazquez Guillamet C, et al (2020). The Clinical Course of Coronavirus Disease 2019 in a US Hospital System: A Multistate Analysis. American Journal of Epidemiology, Volume 190, No. 4.

Validation

Model output

The user decides which type of model wants to fit and the model is fitted including the previously selected transition specific covariates. Some forest plots are returned and the model validation can be done. Finally, the user can compare different fitted models.

Validation: Linearity

The user can analyze wheteher the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to select:

  • The transition to analyze.
  • The covariate to analyze.

Outputs

A graph of the martingale residuals for the selected covariate and transition are returned.

As those plots represents the residuals for a concrete covariate, if no covariate is selected there won’t be any graph in this section.

Interpretation

The martingale residuals serve to determine the best transformation for a covariate in such a way that it optimally explains the time to an individual passes through a certain transition. To find the best transformation for the covariate \(Z_q\) in the transition \(k \rightarrow l\), the martingale-based residual from a Cox model adjusted with the other \(p-1\) covariates need to be computed. Then, the graphic of martingale residuals respect to the value of the covariate \(Z_{i,q}\) are represented with a smoothed curve of the points trajectory along the x-axis. If the smoothed curve is reasonably linear, the covariate \(Z_q\) does not require any further transformation in the transition \(k \rightarrow l\).

Validation: Influential observations

The user can analyze wheteher the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to choose the transition to analyze.

Outputs

The plots of the dfbetas residuals for the selected transition are returned.

As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.

Interpretation

We plot the dfbetas residuals versus \(Z_{i,q}\) to determine the influence of the individual \(i\) in the estimation of the coefficients of the transition \(k \rightarrow l\). That is, those residuals represent the difference between the estimator obtained when adjusting the Cox model for the transition \(k \rightarrow l\) considering all the individuals, \(\hat{\boldsymbol{\beta}}\), and the estimator from the model without taking into account the individual \(i\), \(\hat{\boldsymbol{\beta}}_{(i)}\). So, those individuals far away from the others have a higher influence on the model estimates. The ideal situation would be that more or less all the points appear in the same area.

Take into account that those dfbetas residuals are standardized, hence they take values in \([-1,1]\).

Validation: Proportional hazards assumption

The user can analyze whether the different assumptions reached to fit the Cox model hold.

Inputs

The user needs to choose the transition to analyze.

Outputs

The plots of the Schoenfeld residuals for the selected transition are returned.

As those plots represents the residuals for the different covariates, if no covariate is selected there won’t be any graph in this section.

Interpretation

The Schoenfeld residuals determine the difference between the observed and expected value of the covariate \(Z_q\) in each transitioning time between states \(k\) and \(l\).

Schoenfeld residuals for each individual are represented with a smoothed curve of the points. A line at 0 is added. If the confidence interval of the smoothed curve covers the 0 line, the proportionality of the hazard can be assumed.

References

Rizopoulos, D. (2018). Biostatistical Methods II: Classical Regression Models (EP03) Survival Analysis. Course material.

Model comparison

Model output

The user decides which type of model wants to fit and the model is fitted including the previously selected transition specific covariates. Some forest plots are returned and the model validation can be done. Finally, the user can compare different fitted models.

Model comparison

Inputs

The user can save and upload information of the sessions using the save and load buttons.

Outputs

A table is returned showing the information of all the fitted models.

If wanted, the user can save that information to use in other session clicking on the save button. The user needs to save the session id that is shown in order to upload that information in other session.

To upload the information of other session, the user needs to introduce the session id and clicj on the load button.

Predictions

Predictions

The user can make predictions over one or two new individuals.

Predictions

Inputs

The inputs can be divided in three blocks:

  • Characteristics of the patient(s): different aspects of the new patient are specified as the initial state or the values of the selected covariates.
  • Prediction time: the time where the prediction wants to be done.
  • Graphical representation: the user can choose which graphical representation wants to see between a transition probability plot or a cumulative hazard plot. Furthermore, if the transition probability plot is selected he could choose between non-stacked and stacked plots.

Outputs

Two groups of outputs are returned:

  • Some boxes with the predicted probability of being in each state after the selected time for the new patient.
  • A transition probability plot for the new patient.

As the predictions are made based on the characteristics of a new patient, if no covariate is selected there won’t be any output in this section.

Interpretation

In the first group of outputs the probabilities of being in each state are returned. So, with those values we can get a better idea of how the new individual will be after the selected time period.

In the second group of outputs the transitions probability plot is obtained. This plot can be received in a stacked or non-stacked way, despite both plots give the same information: for the new patient, which is the probability of being in each state along time. In the non-stacked plot, the curves indicate which are those probabilities but in the stacked plot, in order to obtain those probabilities the height of each color need to be analyzed.

References

de Wreede LC, Fiocco M, and Putter H (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, Volume 38, Issue 7.