A comparison of presmoothing methods in the estimation of transition probabilities

Abstract One major goal in clinical applications of multi-state models is the estimation of transition probabilities. Estimators based on subsampling were recently introduced by de Uña-Álvarez and Meira-Machado to estimate these quantities, and their superiority with respect to the competing estimators has been proved in situations in which the Markov condition is violated. The idea behind the proposed estimators is to use a procedure based on (differences between) Kaplan-Meier estimators derived from a subset of the data consisting of all subjects observed to be in a given state at a given time. Subsampling, also referred to as landmarking, leads to small sample sizes and high percentage of censoring for which more efficient estimators are essential. Preliminary smoothing, also known as presmoothing, is a good alternative to these situations. The presmoothed estimators are obtained by replacing the censoring indicator variables in the classical definitions by values of a regression estimator. The behavior of the presmoothed landmark estimators for the transition probabilities is explored through simulation studies when different ways to estimate the conditional probability of uncensoring are used. Real data illustration is included.


Introduction
Multi-state models can be successfully used to model the movement of patients among a set of several states (Andersen et al. 1993;Hougaard 2000;Meira-Machado et al. 2009). These models are illustrated by a direct graph with rectangular boxes to represent possible states and with arrows between the states representing the allowed transitions. States can be transient or absorbing. Multi-state models have been widely used in medicine to investigate the progression of patients through different states of a disease such as cancer and AIDS. They have also been used in other areas of application including epidemiology, clinical trials, reliability studies in engineering, etc. Data in these applications are often subject to right censoring and possibly left truncation.
The simplest multi-state model is called the mortality model for survival data which consists of just two states (usually alive and death) and one transition. The competing risks model (Andersen and Keiding, 2002;Putter, Fiocco, and Geskus 2007) can be seen as an extension of the simple mortality model for survival data in which each individual may 'die' due to any of several causes. A well-known and more complex multi-state model is the illness-death model. This model, also known as the disability model, can be used to study the incidence of the disease and the rate of death. In the irreversible version of this model, individuals may pass from the initial state (e.g., 'alive and disease-free'; State 0), to the intermediate event or disease state (e.g., 'diseased'; State 1) and then to the absorbing state (e.g., 'dead'; State 2). Individuals are at risk of death in each transient state (states 0 and 1). One problem here may be to evaluate whether previously diseased subjects have the same rate of death as those who have been healthy all their lives. Unlike the survival or the competing risks model, the illness-death model is not necessarily Markovian, since the prognosis for an individual in the intermediate state may be influenced by the subject specific arrival time. Figure 1 shows the schematic diagram of transitions involved in the progressive illness-death model.
One important feature of multi-state models is their ability to obtain predictions of the clinical prognosis of a patient at a certain point in their illness process. The transition probabilities are a key quantity of interest since they allow for long-term predictions of the process. In case the process is Markovian, the transition probabilities depend on the current state (not on the history) and can be consistently estimated nonparametrically using the so-called Aalen-Johansen estimator (Aalen and Johansen 1978). However, in many cases, the Markov models do not fit satisfactorily, and fortunately, alternative estimators exist. Estimators which do not rely on the Markov condition were introduced for the first time by Meira-Machado, de Uña-Alvarez, and Cadarso-Su arez (2006). The proposed estimators were defined using multivariate Kaplan-Meier integrals employing time multivariate techniques. These authors showed that the proposed estimators may behave more efficiently than the Aalen-Johansen estimators in case of strong violation of the Markov condition. This work has been recently revisited by de Uña-Alvarez and Meira-Machado (2015) who proposed alternative estimators based on subsampling which are consistent regardless the Markov condition. The idea behind their estimators is to use specific subsamples or portions of the data at hand for which the ordinary Kaplan and Meier (1958) survival function leads to a consistent estimator of the target. The same idea of subsampling, combined with the Aalen-Johansen estimate of the state occupation probabilities derived from the same subsets, was later used by Putter and Spitoni (2018) to introduce new estimators which were termed as landmark Aalen-Johansen. The idea behind the landmark estimators is based on the computation of the socalled occupation probabilities in a subset of individuals that happen to be in a given state at a particular time. Because of this, in some cases the computation of the transition probabilities are based on reduced data and usually in the presence of heavily censored data leading to estimators with higher variability. To avoid this problem, Meira-Machado (2016) propose two approaches that can be used to reduce the variability of the landmark estimator (de Uña-Alvarez and Meira-Machado 2015). One approach is based on spline smoothing while the other is based on a preliminary parametric estimation (presmoothing) of the probability of censoring. The main idea of presmoothing is that each censoring indicator is replaced by a smooth fit of a binary regression of the indicator on observables. The use of presmoothed estimators is a good alternative in these situations, since they give mass to all the event times, including the censored observations. Successful applications of presmoothed estimators include estimation of the survival function (Dikta 1998;Meira-Machado, Sestelo, and Gonçalves 2016), nonparametric curve estimation (Cao and J acome 2004), regression analysis (de Uña-Alvarez and Rodr ıguez-Campos 2004; J acome and Iglesias 2010), estimation of the bivariate distribution of censored gap times (de Uña-Alvarez and Amorim, de Uña-Alvarez, and Meira-Machado 2011), and the estimation of the transition probabilities (Amorim, de Uña-Alvarez, and Meira-Machado 2011). All these references concluded that the presmoothed estimators have improved variance when compared to purely nonparametric estimators.
In this work we review some recent developments in this area, focusing on the presmoothed estimation of the transition probabilities. Specifically, we focus on the choice of the preliminary smoothing function which may be based on a certain parametric family such as the logistic, probit etc., or on a nonparametric regression smoother such as the Nadaraya-Watson kernel estimator (Wand and Jones 1995).
The organization of the paper is as follows. In Sec. 2 we introduce the quantities to estimate as well as the corresponding estimators. The performance of four sets of estimators is investigated through simulations in Sec. 3, while in Sec. 4 the methods are compared through the analysis of medical data from a controlled clinical trial in liver cirrhosis. In Sec. 5 we give a brief overview of the R package developed by the authors. Main conclusions are reported in Sec. 6.

Notation
A multi-state model is a model for a time continuous stochastic process ðYðtÞ, t 2 ½0, 1ÞÞ which at any time occupies one of a few possible states. In this paper we consider the progressive illness-death model depicted in Figure 1, i.e., we consider the state space f0, 1, 2g, right-continuous sample paths and we assume that all subjects are in State 0 at time t ¼ 0, i.e., PðYð0Þ ¼ 0Þ ¼ 1: This model is encountered in many medical studies for describing the progression of patients undergoing a given illness, particularly in cancer studies.
We may define the sojourn time in the initial state, Z ¼ infft : YðtÞ 6 ¼ 0g and the total time of the process T ¼ infft : YðtÞ ¼ 2g: Note that (Z, T) falls on a line with a strictly positive probability, since T ¼ Z for those individuals undergoing a direct transition from State 0 to the absorbing State 2. On the other hand, Z < T indicates that the individual visits the intermediate State 1 at some time. In practice, several issues influence the observation of these two random variables. The most common issue is rightcensoring, which happens when a subject leaves the study before an event occurs, or when the study ends before the event has occurred. Under right-censoring, only the censored versions of Z and T, along with their corresponding censoring indicators, are available. This censoring is modeled by considering a censoring variable C, which we assume to be independent of the process (Z, T). DefineZ ¼ minðZ, CÞ andT ¼ minðT, CÞ for the censored versions of Z and T and introduce D 1 ¼ IðZ CÞ and D ¼ IðT CÞ for the respective censoring indicators of Z and T. The variablesZ andT 12 1 T ÀZ are the observed sojourn times in states 0 and 1, respectively. Finally, the available data are ðZ i ,T i , D 1i , D i Þ, 1 i n, i.i.d. copies of ðZ,T , D 1 , DÞ:

Estimation of the transition probabilities
For two states h, j and two time points s < t, introduce the so-called transition probabilities p hj ðs, tÞ ¼ PðYðtÞ ¼ jjYðsÞ ¼ h, H sÀ Þ, where H sÀ denotes the history of the multistate process up to s. The history of the process will include information such as the states previously visited, transition times, etc. The process is Markovian, if for any s, t with 0 s < t, and h, j, y 2 S, the state space of the process, we have PðYðtÞ ¼ jjYðsÞ ¼ h, YðuÞ ¼ yÞ ¼ PðYðtÞ ¼ jjYðsÞ ¼ hÞ, 0 u < s, and thus, the future of the process after time s depends only on the state occupied at time s, not on the arrival time to that state or on the states previously visited. Similarly, in Markov models, the transition intensities depend on the individual's past history only through the current state.
Traditionally, the Markov condition is verified by modeling particular transition intensities on aspects of the history of the process using a proportional hazard model (Kay 1986). In the progressive illness-death model the Markov condition is only relevant for the transition from the disease state (State 1) to death (State 2). Under this model, we can examine whether the time spent in the initial state (State 0) is important on the transition from the disease state to death. For doing that, let k 12 ðtÞ denote the hazard function of T for those individuals going from State 1 to State 2. Fitting a Cox model k 12 ðtjZÞ ¼ k 12, 0 ðtÞ exp ðbZÞ, where k 12, 0 is the baseline hazard and b a regression parameter, we now need to test the null hypothesis, H 0 : b ¼ 0, against the general alternative, H 1 : b 6 ¼ 0: This would assess if the transition rate from the disease state into death is unaffected by the time spent in the disease state.
In the illness-death model, the five transition probabilities can be expressed as depending on the joint distribution of (Z, T) as follows p 00 ðs, tÞ ¼ P Z > tjZ > s ð Þ , p 01 ðs, tÞ ¼ P Z t, T > tjZ > s ð Þ p 02 ðs, tÞ ¼ P T tjZ > s ð Þ , p 11 ðs, tÞ ¼ P Z t, T > tjZ s, T > s ð Þ p 12 ðs, tÞ ¼ P T tjZ s, T > s ð Þ Recently, de Uña-Alvarez and Meira-Machado (2015) have used the idea of subsampling to introduce (landmark) estimators of the transition probabilities which do not rely on the Markov property. The idea of the new methods is to use a procedure based on (differences between) Kaplan-Meier estimators derived from a subset of the data consisting of all subjects observed to be in a given state at a given time. Following equations above, given the time point s, to estimate p 0j ðs, tÞ for j ¼ 0, 1, 2 the landmark analysis is restricted to the individuals observed in State 0 at time s. For the subpopulation Z > s, the censoring time C is still independent of the pair (Z, T) and, therefore, Kaplan-Meier-based estimation will be consistent. Similarly, to estimate p 1j ðs, tÞ, j ¼ 1, 2, the landmark analysis proceeds from the sample restricted to the individuals observed in State 1 at time s. Then, we may formally introduce the landmark estimators as folloŵ p LM 00 ðs, tÞ ¼Ŝ As can be seen in Andersen et al. (1993), the Kaplan-Meier estimator play an important role for the landmark estimators proposed by de Uña-Alvarez and Meira-Machado (2015). Under our notation, the Kaplan-Meier product-limit estimator of the survival function SðtÞ ¼ PðT > tÞ can be expressed using Kaplan-Meier weights as followŝ is the Kaplan-Meier weight attached toT ðiÞ : Similarly, one could introduce the Kaplan-Meier formula based on the ðZ i , D 1i Þ's for the distribution of Z.
The Kaplan-Meier estimator is a step function with jumps located only at the uncensored observations. In heavily censored data sets the accuracy of their estimation might not be acceptable in particular at the right tail of the distribution. This problem becomes worse when this estimator is applied to subsamples of the complete data. Because of this, the landmark estimators introduced in Andersen et al. (1993) may result in wiggly estimators with fewer jump points and high variability. These problems will be more prominent in some transition probabilities. To handle these difficulties, the use of presmoothed Kaplan-Meier estimators in Andersen et al. (1993) are a good alternative since these estimators will give mass to all the event times, including the censored observations. The premoothed Kaplan-Meier estimators are obtained by replacing the censoring indicator variables in the expression of the Kaplan-Meier weights with some smooth fit before the Kaplan-Meier formula is applied. The presmoothed Kaplan-Meier estimator of the survival function of T is given by i IðT ðiÞ tÞ, and p n ðtÞ stands for an estimator of the binary regression function pðtÞ ¼ PðD ¼ 1jT ¼ tÞ, i.e., the conditional probability that the observation at time t is not censored.
We may now formally introduce the presmoothed landmark estimators as followŝ butions of Z and T, respectively, but computed from the respective subsamples. One useful parametric candidate for the binary regression function p(t) belongs to a parametric family of binary regression curves, such as logit or probit. When the parametric model specified for p(t) is correct the corresponding semiparametric presmoothed estimator is at least as efficient as the original nonparametric Kaplan-Meier estimator. Importantly, the validity of a given model for the presmoothing function can be checked graphically or formally, by applying a goodness-of-fit test such as the test proposed by Hosmer, Lemeshow, and Sturdivant (2013) for the logistic model or the Kolmogorov-Smirnov type version of the model-based bootstrap approach described in Dikta, Kvesic, and Schmidt (2006). This implies that the risk of a misspecified model can be controlled in practice. An alternative and more flexible approach is to model the binary regression function through an additive regression model.
Nonparametric presmoothing (Cao et al. 2005) is useful when there is a clear risk of a miss-specification of the parametric model. In this case, one may use the Nadaraya-Watson kernel estimator for pðÁÞ based on the binary responses D i with covariates T i . The idea is to estimate pðT i Þ by p n ðT i Þ where p n ðtÞ ¼ and K is a known probability density function (the kernel function) and a n is a sequence of bandwidths. As a drawback, nonparametric regression requires the specification of a bandwidth for the computation of the smooth fit p n ðtÞ: As a n decreases the roughness of the resulting estimator will increase. When a n ! 0 then the presmoothed estimator coincide in the limit with the classical estimators. On the other hand, as the bandwidth a n increase oversmoothed estimates will be obtained removing important features of the underlying structure of the survival function to be estimated. Different presmoothed estimators are obtained depending on the choice of the preliminary smoothing function. Simulation studies reported in the next section show that in both cases the presmoothed estimators may be much more efficient than the completely nonparametric estimator, since they often have less variance while providing smoother curves with the expected behavior.

Simulation studies
In this section we report results from two simulation studies, where the aim is to compare the finite sample performance of our presmoothed estimators of the transition probabilities. To be specific, we compare the original landmark estimator of de Uña-Alvarez and Meira-Machado (2015) (labeled as LM) with the semiparametric presmoothed estimator which is obtained through the use of a logistic regression model for the presmoothing function (labeled as PLM) and the presmoothed estimator based on a nonparametric kernel regression model (labeled as NP). For completeness purposes we also included the presmoothed estimator which is obtained through the use of a generalized additive logistic regression model (labeled as GAM). We first simulated data from a scenario used by Amorim, de Uña-Alvarez, and Meira-Machado (2011), which these authors found to be challenging both in terms of bias and variance. To be specific, we separately consider the subjects passing through State 1 at some time and those who directly go to the State 2. For the first subgroup of individuals we generated replicates of ðZ, T À ZÞ according to the bivariate distribution F 12 ðx, yÞ ¼ F 1 ðxÞF 2 ðyÞ 1 þ 1 À F 1 ðxÞ È É 1 À F 2 ðyÞ È É h i with exponential marginal distribution functions with rate parameter 1. For the second subgroup of individuals (Z ¼ T), the value of Z is simulated according to an exponential with rate parameter 1. Random censoring was simulated from uniform distributions U 0, s G ½ for s G equal to 3 and 4. The model with s G ¼ 4 results in 24% of censoring on the first gap time Z, and in 47% of censoring on the second gap time T -Z. The model with s G ¼ 3 increases these censoring levels to 32% and about 57%, respectively. Since we are assuming correlated times for Z and T -Z, the simulated model does not satisfies the Markov property. Details of the simulation procedure can be found in Amorim, de Uña-Alvarez, and Meira-Machado (2011).
To summarize the results we fixed the values of (s, t) for four different points, corresponding to combinations of the percentiles 20%, 40%, 60% and 80% of the exponential marginal distribution functions with rate parameter 1. In each simulation, 1000 samples were generated with sample sizes 50, 100 and 250. As a measure of efficiency, we took the Mean Squared Error (MSE) but we also computed the standard deviations (SD) and the Bias for each point (s, t).
The performance of the empirical transition probabilitiesp  Table 1. The semiparametric presmoothing (labeled as PLM) requires to estimate the binary regression functions, such as p 0n ðtÞ ¼ PðD 1 ¼ 1jZ ¼ tÞ and p 1n ðtÞ ¼ PðD ¼ 1jT ¼ tÞ, to presmooth the Kaplan-Meier estimatorsŜ 0 ðtÞ andŜðtÞ, respectively. After some algebra, it is seen that the function p 0n ðtÞ is written as ðs G À tÞ=ðs G À t þ 1Þ: The binary function p 1n ðtÞ, for those individuals that observe a transition to State 1, is given by s G =ð1 þ gðtÞÞ, where gðtÞ ¼ k G ðtÞ=k T ðtÞ, and where s G ðtÞ ¼ 1=ðs G À tÞ is the hazard rate of the censoring variable and k T ðtÞ ¼ Àe Àt 4 À 2t ð Þþ e À2t 8 þ 4t ð Þ À Á = e Àt 2t À 2 ð Þþ e À2t 3 þ 2t ð Þ À Á is the hazard rate of T under restrictionZ <T: Plots for these functions shown in Figure 2 reveal that they are monotonous and so they can be adequately estimated by logistic regression. In our simulations we have used the logistic regression models p n ðt; bÞ ¼ 1=ð1 þ exp ðb 0 þ b 1 tÞÞ: The b parameter in model p 0n ðÁ; bÞ was estimated by maximizing of the conditional likelihood of the D 1 's givenZ; and via maximization of the conditional likelihood of the D's givenT , in model p 1n ðÁ; bÞ: The presmoothing labeled as GAM was implemented by fitting a generalized additive logistic model through the mgcv R package (Wood 2019). To compute the transition probabilities with a nonparametric presmoothing we have used the plug-in bandwidth selector of Cao et al. (2005) and biweight kernels. As would be expected, results reported in this table reveal that the estimation of the transition probabilities is performed with less accuracy as s and t grow. This behavior was expected since the performance of all methods in lifetime data is usually poorer at the right tail where the censoring effects are stronger. At these points the SD is in most cases larger. The SD decreases with an increase in the sample size and with a decrease of the censoring percentage. All methods obtain in all settings a low bias. It can also be seen that the SD clearly dominates the performance of the proposed estimators in most the cases revealing a clear advantage of the presmoothed estimators when compared with the unsmoothed landmark estimators (labeled as LM). This can be observed by the relative efficiency between the unsmoothed landmark estimator and the three presmoothed estimators that was measured by the ratio between their corresponding MSEs. In almost all cases, the use of presmoothing leads to estimators with less SD and less MSE.
Because of space limitation, Table 1 does not report the results for transition probabilityp 00 ðs, tÞ: However, the behavior of the estimators for this transition can be seen in Figures 3 and 4 through the boxplots of the mean squared errors based on the 1000 Monte Carlo replicates for the four estimators, with different sample sizes and two censoring levels. For completeness purposes we decided to show the plots for three transitions and different fixed values of (s, t). The boxplots shown in these figures reveal some results which are in agreement with our findings reported in Table 1. These plots confirm the less variability of the presmoothed estimators. Results shown in Table 1    and Figures 3 and 4 suggest that the use of presmoothing leads to better results for all transition probabilities while neither one of the three presmoothed estimators (PLM, GAM and NP) seems to be uniformly the best.
While reducing the variance, presmoothing may introduce some bias in estimation. Simulation results reported in Table 1 serve to illustrate this issue too. When the parametric model specified for the presmoothing function is incorrect, the corresponding semiparametric estimator may lose efficiency, providing estimators with a large bias. However, the validity of a given parametric model for presmoothing can be checked by applying a goodness-of-fit test such as the test proposed by Hosmer, Lemeshow, and Sturdivant (2013). We have conducted simulation studies to evaluate the capability of the Hosmer and Lemeshow goodness-of-fit test to detect deviations from the logistic regression model. Results (not reported here) suggest small deviations in this scenario. In most cases the test was not able to reject the logistic model. Though the simulated scenario seem to be favorable to the estimator based on a parametric preliminary smoothing, the estimator based on nonparametric presmoothing is competitive, attaining better results (with less variance and less mean squared errors) in a large number of points.
To assess the effect of a misspecification of the logistic regression model in the estimation of the transition probabilities through the use of parametric presmoothing methods, a second simulation study was performed. In this scenario, data were generated according to the progressive three-state model. This model can be viewed as a particular case of the illness-death model where no transitions are observed on disease-free mortality transition. The vector of gap times ðZ, T À ZÞ is simulated as follows. The first gap time Z is simulated according to a mixture of two lognormal distributions, LogN(0, 3) and LogNð3, 0:2Þ, with equal probability. Given Z ¼ x, the second gap time, T 12 ¼ T À Z, is drawn from a Lognormal distribution with mean logðxÞ=2 and standard deviation equal to 0.2. This scenario does not follow the Markov assumption since the hazard for the second gap time depend on the time to progression to the intermediate state. The censoring time C was independently generated following a uniform distribution U½10, 30: Note that under this scenario, censored observations are expected to be concentrated near the center of the distribution and therefore revealing a misspecification of the fitted logistic regression model.
Results shown in Table 2 reveal the impact of a misspecification of the fitted logistic regression model in the estimation of the transition probabilities, leading to estimators with increased bias, in some points, and higher values for the MSE. In such cases the use of an estimator based on nonparametric presmoothing is preferable.

Example of application
For illustration purposes, we apply the proposed methods of Sec. 2 to data based on a case study presented by the PROVA study group which aimed to evaluate the effect of propranolol and sclerotherapy on risk of first variceal bleeding and survival in patients with liver cirrhosis. The trial included 286 patients in whom cirrhosis has been diagnosed in the past and where endoscopy had shown esophageal varices but who had not yet experienced a transfusion-requiring bleeding from the varices (PROVA Study Group 1991). From the 286 patients enrolled in the trial, 50 had a bleeding episode and among these 29 died. Forty six patients died without experienced variceal bleeding. The rest of the patients (190) remained alive and without variceal bleeding up to the end of the follow-up. The patients were observed for up to 42 months, with an average of 15 months. For each patient, the two event times (time to variceal bleeding and time to death) and the corresponding indicator statuses are recorded. The process can be modeled using a progressive illness-death model with transient states 'alive and without variceal bleeding', 'alive with variceal bleeding', and an absorbing state 'dead'.
The dataset for the PROVA study group is of small size with a high censoring percentage, near to 74%. The time alive and without variceal bleeding, Z, is more often censored in the time interval between 1 and 4 years. We have only 11 individuals that experienced variceal bleeding one year after enrollment in the study. A few (de Uña- Table 2. Bias and standard deviation (SD) for estimators of p ij ðs, tÞ: The relative MSEs are also given. Scenario 2: progressive three-state model.  , uncensored) observations for patients with a survival time greater than one year that died without experiencing variceal bleeding. At 90 days (3 months) we have 260 individuals alive and without variceal bleeding and only 11 individuals alive with variceal bleeding. From those 11 that experiencing variceal bleeding only 3 died leading to 73% of censored observations. Similarly, at 1 year after enrollment, we have 21 individuals alive with variceal bleeding and a few 5 complete observations (76% censoring). This means that it is expected that any estimator for the transition probabilities will behave poorly in these cases. Presmoothing will help to improve the estimation in these situations, in particular in the right tail of the distributions of Z and T (i.e., for larger times for being alive and without variceal bleeding, and also for larger values of the time since variceal bleeding, T -Z).
It is of practical interest to determine whether the Markov condition holds within a particular dataset in order to determine which method is more appropriate to estimate the transition probabilities. The Markov condition may by checked by studying the effect of the time of variceal bleeding on the mortality after bleeding. Results from fitting this covariate in a Cox model reported a significant coefficient (p-value ¼ 0.0003; regression coefficient: 0.0061) for the bleeding time revealing the failure of the Markov condition for the PROVA dataset. These findings are in agreement with those obtained by Andersen, Esbjerg, and Sorensen (2000) and Andersen and Keiding (2002).
In cases like this one, in which there is strong evidence that the process does not verifies the Markov property, the landmark estimators can be preferable due to their greater accuracy. Since the construction of these estimators is based on subsamples of the complete data, this will generally lead to small sample sizes and usually to heavily censored data as shown in Figure 5. This fact is more evident for the individuals with variceal bleeding (observed in State 1) but it is also present for those in the alive and without bleeding state (State 0) for higher landmarking times. To avoid this problem, a valid approach is to consider a modification of the landmark estimator based on presmoothing. Here we present some figures to illustrate the estimators obtained by replacing the censoring indicator variables in the classical definitions by values of several regression estimators.   (2015) whereas the estimator labeled as 'logit' correspond to the semiparametric presmoothed estimator introduced in Sec. 2.2 with a preliminary presmoothing based on a parametric binomial logistic family. For completeness purposes we also included semiparametric presmoothed estimators based on a different binomial family such as 'probit' or 'cauchit'. We also considered estimators with a preliminary presmoothing function based on an additive logistic model (labeled as 'logit.gam'). Finally, the estimator labeled as 'nonparametric' correspond to the presmoothed estimators with a preliminary presmoothing based on a nonparametric regression model using the Nadaraya-Watson kernel estimator. None of these methods require the process to be Markov.
As expected, all estimators report roughly the same estimates for lower values of t whereas some deviations are observed for higher values of t. It can also be seen that the original unsmoothed landmark estimator (de Uña-Alvarez and Meira-Machado 2015) reveals higher variability on the right hand side. Plots for the transition probabilitieŝ p 00 ðs, tÞ (first row) andp 11 ðs, tÞ (last row), s ¼ 90 and 365, report the survival fraction along time, among the individuals 'alive and without variceal bleeding' and among those with a 'bleeding episode'. Results for the estimated curves forp 00 ðs, tÞ, s ¼ 90 and 365, reveal also that the choice of the family function for the regression estimator can lead to major differences in the estimates. Some differences can also be observed in the right tail of the estimated transition probabilitiesp 01 ðs, tÞ (second row) andp 02 ðs, tÞ (third row). Plots forp 02 ðs, tÞ, report one minus the the survival fraction along time, among the individuals alive and without variceal bleeding at time s. Plots forp 01 ðs, tÞ, allow for an inspection along time of the probability of being 'alive with bleeding episode' for the individuals who are without variceal bleeding at time s. Since the 'bleeding' state is transient, these curves are first increasing and then decreasing. Major differences are observed in the right tail when comparing the methods based on a parametric presmoothing with their counterparts ('unsmoothed' and 'nonparametric') for the estimation of the transition probabilities. This can be possibly explained by a misspecification of the parametric model. The logistic model is likely the most common statistical model that is usually taken to apply to a binary dependent variable. The validity of a logistic model for the presmoothing function can be checked by applying a goodness-of-fit test such as the test proposed by Hosmer, Lemeshow, and Sturdivant (2013). Results obtained for all landmarking times (i.e., s) are depicted in Figure 7, revealed that the test was able to reject the logistic model when used to estimate the transition probabilitiesp 0j ð90, tÞ: Note that the choice of this parametric model is a common choice for a parametric presmoothing. However, curves depicted in Figure 6 reveal that the presmoothed landmark estimators with a preliminary presmoothing based on a nonparametric regression provides in all cases curves with the expected behavior, similar to those obtained from the nonparametric original landmark estimators (labeled as 'unsmoothed') but with a better description of the tail behavior. Since there is a clear risk of a misspecification of the parametric model, the use of estimators based on nonparametric presmoothing as those labeled with 'nonparametric' are preferable.

Software development
To provide the biomedical researchers with a comprehensive and easy-to-use tool for obtaining estimates of the transition probabilities we developed an R package (R Core Team 2018) called presmTP (Soutinho, Meira-Machado, and Oliveira 2019). The main function presmTP() can be used to calculate the estimated transition probabilities from the unsmoothed landmark estimators but presmoothed estimates can also be obtained through the use of a parametric family of binary regression curves, such as logit, probit or cauchit. The additive logistic regression model and nonparametric regression are also alternatives implemented in function presmTP(). Examples and details on the usage of the functions that compose the package can be obtained with the corresponding help pages. The package dependencies include the survPresmooth package (L opez-de-Ullibarri and J acome 2013) and the mgcv package (Wood 2019). presmTP is written entirely in the R programing language, and is available at the CRAN repository at https://cran.r-project.org/web/packages/presmTP.

Conclusions
There have been several recent contributions for the estimation of the transition probabilities in the context of multi-state processes that do not satisfy the Markov property. Many of these contributions were made for the illness-death model. Recently, the problem of estimating the transition probabilities in illness-death model has been reviewed, and new (landmark) estimators have been proposed which are built by considering specific subsets of individuals. As a weakness, the proposed estimators may provide large standard errors in estimation.
In this article the relative performance of several presmoothed landmark estimators for the transition probabilities was investigated through simulations. Attained results suggests that the use of presmoothing techniques can lead to competitive estimators that may outperform the original landmark estimators providing estimators with less variability. It is worth mentioning that presmoothing can introduce some bias in estimation while reducing the variance. This bias component is larger when there is some misspecification in the chosen parametric model. The risk of introducing a large bias through a misspecified model can be controlled in practice by testing the validity of the model applying a goodness-of-fit test. Nonparametric presmoothing is useful when there is no parametric candidate for the presmoothing function.
Software in the form of an R package was developed implementing the proposed methods.