Progress of chronic kidney disease and associated predictors among patients under treatment at Gambi and Felege-Hiwote hospitals
Study area and population and design
The current study was conducted in the capital city of Amhara region, Bahir Dar. Bahir City is located at a distance of 585 kg. from the capital city of Ethiopia (Addis Ababa). Bahir Dar City is surrounded by the initial place of Blue Nile river, what is known as Lake Tana. A hospital-based retrospective study was conducted among CKD outpatients attending at Felege Hiwot Referral Hospital (FHRH) and Gambi Teaching Hospital (GTH), Ethiopia, between September 2017 and January 2021. The two hospitals are popular referral hospitals and have a section served for treatment of normal range CKD(> 60) and CKD rage (15–60 ml/min/1.73m2).
Sampling procedures and sample size determination
During the study period, about 1723 (950 at Felege Hiwot and the rest 773 (at Gambi) CKD patients attended at the two hospitals. In sampling procedures, first the entire sample that should be taken from the two hospitals should be determined adopting Cochran’s formula. Cochran (1977) developed a formula to calculate a representative sample for proportions as19;
$$\:n\:=\frac{{z}^{2}pq}{{e}^{2}}$$
where \(\:n\) is the sample size, z is the selected critical value of desired confidence level, p is the estimated proportion of an attribute that is present in the population, q = p − 1 and e is the desired level of precision.
Suppose we want to calculate a sample size whose degree of variability is not known. Assuming the maximum variability, which is equal to 50% ( p = 0.5) and taking 95% confidence level with ± 5% precision, the calculation for required sample size will be as follows20; p = 0.5 and hence q = 1-0.5 = 0.5; e = 0.05; z = 1.96. So that \(\:n\) \(\:\approx\:\) 343.
After determined the entire sample size, the proportional allocation random sampling technique was conducted to take proportional samples from the two hospitals. To compute the sample size in each hospital, stratified random sapling technique was conducted as indicated below;
\(\:{n}_{i}\) =\(\:\frac{{N}_{i}}{N}\) *n, where i = 1,2, \(\:{N}_{i}\) is the entire population in the ith strata/group and \(\:{N}_{1}\) +\(\:{N}_{2}\) =n=1456, \(\:{n}_{i}\) is the sample size at the ith strata/group21. Hence, sample size at Felege-Hiwot hospital, \(\:{n}_{1}\) = \(\:\frac{950}{1723}\) *343 = 189. The remaining samples (154) were selected at Gambi teaching hospital that can be computed using the same formula. After determing the sample sizes for each hospital, a random sample of charts of each patient arranged based on their chart number, a systematic random sampling technique was used with intervals for Felege Hiwot = \(\:\frac{950}{189}\) = 5 and for Gambi= \(\:\frac{773}{154}\) = 5. Generally, about 189 random samples from Felege-Hiwot and 154 random sample of charts from Gambi teaching hospital were selected using systematic random sampling technique with equal interval of 5.
Inclusion criteria: CKD patients who had at least two visits in the two hospitals mentioned above were considered as potential candidate to be included in the current study.
Variables under study
Response variable
The response variable in the current study was the status of CKD. CKD is evaluated using two simple tests namely a blood test known as the estimated glomerular filtration rate (eGFR) ans urine test known as the urine albumin-creatinine ratio (uACR) in which both test need to have a clear picture of your kidney health. In the current study, estimation of the glomerular filtration rate (GFR) used considering the three categories namely: “normal range (if GFR > 60)”, “CKD range (if GFR 15–60)” and “end-stage(if GFR < 15 ml/min/1.73 m2)22 ”. Kidney patients at the third stage (end-stage) were forced to kidney transplantation or hemodialysis and the data for such patients were not available during data collection for this study. Hence, among the three categories, only the first two (Normal and CKD ranges) were considered in this study. Therefore, the response variable was coded as 1 for “normal range” and 0 for “CKD range”23. Among the different formula used for the calculation of status of CKD, the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation was used in this study24.
Predictor variables
The predictor variables in this study were: age (in years), sex (male, female), residential area (urban, rural), use of salt (yes, no), hypertension (HTN) (yes, no), diabetes mellitus type-I (DM-I) (yes, no), diabetes mellitus type-II patients (DM-II) (yes, no), serum creatinine (SCr) in mg/dl, blood urea nitrogen (BUN) in mg/dl, hematocrit (HCT) in mg/dl, and urinary protein (Up) measured in mg/dl. The categories of predictor variables were based on previous studies25.
Statistical analysis
A first-order Markov chain estimation technique models a stochastic process where the probability of transitioning to the next state depends only on the current state. The process is typically represented by a transition matrix, where each cell (Pij) indicates the probability of transitioning from state i to state j.
Markov chains are usually used in modeling many practical problems. They are also effective in modeling repeated measures on the same individual or patient. In this study, a first order Markov chain model was used to analyze and predict the status of CKD. The results from the previous studies show that the performance and effectiveness of the Markov chain model to predict the repeated measures is very well26.
The chain of successive events is called a Markov process, which is discrete/categories when the event is happened at fixed times. A discrete Markov process is well applicable in case of failure rate λi of a component i is constant and probability of functioning reliably at time t, Pi(t). For a small time increment Δt it means that the probability change of i being still reliable is:
\({P_{\text{i}}}(t+Dt)={P_{\text{i}}}\left( t \right){e^{ – \lambda {\text{iDt}}}} \approx {P_{\text{i}}}\left( t \right)({\text{1}} – {\lambda _{\text{i}}}Dt).\)
If all λi are the same the process is called homogeneous, but usually for a system this will not be the case and is the process called semi-Markov. The equation describing the probability of system states P of a homogeneous process as a function of time t is:
\({P_j}t+\Delta t=\sum\nolimits_{{k \ne j}} {{P_k}t{\lambda _{{\text{kj}}}}\Delta t+{P_j}t1} – \sum\nolimits_{{k \ne j}} {{\lambda _{{\text{jk}}}}\Delta ,}\)
in which λkj is the rate of transition of the system from state k to state j and λjk back to k.
Random effects
Individual random effects \(\:{\:b}_{t}\)~ N(0, σ2) were incorporated as an additional term to the linear predictor in the above continuous regression model, leading to the regression model with a random intercept27.
Intercept only model In the current study, it was assumed that the average normal status in CKD in outpatients as a constant from one visit to another. The model in this regard is written as;
$$\:logit\left({\pi\:}_{it}\right)=\text{l}\text{o}\text{g}\left(\frac{{\pi\:}_{it}}{1-{\pi\:}_{it}}\right)=\text{log}\left(\frac{p\:\left(y=1|{x}_{it},\beta\:\right)}{1-p\:\left(y=1|{x}_{it},\beta\:\right)}\right)={{\upbeta\:}}_{0}{\:+\:b}_{t0\:}$$
,
where logit(πit) is the odds of having normal status, β0 and bt0 are the intercepts for fixed and random effects, respectively, for the value of p (\(\:{Y}_{it}=1)\:\)(i.e., probability of having normal status in outpatients).
Random intercept model: The random intercept model was assumed as subject-specific changes in the average chance of being classified as normal. The model in this case is written as follows, considering the variation assumed to be constant.
$$\:\:logit\left({\pi\:}_{it}\right)=\text{\:log}\left(\frac{{\pi\:}_{it}}{1-{\pi\:}_{it}}\right)=\text{\:log}\left(\frac{p\:\left(y=1|{x}_{it},\beta\:\right)}{1-p\:\left(y=1|{x}_{it},\beta\:\right)}\right)=\sum\:_{i=0}^{k}{\beta\:}_{i}{x}_{it}{+\:b}_{t0}$$
,
where \(\:logit\left({\pi\:}_{it}\right)\) the log odds of normal status, \(\:{x}_{it}\) is the design matrix of fixed effect variables; \(\:\beta\:\) are the vector of regression coefficients; and\(\:{\:b}_{t0}\) is the random intercept.
Handling missing observations
There are three missing data mechanisms28. The first one is missing completely at random (MCAR), which refers to missingness such that the missing values are independent of both the unobserved and the observed values of the variable of interest. The second is missing at random (MAR) which can be occurred when missing values depend on only the observed values of the dependent variable, but are independent of unobserved values of the same variable. The third mechanism of missingness is referred to as missing not at random (MNAR), which is neither MCAR nor MAR28.
Under MCAR mechanism, the probability of an observation being missing is independent of the responses. Therefore, the probability density of \(\:{k}_{i}\) is29.
\({\text{f}}({k_i}/{y_i},{w_i},b)={\text{f}}({k_i}/{w_i},b)\)
Under MAR mechanism, the probability of data missing is conditionally independent of the unobserved data. That is30,
\({\text{f}}({k_i}/{y_i},{w_i},b)={\text{f}}({k_i}/y_{i}^{o}{w_i},b)\)
Therefore, the distribution of the observed data (\(y_{i}^{o}\) ) can be partitioned as:
\({\text{f}}(y_{i}^{o},{k_i}|{{\mathbf{X}}_i},{{\mathbf{Z}}_i},{w_i},\theta ,{\mathbf{b}})={\text{f}}(y_{i}^{o}|{{\mathbf{X}}_i},{{\mathbf{Z}}_i},\theta ){\text{f}}({k_i}|y_{i}^{o}{w_i},{\mathbf{b}})\)
where \(\:{\:\mathbf{X}}_{i}\), \(\:{\mathbf{Z}}_{i}\:\)and \(\:{\mathbf{W}}_{\text{i}}\) are design matrices for fixed effects, random effects and missing data process, respectively; \(\:{\uptheta\:}\:\text{a}\text{n}\text{d}\:\mathbf{b}\) are vectors that parameterize the joint distribution. The value of \(\:\varvec{b}\) describes the measurement and missingness process.
Since individuals with missing values(\(y_{i}^{m}\)) are not included as samples for data analysis, the complete case provides unbiased estimates; and this time, likelihood-based methods yield valid estimates31.
In the MNAR case, neither MCAR nor MAR holds true and the probability of a measurement being missing depends on unobserved outcomes32. The joint distribution of measurements and the missingness process is written as33:
\({\text{f}}(y_{i}^{o},{k_i}|{{\mathbf{X}}_i},{{\mathbf{Z}}_i},{{\mathbf{w}}_i},{\mathbf{b}})=\int {{\text{f}}({{\mathbf{y}}_i}|{{\mathbf{X}}_i},{{\mathbf{Z}}_i},{{\mathbf{w}}_i}){\text{f}}({k_i}|{{\mathbf{y}}_i},{\mathbf{b}})d{\mathbf{y}}_{i}^{m}}\) and it is impossible to have simplified form of this joint distribution.
Check for missing completely at random (MCAR) in longitudinal data analysis
If the missing data is MCAR, the means of the two data sets, obtained by categorizing the data by variable \(\:{k}_{ij}\) will not be differed34.
The logistic regression model can also be used to check the MCAR assumption35. Let,
\({p_{ij}}=P({k_{ij}}=0|{y_{i(j – 1)}})\), then the logistic regression model that can be fitted to check the MCAR assumption is:
\(\begin{gathered} \ln \left( {\frac{{{p_{ij}}}}{{1 – {p_{ij}}}}} \right)={\beta _0}+{\beta _1}{y_{i(j – 1)}}+{\mathbf{X}}_{i}^{T}{\beta _2}+\left( {{y_{i(j – 1)}} \times {\mathbf{X}}_{i}^{T}} \right){\beta _3}+{\varepsilon _{ij}} \hfill \\ {\text{with}}\;{p_{ij}}=\left\{ {\begin{array}{*{20}{l}} 1&{{\text{if}}\;{y_{ij}}\;{\text{is}}\;{\text{observed}}} \\ 0&{{\text{if}}\;{\text{otherwise}}} \end{array}} \right. \hfill \\ \end{gathered}\)
Where \(\:{\beta\:}_{2}\) and\(\:{\:\beta\:}_{3}\) are vectors of regression coefficients associated with covariate \(\:{X}_{i}^{T}\) and the interaction of \(\:{X}_{i}^{T}\) with\(\:{y}_{i(j-1)}\) respectively. Under MCAR, \(\:{\beta\:}_{1}\)= 0 and \(\:{\beta\:}_{3}\)=0. This indicates that dropouts are random and independent of the response.
To assess the trend of missigness in the longitudinal trajectory of our data, two-way interaction of the previous result with other covariates were included in the model one at a time and their significance were also tested36 .
Hence, a logistic regression was conducted to assess whether or not missing values were affected by previous results; and this indicated that dropouts were independent of the previous outcomes (χ21 = 0.2018, p = 0.864). Hence, dropout patients did not have reasons from their previous visits; therefore dropout trend/pattern was Missed Completely at Random (MCAR)37. and Missing observations were managed using multiple imputations techniques conducted for 20 imputations on the variables involved in the model38.
Model selection
Essentially, alternative models for the same dataset exist by varying the dependency between two successive visits by each individual in the dataset. The models were: a null model (Null), a full model with independence of visits (ind), an independence with random intercept (indR), a first order Markov chain (MC1), and a first order Markov chain with random intercept (MC1R). The receiver operating characteristic curves (ROCs) were employed in the study to identify the best fit model from those Markov dependencies. Therefore, the model with the highest values of AUC (area under the ROC curve) was considered as the best fit to the model. The data were coded and cleaned using SPSS version 26 and analyzed by R version 4.1.1 with the “bild” package27. Finally, a statistical significance test was taken at a 5% level of significance.
Model adequacy
Model adequacy in binary classification refers to evaluating how well a model fits the data and makes accurate predictions. It’s crucial to assess model adequacy to ensure that inferences drawn from the model are reliable and avoid misleading conclusions39. In the current study, goodness-of-fit and model adequacy was assessed over the constellation of fitted values determined by the covariate patterns in the model, not the total collection of covariates40. Overall assessment of fit was conducted using a combination of Likelihood Ratio Test and ROC curve analysis for adequacy of the fitted model41. Previous study also illustrated concisely that the fitting process of binary logistic regression model under standard assumptions had the predictive ability of the model under different aspects42.
Ethics procedures for the current study
For the current study, Ethical clearance was obtained at the office of Research and Community engagement vice president, Bahir Dar university, Ethiopia with reference number: RCS/1412/2017. Hence, all the methods were performed in accordance with the relevant guidelines and regulations. The secondary data were obtained with legal ethical clearance given from the university’s vice president for research and community engagement.
link
