Next Article in Journal
The Domain of Residual Lifetime Attraction for the Classical Distributions of the Reliability Theory
Next Article in Special Issue
Single Imputation Methods and Confidence Intervals for the Gini Index
Previous Article in Journal
Deciphering Genomic Heterogeneity and the Internal Composition of Tumour Activities through a Hierarchical Factorisation Model
Previous Article in Special Issue
A Map of the Poor or a Poor Map?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of the Average Kappa Coefficients of Two Binary Diagnostic Tests with Missing Data

1
Department of Statistics, School of Medicine, University of Granada, 18016 Granada, Spain
2
Epidemiology and Public Health Research Unit and URMCD, School of Medicine, University of Nouakchott Alaasriya, Nouakchott BP 880, Mauritania
*
Author to whom correspondence should be addressed.
Submission received: 8 October 2021 / Revised: 2 November 2021 / Accepted: 5 November 2021 / Published: 8 November 2021

Abstract

:
The average kappa coefficient of a binary diagnostic test is a parameter that measures the average beyond-chance agreement between the diagnostic test and the gold standard. This parameter depends on the accuracy of the diagnostic test and also on the disease prevalence. This article studies the comparison of the average kappa coefficients of two binary diagnostic tests when the gold standard is not applied to all individuals in a random sample. In this situation, known as partial disease verification, the disease status of some individuals is a missing piece of data. Assuming that the missing data mechanism is missing at random, the comparison of the average kappa coefficients is solved by applying two computational methods: the EM algorithm and the SEM algorithm. With the EM algorithm the parameters are estimated and with the SEM algorithm their variances-covariances are estimated. Simulation experiments have been carried out to study the sizes and powers of the hypothesis tests studied, obtaining that the proposed method has good asymptotic behavior. A function has been written in R to solve the proposed problem, and the results obtained have been applied to the diagnosis of Alzheimer's disease.

1. Introduction

Diagnostic tests are fundamental in the current practice of medicine. A diagnostic test is a medical test that is applied to an individual to determine the presence or absence of a disease [1]. Diagnostic tests can be binary, ordinal or continuous. Binary tests give two possible results: positive or negative. An antigen test for the diagnosis of COVID-19 is an example of a binary diagnostic test. Ordinal tests classify the presence of the disease in different ordinal categories. For example, in the diagnosis of breast cancer, malignant lesions can be classified as “malignant, suspicious, probably benign, benign or normal”. With respect to continuous tests, these give rise to continuous values, for example procalcitonin for the diagnosis of infective endocarditis. The efficacy of a diagnostic test is evaluated against a gold standard. A gold standard (GS) is a medical test that objectively determines whether or not an individual has the disease. For example, a biopsy for the diagnosis of cancer. This article focuses on binary diagnostic tests.
The fundamental measures to evaluate the effectiveness of a binary diagnostic test (BDT) are sensitivity and specificity. Sensitivity is the probability that the test result is positive when the individual has the disease, and specificity is the probability that the test result is negative when the individual does not have the disease. The sensitivity and specificity of a BDT depend on the physical, chemical or biological bases with which the test has been developed. When evaluating the effectiveness of a BDT considering the losses associated with misclassification with the BDT, the parameter used is the weighted kappa coefficient [1,2]. The weighted kappa coefficient is a parameter that measures the beyond chance agreement between BDT and GS [1,2], and depends on the sensitivity and specificity of BDT, on the disease prevalence and on the weighting index. The weighting index is a measure of the relative importance between false positives and false negatives. In practice, the weighting index c is set by the clinician depending on the clinical use of the BDT (for example, confirmatory test or screening test) and the clinician’s knowledge of the importance of a false positive and a false negative. If the BDT is to be used as a confirmatory test, then the weighting index takes a value between 0 and 0.5. If the BDT is to be used as a screening test, then the weighting index takes a value between 0.5 and 1. The problem with the weighted kappa coefficient is the assignment of values to the weighting index c, since the clinician does not always have a knowledge that allows him to decide how important a false positive is compared to a false negative. Even in the same problem, two clinicians can assign different values to the weighting index. Roldán-Nofuentes and Olvera-Porcel [3] have defined and studied a new measure to evaluate the effectiveness of a BDT: the average kappa coefficient. The average kappa coefficient depends only on the intrinsic accuracy (sensitivity and specificity) of the BDT and on the disease prevalence, and is a parameter that does not depend on the weighting index. Therefore, the average kappa coefficient is a parameter that solves the problem of assigning values to the weighting index. Average kappa coefficient is a measure of the average beyond-chance agreement between the BDT and the GS [3].
Comparison of the effectiveness of two BDTs is a topic of special interest in the study of statistical methods for the diagnosis of diseases. The most frequent type of sampling to compare two BDTs is the paired design, which consists of applying the two BDTs to all individuals in a random sample whose disease status is known by applying a GS. Bloch [4] has studied the comparison of the weighted kappa coefficients of two BDTs under a paired design, and Roldán-Nofuentes and Luna [5] have extended the study of Bloch to the situation in which the weighted kappa coefficients of more than two BDTs are compared. Roldán-Nofuentes and Olvera-Porcel [6] has studied the comparison of the average kappa coefficients of two BDTs under a paired design. However, in clinical practice the GS is not always applied to all individuals in the sample. Consequently, the disease state is unknown for a subset of individuals in the sample. This problem is known as partial verification of disease [7,8]. Zhou [9] has studied a hypothesis test to compare the sensitivities (specificities) of two BDTs in the presence of partial verification, applying the maximum likelihood method. If in this situation the two sensitivities (specificities) are compared, eliminating the individuals whose disease status is unknown, the estimates obtained are biased (the estimators are affected by the so-called verification bias [7]) and the results may be incorrect [9]. Harel and Zhou [10] have compared the sensitivities (specificities) of two BDTs using confidence intervals applying multiple imputation, and Roldán-Nofuentes and Luna [11] have compared the sensitivities (specificities) by applying the EM and the SEM algorithms. Roldán-Nofuentes and Luna [12] have studied a hypothesis test to compare the weighted kappa coefficients of two BDTs in the presence of partial verification of the disease, applying the maximum likelihood method. Regarding the average kappa coefficient, Roldán-Nofuentes and Regad [13] have studied the estimation of this parameter when only a single BDT is evaluated in the presence of partial verification, applying the maximum likelihood method and multiple imputation. The comparison of the average kappa coefficients of two BDTs has never been studied in the presence of partial verification. In this situation, if the weighted kappa coefficients are compared, eliminating the unverified individuals with the GS, then the estimators of the weighted kappa coefficients are biased [12], and therefore the estimators of the average kappa coefficients, and the conclusions can also be incorrect. Consequently, the method of Roldán-Nofuentes and Olvera-Porcel [6] cannot be applied in the presence of partial verification.
In this article, the comparison of the average kappa coefficients of two BDTs in the presence of partial verification of the disease is studied. Therefore, the objective of our manuscript is to study a hypothesis test to compare the average kappa coefficients of two BDTs in the presence of partial verification, a topic that has never been studied. This article is an extension of the article by Roldán-Nofuentes and Olvera-Porcel [6] to the situation in which the GS does not apply to all the individuals in the sample, and is also an extension of the article by Roldán-Nofuentes and Regad [13] to the situation where two BDTs are compared in the presence of partial verification. The article is structured as follows. In Section 2 the average kappa coefficient and its properties are presented. In Section 3 we study the comparison of the weighted kappa coefficients of two BDTs in the presence of partial verification of the disease, applying two computational methods: the EM algorithm and the SEM algorithm. In Section 4, a function written in R is presented to solve the problem and simulation experiments are carried out to study the size and power of the method to solve the hypothesis test for the comparison of the two average kappa coefficients. In Section 5 the results are applied to the diagnosis of Alzheimer disease, and in Section 6 the results obtained are discussed.

2. Average Kappa Coefficient

Let us consider two BDTs, Test 1 and Test 2, whose performances are compared with respect to the same GS. Let L ( L ) be the loss that occurs when a BDT gives a negative (positive) result for a diseased (non-diseased) patient. Loss L is associated with a false negative and loss L is associated with a false positive [1,2]. Losses are assumed to be zero when a BDT correctly classifies a diseased patient or a non-diseased patient [1,2]. For example, let us consider the diagnosis of renal cell carcinoma using the MOC 31. If the MOC 31 is positive for an individual without the renal carcinoma (false positive), the individual will undergo a renal biopsy which will be negative. Loss L is determined by the economic costs of the diagnosis and also by the risk, stress, etc, caused to the individual. If the MOC 31 is negative for an individual with renal carcinoma (false negative), the individual will be diagnosed later, but the cancer will progress and get worse, decreasing the chance that treatment will be successful. Loss L is determined from this situation. Therefore, losses L and L are measured in terms of economic costs and in terms of risks, stress, etc [1,2], so in clinical practice it is not possible to know L and L . Let T be the binary random variable that models the result of the BDT, in such a way that T = 1 when the result is positive and T = 0 when the result is negative. Let D be the binary random variable that models the result of the GS, in such a way that D = 1 when the individual has the disease and D = 0 when the individual does not have the disease. In Table 1, we show the losses and probabilities associated with the assessment of a BDT in relation to a GS, where Se is the sensitivity, Sp the specificity and p the disease prevalence.
In terms of the losses and probabilities in Table 1, the expected loss [4] is p 1 S e L + q 1 S p L and the random loss [4] is p p 1 S e + q S p L + q p S e + q 1 S p L , with q = 1 p . The expected loss is the loss that occurs when erroneously classifying a diseased or non-diseased individual with the BDT. The expected loss varies between zero and infinity. The random loss is the loss that occurs when the BDT and the GS are independent, i.e., when P T = i D = j = P T = i . In terms of these losses, the weighted kappa coefficient is defined as [1,2,4]
κ = Random   loss Expected   loss Random   loss min Expected   loss = Random   loss Expected   loss Random   loss ,
since min Expected   loss = 0 . Performing algebraic operations, the weighted kappa coefficient is written as [1,2,4]
κ h c = p q Y h p c 1 Q h + q 1 c Q h ,   0 c 1 ,   h = 1 , 2 ,
where Y h = S e h + S p h 1 is the Youden index [14] of the hth Test, Q h = p S e h + q 1 S p h is the probability that the hth Test is positive and c = L / L + L is the weighting index. The weighted kappa coefficient of the hth Test can also be written as
κ h c = κ h 0 κ h 1 c κ h 0 + 1 c κ h 1 , 0 c 1 ,
where
κ h 0 = S p h 1 Q h Q h   and   κ h 1 = S e h Q h 1 Q h .
As L and L are unknown, the clinician sets the value of the weighting index based on the relative importance between false positives and false negatives [1,2]. If the clinician considers that false positives are more important than false negatives, as is the situation in which the BDT is used as a confirmatory test prior to the application of a risk treatment (for example a surgical operation), then L > L and 0 c < 0.5 . For example, if a false positive is four times more important than a false negative then L = 4 L and c = 1 / 1 + 4 = 1 / 5 . If the clinician considers that false negatives are more important than false positives, as is the situation in which the BDT is used as a screening test, then L > L and 0.5 < c 1 . For example, if a false negative is three times more important than a false positive then L = 3 L and c = 3 / 3 + 1 = 3 / 4 . Value c = 0.5 is used when false positives and false negatives have the same importance, being κ 0.5 the Cohen kappa coefficient. The weighted kappa coefficient has the following properties [1,2,4]:
1.
If S e h = S p h = 1 then κ c = 1 , and the agreement between Test and GS is perfect.
2.
If S e h = 1 S p h then κ h c = 0 , and the Test and the GS are independent.
3.
Weighted kappa coefficient is a function of the index c, which is increasing if Q > p , decreasing if Q < p , or equal to the Youden index if Q = p .
The weighted kappa coefficient can be classified in the following scale of values [15]: 0 0.20 , slight; 0.21 0.40 , fair; 0.41 0.60 , moderate; 0.61 0.80 , substantial; and 0.81 1 , almost perfect. Another scale based on levels of clinical significance is [16]: < 0.40 , poor; 0.40 0.59 , fair; 0.60 0.74 , good; and 0.75 1 , excellent.
Roldán-Nofuentes and Olvera-Porcel [3] have proposed a new measure to evaluate and to compare BDTs: the average kappa coefficient. If L > L , and therefore 0 c < 0.5 , the average kappa coefficient of the hth Test is [3]
κ h 1 = 1 0.5 0 0.5 κ h c d c = 2 κ h 0 κ h 1 κ h 0 κ h 1 ln κ h 0 + κ h 1 2 κ h 1 , p Q h Y h , p = Q h ,
i.e., the average kappa coefficient is the average value of κ h c when 0 c < 0.5 . If L > L and therefore 0.5 < c 1 , the average kappa coefficient of the hth Test is [3]
κ h 2 = 1 0.5 0.5 1 κ h c d c = 2 κ h 0 κ h 1 κ h 0 κ h 1 ln 2 κ h 0 κ h 0 + κ h 1 , p Q h Y h , p = Q h ,
i.e., the average kappa coefficient is the average value of κ h c when 0.5 < c 1 . As the weighted kappa coefficient is a measure of the beyond-chance agreement between a BDT and the GS, the average kappa coefficient is a measure of the average beyond-chance agreement between a BDT and a GS [3], and does not depend on the weighting index c . As κ h 0 and κ h 1 depend on S e h , S p h and p , then κ h 1 and κ h 2 also depend on these same parameters. The values of the average kappa coefficient can be classified on the same scales [15,16] as the values of the weighted kappa coefficient [3]. The average kappa coefficients κ h 1 and κ h 2 have the following properties [3]:
1.
If S e h = S p h = 1 then κ h 1 = κ h 2 = 1 , and if S e h = 1 S p h then κ h 1 = κ h 2 = 0 . Therefore 0 κ h i 1 , i = 1 , 2 .
2.
κ h 1 > κ h 2 if p > Q h and κ h 1 < κ h 2 if Q h > p .
3.
κ h 1 minimizes 2 0 0.5 κ h c x 2 d c and κ h 2 minimizes 2 0.5 1 κ h c x 2 d c . Therefore, when x = κ h 1 ( x = κ h 2 ) the first (second) expression is the variance of κ h c around κ h 1 ( κ h 2 ).
4.
For fixed values of κ h 0 and κ h 1 , the weighted kappa coefficient κ h c is a function of c which is continuous in the interval 0 , 1 . Therefore, the average kappa coefficient κ h i is equal to a value of κ h c in the interval 0 , 1 . This value of κ h c has a value of weighting index c . So, as κ h i = κ h c for some value of c , from Equation (1) and for a specific sample it is possible to calculate the value of c associated to the estimated of κ h i . Therefore, the estimation of κ h i allows estimating how much greater (or less) the loss L is than the loss L .
Next, the comparison of the average kappa coefficients of two BDTs in the presence of partial verification of the disease is studied.

3. Comparison of Average Kappa Coefficients

The objective of this manuscript is to study the hypothesis tests
H 0 : κ 11 = κ 21   vs   H 1 : κ 11 κ 21
and
H 0 : κ 12 = κ 22   vs   H 1 : κ 12 κ 22
when not all patients in a random sample are verified with the GS. The first hypothesis test is used when the clinician considers that L > L ( 0 c < 0.5 ) and the second hypothesis test is used when the clinician considers that L > L ( 0.5 < c 1 ). Both hypothesis tests will be solved by applying two computational methods: the EM algorithm and the SEM algorithm. The EM algorithm [17] is a classic method to estimate parameters with missing data, and the SEM (Supplemented EM) algorithm [18] is a method that allows estimating the variances-covariances of a vector of parameters from the results obtained by applying the EM algorithm.
In the problem posed here, the sample design is as follows: two BDTs are applied to all individuals of a random sample sized n and the GS is applied only to a subset of the n individuals. This situation gives rise to Table 2, where T h is the binary random variable that models the result of the hth Test ( T h = 1 when the Test is positive and T h = 0 when it is negative), V is the binary random variable that models the verification process ( V = 1 when the disease status of an individual is verified with the GS and V = 0 when the disease status of an individual is not verified with the GS), and D is the binary random variable that models the GS ( when the individual verified with the GS has the disease and D = 1 when the individual verified with the GS does not have the disease and D = 0 when the individual verified with the GS does not have the disease). In this table, each frequency s i j ( r i j ) is the number of diseased (non-diseased) individuals in which T 1 = i and T 2 = j ( i , j = 0 , 1 ), each frequency u i j is the number of individuals not verified with the GS in which and T 1 = i and T 2 = j , s = i , j = 0 1 s i j , r = i , j = 0 1 r i j , u = i , j = 0 1 u i j , n i j = s i j + r i j + u i j and n = s + r + u = i , j = 0 1 n i j .
Let S e h = P T h = 1 D = 1 and S p h = P T h = 0 D = 0 be the sensitivity and the specificity of the hth Test, let p = P D = 1 be the disease prevalence, and let λ i j k = P V = 1 T 1 = i , T 2 = j , D = k be the probability of verifying with the GS an individual with results T 1 = i , T 2 = j and D = k , with h = 1 , 2 and i , j , k = 0 , 1 . Assuming that the verification process is missing at random (MAR) [19], i.e., that the probability of verifying with the GS the disease status of an individual only conditionally depends on the results of both BDTs, then λ i j k = λ i j = P V = 1 T 1 = i , T 2 = j . If the disease status of an individual is not verified with the GS, this individual can be considered as a missing value of the disease status, and then missing data analysis methods can be used to compare two BDTs in the presence of partial verification of the disease. The MAR assumption has been widely used in this context to compare parameters of two BDTs [9,10,11,12]. Assuming the MAR assumption, the frequencies in Table 1 are the product of a multinomial distribution sized n, whose probabilities are:
ξ i j = P V = 1 , D = 1 , T 1 = i , T 2 = j = p λ i j S e 1 i 1 S e 1 1 i S e 2 j 1 S e 2 1 j + δ i j S e 1 S e 2 α 1 1 , ψ i j = P V = 1 , D = 0 , T 1 = i , T 2 = j = q λ i j S p 1 1 i 1 S p 1 i S p 2 1 j 1 S p 2 j + δ i j 1 S p 1 1 S p 2 α 0 1 , ζ i j = P V = 0 , T 1 = i , T 2 = j = 1 λ i j λ i j ξ i j m + ψ i j m ,
where q = 1 p , δ i j = 1 if i = j and δ i j = 1 if i j , α 1 ( α 0 ) is the covariance [20] between the two BDTs when D = 1 ( D = 0 ), verifying that
1 α 1 1 max S e 1 , S e 2   and   1 α 0 1 max 1 S p 1 , 1 S p 2 ,
and i , j = 0 1 ξ i j + i , j = 0 1 ψ i j + i , j = 0 1 ζ i j = 1 . If α 1 = α 0 = 1 then the two BDTs are conditionally independent on the disease, a situation which is not realistic in practice so that α 1 > 1 and/or α 0 > 1 . Solving the system of equations κ h 0 = S p h 1 Q h / Q h and κ h 1 = S e h Q h / 1 Q h , with h = 1 , 2 , it is obtained that
S e h = p κ h 1 + q κ h 0 κ h 1 q κ h 0 + p κ h 1   and   S p h = q κ h 0 + p κ h 0 κ h 1 q κ h 0 + p κ h 1 ,
and substituting these expressions in Equation (6), the probabilities of the multinomial distribution are obtained in terms of the weighted kappa coefficients. Next we apply the EM algorithm to obtain the estimates of the parameters.
The maximum likelihood (ML) estimates of the parameters are obtained by applying the EM algorithm [17]. The EM algorithm is a computational method that allows estimating parameters in the presence of missing data, and it is a method widely used in statistics to solve estimation problems in different areas, for example in industrial engineering [21] and in epidemiology [22]. Next, we carry out a reparametrization of the EM algorithm that allows us to estimate the weighted kappa coefficients of the two BDTs (and therefore the average kappa coefficients), the covariances and the disease prevalence. In Table 2 the missing data is the true disease status of the individuals who are not verified with the GS; this information is reconstructed in the E step of the EM algorithm. In the M step the ML estimates are imputed. Let us assume that that among the u i j individuals not verified with the GS, y i j have the disease and u i j y i j do not have the disease. Then the data can be expressed in the form of a 2 × 4 table with frequencies s i j + y i j for D = 1 and r i j + u i j y i j , with i , j = 0 , 1 . Let θ = κ 1 0 , κ 1 1 , κ 2 0 , κ 2 1 , p , α 1 , α 0 T be the vector of parameters. From the complete data, the log-likelihood function based on n individuals is
l θ = i , j = 0 1 s i j + y i j ln ϕ i j + i , j = 0 1 r i j + u i j y i j ln φ i j ,
where
ϕ i j = P T 1 = i , T 2 = i , D = 1 = p S e 1 i 1 S e 1 1 i S e 2 j 1 S e 2 1 j + δ i j S e 1 S e 2 α 1 1 , φ i j = P T 1 = i , T 2 = i , D = 0 = q S p 1 1 i 1 S p 1 i S p 2 1 j 1 S p 2 j + δ i j 1 S p 1 1 S p 2 α 0 1 .
In these probabilities, covariances α 1 and α 0 verify Equation (7), S e h and S p h are given by Equation (8), and it is verified that i , j = 0 1 ϕ i j + i , j = 0 1 φ i j = 1 . The vector θ is estimated by applying the EM algorithm. Let y i j m be the value of y i j in the mth iteration of the EM algorithm and y m = i , j = 0 1 y i j m . ML estimate of θ in the mth iteration, θ ^ m , is:
κ ^ 1 m 0 = j = 0 1 s 1 j + y 1 j m × j = 0 1 r 0 j + u 0 j y 0 j m j = 0 1 s 0 j + y 0 j m × j = 0 1 r 1 j + u 1 j y 1 j m r + u y m n 10 + n 11 , κ ^ 1 m 1 = j = 0 1 s 1 j + y 1 j m × j = 0 1 r 0 j + u 0 j y 0 j m j = 0 1 s 0 j + y 0 j m × j = 0 1 r 1 j + u 1 j y 1 j m s + y m n 00 + n 01 , κ ^ 2 m 0 = i = 0 1 s i 1 + y i 1 m × i = 0 1 r i 0 + u i 0 y i 0 m i = 0 1 s i 0 + y i 0 m × i = 0 1 r i 1 + u i 1 y i 1 m s + r y m n 01 + n 11 , κ ^ 2 m 1 = i = 0 1 s i 1 + y i 1 m × i = 0 1 r i 0 + u i 0 y i 0 m i = 0 1 s i 0 + y i 0 m × i = 0 1 r i 1 + u i 1 y i 1 m s + x m n 00 + n 10 , p ^ m = s + y m n , α ^ 1 m = s + y m s 11 + y 11 m i = 0 1 s i 1 + y i 1 m j = 0 1 s 1 j + y 1 j m , α ^ 0 m = r + u y m r 11 + u 11 y 11 m i = 0 1 r i 1 + u i 1 y i 1 m j = 0 1 r 1 j + u 1 j y 1 j m .
The ML estimate of θ in the m + 1 th iteration, θ ^ m + 1 , is calculated applying the previous equations substituting m with m + 1 , where
y i j m + 1 = u i j ϕ ^ i j k ϕ ^ i j k + φ ^ i j k , i , j = 0 , 1 ,
and where ϕ ^ i j m ( φ ^ i j m ) is the estimate of ϕ i j ( φ i j ) in the mth iteration and it is obtained substituting in ϕ i j ( φ i j ) the parameters with their respective estimates obtained in the mth iteration of the algorithm. As initial value y i j 0 one can take any value 0 y i j 0 u i j , i , j = 0 , 1 . The EM algorithm stops when the difference between the values of the log-likelihood functions of two consecutive iterations is equal to or less than a value δ , for example δ = 10 12 . If the EM algorithm converges in M iterations, θ ^ = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T is the final estimate obtained. The estimates of the weighted kappa coefficients obtained by applying the EM algorithm converge to the ML estimates (proof can be seen in Appendix A). Figure 1 shows the flowchart of the EM algorithm to estimate θ .
Once the value of κ ^ h 1 and κ ^ h 0 have been imputed, the estimates of average kappa coefficients are easily calculated by applying Equations (2) and (3), i.e.,
κ ^ h 1 = 2 κ ^ h 0 κ ^ h 1 κ ^ h 0 κ ^ h 1 ln κ ^ h 0 + κ ^ h 1 2 κ ^ h 1 , p ^ Q ^ h Y ^ h , p ^ = Q ^ h ,
and
κ ^ h 2 = 2 κ ^ h 0 κ ^ h 1 κ ^ h 0 κ ^ h 1 ln 2 κ ^ h 0 κ ^ h 0 + κ ^ h 1 , p ^ Q ^ h Y ^ h , p ^ = Q ^ h .
The estimates of S e h and S p h are calculated as:
S ^ e h = p ^ κ ^ h 1 + q ^ κ ^ h 0 κ ^ h 1 q ^ κ ^ h 0 + p ^ κ ^ h 1   and   S ^ p h = q ^ κ ^ h 0 + p ^ κ ^ h 0 κ ^ h 1 q ^ κ ^ h 0 + p ^ κ ^ h 1 ,   h = 1 , 2 ,
where q ^ = 1 p ^ . Once the ML estimates have been obtained, it is necessary to estimate their variances-covariances. For this we apply the Supplemented EM algorithm.
The variance-covariance matrix of θ ^ is estimated by applying the supplemented EM (SEM) algorithm [18]. The SEM algorithm is a computational method which estimates the variances-covariances matrix from the calculations obtained by applying the EM algorithm. Dempster et al. [17] have shown that the matrix of variance-covariance of θ ^ is expressed as
^ θ ^ = I o c 1 I D M 1
where I is the identity matrix, D M = I m i s I o c 1 , I o c is the Fisher information matrix of complete data and I m i s is the Fisher information matrix of missing data. The application of the SEM algorithm consists of three steps [18]: (1) calculate the matrix I o c 1 , (2) calculate the D M matrix, and (3) calculate ^ θ ^ . The main step is to calculate the D M matrix.
The first step consists of calculating I o c 1 . This matrix is the inverse of the Fisher information matrix of the complete data, i.e., I o c = 2 l θ / θ i θ j , where l θ is the function 9 and each θ i is one of the parameters of θ . This matrix is calculated from the last 2 × 4 table obtained by applying the EM algorithm. Therefore, if the EM algorithm has converged in M iterations, then the frequencies of this table are s i j + x i j M for the diseased individuals and r i j + u i j x i j M for the non-diseased individuals.
The second step of the SEM algorithm consists of calculating the DM matrix. The elements ( β i j , i , j = 1 , , 7 ) of this matrix are calculated by applying the following algorithm:
Input: θ ^ and θ t = κ 1 t 0 , κ 1 t 1 , κ 2 t 0 , κ 2 t 1 , p t , α 1 t , α 0 t T .
  • Calculate θ t + 1 = κ 1 t + 1 0 , κ 1 t + 1 1 , κ 2 t + 1 0 , κ 2 t + 1 1 , p t + 1 , α 1 t + 1 , α 0 t + 1 T applying the EM algorithm.
  • Obtain the vectors
    θ 1 t = κ 1 t 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T θ 2 t = κ ^ 1 0 , κ 2 t 1 , κ ^ 2 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T θ 3 t = κ ^ 1 0 , κ ^ 1 1 , κ 2 t 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T θ 4 t = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ 2 t 1 , p ^ , α ^ 1 , α ^ 0 T θ 5 t = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 0 , p ^ t , α ^ 1 , α ^ 0 T θ 6 t = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 0 , p ^ , α ^ 1 t , α ^ 0 T θ 7 t = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 0 , p ^ , α ^ 1 , α ^ 0 t T
    and for each one of these vectors run the first iteration of the EM algorithm taking θ ^ i t as the initial value of θ and obtain the vectors θ ˜ ^ 1 t + 1 ,…, θ ˜ ^ 7 t + 1 .
  • Calculate
    β i j t = θ ˜ ^ i j t + 1 θ ^ j θ i t θ ^ i , i , j = 1 , , 7 ,
    where θ ˜ ^ i j t + 1 is the jth component of θ ˜ ^ i t + 1 , θ i t is the ith component of θ t and θ ^ i is the ith component of θ ^ .
Output: θ ^ t + 1 and β i j t , i , j = 1 , , 7 .
This algorithm is repeated until β i j t + 1 β i j t δ [18], where δ is the stop criterion of the EM algorithm. Figure 2 shows the flowchart of the SEM algorithm to calculate the DM matrix.
The smaller δ is, the smaller are the errors that are made when calculating the DM matrix, and then smaller are the errors that are committed when calculating the variance-covariance matrix θ ^ .
The third and final step of the SEM algorithm consists of estimating the variance-covariance matrix θ ^ applying equation 10. This matrix is not normally symmetrical due to the numerical errors made in the calculation of the DM matrix [18]. The assessment of ^ θ ^ is performed calculating the matrix Δ ^ θ ^ = I o c 1 D M I D M 1 [18], a matrix which represents the increase in the variances-covariances estimated owing to the missing information. The matrix Δ ^ θ ^ is the more symmetric the smaller the value of δ , therefore the asymmetry of ^ θ ^ is solved taking a value a very small value of δ [18].
Once the matrix ^ θ ^ has been calculated, the asymptotic variance-covariance matrix of the average kappa coefficients is obtained by applying the delta method. Let κ 1 = κ 11 , κ 21 T and κ 2 = κ 12 , κ 22 T be the vectors whose components are the average kappa coefficients. Let κ = κ 1 0 , κ 1 1 , κ 2 0 , κ 2 1 , p T be the vectors whose components are the weighted kappa coefficients and the prevalence, and let ^ κ ^ be the estimated asymptotic variance-covariance of κ ^ (obtained by eliminating the variances and covariances corresponding to α ^ 1 and α ^ 0 from the matrix ^ θ ^ ), since the average kappa coefficients do not depend on the covariances α 1 and α 0 . Then, applying the delta method, the asymptotic variance-covariance matrices are
^ κ ^ i = κ i κ κ = κ ^ ^ κ ^ κ i 1 κ κ = κ ^ T ,   i = 1 , 2 .
Once the estimates of the average kappa coefficients and their variances-covariances have been calculated, the test statistics for the hypothesis tests
H 0 : κ 1 i = κ 2 i   vs   H 1 : κ 1 i κ 2 i ,   i = 1 , 2 ,
are
z i = κ ^ 1 i κ ^ 2 i V ^ ar κ ^ 1 i + V ^ ar κ ^ 2 i 2 C ^ ov κ ^ 1 i , κ ^ 2 i ,   i = 1 , 2 ,
whose distribution is a normal standard distribution when the sample size n is large. Inverting each test statistic, the 100 × 1 α % Wald-type confidence interval for the difference of the two average kappa coefficients is
κ 1 i κ 2 i κ ^ 1 i κ ^ 2 i ± z 1 α / 2 V ^ ar κ ^ 1 i + V ^ ar κ ^ 2 i 2 C ^ ov κ ^ 1 i , κ ^ 2 i ,   i = 1 , 2 ,
where z 1 α / 2 is the 100 × 1 α / 2 th percentile of the normal standard distribution.

4. Simulation Study

Monte Carlo simulation experiments have been carried out to study the sizes and the powers of the hypothesis tests 4 and 5 solved with the EM-SEM algorithms. These experiments have consisted of generating N = 10 , 000 random samples of multinomial distributions. As sample size we have considered n = 50 , 100 , 200 , 500 , 1000 , 2000 . Probabilities of multinomial distributions have been calculated from equations 6 written in terms of the weighted kappa coefficients. These simulation experiments have been designed from the equations of the average kappa coefficients (Equations (2) and (3)). For the prevalence, the values 5%, 10%, 30% and 50% have been considered, and that it is a sufficient range of values to study the effect of prevalence on the behaviour of the hypothesis tests. Regarding the average kappa coefficients, the values 0.2, 0.4, 0.6 and 0.8 have been considered, values that correspond to different levels of clinical significance [16]. Once the values for the disease prevalence and the average kappa coefficient have been set, the values of κ h 0 and κ h 1 are calculated by solving (using the Newton-Raphson method) the system formed by Equations (2) and (3), considering only the solutions that are between 0 and 1. Next, the values of S e h and S p h are calculated by applying equation (8). Once the values for S e h and S p h have been calculated, the maximum values of the covariances α 1 and α 0 have been calculated by applying Equation (7), considering intermediate values (50% of the maximum value) and high values (90% of the maximum value), i.e.,:
α 1 = f max S e 1 , S e 2 + 1 f   and   α 0 = f max 1 S p 1 , 1 S p 2 + 1 f ,
with f = 0.50 , 0.90 . As verification probabilities, three scenarios have been considered: λ 11 = 0.50 , λ 10 = λ 01 = 0.30 , λ 00 = 0.05 , λ 11 = 0.95 , λ 10 = λ 01 = 0.60 , λ 00 = 0.25 and λ 11 = λ 10 = λ 01 = λ 00 = 1 . The first scenario corresponds to a situation in which the verification is low, the second corresponds to a situation in which the verification is high and the third scenario corresponds to the situation in which all individuals are verified with the GS (a situation that can be called complete verification). In the last scenario, there is no verification bias and the sample design corresponds to a paired design, and the average kappa coefficients are compared using the method of Roldán-Nofuentes and Olvera-Porcel [6]. Finally, the probabilities of the multinomial distributions have been calculated by applying Equation (6) (in terms of the weighted kappa coefficients). Therefore, the probabilities of the multinomial distributions have been calculated from the values of the average kappa coefficients and not by fixing the sensitivities and specificities of the BDTs.
The Monte Carlo simulation experiments have been designed in such a way that in all of the random samples it is possible to apply the EM-SEM algorithms. For the application of the EM-SEM algorithms, the values δ = 10 12 and δ = 10 6 have been considered as stop criterion, and y i j 0 = u i j / 2 as initial values of the EM algorithm. As nominal error, α = 5 % has been considered.
The simulation experiments have been carried out with R [23], and have been made with computers i7-3770 CPU 3.4 GHz. For this, a function, called “cakcmd” (Comparison of Average Kappa Coefficients with Missing Data), has been programmed to solve the hypothesis tests 1 and 2 applying the EM and SEM algorithms. The function runs with the command
cakcmd s 11 , s 10 , s 01 , s 00 , r 11 , r 10 , r 01 , r 00 , u 11 , u 10 , u 01 , u 00 .
By default the stop criterion of the EM algorithm is 10 12 , the confidence level for the CIs is 95% and y i j 0 = u i j / 2 . The function does not use any R library and the EM and SEM algorithms have been specifically programmed. The function always checks that the problem can be solved by applying the methods described, for example that there are no negative frequencies, that u > 0 , etc. The function provides all the estimates and their standard errors, all the matrices described in Section 3, the test statistics, the p-values and the CIs for the difference between the two average kappa coefficients. The “cakcmd” function is available as Supplemental Material to this manuscript.
Table 3 shows the type I error (in %) of the hypothesis test to compare the two average kappa coefficients when L > L ( 0 c < 0.5 ) for different scenarios. The verification probabilities and the covariances α 1 and α 0 have an important effect on the type I error of the hypothesis test. For fixed values of the covariances, the increase in the verification probabilities produces an increase in the type I error. For fixed values of the verification probabilities, the increase in the covariances produces a decrease in the type I error. In general terms and depending on the verification probabilities and on the covariances, the type I error is very small (much lower than the nominal error) when the sample size is not very large ( n 500 ), and fluctuates around the error nominal (without exceeding it excessively) when the sample size is very large ( n 1000 ). Therefore, this hypothesis test is a conservative test (which is preferable to a liberal test) when the sample size is not very large and it has the behaviour of an asymptotic test when the sample size is very large. The hypothesis test does not give too many false significances even when the sample size is very large.
In the complete verification situation ( λ i j = 1 ), the type I error behaves in a very similar way to the type I error obtained in partial verification. Comparing the partial verification scenarios with the complete verification scenario, the partial verification implies a decrease in type I error. Consequently, the presence of missing data implies that the type I error decreases with respect to the situation in which all individuals are verified with the GS.
Table 4 shows the type I error (in %) of the hypothesis test to compare the two average kappa coefficients when L > L ( 0.5 < c 1 ) for different scenarios. The verification probabilities and the covariances also have an important effect on the type I error of this hypothesis test, its effects being the same as in the previous case. The type I error of this test has the same behaviour as that of the previous hypothesis test, and is therefore a conservative test when the sample size is not very large and fluctuates around the nominal error when the sample size is very large. Comparing the partial verification scenarios with the full verification scenario, the same conclusions as those previous are obtained.
Table 5 shows the power (in %) of the hypothesis test when L > L ( 0 c < 0.5 ) for different values of the average kappa coefficients.
Verification probabilities and covariances also have an important effect on the power of the hypothesis test. For fixed values of the covariances, increasing the verification probabilities produces an increase in power. With respect to the covariances, for fixed values of verification probabilities, in general terms their increase produces an increase in power (although when the sample is small or moderate, the power may decrease slightly, depending on the difference between the values of the average kappa coefficients). Comparing the partial verification scenarios with the complete verification scenario, the partial verification implies a lower power. A decrease in the verification probabilities implies a decrease in power, with respect to the complete verification situation. In very general terms, the following conclusions are obtained:
When the difference between the two average kappa coefficients is small (0.2), a large ( n = 500 ) or very large ( n 1000 ) sample is needed, for the power is greater than 80–90%, depending on the verification probabilities and on the covariances.
When the difference between the two average kappa coefficients is moderate or large ( 0.4 ), a sample of moderate size ( n = 100 200 ) is needed for the power to be greater than 80–90%, depending on the verification probabilities and on the covariances.
Table 6 shows the power (in %) of the hypothesis test when L > L ( 0.5 < c 1 ) for different values of the average kappa coefficients. In general terms, the conclusions are the same as those obtained for the previous hypothesis test.

5. Example

The model has been applied to the study by Hall et al. [24] on the diagnosis of Alzheimer’s disease. Hall et al. have used two BDTs for the diagnosis of Alzheimer's disease: a new BDT based on a cognitive test applied to the patient (NBDT), and another BDT related to another person who knows the patient and a standard diagnostic test based on a cognitive test (CT). As a GS, a clinical assessment (a neurological exploration, computerized tomography, neuro-psychological and laboratory tests, etc.) has been used. This study corresponds to a two-phase study: in the first phase, two BDTs have been applied to all of the patients, and in the second phase only a subset of patients are verified with the GS, depending on the results of both BDTs [9]. Therefore, it is assumed that the verification process is MAR. Table 7 shows the data obtained by Hall et al. when applying medical tests to a sample of 588 patients, where T 1 models the result of the NBDT, T 2 models the result of the CT, and D models the result of the clinical assessment.
Executing the “cakcmd” function with the command
cakcmd 31 , 5 , 3 , 1 , 25 , 10 , 19 , 55 , 22 , 6 , 65 , 346 ,
the results given in Table 8 are obtained.
The EM algorithm has converged in 217 iterations using δ = 10 12 as the stopping criterion. The execution time of the function has been 0.2 s with a computer i7-3770 CPU 3.4 GHz. The estimates of the weighted kappa coefficients, prevalence and covariances are
θ ^ = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T 0.44 , 0.67 , 0.24 , 0.72 , 0.12 , 1.08 , 3.37 .
Applying the SEM algorithm, the variance-covariance matrix of θ ^ = κ ^ 1 0 , κ ^ 1 1 , κ ^ 2 0 , κ ^ 2 1 , p ^ , α ^ 1 , α ^ 0 T is obtained (see Table 8). The variance-covariance matrices of the estimates of average kappa coefficients are obtained from the previous matrix by applying the delta method (Equation (11)). All these matrices are not symmetric due to the numerical errors made in the application of the SEM algorithm.
If the clinician considers that false positives are more important than false negatives ( L > L and 0 c < 0.5 ), then the estimates of the average kappa coefficients are κ ^ 11 0.48 and κ ^ 12 0.30 , and the estimates of the variances and covariance are V ^ ar κ ^ 11 0.0040 , V ^ ar κ ^ 21 0.0030 and C ^ ov κ ^ 11 , κ ^ 21 0.0012 . The value of the test statistic for H 0 : κ 11 = κ 21 is z 1 2.75 ( two   sided   p-value 0.0060 ). Therefore, with α = 5 % , the equality of both average kappa coefficients is rejected. The average kappa coefficient of the NBDT is significantly higher than the average kappa coefficient of the CT (95% CI for the difference: 0.0535 to 0.3202). If the clinician considers that false positives are more important than false negatives, the average kappa coefficient of the NBDT is greater than the average kappa coefficient of the CT. Therefore, the average beyond-chance agreement between the new BDT and the clinical assessment is greater than the average beyond-chance agreement between the cognitive test and clinical assessment.
If the clinician considers that false negatives are more important than false positives ( L > L and 0.5 < c 1 ), then the estimates of the average kappa coefficients are κ ^ 12 0.60 and κ ^ 22 0.50 , and the estimates of the variances and covariance are V ^ ar κ ^ 12 0.0080 , V ^ ar κ ^ 22 0.0064 and C ^ ov κ ^ 12 , κ ^ 22 0.0022 . The value of the test statistic for H 0 : κ 12 = κ 22 is z 2 = 0.9413 ( two   sided   p-value 0.3465 ). Therefore, with α = 5 % the equality of both average kappa coefficients is not rejected. With α = 5 % , we cannot reject that the average kappa coefficient of the NBDT and CT are equal, and therefore we cannot reject that the average beyond-chance agreement between the NBDT and the clinical assessment is equal to the average beyond-chance agreement between the CT and clinical assessment (95% CI for the difference: −0.1018 to 0.2898).

6. Discussion and Conclusions

The average kappa coefficient of a BDT is a measure of average beyond-chance agreement between the BDT and the GS, and solves the problem of assigning values to the weighting index of the weighted kappa coefficient. The average kappa coefficient depends solely on the sensitivity and specificity of BDT and the prevalence of the disease, and is therefore a parameter that can be used to evaluate the efficacy of a BDT and to compare the efficacy of two (or more) BDTs. In this manuscript, the comparison of the average kappa coefficients of two BDTs is studied when the GS is not applied to all individuals in a sample. In this situation, the disease state is unknown for a subset of individuals and therefore the missing information is the true disease status for these individuals. The applied methods require the assumption that the missing data is MAR. This assumption is widely used in these types of studies, and establishes that the probability of verifying an individual with GS depends solely on the results of the two BDTs. This situation also corresponds to two-phase studies: in the first phase the two BDTs are applied to all individuals and in the second phase the GS is applied only to a subset of them depending on the results of the two BDTs in the previous phase.
Two hypothesis tests have been studied to compare the two average kappa coefficients: a first hypothesis test when false positives are more important than false negatives and another when false negatives are more important than false positives. For example, the first hypothesis test is applied when the two BDTs are used as confirmatory tests before a risk treatment, and the second hypothesis test is applied when the two BDTs are used as screening tests. Both hypothesis tests have been solved by applying computational methods for the estimation of parameters with missing data: the EM algorithm and the SEM. The EM algorithm allows us to estimate the parameters. The SEM algorithm, which is based on the calculations of the EM algorithm, allows us to estimate the variance-covariance matrix of the parameter vector. The EM algorithm requires assuming the MAR assumption. If the MAR assumption cannot be assumed, then the method proposed in this manuscript cannot be applied. For example, if the probability of verifying with the GS also depends on the disease status, then the MAR assumption is not verified. Future research will focus on studying, through a sensitivity analysis, the behavior of the hypothesis tests applying the EM-SEM algorithms when the MAR assumption is not verified.
Simulation experiments have been carried out to study the size and power of each hypothesis test. The results have shown that both hypothesis tests are conservative when the sample size is small or moderate, and that the type I error fluctuates around the nominal error when the sample size is large or very large. Regarding the power of each hypothesis test, in general terms, a moderate or large sample is necessary (depending on the verification probabilities, covariances, and difference between the values of the two average kappa coefficients) for the power of each hypothesis test to be large. Consequently, the two hypothesis tests have an asymptotic behavior that allows them to be applied in practice.
A function has been written in R to solve the hypothesis tests of comparison of the two average kappa coefficients applying the EM and SEM algorithms. This function allows the researcher to solve the problem in a simple and fast way, providing all the necessary results to carry out a study. This function is available as Supplemental Material to this manuscript.
Hypothesis tests can also be solved by applying the maximum likelihood method to obtain the estimates of the average kappa coefficients and the delta method to estimate the variances-covariances. For this, the methodology applied in the manuscript of Roldán-Nofuentes and Luna [12] is used. However, the maximum likelihood method cannot be applied when some frequency s i j (or r i j ) is equal to zero (since the variances-covariances cannot be estimated). In this situation, the EM and SEM algorithms can be applied. Therefore, this is the advantage of EM-SEM algorithms over the maximum likelihood method.
Another alternative computational method to EM-SEM algorithms is multiple imputation [25,26,27]. Multiple imputation is a computational method used to solve problems with missing data. Appendix B describes in detail the multiple imputation by chained equations [28] used to solve the hypothesis test for the comparison of the two average kappa coefficients. We have carried out simulation experiments to study the asymptotic behaviour of the hypothesis tests 4 and 5 by applying multiple imputation. The experiments have been designed similarly to those performed in Section 4. The experiments have also been carried out with R and the “mice” library [29] has been used. For multiple imputation, 10 complete data sets have been generated and 100 cycles have been performed. Table 9 shows the results obtained for some of the scenarios given in Table 3, Table 4, Table 5 and Table 6. The type I error of the hypothesis test solved by applying multiple imputation is slightly less than that of the hypothesis test solved by applying the EM-SEM algorithms, both having very similar asymptotic behavior. Regarding the power of the test, this is also a little lower than the power of the test solved by applying the EM-SEM algorithms, also having a very similar asymptotic behavior. In very general terms, although the differences between multiple imputation and EM-SEM algorithms are not very important, the hypothesis tests solved with multiple imputation are slightly more conservative (and also slightly less powerful) than the hypothesis tests solved with EM-SEM algorithms. Multiple imputation has the disadvantage that it cannot be applied when some frequency s i j (or r i j ) is equal to zero, since logistic regression models cannot be applied to impute missing data.
Future research should also focus on comparing the two average kappa coefficients through confidence intervals and on extending the hypothesis tests to the situation in which the average kappa coefficients of more than two BDTs are compared. In the first case, multiple imputation can be applied together with confidence intervals for the difference or ratio of two average kappa coefficients, adapting the intervals studied by Roldán-Nofuentes and Regad [30,31]. For the second case, an adaptation of the method used by Regad and Roldán-Nofuentes [32] and Roldán-Nofuentes and Regad [33] can be a solution to the problem.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/math9212834/s1. The “cakcmd” function is a function written in R to compare the average kappa coefficients of two binary diagnostic tests in the presence of missing data.

Author Contributions

J.A.R.-N. and S.B.R. have collaborated equally in the realization of this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the anonymous referees for their helpful comments that improved the quality of the manuscript.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

For simplicity, only κ 1 0 is considered. The ML estimator of κ 1 0 in the presence of missing data is [12]
κ ^ 1 0 = j = 0 1 n 1 j s 1 j s 1 j + r 1 j n 10 + n 11 n i , j = 0 1 n i j s i j s i j + r i j i , j = 0 1 n i j r i j s i j + r i j ,
From Equation (9) it is obtained that
ϕ ^ i j = s i j + y i j n   and   φ ^ i j = r i j + u i j y i j n ,
In order to demonstrate that the EM algorithm converges to the ML estimates, we are going to follow the same steps as Little and Rubin [27]. With the EM algorithm, the estimator of κ 1 0 is
κ ^ 1 m + 1 0 = j = 0 1 s 1 j + y 1 j m + 1 × j = 0 1 r 0 j + u 0 j y 0 j m + 1 j = 0 1 s 0 j + y 0 j m + 1 × j = 0 1 r 1 j + u 1 j y 1 j m + 1 r + u y m + 1 n 10 + n 11 .
Then, taking ϕ ^ i j m = ϕ ^ i j m + 1 = ϕ ^ i j = s 1 j + y 1 j n , φ ^ i j m = φ ^ i j m + 1 = φ ^ i j = r i j + c i j y i j n and y i j m = y i j m + 1 = y i j = u i j ϕ ^ i j ϕ ^ i j + φ ^ i j , it is obtained that y i j m = y i j m + 1 = y i j = s i j u i j s i j + r i j , with i , j = 0 , 1 . Substituting in the expression for κ ^ 1 m + 1 0 and performing algebraic operations, it is obtained that
κ ^ 1 m + 1 0 = j = 0 1 n 1 j s 1 j s 1 j + r 1 j n 10 + n 11 n i , j = 0 1 n i j s i j s i j + r i j i , j = 0 1 n i j r i j s i j + r i j = κ ^ 1 0
Therefore, κ ^ 1 m + 1 0 converges to κ ^ 1 0 . The convergence of the other estimates obtained by applying the EM algorithm is demonstrated in a similar way.

Appendix B

Multiple imputation [25,26,27] is another computational method used to solve problems with missing data. Multiple imputation consists of constructing K 2 sets of complete data obtained replacing the missing data with the sets imputed independently. Parameters are estimated from each of the K complete datasets, obtaining K estimates of each parameter, and then the K estimates of each parameter are combined in an appropriate way, obtaining a global estimate of each parameter and its variance. From these global estimates it is possible to obtain confidence intervals for each parameter and also to solve hypothesis tests.
In this manuscript, the multiple imputation by chained equations has been used for the imputation of the missing data. Multiple imputation by chained equations (MICE), also known as fully conditional specification or sequential regression multivariate imputation, requires us to assume that the missing data are MAR. The MICE method is described in detail in the work by White et al. [28]. In the problem studied here there are three random binary variables: T 1 , T 2 and D . Variables T 1 and T 2 have no missing data, because the two BDTs have been applied to all the individuals in the sample. Nevertheless, variable D is missing for a subset of individuals, since the GS has not been applied to all the individuals in the sample. First, all missing values are filled in at random and then variable D is regressed on the variables T 1 and T 2 through a logistic regression [28]. Next, the missing values in variable D (disease status for individuals non-verified with the GS) are replaced by simulated draws from the posterior predictive distribution of variable D [28]. This process, called a cycle, is repeated a determined number of times to stabilize the results [28]. Finally a set of imputed data is obtained. Therefore, from the 3 × 4 table (Table 2) K 2   2 × 4 tables are imputed, and from each one of these 2 × 4 tables the estimates of the average kappa coefficients and their variances-covariances are obtained. Frequencies of the kth 2 × 4 table are a i j k for individuals with the disease and b i j k for individuals without the disease, with i , j = 0 , 1 and k = 1 , , K . Therefore, in each imputed 2 × 4 table, the average kappa coefficients are estimated by applying the equations deduced by Roldán-Nofuentes and Olvera-Porcel [6], i.e.,:
κ ^ 11 k = 2 a 10 k + a 11 k b 00 k + b 01 k a 00 k + a 01 k b 10 k + b 11 k n j = 0 1 a 0 j k b 1 j k × ln 1 2 a k j = 0 1 n 0 j k b k j = 0 1 n 1 j k + 1 ,   if   p ^ k Q ^ 1 k , κ ^ 11 k = S ^ e 1 k + S ^ p 1 k 1 ,   if   p ^ k = Q ^ 1 k , κ ^ 21 k = 2 a 01 k + a 11 k b 00 k + b 10 k a 00 k + a 10 k b 01 k + b 11 k n i = 0 1 a i 0 k b i 1 k × ln 1 2 a k i = 0 1 n i 0 k b k i = 0 1 n i 1 k + 1 ,   if   p ^ k Q ^ 2 k , κ ^ 12 k = S ^ e 2 k + S ^ p 2 k 1 ,   if   p ^ k = Q ^ 2 k
if L ' > L ( 0 c < 0.5 ), and
κ ^ 12 k = 2 a 10 k + a 11 k b 00 k + b 01 k a 00 k + a 01 k b 10 k + b 11 k n j = 0 1 a 0 j k b 1 j k × ln 2 a k j = 0 1 n 0 j k a k j = 0 1 n 0 j k + b k j = 0 1 n 1 j k ,   if   p ^ k Q ^ 1 k , κ ^ 12 k = S ^ e 1 k + S ^ p 1 k 1 ,   if   p ^ k = Q ^ 1 k , κ ^ 22 k = 2 a 01 k + a 11 k b 00 k + b 10 k a 00 k + a 10 k b 01 k + b 11 k n i = 0 1 a i 0 k b i 1 k × ln 2 a k i = 0 1 n i 0 k a k i = 0 1 n i 0 k + b k i = 0 1 n i 1 k ,   if   p ^ k Q ^ 2 k , κ ^ 22 k = S ^ e 2 k + S ^ p 2 k 1 ,   if   p ^ k = Q ^ 2 k ,
if L ' < L ( 0.5 < c 1 ), and where
a k = i , j = 0 1 a i j k ,   b k = i , j = 0 1 b i j k ,   n i j k = a i j k + b i j k ,   p ^ k = a k n , S ^ e 1 ( k ) = a 10 ( k ) + a 11 ( k ) a ( k ) ,   S ^ p 1 ( k ) = b 01 ( k ) + b 00 ( k ) b ( k ) ,   S ^ e 2 ( k ) = a 01 ( k ) + a 11 ( k ) a ( k ) ,   S ^ p 2 ( k ) = b 10 ( k ) + b 00 ( k ) b ( k )
and
Q ^ h k = S ^ e h k + S ^ p h k 1 .
The overall estimates of the average kappa coefficients and their variances-covariances are then calculated using Rubin’s rules [25]. Overall estimates of the average kappa coefficients are
κ ¯ 11 = 1 K k = 1 K κ ^ 11 k   and   κ ¯ 21 = 1 K k = 1 K κ ^ 21 k
and the overall estimates of the difference is
κ ¯ 1 = κ ¯ 11 κ ¯ 21 = 1 K k = 1 K κ ^ 1 k   and   κ ¯ 2 = κ ¯ 12 κ ¯ 22 = 1 K k = 1 K κ ^ 2 k
where κ ^ 1 k = κ ^ 11 k κ ^ 21 k and κ ^ 2 k = κ ^ 12 k κ ^ 22 k . The variance of κ ¯ 1 is V ^ ar κ ¯ 1 = V ¯ ar κ ^ 1 + 1 K + 1 B 1 , where V ^ ar κ ^ 1 = 1 K k = 1 K V ¯ ar κ ^ 11 k κ ^ 21 k is the within imputation variance (complete equation of this variance can be seen in the article by Roldán-Nofuentes and Olvera-Porcel [6]) and B 1 = 1 K 1 k = 1 K κ ^ 1 k κ ¯ 1 2 is the between imputation variance (the variance of the complete data point estimates) [25]. Similarly, V ^ ar κ ¯ 2 = V ¯ ar κ ^ 2 + 1 K + 1 B 2 , where V ^ ar κ ^ 2 = 1 K k = 1 K V ¯ ar κ ^ 12 k κ ^ 22 k and B 2 = 1 K 1 k = 1 K κ ^ 2 k κ ¯ 2 2 . Finally, the test statistic for the hypothesis test
H 0 : κ 1 i = κ 2 i   vs   H 1 : κ 1 i κ 2 i ,   i = 1 , 2 ,
is
t i = κ ¯ i V ^ ar κ ¯ i
whose distribution is [25] a Student t-distribution with v i = K 1 1 + K K + 1 V ^ ar κ ¯ i B i degrees of freedom. With respect to the confidence intervals for the difference of the two average kappa coefficients, their expressions are
κ 1 i κ 2 i κ ¯ i ± t v i , 1 α / 2 V ^ ar κ ¯ i
where t v i , 1 α / 2 is the 100 × 1 α / 2 th percentile of the Student t-distribution with v i degrees of freedom.

References

  1. Kraemer, H.C. Evaluating Medical Tests. Objective and Quantitative Guidelines; Sage Publications: Newbury Park, CA, USA, 1992. [Google Scholar]
  2. Kraemer, H.C.; Periyakoil, V.S.; Noda, A. Kappa coefficients in medical research. Stat. Med. 2002, 21, 2109–2129. [Google Scholar] [CrossRef]
  3. Roldán-Nofuentes, J.A.; Olvera-Porcel, C. Average kappa coefficient: A new measure to assess a binary test considering the losses associated with an erroneous classification. J. Stat. Comput. Simul. 2015, 85, 1601–1620. [Google Scholar] [CrossRef]
  4. Bloch, D.A. Comparing two diagnostic tests against the same “gold standard” in the same sample. Biometrics 1997, 53, 73–85. [Google Scholar] [CrossRef]
  5. Roldán-Nofuentes, J.A.; Luna del Castillo, J.D. Comparison of weighted kappa coefficients of multiple binary diagnostic tests done on the same subjects. Stat. Med. 2010, 29, 2149–2165. [Google Scholar] [CrossRef]
  6. Roldán-Nofuentes, J.A.; Olvera-Porcel, C. Comparison of the average kappa coefficients of binary diagnostic tests done on the same subjects. Revstat Stat. J. 2018, 16, 405–428. [Google Scholar]
  7. Begg, C.B.; Greenes, R.A. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983, 39, 207–215. [Google Scholar] [CrossRef]
  8. Zhou, X.H. Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Comm. Statist. Theory Methods 1993, 22, 3177–3198. [Google Scholar] [CrossRef]
  9. Zhou, X.H. Comparing accuracies of two screening tests in a two-phase study for dementia. J. R. Stat. Soc. Ser. C Appl. Stat. 1998, 47, 135–147. [Google Scholar] [CrossRef]
  10. Harel, O.; Zhou, X.H. Multiple imputation for the comparison of two screening tests in two-phase Alzheimer studies. Stat. Med. 2007, 26, 2370–2388. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Roldán-Nofuentes, J.A.; Luna del Castillo, J.D. EM algorithm for comparing two binary diagnostic tests when not all the patients are verified. J. Stat. Comput. Simul. 2008, 78, 19–35. [Google Scholar] [CrossRef]
  12. Roldán-Nofuentes, J.A.; Luna del Castillo, J.D. Comparing two binary diagnostic tests in the presence of verification bias. Comput. Stat. Data Anal. 2006, 50, 1551–1564. [Google Scholar] [CrossRef]
  13. Roldán-Nofuentes, J.A.; Regad, S.B. Estimation of the Average Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification. Mathematics 2021, 9, 1694. [Google Scholar] [CrossRef]
  14. Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef]
  15. Landis, R.; Koch, G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef] [Green Version]
  16. Cicchetti, D.V. The precision of reliability and validity estimates re-visited: Distinguishing between clinical and statistical significance of sample size requirements. J. Clin. Exp. Neuropsychol. 2001, 23, 695–700. [Google Scholar] [CrossRef]
  17. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM Algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 1977, 39, 1–38. [Google Scholar]
  18. Meng, X.; Rubin, D.B. Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. J. Am. Stat. Assoc. 1991, 86, 899–909. [Google Scholar] [CrossRef]
  19. Rubin, D.B. Inference and missing data. Biometrika 1976, 4, 73–89. [Google Scholar] [CrossRef]
  20. Berry, G.; Smith, C.; Macaskill, P.; Irwig, L. Analytic methods for comparing two dichotomous screening or diagnostic tests applied to two populations of differing disease prevalence when individuals negative on both tests are unverified. Stat. Med. 2002, 21, 853–862. [Google Scholar] [CrossRef]
  21. Tsai, T.-R.; Lio, Y.; Ting, W.C. EM algorithm for mixture distributions model with type-I hybrid censoring scheme. Mathematics 2021, 9, 2483. [Google Scholar] [CrossRef]
  22. Gallardo, D.I.; de Castro, M.; Gómez, H.W. An alternative promotion time cure model with overdispersed number of competing causes: An application to melanoma data. Mathematics 2021, 9, 1815. [Google Scholar] [CrossRef]
  23. R Core Team. A Language and Environment for Statistical Computing. Vienna, Austria. 2016. Available online: https://www.R-project.org/ (accessed on 1 October 2021).
  24. Hall, K.S.; Ogunniyi, A.O.; Hendrie, H.C.; Osuntokun, B.O.; Hui, S.L.; Musick, B.; Rodenberg, C.S.; Unverzagt, F.W.; Guerje, O.; Baiyewu, O. A cross-cultural community based study of dementias: Methods and performance of survey instrument. Int. J. Methods Psychiatr. Res. 1996, 6, 129–142. [Google Scholar] [CrossRef]
  25. Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; Wiley: New York, NY, USA, 1987. [Google Scholar]
  26. Schafer, J.L. Analysis of Incomplete Multivariate Data; Chapman and Hall: New York, NY, USA, 1997. [Google Scholar]
  27. Little, R.J.A.; Rubin, D.B. Statistical analysis with missing data, 2nd ed.; Wiley: New Jersey, NJ, USA, 2002. [Google Scholar]
  28. White, I.R.; Royston, P.; Wood, A.M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 2011, 30, 377–399. [Google Scholar] [CrossRef]
  29. van Buuren, S.; Groothuis-Oudshoorn, K. Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 2011, 45, 3. [Google Scholar] [CrossRef] [Green Version]
  30. Roldán-Nofuentes, J.A.; Regad, S.B. Confidence intervals and sample size to compare the predictive values of two diagnostic tests. Mathematics 2021, 9, 1462. [Google Scholar] [CrossRef]
  31. Roldán-Nofuentes, J.A.; Regad, S.B. Asymptotic confidence intervals for the difference and the ratio of the weighted kappa coefficients of two diagnostic tests subject to a paired design. Revstat Stat. J. 2021, in press. [Google Scholar]
  32. Regad, S.B.; Roldán-Nofuentes, J.A. Global hypothesis test to compare the predictive values of diagnostic tests subject to a case-control design. Mathematics 2021, 9, 658. [Google Scholar] [CrossRef]
  33. Roldán-Nofuentes, J.A.; Regad, S.B. Recommended methods to compare the accuracy of two binary diagnostic tests subject to a paired design. J. Stat. Comput. Simul. 2019, 89, 2621–2644. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the EM algorithm.
Figure 1. Flowchart of the EM algorithm.
Mathematics 09 02834 g001
Figure 2. Flowchart of the second step of the SEM algorithm.
Figure 2. Flowchart of the second step of the SEM algorithm.
Mathematics 09 02834 g002
Table 1. Losses and observed frequencies associated with the assessment of a BDT in relation to a GS.
Table 1. Losses and observed frequencies associated with the assessment of a BDT in relation to a GS.
Losses
T = 1 T = 0 Total
D = 1 0 L L
D = 0 L 0 L
Total L L L + L
Probabilities
T = 1 T = 0 Total
D = 1 p S e p 1 S e p
D = 0 1 p 1 S p 1 p S p 1 p
Total Q = p S e + 1 p 1 S p 1 Q = p 1 S e + 1 p S p 1
Table 2. Observed frequencies in the presence of partial verification.
Table 2. Observed frequencies in the presence of partial verification.
Observed Frequencies
T 1 = 1 T 1 = 0
T 2 = 1 T 2 = 0 T 2 = 1 T 2 = 0 Total
V = 1
D = 1 s 11 s 10 s 01 s 00 s
D = 0 r 11 r 10 r 01 r 00 r
V = 0 u 11 u 10 u 01 u 00 u
Total n 11 n 10 n 01 n 00 n
Table 3. Type I error (in %) of the hypothesis test when L > L ( 0 c < 0.5 ).
Table 3. Type I error (in %) of the hypothesis test when L > L ( 0 c < 0.5 ).
κ 11 = κ 21 = 0.2 κ 1 0 = 0.16   κ 1 1 = 0.67   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.14   α 0 = 2.37 α 1 = 1.24   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.24   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.24   α 0 = 3.47
50000.0500.500
1000.0500.5001.200
2000.1500.8503.100.10
5001.100.102.900.104.401.05
10001.700.203.400.954.752.05
20003.250.554.552.255.504.35
κ 11 = κ 21 = 0.4 κ 1 0 = 0.34   κ 1 1 = 0.78   κ 2 0 = 0.34   κ 2 1 = 0.78   p = 30 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85
50000.4502.051.10
1000.3001.5004.503.90
2001.4002.300.254.904.35
5002.900.454.151.254.253.55
10003.851.905.152.355.254.70
20004.552.654.754.154.804.40
κ 11 = κ 21 = 0.6 κ 1 0 = 0.77   κ 1 1 = 0.34   κ 2 0 = 0.77   κ 2 1 = 0.34   p = 5 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.91   α 0 = 96.02 α 1 = 2.64   α 0 = 172.39 α 1 = 1.91   α 0 = 96.02 α 1 = 2.64   α 0 = 172.39 α 1 = 1.91   α 0 = 96.02 α 1 = 2.64   α 0 = 172.39
5000000.600.10
1000.0500.0501.250.15
2000.4500.3503.301.05
5000.600.052.050.155.353.75
10001.600.254.150.454.954.90
20003.450.654.501.504.554.40
κ 11 = κ 21 = 0.8 κ 1 0 = 0.86   κ 1 1 = 0.66   κ 2 0 = 0.86   κ 2 1 = 0.66   p = 50 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.12   α 0 = 8.73 α 1 = 1.21   α 0 = 14.91 α 1 = 1.12   α 0 = 8.73 α 1 = 1.21   α 0 = 14.91 α 1 = 1.12   α 0 = 8.73 α 1 = 1.21   α 0 = 14.91
5000000.300.10
1000.0500.4002.250.25
2000.4501.6504.251.05
5002.400.052.900.555.553.40
10003.650.904.651.155.354.10
20003.752.405.353.355.605.10
Table 4. Type I error (in %) of the hypothesis test when L > L ( 0.5 < c 1 ).
Table 4. Type I error (in %) of the hypothesis test when L > L ( 0.5 < c 1 ).
κ 12 = κ 22 = 0.2 κ 1 0 = 0.93   κ 1 1 = 0.16   κ 2 0 = 0.93   κ 2 1 = 0.16   p = 50 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 2.26   α 0 = 49.16 α 1 = 3.28   α 0 = 87.69 α 1 = 2.26   α 0 = 49.16 α 1 = 3.28   α 0 = 87.69 α 1 = 2.26   α 0 = 49.16 α 1 = 3.28   α 0 = 87.69
50000.4002.551.05
1000.5001.7004.452.15
2001.7002.8004.903.30
5004.300.603.702.204.204.40
10004.102.904.504.355.454.05
20004.303.605.205.055.755.35
κ 12 = κ 22 = 0.4 κ 1 0 = 0.16   κ 1 1 = 0.67   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.14   α 0 = 2.37 α 1 = 1.26   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.26   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.26   α 0 = 3.47
5000000.050
100000.200.050.600.15
2000.200.100.900.552.250.95
5000.800.453.002.154.553.35
10001.801.303.602.255.253.55
20003.752.804.253.355.103.80
κ 12 = κ 22 = 0.6 κ 1 0 = 0.34   κ 1 1 = 0.78   κ 2 0 = 0.34   κ 2 1 = 0.78   p = 30 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85
50000.1000.200.05
1000.300.050.700.102.350.15
2001.050.101.500.654.350.80
5002.500.704.501.405.353.15
10004.101.405.101.804.954.80
20004.802.605.153.155.955.50
κ 12 = κ 22 = 0.8 κ 1 0 = 0.88   κ 1 1 = 0.78   κ 2 0 = 0.88   κ 2 1 = 0.78   p = 5 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37 α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37 α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37
5000000.050.10
1000.0500.200.100.450.25
2000.1000.450.150.400.55
5000.550.300.900.502.051.45
10001.250.953.051.953.553.25
20002.101.303.852.654.253.05
Table 5. Power (in %) of the hypothesis test when L > L ( 0 c < 0.5 ).
Table 5. Power (in %) of the hypothesis test when L > L ( 0 c < 0.5 ).
κ 11 = 0.4   κ 21 = 0.2 κ 1 0 = 0.34   κ 1 1 = 0.78   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.11   α 0 = 2.37 α 1 = 1.19   α 0 = 3.47 α 1 = 1.11   α 0 = 2.37 α 1 = 1.19   α 0 = 3.47 α 1 = 1.11   α 0 = 2.37 α 1 = 1.19   α 0 = 3.47
500.150.051.300.857.8522.35
1003.803.0017.9521.0561.1577.15
20026.4536.0064.1586.4593.1099.65
50081.9097.9599.05100100100
100099.15100100100100100
2000100100100100100100
κ 11 = 0.6   κ 21 = 0.4 κ 1 0 = 0.56   κ 1 1 = 0.76   κ 2 0 = 0.34   κ 2 1 = 0.78   p = 30 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85 α 1 = 1.06   α 0 = 2.03 α 1 = 1.11   α 0 = 2.85
503.353.4024.4531.7529.0538.55
10037.8054.6083.1594.3085.1082.05
20087.9098.1099.75100100100
50099.95100100100100100
1000100100100100100100
2000100100100100100100
κ 11 = 0.6   κ 21 = 0.2 κ 1 0 = 0.56   κ 1 1 = 0.76   κ 2 0 = 0.17   κ 2 1 = 0.39   p = 5 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.14   α 0 = 6.09 α 1 = 1.26   α 0 = 10.16 α 1 = 1.14   α 0 = 6.09 α 1 = 1.26   α 0 = 10.16 α 1 = 1.14   α 0 = 6.09 α 1 = 1.26   α 0 = 10.16
500.600.256.053.2018.7531.05
10011.7513.2029.5044.1097.4599.05
20043.4063.0569.6090.60100100
50087.3098.2598.6099.95100100
100099.5599.95100100100100
2000100100100100100100
κ 11 = 0.8   κ 21 = 0.6 κ 1 0 = 0.90   κ 1 1 = 0.60   κ 2 0 = 0.80   κ 2 1 = 0.33   p = 50 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.16   α 0 = 9.06 α 1 = 1.28   α 0 = 15.51 α 1 = 1.16   α 0 = 9.06 α 1 = 1.28   α 0 = 15.51 α 1 = 1.16   α 0 = 9.06 α 1 = 1.28   α 0 = 15.51
500.100.050.150.109.9523.05
1000.150.100.200.1570.0577.95
2000.300.152.751.9596.10100
5007.655.6530.1039.30100100
100034.4541.1569.0589.15100100
200070.3589.5595.1099.85100100
Table 6. Power (in %) of the hypothesis test when L > L ( 0.5 < c 1 ).
Table 6. Power (in %) of the hypothesis test when L > L ( 0.5 < c 1 ).
κ 12 = 0.4   κ 22 = 0.2 κ 1 0 = 0.27   κ 1 1 = 0.47   κ 2 0 = 0.39   κ 2 1 = 0.17   p = 30 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99 α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99 α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99
500.100.050.051.109.8512.40
1000.900.159.008.2042.0545.15
2004.905.6024.7028.3073.8078.35
50022.726.8070.6076.9097.1599.40
100054.0559.2094.1098.40100100
200088.3091.05100100100100
κ 12 = 0.6   κ 22 = 0.4 κ 1 0 = 0.46   κ 1 1 = 0.66   κ 2 0 = 0.27   κ 2 1 = 0.47   p = 50 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99 α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99 α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99
500.050.013.301.6013.2516.75
1005.604.1019.0529.9056.8574.05
20025.5029.2049.3078.9084.1599.10
50065.5090.3089.4099.90100100
100089.5599.6099.30100100100
200099.80100100100100100
κ 12 = 0.6   κ 22 = 0.2 κ 1 0 = 0.34   κ 1 1 = 0.78   κ 2 0 = 0.39   κ 2 1 = 0.17   p = 10 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.11   α 0 = 4.23 α 1 = 1.19   α 0 = 6.81 α 1 = 1.11   α 0 = 4.23 α 1 = 1.19   α 0 = 6.81 α 1 = 1.11   α 0 = 4.23 α 1 = 1.19   α 0 = 6.81
500.100.050.301.1017.1022.65
1000.500.067.109.1035.9542.15
2005.305.2039.9044.1082.0584.05
50044.2056.2093.1094.40100100
100091.7094.3099.80100100100
200099.80100100100100100
κ 12 = 0.8   κ 22 = 0.6 κ 1 0 = 0.88   κ 1 1 = 0.78   κ 2 0 = 0.95   κ 2 1 = 0.53   p = 5 %
λ 11 = 0.50 , λ 10 = 0.30 , λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = 0.60 , λ 01 = 0.60 , λ 00 = 0.25 λ 11 = λ 10 = λ 01 = λ 00 = 1
n α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37 α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37 α 1 = 1.13   α 0 = 93.98 α 1 = 1.24   α 0 = 168.37
500.050.010.100.0514.2017.85
1000.080.030.080.0244.8552.10
2000.100.082.602.7089.0596.95
5005.307.0520.3023.10100100
100021.8030.0148.5053.80100100
200049.7063.7084.0586.80100100
Table 7. Diagnosis of coronary stenosis.
Table 7. Diagnosis of coronary stenosis.
Observed Frequencies
T 1 = 1 T 1 = 0
T 2 = 1 T 2 = 0 T 2 = 1 T 2 = 0
V = 1
D = 1 31531
D = 0 25101955
V = 0 22665346
Table 8. Results for the example of the diagnosis of Alzheimer’s disease.
Table 8. Results for the example of the diagnosis of Alzheimer’s disease.
COMPARISON OF AVERAGE KAPPA COEFFICIENTS OF TWO BDTS WITH MISSING DATA:
Iterations of the EM algorithm: 217
Inverse matrix of the Fisher information matrix for complete data:
Kappa10Kappa11Kappa20Kappa21pa1a0
Kappa10 2.70 × 10 3 1.38 × 10 3 9.91 × 10 4 3.69 × 10 4 3.70 × 10 4 3.64 × 10 4 4.35 × 10 3
Kappa11 1.38 × 10 3 3.88 × 10 3 2.19 × 10 4 1.08 × 10 3 4.43 × 10 5 1.08 × 10 3 5.07 × 10 4
Kappa20 9.91 × 10 4 2.19 × 10 4 1.13 × 10 3 1.07 × 10 3 2.77 × 10 4 3.24 × 10 4 3.87 × 10 3
Kappa21 3.69 × 10 4 1.08 × 10 3 1.07 × 10 3 4.30 × 10 3 4.08 × 10 5 1.37 × 10 3 1.14 × 10 3
p 3.70 × 10 4 4.43 × 10 5 2.77 × 10 4 4.08 × 10 5 1.77 × 10 4 2.17 × 10 19 6.58 × 10 19
a1 3.64 × 10 4 1.08 × 10 3 3.24 × 10 4 1.37 × 10 3 1.84 × 10 19 2.21 × 10 3 2.74 × 10 18
a0 4.35 × 10 3 5.07 × 10 4 3.87 × 10 3 1.14 × 10 3 8.00 × 10 19 6.87 × 10 20 1.15 × 10 1
DM matrix:
Kappa10Kappa11Kappa20Kappa21pa1a0
Kappa100.257478560.226701970.04820999−0.034302950.02544226−0.000818740.00737780
Kappa110.040181920.46969774−0.15323480−0.01112308−0.063429520.01088978−0.68269317
Kappa200.07169712−0.310726160.301176810.432062620.06510425−0.09497630−0.06344007
Kappa21−0.041575500.088803630.020175790.22133723−0.01974722−0.11024679−0.25032953
p−0.11756844−1.20091231−0.02896885−1.177607490.158704330.640745851.86688820
a1−0.03532342−0.11283512−0.15022054−0.502463300.009190010.67340594−1.33229242
a0−0.00750592−0.01428990−0.00424797−0.015500810.00044486−0.011715240.09379489
Variance-covariance matrix of weighted kappa coefficients, prevalence and covariances:
Kappa10Kappa11Kappa20Kappa21pa1a0
Kappa100.003802630.002825280.001279520.000974280.00040832−0.001101580.00480053
Kappa110.002832190.01558280−0.000863000.00807824−0.00148826−0.00801231−0.00461472
Kappa200.00127269−0.000941120.002331690.002897620.00053473−0.001917510.00794194
Kappa210.000960240.007845340.002969830.01611484−0.00088940−0.012236150.00684820
p0.00040347−0.001522040.00051321−0.000984940.000410100.000903680.00090234
a1−0.00109354−0.00785617−0.00198050−0.012279480.000833460.01314268−0.00816490
a00.00477417−0.004874270.007910000.006516610.00096002−0.007892700.14222082
Estimated weighted kappa coefficient K(0) of Test 1 is 0.4410538 and its standard error is 0.06166551
Estimated weighted kappa coefficient K(1) of Test 1 is 0.6692124 and its standard error is 0.1248311
Estimated weighted kappa coefficient K(0) of Test 2 is 0.2446698 and its standard error is 0.04828762
Estimated weighted kappa coefficient K(1) of Test 2 is 0.7152702 and its standard error is 0.1269442
Estimated disease prevalence is 0.1177224 and its standard error is 0.0202509
Estimated covariance a1 is 1.082158
Estimated covariance a0 is 3.365059
COMPARISON OF AVERAGE KAPPA COEFFICIENTS FOR L′ > L (0 < c < 0.5)
Variance-covariance matrix:
Average kappa11Average kappa21
Average kappa110.0039786280.001180153
Average kappa210.0011598650.003010255
Estimated average kappa coefficient of Test 1 is 0.4835519 and its standard error is 0.06307636
Estimated average kappa coefficient of Test 2 is 0.2967101 and its standard error is 0.05486579
Test Statistic for the hypothesis test is 2.746314 and the p-value is 0.006026899
95% confidence interval for the difference between the two average kappa coefficients is: 0.05349828; 0.3201853
COMPARISON OF AVERAGE KAPPA COEFFICIENTS FOR L > L′ (0.5 < c < 1)
Variance-covariance matrix:
Average kappa12Average kappa22
Average kappa120.0079568450.002206378
Average kappa220.0021025790.006436081
Estimated average kappa coefficient of Test 1 is 0.5951878 and its standard error is 0.08920115
Estimated average kappa coefficient of Test 2 is 0.5011507 and its standard error is 0.08022519
Test Statistic for the hypothesis test is 0.9413048 and the p-value is 0.3465487
95% confidence interval for the difference between the two average kappa coefficients is: −0.1017649; 0.2898391
Table 9. Type I errors (in%) and powers (in%) applying multiple imputation.
Table 9. Type I errors (in%) and powers (in%) applying multiple imputation.
Type I error when L > L ( 0 c < 0.5 )
κ 11 = κ 21 = 0.2 κ 1 0 = 0.16   κ 1 1 = 0.67   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = λ 01 = 0.60 , λ 00 = 0.25
n α 1 = 1.14   α 0 = 2.37 α 1 = 1.24   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.24   α 0 = 3.47
500000
100000.100
2000.0500.550
5000.950.052.150.05
10001.200.153.050.90
20002.950.403.801.85
Power when L > L ( 0 c < 0.5 )
κ 11 = 0.4   κ 21 = 0.2 κ 1 0 = 0.34   κ 1 1 = 0.78   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = λ 01 = 0.60 , λ 00 = 0.25
n α 1 = 1.11   α 0 = 2.37 α 1 = 1.19   α 0 = 3.47 α 1 = 1.11   α 0 = 2.37 α 1 = 1.19   α 0 = 3.47
500.100.010.550.85
1002.703.0514.9017.35
20024.8534.2561.8084.05
50080.0595.9597.75100
100098.2099.15100100
2000100100100100
Type I error when L > L ( 0.5 < c 1 )
κ 12 = κ 22 = 0.4 κ 1 0 = 0.16   κ 1 1 = 0.67   κ 2 0 = 0.16   κ 2 1 = 0.67   p = 10 %
λ 11 = 0.50 , λ 10 = λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = λ 01 = 0.60 , λ 00 = 0.25
n α 1 = 1.14   α 0 = 2.37 α 1 = 1.26   α 0 = 3.47 α 1 = 1.14   α 0 = 2.37 α 1 = 1.26   α 0 = 3.47
500000
100000.100
2000.150.050.550.25
5000.650.452.851.95
10001.451.053.252.05
20003.452.403.903.10
Type I error when L > L ( 0.5 < c 1 )
κ 12 = 0.6   κ 22 = 0.4 κ 1 0 = 0.46   κ 1 1 = 0.66   κ 2 0 = 0.27   κ 2 1 = 0.47   p = 50 %
λ 11 = 0.50 , λ 10 = λ 01 = 0.30 , λ 00 = 0.05 λ 11 = 0.95 , λ 10 = λ 01 = 0.60 , λ 00 = 0.25
n α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99 α 1 = 1.22   α 0 = 2.10 α 1 = 1.39   α 0 = 2.99
500.050.011.252.05
1004.403.3514.3526.85
20023.8026.8047.2575.80
50062.7584.9588.1599.10
100086.8594.4598.35100
200099.70100100100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Roldán-Nofuentes, J.A.; Regad, S.B. Comparison of the Average Kappa Coefficients of Two Binary Diagnostic Tests with Missing Data. Mathematics 2021, 9, 2834. https://0-doi-org.brum.beds.ac.uk/10.3390/math9212834

AMA Style

Roldán-Nofuentes JA, Regad SB. Comparison of the Average Kappa Coefficients of Two Binary Diagnostic Tests with Missing Data. Mathematics. 2021; 9(21):2834. https://0-doi-org.brum.beds.ac.uk/10.3390/math9212834

Chicago/Turabian Style

Roldán-Nofuentes, José Antonio, and Saad Bouh Regad. 2021. "Comparison of the Average Kappa Coefficients of Two Binary Diagnostic Tests with Missing Data" Mathematics 9, no. 21: 2834. https://0-doi-org.brum.beds.ac.uk/10.3390/math9212834

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop