Research Article Volume 9 Issue 2
1Department of Statistics, University of Manitoba Winnipeg, Canad
2Department of Statistics and Computer Science, University of Peradeniya, Sri Lanka
3Division of Orthodontics, Faculty of Dental Sciences, University of Peradeniya, Sri Lanka
Correspondence: Lakshika S. Nawarathna, Department of Statistics and Computer Science, University of Peradeniya, Peradeniya, Sri Lanka
Received: February 10, 2020 | Published: April 30, 2020
Citation: Dharmasena RAIH, Nawarathna LS, Nawarathna RD, et al. Predicting cessation of orthodontic treatments using a classification-based approach, Biom Biostat Int J. 2020;9(2):67-73 DOI: 10.15406/bbij.2020.09.00302
In recent years, dental care has received increasing attention from people across the globe. With growing living conditions, people are more aware of preventable conditions that might be avoided. Malocclusion is one among the most studied problems in orthodontics. The statistical predictive model building plays a vital role in dentistry particularly, for clinical decision making. Developing a model for predicting the factors affecting for discontinuation of treatment is a vital step in assessing the therapeutic effect of treatment, resource management and cost reduction in the healthcare industry. Logistic regression and Probit regression models are considered as a successful widely used approach to analyze a classification problem with factor predictor variables. In this study, Naïve Bayes classifier and random forest classification models are introduced to predict discontinuation of orthodontic treatments of dental patients. Based on this study the duration of active treatment was the most significant factor affecting the discontinuation of the treatment. When comparing the four approaches, random forest classifier showed the highest accuracy and specificity, while Naïve Bayes model indicated the highest sensitivity on the prediction of discontinuation of the treatment. Besides, the classification-based approach with modern predictive algorithms shows a robust result for orthodontic data.
Keywords: Dental malocclusion, classification, logistic Regression, probit Models, naïve bayes, random forests
Malocclusion of the teeth is a misalignment condition where teeth deviate from ideal occlusion that can cause serious aesthetic issues and oral health complications. The teeth will not be able to perform important functions when they are misaligned. Malocclusions are mainly resulted due to environmental and genetic factors. It can be inherited in nature which means, it can be passed down from one generation to the next. But this can cause some oral habits too.1 Specially thumb or finger sucking, pacifier use for a longer period and mouth breathing are most common oral habits that can cause malocclusion. Sports injuries, automobile and fall accidents can also lead to this. 2
Malocclusion is neither a sickness nor a life-threatening condition and usually is not serious enough to require treatment. But there has been a considerable demand for orthodontic care.3,4 It is usually diagnosed through routine dental examination. In a child’s life, the period of eruption in permanent teeth must be considered critical.5,6 Depending on the classification of malocclusion, the symptoms of the disorder may be subtle or severe. Moreover, the treatment of malocclusion places a considerable burden on health care resources nationally and globally, significantly when treatments are funded by public means.7 Malocclusions are one amongst the most studied problems in orthodontics, using completely different classifications in several populations, usually to find out about its prevalence, causes and establishing treatment procedures.8 The selection from potential alternatives treatments should ideally be based on well-known effective treatments, rather than be dependent on visible clinical impression.
Depending on the type of malocclusion, the orthodontists recommend various treatments. These can include applying braces, wires or plates to correct the position of the teeth, enhance the jaw growth with functional orthopaedic devices and to stabilize the jawbone with surgical procedures. To evaluate the effectiveness of the treatment, it is necessary to use both valid and reliable measures of results.9 The treatment of this condition in children and adults usually ends up in correcting the misalignment and early treatment is cost effective and it reduces the duration of the treatment.10
Statistical methodologies and applications play a major role in dentistry and dental research mainly in evidence-based dentistry. Clinical trials, designing experiments on treatments end up with data which are needed to be analyzed properly to get the most use out of it. Statistic based approach is the most reliable and widely used method to interpret information gained by clinical data.11 Statistical predictive model building is a common application of statistics to dentistry mainly for clinical decision making.12 Logistic regression and Probit models are some of the most widely used predictive models in bioinformatics for decision making.13,14 With the advancement of development of computational power within the last decades, evolutionary search algorithms and unsupervised learning algorithms emerged as important heuristic optimization techniques for decision making.15 These studies have vital importance when addressing the therapeutic goals in the completion of orthodontic treatment. On recent studies, application of unsupervised methods like Naïve Bayes models and random forest models in bioinformatics are not rare.
The objective of this study is to predict the continuation or discontinuation of orthodontic treatment for dental malocclusion by identifying the factors affecting the decision of discontinuing the treatment. Moreover, we identify the most suitable predictive model to address this scenario using several different learning algorithms by comparing the accuracies of classical approaches with the Naïve Bayes models and random forest models.
This article is organized as follows. In Section 2, we discuss the statistical theory behind the two data mining algorithms and the conventional models used in this research together with the model reduction techniques under the materials section. Next, in Section 3 we illustrate the methodology by analyzing the clinical records obtained from the Division of Orthodontics, University Dental Hospital, Peradeniya, Sri Lanka. To sum up in Section 4, the article is concluded with a discussion. The statistical software R and Waikato Environment for Knowledge Analysis (Weka) were used for all the statistical computations in this article.
To build a predictive model for discontinuation of orthodontic treatments of dental patients, a clinical dataset was used from the clinical records obtained from Division of Orthodontics, University Dental Hospital, Peradeniya, Sri Lanka. This dataset consisted of 310 records of clinical treatments for dental malocclusion. The variable discontinuation of orthodontic treatment was considered as the dependent variable. People treated more than 5 years were diagnosed as those who continue the treatment. Further, 12.903% of patients were diagnosed as to continue the treatment while 87.096% patients were diagnosed to discontinue the orthodontic treatments. There were no missing data in the dataset and all the variables were recorded to a common Likert scale as illustrated in Table 1.
Variable |
Likert Scale |
|
|
|||
1 |
2 |
3 |
4 |
5 |
6 |
|
Discontinuation of treatment (Y ) |
Discontinue |
Continue |
|
|
|
|
Age (X1) |
1 – 10 |
11 – 20 |
21 – 30 |
31 – 40 |
41 – 50 |
|
Gender (X2) |
Male |
Female |
|
|
|
|
Type of malocclusion (X3) |
Class I |
Class II Division 1 |
Class II Division 2 |
Class III |
|
|
Severity of malocclusion (X4) |
Grade 1 |
Grade 2 |
Grade 3 |
Grade 4 |
Grade 5 |
|
Treatment indicated (X5) |
Non- extraction |
Extraction deciduous tooth |
Extraction permanent tooth |
|
|
|
Simple removable appliance (X6) |
No |
Yes |
|
|
|
|
Fixed appliance (X7) |
No |
Single arch |
Both arches |
|
|
|
Growth modification appliance (X8) |
No |
Twin block |
Head gear |
Face mask |
Other |
|
Cost of treatment in LKR (X9) |
No |
200-400 |
400-1000 |
1100-3500 |
3600-7500 |
Above 7500 |
Stage of treatment at cessation (X10) |
Record taking |
Treatment planning |
Appliance fitting |
Review visits |
End of active treatment |
Retention phase |
Duration of active treatment (X11) |
< 6 months |
6 – 12 months |
1 – 2 years |
2 – 5 years |
> 5 years |
|
Table 1 Likert Scale recoding of variables used in the analysis
Actual clinical data were used to build up several predictive models using different learning algorithms namely Naïve Bayes, Random Forest, Logistic Regression and Probit model and the accuracy and reliability of each model were compared.
Prediction model
In this study, two data mining algorithms, Naïve Bayes and Random Forest were introduced beside the most generally used statistical methods Logistic regression and Probit model16 to develop the prediction models for predicting cessation of orthodontic treatments.
Naïve Bayes classifier: Naïve Bayesian is a specialized form of the Bayesian network which is a simple probabilistic classifier based on Bayesian theory. All Naïve Bayes classifiers assumed that the predictive variables are conditionally independent given the class and no hidden or latent attributes influence the prediction method.17
Let where ; a vector representing 11 features (independent variables) which assigns to instance probabilities for each
(1)
of k possible outcomes (‘0’ or ‘1’) or classes Ck. Using Bayes’ Theorem, now the joint probability model can be defined as,
(2)
With respect to the independence assumption of the Naïve Bayes classifier, the conditional distribution over the class variable C is,
(3)
where the evidence Z = p(X) is a scaling factor dependent only on . Therefore, the Naïve Bayes classifier is the function that assigns a class label as follows.
(4)
Random Forest classifier: Unlike single classification trees, random forest creates many classification trees which classify a new object from an input vector by inserting to all trees and select the trees which classify the best out of the trees in the forest.18 Random forest classifier does not overfit although the number of trees is increased, and it creates the model fast with large databases without changing or deleting variables. When dealing with random forest classifiers, it is not needed to cross-validate data or uses a separate test to get an unbiased estimate of the prediction since test set errors are calculated internally on the run of the random forest classifier on a dataset.19 For the dataset, a random forest with a maximum of 2000 trees were created and measured the classification accuracy.
Logistic Regression models: Logistic regression is used especially in the case that the model contains a binary categorical dependent variable, that is the output can take only two values, ’0’ or ‘1’. Here, the dependent variable of the predictive model is a disconnection of the treatment (Y ) which has only two outcomes namely ‘Yes’ or ‘No’ and is categorical which enable the opportunity to employ a logistic regression model.20 The general logistic function σ(t) where can be defined as,
(5)
Then the proposed logistic regression model is defined as,
(6)
where p is the probability of the dependent variable equaling a ”success” and β1,...,βn be the regression coefficients.
Probit models: As in logistic regression models, Probit models are also used when the dependent variable is dichotomous. It employs a Probit link function which mostly estimated using the maximum likelihood procedure. Assuming the dependent variable (Y) is binary, with a vector of X variables which influences Y. Then the model takes the form,
(7)
where φ is the Cumulative Distribution Function(CDF) of the standard normal distribution. The parameters β’s are typically estimated by maximum likelihood estimator.13 The proposed Probit model is as follows.
(8)
Model Reduction: To get the optimum model for the logistic regression model, model reduction using backward elimination and bidirectional elimination were used to fit the model. Elimination was done based on the Akaike Information Criterion(AIC) and Bayesian Information Criterion(BIC) values which are estimators of the relative quality of statistical models. Models end up with minimum relative AIC and BIC values are considered to be the best model in model reductions. Additionally, adjusted R-squared (R2) values were also obtained to compare the model performance in reduced models.21
Estimation for model performance
10-Fold Cross Validation method: k-Fold Cross Validation is a model validation technique which partition the dataset into k equal partitions and keep one partition for testing and use the rest of (k−1) partitions to train the model as the testing set. This is done k times (number of folds) and the average of the estimates are taken as the final estimation.22 In this study, 10-fold cross-validation (i.e., k = 10) was used to validate logistic regression, Probit models and Naïve Bayes classifier.
Confusion matrix: Confusion matrix or error matrix is often used in statistical modelling to evaluate and visualize the model performance. As shown in Table 2, it is a two by two matrix which can be used to obtain sensitivity, specificity and accuracy of predictive model classifications.16
True Positive (TP) is the number of dental patients who were predicted to continue the treatment and they do continue the dental treatments. True Negative (TN) means the number of dental patients who were predicted to discontinue the treatments are actually has been discontinued the treatment. False Positive (FP) can be taken as the numbers of dental patients predicted to continue the treatment are needed to discontinue the treatment. False Negative (FN) gives the number of dental patients predicted to discontinue the treatment actually need to continue the treatment. The sensitivity, specificity and accuracy were calculated by TP, TN, FP and FN values by the confusion matrix. Sensitivity or Recall is the probability that the model can correctly predict the discontinuation of a dental patient.
(9)
Discontinuation of orthodontic treatment (Y) |
Actual class |
||
Discontinue (‘No’) |
Discontinue (‘Yes’) |
||
Predicted class |
Discontinue (‘No’) |
True Negatives (TN) |
False Positive (FP) |
Discontinue (‘Yes’) |
False Negatives (FN) |
True Positive (TP) |
Table 2 Confusion matrix to validate the results
Specificity can be taken as the probability that the model can correctly predict discontinuation of orthodontic treatment.
(10)
Accuracy is the probability that the model can correctly predict continuation of discontinuation of orthodontic treatments of a dental patient.
(11)
Receiver Operating Characteristic (ROC) Curve: The Receiver Operating Characteristic (ROC) curve is a popular method to evaluate model performance. It is based on sensitivity and specificity where the x-axis is 1-specificity (False Positive Rate) and the y-axis is sensitivity (True Positive Rate) of a given model. Area Under Curve(AUC) is a measure that can be obtained to interpret the ROC plots easily. It is the area under ROC curve of a model. The value of AUC ranges from 0-1 where 1 is the perfect fit or the perfect classifier. This method is convenient to compare the performance of multiple models.23
We present details on descriptive analysis, model fitting, model validation, and a detailed discussion on the factors affecting discontinuation of the orthodontic treatments. Three hundred and ten patient’s records were analyzed, and their age range was 7-30 years.
Table 3 shows the estimated partial regression coefficients corresponding to each explanatory variable mentioned in the Table 1, the standard errors, the z-statistics and the p-values used for testing the significance of each coefficient. Moreover, the summary of coefficients of reduced logistic model which is obtained by backward elimination and bidirectional elimination is shown in Table 4.
Variable |
Estimate |
Std. Error |
z value |
p-value |
(Intercept) |
40.40 |
8520 |
0.01 |
0.996 |
Female |
0.83 |
0.65 |
1.27 |
0.205 |
Age (11 – 20) |
0 |
0.79 |
0 |
0.997 |
Age (21 – 30) |
-3.30 |
1.68 |
-1.96 |
0.050 |
Age (31 – 40) |
-0.38 |
17900 |
0 |
1 |
Age (41 – 50) |
19.20 |
17700 |
0 |
0.999 |
Type of malocclusion (Class11-Division1) |
1.17 |
0.83 |
1.41 |
0.159 |
Type of malocclusion (Class11-Division2) |
0.84 |
1.10 |
0.77 |
0.443 |
Type of malocclusion (Class111) |
-0.13 |
0.80 |
-0.17 |
0.868 |
Severity of malocclusion (Grade 2) |
-16.20 |
8100 |
0 |
0.998 |
Severity of malocclusion (Grade 3) |
-20.50 |
8100 |
0 |
0.998 |
Severity of malocclusion (Grade 4) |
-20.60 |
8100 |
0 |
0.998 |
Severity of malocclusion (Grade 5) |
-19.00 |
8100 |
0 |
0.998 |
Simple removable appliance (Yes) |
2.62 |
1.90 |
1.38 |
0.167 |
Fixed appliance (Single arch) |
12.90 |
7460 |
0 |
0.999 |
Fixed appliance (Both arches) |
-0.29 |
1.87 |
-0.15 |
0.878 |
Growth modification appliance (Twin block) |
1.00 |
1.88 |
0.53 |
0.597 |
Growth modification appliance (Head gear) |
-4.45 |
6.77 |
-0.66 |
0.511 |
Growth modification appliance (Face mask) |
1.53 |
2.05 |
0.75 |
0.454 |
Growth modification appliance (Other) |
-4.50 |
3.52 |
-1.28 |
0.201 |
Stage of treatment at cessation (Treatment planning) |
-23.20 |
2660 |
-0.01 |
0.993 |
Stage of treatment at cessation (Appliance fitting) |
-4.63 |
3990 |
0 |
0.999 |
Stage of treatment at cessation (Review visits) |
-23.90 |
2660 |
-0.01 |
0.993 |
Stage of treatment at cessation (End of active treatment) |
-25.40 |
2660 |
-0.01 |
0.992 |
Stage of treatment at cessation (Retention phase) |
-25.90 |
2660 |
-0.01 |
0.992 |
Treatment indicated (Extraction deciduous tooth) |
-2.08 |
0.92 |
-2.27 |
0.023 |
Treatment indicated (Extraction permanent tooth) |
0.21 |
0.91 |
0.23 |
0.819 |
Cost of treatment in LKR (200-400) |
3.72 |
3.33 |
1.12 |
0.264 |
Cost of treatment in LKR (400-1000) |
5.88 |
3.59 |
1.64 |
0.101 |
Cost of treatment in LKR (1100-3500) |
6.95 |
3.67 |
1.90 |
0.058 |
Cost of treatment in LKR (3600-7500) |
4.81 |
3.72 |
1.29 |
0.196 |
Cost of treatment in LKR (Above 7500) |
6.77 |
3.76 |
1.80 |
0.072 |
Duration of active treatment (6 – 12 months) |
2.94 |
1.54 |
1.91 |
0.056 |
Duration of active treatment (1 – 2 years) |
1.44 |
1.20 |
1.20 |
0.232 |
Duration of active treatment (2 – 5 years) |
0.44 |
1.14 |
0.39 |
0.696 |
Duration of active treatment (> 5 years) |
-4.55 |
1.30 |
-3.49 |
0.001 |
Table 3 Model coefficients of logistic regression full model
Variable |
Estimate |
Std. Error |
z value |
p-value |
(Intercept) |
19.44 |
1818.23 |
0.01 |
0.992 |
Simple removable appliance (Yes) |
1.29 |
0.61 |
2.11 |
0.035 |
Growth modification appliance (Twin block) |
1.99 |
0.80 |
2.48 |
0.013 |
Growth modification appliance (Head gear) |
-1.98 |
3.16 |
-0.63 |
0.531 |
Growth modification appliance (Face mask) |
-0.25 |
1.46 |
-0.17 |
0.863 |
Growth modification appliance (Other) |
-2.74 |
1.62 |
-1.70 |
0.090 |
Stage of treatment at cessation (Treatment planning) |
-17.62 |
1818.23 |
-0.01 |
0.992 |
Stage of treatment at cessation (Appliance fitting) |
-0.12 |
2688.89 |
0.00 |
1.000 |
Stage of treatment at cessation (Review visits) |
-17.61 |
1818.23 |
-0.01 |
0.992 |
Stage of treatment at cessation (End of active treatment) |
-19.30 |
1818.23 |
-0.01 |
0.992 |
Stage of treatment at cessation (Retention phase) |
-20.18 |
1818.23 |
-0.01 |
0.991 |
Duration of active treatment (6– 12 months) |
2.50 |
1.38 |
1.81 |
0.071 |
Duration of active treatment(1 –2 years) |
1.71 |
1.21 |
1.42 |
0.157 |
Duration of active treatment (2– 5 years) |
0.85 |
1.05 |
0.81 |
0.421 |
Duration of active treatment (>5 years) |
-2.70 |
1.02 |
-2.66 |
0.008 |
Table 4 Model coefficient information of logistic regression reduced model
The AIC, BIC and adjusted R-squared values for both full and reduced logistic regression models are shown in Table 5, where the reduced model had the minimum AIC, BIC and adjusted R-squared values than the full model with all predictor variables. Therefore, the reduced model is selected as the better model when compared with the full model.
Model |
AIC |
BIC |
R-Squared |
Full |
172.7617 |
307.2783 |
0.6682 |
Reduced |
152.6485 |
208.6971 |
0.5808 |
Table 5 AIC, BIC and R-Squared values for the logistic regression full model and reduced model
In both full model and reduced models, the variable duration of active treatment more than 5 years was significant at the level of 2-sided alpha 0.05, implying that high rates of discontinuation of treatment with long-term duration of active treatments. Treatment indicated as extraction of deciduous tooth was also significant for the full model while duration of active treatment in between 6 to 12 months was also significant for the reduced model at the confidence level of 95%. Moreover, patients who were treated with simple removable appliances were more prone to discontinue the treatment.
Table 6 shows the 10-Fold cross-validation results of the fitted models. Hence, From the results of model validation, we conclude that the the full model has a high predictive ability compared to the reduced models. Further, Random Forest classifier indicated the highest accuracy and specificity. Moreover, Naïve Bayes classifier had the highest sensitivity even though it indicated the lowest accuracy. Probit model classifier indicated the highest Area Under Curve (AUC) while Naïve Bayes classifier showed the least.
Classifier |
Sensitivity |
Specificity |
Accuracy |
AUC |
Logistic Regression |
70.00% |
98.148% |
94.52% |
95.63% |
Reduced Logistic Regression |
60.00% |
97.407% |
92.58% |
93.30% |
Probit Model |
70.00% |
97.778% |
94.19% |
95.70% |
Reduced Probit Model |
57.50% |
97.037% |
91.94% |
93.94% |
Naïve Bayes |
93.704% |
55.00% |
88.70% |
83.27% |
Random Forest |
92.50% |
99.63% |
98.71% |
88.38% |
Table 6 Sensitivity, Specificity, Accuracy and AUC value of each classifier
Based on the results of this study it can be concluded that the duration of the treatment for dental malocclusion is the main factor significant when predicting the discontinuation of the orthodontic treatment while the factor treatment indicated had a slight effect on the results. The random forest model showed the highest accuracy and highest specificity while the Naïve Bayes model indicated the highest sensitivity on the prediction of discontinuation of the treatment. Thus, the classification-based approach with modern predictive algorithms shows a robust result for orthodontic data.
Authors wish to acknowledge the support from the Faculty of Dental Sciences, University of Peradeniya for data collection and granting us access to utilize data for statistical modelling.
The authors declare that they have no conflict of interest.
There is no funding source.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
©2020 Dharmasena, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7