Research Article Volume 7 Issue 5
Department of Statistics and Computer Science, University of Peradeniya, Sri Lanka
Correspondence: UNJ Bandara, Department of Statistics and Computer Science, University of Peradeniya, Peradeniya, Sri Lanka
Received: August 08, 2018 | Published: September 19, 2018
Citation: Bandara UNJ, Nawarathna LS. Socioeconomic factors affecting suicides in Sri Lanka. Biom Biostat Int J. 2018;7(5):405-411. DOI: 10.15406/bbij.2018.07.00238
Suicide is a serious public health problem in Sri Lanka with strong emotional repercussions for its survivors and for the families of its victims. More than 4000 people in Sri Lanka killed themselves in 2016 and were listed as having the third highest suicide rate. Hence, suicides have emphasized the importance of various factors in the prediction of suicide. Therefore, the main objectives of this study are to determine the major factors associated with suicide and predict the risk of suicide by age and gender separately. We analyzed data based on social and environmental surroundings from 2005 to 2011, obtained from the Sri Lanka Police. Additionally, the data from 2014 - 2016 were used for validation purposes. Factor analysis method was used to determine the variability among correlated variables. Moreover, LASSO regression model, Poisson regression model and Negative binomial regression model were used to predict the total suicides. AIC values were used to identify the best model in the Poisson regression and Negative binomial models. Besides, finest lambda value was used to identify the best LASSO regression model. All the predictions are made assuming that the prevailing conditions in the country affecting crime rates remain unchanged during the period. Further, the mean absolute percentage error (MAPE), is calculated, in order to find the prediction accuracy of the proposed model. Model accuracies for proposed LASSO regression model, Poisson regression model and Negative Binomial model are approximately 89.43%, 87.67% and 80.65% for males, and 76.22%, 92.81% and 90.58% for females respectively. Based on MAPE values, LASSO Regression model was selected as the best male model and the Poisson regression model was selected as the best female model.
Keywords: factor analysis, least absolute shrinkage and selection operator, akaike information criteria, mean absolute percentage error.
Most of the people think the meaning of suicide is some person choses to kill themselves, but the real explanation of suicide is some person acts of intentionally causing one’s death (Suicide terminology, Wikipedia). Other than that, they do not want to live furthermore in the world. Most of suicidal people have some types of mental conditions at that moment. Moreover, suicide is a sign of serious depression, and hence suicide and depression have strong interrelated connection.1–4 Approximately 800,000 people die due to suicide annually around the world, furthermore one person die due to suicide every 40 seconds in the world. Therefore, suicide is one of the most important global health problems in the world and it is a complex human behavior.5 Suicide occurs many countries, however suicide rate is relatively low in high income countries.6 Among total suicide around the world, 78% of suicides occur in low and middle-income countries. In Sri Lanka, more than 4000 people die due to suicide annually. According to the report conducted by the,7 Sri Lanka was ranked in the 3rd position among the 172 countries in terms of most suicide prone countries in the world. Since final few decades, Sri Lanka’s suicide rate was very high, in 1995/96, suicide rate in Sri Lanka was 47.0 per 100,000 populations, that was more than two times global suicide rate.8 In 2016 suicide rate of Sri Lanka was 28.8 per 100,000 populations, while the global rate was 16 per 100,000 population considered by WHO, that was approximately two times global suicide rate.9,10
When we consider district wise, most suicide were reported in rural areas. Maximum suicide is reported in Jaffna, Vavniya, Monaragala, Polonnaruwa districts and Colombo; Galle districts have recorded minimum suicides. Besides the leading suicide age range is 16 to 24 ages and suicide rate decrease after 30 year of age. After 50 years of age again increases the rate of suicide.11 Teenagers have higher suicide rates because of their parents’ pressure to fit in socially, to perform academic, their love affairs and spending more time mobile phone, laptop, social media become major causes for teenage suicide. Also, men suicide rate is greater than around 3.8 times women rate. Main objectives of this study are to provide the most current suicide statistics and examine the risk factors associated with suicide. Suicide is a statistically rare event and it is difficult to predict individual which die by suicide. Poisson regression model, negative binomial regression model and lasso regression were used as analysis methods.
This study provides the most current suicide statistics and examines the risk reasons associated with suicide. Suicide is a statistically rare event and it is difficult to predict individual which die by suicide.12,13 Poisson regression model, multiple linear regression model and lasso regression were used as analysis methods. The main objectives of this study are to determine major significance socioeconomic factors associated with suicide, predict the future risk of suicide by gender and selecting the best model for suicide predictions.
This article is organized as follows. In Section 2, we describe the materials and methods used to analyze the suicide data from 2005-2011. Section 4 describes the results obtained from the study and Section 5 concludes with a discussion. All the computations in this article have been performed using the statistical programming language R.
This study is based on data from 2005 to 2011 and 2014 – 2016 total suicides in Sri Lanka which collected from Statistics Department of Sri Lanka Police. The dependent variable is risk factor of total suicide and nine independent numerical variables that are considered in this study are listed in Table 1. This data set is a count data set and it does not contain any negative integers. Moreover, explanatory variables are discrete variables.
Variable |
Variable name |
Y |
Total suicides |
X1 |
Family disputes |
X2 |
Physical disabilities |
X3 |
Mental disorders |
X4 |
Loss love affairs |
X5 |
Economic problems |
X6 |
Addiction drugs |
X7 |
Problem with elders |
X8 |
Loss relationships |
X9 |
Employment problem |
Table 1 Description of variables
Line graph was plotted to identify variation pattern of total suicide in both genders from 2005 to 2011-time period. Then the factor analysis was used to explain whether the relationships of variables observed, correlated variables have similar patterns regarding to total deaths of suicide. We predict the total suicides using the LASSO (Least Absolute Shrinkage and Selection Operator) regression, Poisson regression and Negative binomial models.
LASSO regression model
LASSO regression model is used for attribute selection, shrinkage and improving the accuracy of predictive models. It is a parametric model, implies that it can be applicable to a specific model that has to be a postulated. This has the effect of shrinking coefficient values (and the complexity of the model), allowing some with a minor effect to the response to become zero. When increasing the value of lambda parameter, it will reduce the magnitude of the coefficients, but it will not result in the exclusion of any of the variables. LASSO coefficients, minimize the quantity,
To fit the lasso model, alpha value is specified as one. But alpha value equals to one is not a default. Then deviance graph was plotted, which explains deviance percentage of how much deviance explained, also it is equal to the R-squared in the regression model.
Poisson regression model
Poisson regression is also generalized linear model format of regression analysis. Sometimes Poisson regression model known as log linear model. Characteristic of Poisson distribution is that the mean and variance are same. But in certain circumstances observed variance is greater than mean value. This suicide data set is a count data set and it does not contain any negative integers. Moreover, explanatory variables are discrete variables and hence, can apply Poisson regression model.
We use only significant variables from training data set and then predicted the number of suicides.
Negative binomial regression
Negative binomial regression is a type of generalized linear model in which the dependent variable is a count of the number of times an event occurs. This model can be use under, dependent variable is an observed count that follows negative binomial distribution and the possible values of response variable are nonnegative. Negative binomial regression is implemented using maximum likelihood estimation. This model is generalization of Poisson regression and based on Poisson-gamma mixture distribution (reference).
Model validation
The data from 2005 – 2011 were used for model fitting and the data from 2014 - 2016 were used for model validation purposes. AIC (Akaike Information Criterion) values were used to identify best model. AIC values measure relative quality of the statistical models and the best model has lowest AIC value. Then deviance graph was plotted, which explains deviance percentage of how much deviance explained. Moreover, the mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is measured, in order to find the prediction accuracy of the proposed models.
Where, is the Actual value and is the Predict value. MAPE value measures the absolute error percentage. After constructing three models, accuracy of these three models were compared using MAPE value and model with the maximum accuracy was selected as the best model.
Data from 2005 to 2011 was considered for the preliminary analysis of male and female suicides and line graph was obtained using the training data set.
According to Figure 1, number of male suicides has always high propensity than female suicides in every year. But male suicides gradually decrease annually. There is no significant difference in female suicides during this period. Male suicide percentage is approximately equal to 3.5 times higher tendency than female suicide percentage, which implies that there is a huge impact on the male suicide for Sri Lanka suicide rate. Factor analysis was used to investigate same pattern variables in the data set. In this study permanently there are nine factors included. Latent small number of variables were found using factor analysis and highest positive factor loading were selected regarding the relevant factor. Further number of factors based on observed variables were identified and factors were named accordingly (Table 2).
Figure 1 Line graph of total number of suicide male and female separately for the period (2005-2011).
Variable |
Factor 1 |
Factor 2 |
Factor 3 |
X1 |
0.83 |
0 |
0.5 |
X2 |
0.88 |
0.32 |
0 |
X3 |
0.86 |
0 |
-0.25 |
X4 |
-0.8 |
-0.23 |
0.46 |
X5 |
0 |
0.89 |
-0.39 |
X6 |
0.57 |
0.79 |
0 |
X7 |
0 |
0.92 |
0 |
X8 |
-0.54 |
0 |
0.84 |
X9 |
0 |
0 |
0.96 |
Table 2 Description of three factor model
“Harassment of family”, “Physical disabilities”, “Mental disorders” variables strongly associated with Factor1, and hence these variables are belonging to Factor1. Therefore, factor 1 can be renamed as Social Problems. “Economic problems”, “Addiction to drugs”, “Problems with elders” variables are strongly associated with Factor 2 and hence can be renamed as “low personal income”. Moreover, Factor 3 can be renamed based on losing personal life goals as “Employment problems”, “Loss love affairs” and” Loss relationships” variables strongly associated with Factor 3. Under LASSO regression, total suicide of male and total suicide of female was considered from 2005 to 2011 time period. There are no missing values in the data base. Nine variables are the independent variables and response variable is number of complete suicides of this study. Figures 2–4 shows the actual and predicted male and female suicides respectively using the LASSO regression model.
Table 3 shows the AIC values for models fitted using Poisson regression and negative binomial models according backward selection method.
Method |
Gender |
Model |
AIC values |
Poisson Regression |
Male |
Null |
5110.1 |
Total~X1 |
4114.2 |
||
Total~X1+X2 |
1057.2 |
||
Total~X1+X2+X3 |
816.11 |
||
Total~X1+X2+X3+X4 |
590.63 |
||
Total~X1+X2+X3+X4+X5 |
435.52 |
||
Total~X1+X2+X3+X4+X5+X6 |
389.37 |
||
Total~X1+X2+X3+X4+X5+X6+X7 |
296.52 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8 |
283.84 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8+X9 |
254.44 |
||
Female |
Null |
1793 |
|
Total~X1 |
984.9 |
||
Total~X1+X2 |
782 |
||
Total~X1+X2+X3 |
469.22 |
||
Total~X1+X2+X3+X4 |
278.18 |
||
Total~X1+X2+X3+X4+X5 |
267.25 |
||
Total~X1+X2+X3+X4+X5+X6 |
206.6 |
||
Total~X1+X2+X3+X4+X5+X6+X7 |
105.94 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8 |
107.76 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8+X9 |
109.4 |
||
Negative Binomial |
Male |
Null |
186.38 |
Total~X1 |
186.38 |
||
Total~X1+X2 |
177.9 |
||
Total~X1+X2+X3 |
176.98 |
||
Total~X1+X2+X3+X4 |
173.29 |
||
Total~X1+X2+X3+X4+X5 |
174.69 |
||
Total~X1+X2+X3+X4+X5+X6 |
171.68 |
||
Total~X1+X2+X3+X4+X5+X6+X7 |
169.02 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8 |
165.41 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8+X9 |
164.08 |
||
Female |
Null |
155.49 |
|
Total~X1 |
155.49 |
||
Total~X1+X2 |
155.11 |
||
Total~X1+X2+X3 |
151.31 |
||
Total~X1+X2+X3+X4 |
145.72 |
||
Total~X1+X2+X3+X4+X5 |
146.39 |
||
Total~X1+X2+X3+X4+X5+X6 |
142.84 |
||
Total~X1+X2+X3+X4+X5+X6+X7 |
107.94 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8 |
109.76 |
||
Total~X1+X2+X3+X4+X5+X6+X7+X8+X9 |
111.4 |
Table 3 AIC values for fitted models according backward selection method
According to above AIC values table, we can clearly see that the corresponding AIC values for the different models. Under the procedure the minimum AIC value is 254.44, that is the full model. Under AIC value condition, best model includes lowest AIC value. Therefore full model is the best model for total male suicide model. Selected models with minimum AIC value for each category are highlighted. The plot of male and female suicides for fitted Poisson Regression and Negative binomial Regression and are shown in Figures 4–7.
Table 4 shows the comparison of model accuracies of the fitted models.
Model |
Male |
Female |
||
MAPE |
Model Accuracy |
MAPE |
Model Accuracy |
|
LASSO Regression |
10.57% |
89.43% |
23.78% |
76.22% |
Poisson Regression |
12.33% |
87.67% |
7.19% |
92.81% |
Negative binomial Regression |
19.35% |
80.65% |
9.42% |
90.58% |
Table 4 Model comparison for both male and females
LASSO regression model, Poisson regression model and negative Binomial model has the model accuracies 89.43%, 87.67% and 80.65% respectively for males and 76.22%, 92.81% and 90.58% respectively for females. Based on the model accuracies, LASSO regression model is selected as the best suicide model for males, whereas Poisson regression model was selected as the best suicide model for females.
In last few years risk factors of complete suicide in Sri Lanka have not been changed in significantly. First, we found that male suicide percentage approximately equal to the 3.5 times high propensity than that of female suicide percentage. In this study, it was determined that the most motivational risk factor for complete suicide is categorized into three factors, namely physical problems, low personal income problems, losing personal life goals. Furthermore, LASSO regression, Poisson regression and negative binomial regression models were used to identify the significance of variables associated with suicide, and total suicides were predicted. LASSO regression model was a better model for male complete suicide among these three fitted models. For female suicides, the lowest MAPE value is obtained in Poisson regression model and hence it is the better model for female complete suicide among three fitted models.
None.
Author declares that there is no conflict of interest.
©2018 Bandara, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7