Historically, low efficiency of genetic improvement in camel species, in compare to other farm animals, mostly related to following factors namely physiological aspects, traditional rearing system and also long generation interval. However, one of the most critical and natural barrier to access high genetic progress in camels breeding can highlighted due to difficulties of handling and restraint of animal during phenotype measurements particularly live body weight due to wild nature of this species as well as the large body size especially in mature age (Kadim et al. 2008). With this mind, currently different potential alternative tools such as weighting tape, visual appraisal and the linear body measurements (Mahmud et al. 2014) and digital image processing (Khojastehkey et al. 2016) are used as alternative procedures for weighing of large animals. In livestock, significant correlations between body size and body weight can be used as a tool for estimating the weight of animals via mathematical equation (Cannas and Boe, 2003; Wangchuk et al. 2017). The results of many studies showed that chest circumstance, body length, hip-width, and shoulder height are the most reliable parameters for estimating live weight in domestic animals (Francis et al. 2002; Atta and El-Khidir, 2004; Afolayan et al. 2006; Durosaro et al. 2013; Iqbal et al. 2014; Bahashwan et al. 2016). The correlation between body weight of dromedary camels with their abdominal circumference, chest circumstance, and body length were estimated 0.9, 0.91, and 0.89, respectively, by Kadim et al. (2008). Kohler et al. (2001) reported significant correlations between weights of camels with the chest circumference hump circumference and shoulder height. The following equation proposed as tape meter by them to estimate the weight of camels using body dimensions:
Camel weight (kg)= shoulder height (m) × chest circumstance (m) × hump circumstance (m) × 5C
Results of several similar studies showed that a regression model could estimate the weight of livestock from their body dimensions accurately (Mehta et al. 2010; Tsegaye et al. 2013). Machine learning (ML) provides a tool by which the human decision practice is simulated by computer science, and it can help us to decide about things, animals or, subjects more accurately and quickly (Du and Sun, 2006). Besides regression methods, artificial neural networks, support vector machine, random forest, and decision trees are some of new machine learning tools which can be used widely to estimate the weight of animals. Each of these machine learning methods has its own advantages and disadvantages. For example, it is reported that artificial neural networks had a higher accuracy than multiple regression model to estimate the weight of rabbits (0.71 compared to 0.66) by Salau et al. (2014), and either by Norouzian and Vakili (2016) to estimate the tail weight of sheep ( 0.93 compared to 0.81). Saberioon et al. (2018) reported that SVM with 82% accuracy had the highest probability to correctly classify fish (age of 3 weeks) to correct diet, while both LR and RF achieved good classification accuracy (75% and 70% respectively), and k-NN displayed the overall lowest accuracy (40%). As reported by Van der Heide et al. (2019), at birth and 18 months of age, naive Bayes showed the highest accuracy, and after first calving, regression showed the highest accuracy in predicting individual survival rate of dairy heifers. However, there are many suitable machine-learning techniques for predicting a distinct trait; thus, it is challenging to determine previously which model will result in the highest accuracy. Selection of the best model may be affected by many factors such as the nature of the variable, and the quantity and quality of data. This allows researchers to use trial and error methods to determine the best model for data analysis (White et al. 2018). In the previous studies conducted to estimate the camel weight, comparison of different machine learning methods has been less considered, and in most studies different regression models were compared. So, the main objective of this study was to evaluate the feasibility of seven Machine Learning Models to estimate the weight of dromedary camels using several live body measurements.
MATERIALS AND METHODS
Dataset and variables
In this study, 458 records, aged from birth to 8 months, in dromedary camels were used. These data were collected from people’s herds (n=50 camels) for 2 or 3 times and the national research and development center of dromedary camels (Bafgh) (n=80 camels) for 3 or 4 times. The dependent variable was body weight (BW), and independent variables were head length (HL), muzzle girth (ML), neck length (NEL), chest girth (CHG), height at whither (WH), height at hump (HH), whither to pin length (WPL), body length (BL), tail length (TL), pin width (PW), and abdomen width (AW), abdomen to hump height (ABH). How to measure of biometric traits on dromedary camels is presented in Figure1. The descriptive statistics of the variables are given in Table 1. The frequency distribution of BW is shown in Figure 2. From the original data set, 448 records selected after quality control using the k-Nearest Neighbors method to eliminate the outlier data with considerable distance from their corresponding nearest neighbors.
Figure 1 Measurement of biometric traits on dromedary camels
Table 1 Descriptive statistics, mean, standard deviation (SD), and coefficient of variation (CV) of each variable
BW: body weight; HL: head length; ML: muzzle girth; NEL: neck length; CHG: chest girth; WH: height at whither; HH: height at hump; WPL: whither to pin length; BL: body length; TL: tail length; PW: pin width; AW: abdomen width and ABH: abdomen to hump height.
Figure 2 The frequency distribution of body weight
Machine learning models
In this study, the following seven machine learning methods have been used for estimating the weight of dromedary camels, during birth-240 day of age, using the body measurements.
1. Multi-variable linear model
Multivariable linear model is one of the most yet simple widely known machines learning method. Traditional linear regression models a relationship between the dependent variable and one or more independent variables (linear regression or multivariable linear regression) using the best fit straight line. This method requires some strict assumptions, such as normality of data and no multi-colinearity in independent variables, among others (Aiken and West, 1991).
2. Random forests
Ensembles of regression trees known as random decision forest or simply random forest (RF) are flexible and easy method for classification and regression tasks. The input vector feed through multiple decision trees and the output value of all the trees is averaged as a single output. Unlike linear models, random forests can capture nonlinear interaction between the independent and dependent variables (Breiman, 2001).
3. Support vector machine regression (linear, radial basis, polynomial)
The support vector machine (SVM) is a famous machine learning algorithm that can be used for both classification and regression tasks. Support vector regression (SVR) is formulated as a minimization convex problem that means finding the best hyper tube that surrounded most of the training samples. When the nonlinear data relies on a high dimensional space, the formation of new dataset scan be captured more efficiently. In fact, the kernel method extends the simple linear machine learning to nonlinear. However, each kind of kernel function has its characteristics based on specific kernel functions of support vector regression with different generalization power. In this study, we consider the three well-known kernel functions at the SVM algorithm, i.e., linear, radial basis function (RBF) and polynomial kernels. Each kernel function has a specific parameter that must be turned to offer the best model (Drucker et al. 1997).
4. Bayesian regularized neural network
Bayesian regularized neural networks (BRNN) are powerful mathematical models that have been used for classification and regression tasks and provide nonlinear models that can get better performance compared to the linear models. BRNNs are more robust compared to the standard back-propagation NNs and can reduce or eliminate the need for lengthy cross-validation (Burden and Winkler, 2008).
5. Extreme learning machine
Extreme learning machine (ELM) is extended version of single hidden layer feed forward networks (SLFN) which does not need any iterative learning steps nor initialize parameters. The weights of first intra layer are randomly is set and don’t change during training and predicting phases. The weights between the hidden layer and output layer can be trained very fast (Huang et al. 2011).
Various evaluation measures have been used to evaluate the performance of the models developed and find the best model for predicting the body weight of camels. We consider five traditional regression evaluation measures, including the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), mean squared error (MSE) and mean absolute percentage error (MAPE) (Celik and Yilmaz, 2017). We used a k-fold cross-validation method to optimize the hyper-parameters of the learning model. K-fold cross-validation is a re-sampling method used to evaluate a model on limited data samples. It divides the whole data into k number of the same size partitions. In each step, a single fold is considering as the validation data for evaluate and the remaining k-1 folds are used for training the model. This process repeated k times; hence, each of the k folds used exactly once as the validation data. The obtained k results combined together to produce a single estimation. A value of k= 10 is typical, in the field of applied machine learning and used the same value in this study (Refaeilzadeh et al. 2009). Figure 3 illustrates the procedure of the proposed method. We used R software version 3.5.2 and caret package version 6, for Machine learning, regression modeling, and statistical analysis (Kuhn, 2012).
RESULTS AND DISCUSSION
The boxplots of all independent variables are shown in Figure 4. Results indicated that none of these variables has a normal distribution. It seems the use of linear models to estimate the weight of camels is not appropriate. As results, the correlations among studied variables are presented in Figure 5.
Figure 3 The machine learning flowchart for prediction of body weight
Table 2 Summary of the tested models
BRNN: bayesian regularized neural networks; ELM: extreme learning machine; RF: random forest; LSVM: support vector machine with linear kernel; PNLSVM: polynomial kernel; RNLSVM: radial basis kernel and LM: learning machine.
Figure 4 The box plots of predictors
Figure 5 The estimated correlations among variables
The body weight correlated with HL (r=0.82), ML (r=0.70), NEL (r=0.84), CHG (r=0.91), WH (r=0.83), HH (r=0.87), WPL (r=0.91), body length (r=0.93), TL (r=0.43), PW (r=0.79), AW (r=0.23), and ABH (r=0.18). Except the TL, AW and ABH, most predictors had good correlation (r>0.7) (P<0.05) with BW. Among predictive variables, the highest correlation was 0.96 between WH and HH, as well as AW and ABH. The high and significant correlation between predictive variables and camel weight confirmed that it is possible to develop mathematical models for estimating the camel weight by biometric measurements. The evaluation criteria of seven Machine learning methods are presented in Table 3. The mean of the coefficient of determination in all models was higher than 0.90, and all studied models had acceptable accuracy for predicting the weight of dromedary camel. The MAE ranged from 6.83 (PNLSVM) to 8.55 (Linear) among models, also the mean MAPE varied from 0.07 (PNLSVM) to 0.11 (Linear). The lowest MSE was 81.12 for PNLSVM, and the highest was 122.92 for the linear model. The RMSE for the PNLSVM model (8.93) was the lowest among all compared models. The 10-fold cross-validation for different evaluation criteria including R2, RMSE, MAE, MAPE, and MSE of the PNLSVM model is presented in Figure 7. However, the R2 of PNLSVM, RNLSVM, RF and NN methods are in the same range; the results showed that PNLSVM model was the best one according to all evaluation criteria such as highest R2 and lowest RMSE, MAE, MAPE, and MSE. The variance of the estimates and the stability of the model has inverse relation so that the lower variance of the estimates express greater stability of the model. In this respect, linear, ELM, and LSVM models had the highest variance and lowest stability among all studied models. On the other hand, at present study, the linear, ELM, and LSVM models had the lowest R2 compared to other studied models, and this confirmed that linear models had low efficiency than nonlinear models to predict the camel weight from biometric traits. In the present study, the variance of camel weights predicted by seven machine learning models was not the same, and the variance of the prediction of the PNLSVM model was less than the others (Figure 6).
Table 3 The evaluation criteria of the seven machine learning models
RF: random forest; ELM: extreme learning machine; RNLSVM: radial basis kernel; LSVM: support vector machine with linear kernel; PNLSVM: polynomial kernel; BRNN: bayesian regularized neural networks.
R2: coefficient of determination; MAE: mean absolute error; MAPE: mean absolute percentage error; MSE: mean squared error and RMSE: root mean squared error.
Figure 6 10-fold cross-validation for coefficient of determination (R2); root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) by the polynomial kernel (PNLSVM)
This result suggested that the PNLSVM model has the highest performance compared to other studied models for predicting the body weight of dromedary camels. In the present study, among all body measurements, the BL, WPL, and CHG were the best predictors to predict the body weight of dromedary camels. Results of many studies confirmed that the CHG, BL, hip-width, and shoulder height are the most reliable parameters for predicting the live weight of domestic animals (Atta and El-Khidir, 2004; Afolayan et al. 2006). Kohler et al. (2001) reported a significant correlation between camel weights with their CHG, hump girth, and shoulder height. Kadim et al. (2008) reported a significant correlation between body weight of dromedary camels with their CHG (0.91) and BL (0.89). The results of these reports are similar to the results of the present study. The importance feature of predictors by the PNLSVM method is depicted in Figure 7. The BL, WPL, and CHG had the highest effect on the prediction of BW, while the AW and ABH had the lowest importance. In terms of the high correlation of biometric variables with the weight, there are similar reports in other domestic animals. The correlation of CHG with live weight of calves (Bahashwan et al. 2016) and dairy cattle (Francis et al. 2002) was reported 0.91, and 0.96, respectively. Also, the correlations between live weight of Kajli goats with shoulder height, CHG and BL were reported 0.81, 0.91, and 0.85, respectively, by Iqbal et al. (2014).
Figure 7 Variable importance by polynomial kernel (PNLSVM) method
In the present study, the AW, TL and ABH were not considered in the development of the final prediction model because of their low correlations with BW. Contrary to current observations, Kadim et al. (2008) reported a high and significant correlation between body weight and abdomen girth (0.9) in dromedary camels. Also, a correlation between BW and abdomen girth was reported 0.86 by Khojastehkey et al. (2019) in Kalkoohi camels. The existence of different and contradictory reports of the correlation between the two traits in different reports may be related to how traits are defined, how they are measured, and also the status of studied animals. As result, although the accuracy of 7 studied models for estimating the weights of dromedary camels differed, the coefficient of determination (R2) of models was high, and all models could predict the BW of dromedary camels, accurately. The PNLSVM model followed by BRNN, RNLSVM, and RF models were found to provide more accurate predictions of BW, overtaking the Linear model. Except for PNLSVM model, BRNN model had higher accuracy and lower error than other machine learning models studied in the present study. Although in the present study the accuracy of the BRNN model was equal to PNLSVM, but the MSE and RMSE for the BRNN model was higher than that of PNLSVM. Therefore, in general, the PNLSVM is superior to the BRNN in estimating the weight of dromedary camels. The relation between the observed and predicted body weight by PNLSVM model is visualized in Figure 8. The difference between the observed weights and the weights predicted by the PNLSVM model is very negligible, and this shows that this model has a worthy performance in predicting the weight of dromedary camels.
Figure 8 q-q plot between the observed and predicted body weight by PNLSVM method
Several studies have compared linear and machine-learning methods such as artificial neural network model, Bayesian model, support vector machines, and random forest to predict the performance of domestic animals for different productive and reproductive traits (Saberioon et al. 2018; Van der Heide et al. 2019). The coefficient of determination of the linear model in the present study (0.93) was close to Tsegaye et al. (2013), and our finding also was close to Huma and Iqbal (2019) (0.92). While, Mehta et al. (2010) reported lower accuracy using linear equations to estimate the weight of Indian camels (0.66) in comparison with this study. In this study, the coefficient of determination (R2) of models were close to Huma and Iqbal (2019), reported for weight prediction of Baluchi sheep using SVM and RF methods (R2=0.98). However, the R2 values of nonlinear models in present study were higher than the values in study of Ali et al. (2015), used artificial neural network and decision tree algorithms for predicting live weight of Harnai sheep with the accuracy of 0.81 to 0.84.
This study employed seven machine learning models containing linear model, ELM, LSVM, RF, RNLSVM, PNLSVM, and NN to predict the BW of the dromedary camels using various body measures. The results of the present study showed that all machine-based learning methods had better results than classical regression methods to estimate the weight of camels, but the PNLSVM model was superior to the others in terms of higher accuracy and lower error, as well as more excellent stability of the model. Although the final model proposed in estimating the weight of camels was very efficient, this model is not interpretable, and the herd man cannot easily use it to estimate the weight of camels.
We sincerely thank the staff of National Research and Development Center of dromedary camels (Bafgh) and from camel breeders.