Prediction of Girolando cattle weight by means of body measurements extracted from images

The objective with this study was to analyze the body measurements of Girolando cattle, as well as measurements extracted from their images, to generate a model to understand which measures further explain the cattle body weight. Therefore, the experiment physically measured 34 Girolando cattle (two males and 32 females), for the following traits: heart girth (HGP), circumference of the abdomen, body length, occipito-ischial length, wither height, and hip height. In addition, images of the dorsum and the body lateral area of these animals allowed measurements of hip width (HWI), body length, tail distance to the neck, dorsum area (DAI), dorsum perimeter, wither height, hip height, body lateral area, perimeter of the lateral area, and rib height. The measurements extracted from the images were subjected to the stepwise regression method and regression-based machine learning algorithms. The HGp was the physical measure with stronger positive correlation with respect to body weight. In the stepwise method, the final model generated R2 of 0.70 and RMSE of 42.52 kg and the equation: WEIGHT (kg) = 6.15421 * HWI (cm) + 0.01929 * DAI (cm2) + 70.8388. The linear regression and SVM algorithms obtained the best results, followed by discretization regression with random forests. The set of rules presented in this study can be recommended for estimating body weight in Girolando cattle, at a correlation coefficient of 0.71, by measurements of hip width and dorsum area, both extracted from cattle images.


Introduction
Information on the body weight of dairy cattle is important for the producers to make decisions regarding nutrition, management, genetics, health, and the environment. There are studies on the influence of bovine weight in their diet, reproduction, milk production, health, and welfare (Roche et al., 2009;Tasdemir et al., 2011). However, small producers may not afford commercial scales as they may be expensive in some cases. Researchers have worked with equations to relate biometric measurements to body weight in cattle (Reis et al., 2008;Bretschneider et al., 2014;Franco et al., 2017). Heart girth is a measure highly correlated with weight and, therefore, to facilitate the weighing process, researchers have developed a measuring tape, which contains the reference scale to find out the body weight of the animal (Sales et al., 2009;Abreu et al., 2015). However, as this tape requires direct contact of a person with the animal, it may cause stress and slows down the body weight measuring process.
Computation, mainly in the area of image analysis and interpretation, has been used as a tool to, through photos or videos, extract characteristics that predict the weight of both dairy and beef cattle, fish, sheep, and pig (Tasdemir et al., 2011;Ozkaya, 2013;Song et al., 2018;Mortensen et al., 2016;Saberioon et al., 2017;Menesatti et al., 2014;Jun et al., 2018). There are difficulties in the automatic processing of these images, mainly in relation to the extraction of the object of interest by means of segmentation (Noviyanto and Arymurthy, 2012). In some cases, such as in cattle operations, the environment in which data collection is carried out may not be directly controlled sometimes.
The objective with this study was to analyze the body measurements of Girolando cattle based on real measurements and those extracted from their images, to assemble a prediction model to understand which measures better explain the body weight and how they influence the weight prediction in dairy cattle. The purpose was to understand which bovine images allow for the best regression model for body weight prediction.

Material and Methods
The image dataset of the experiment contained 68 images of 34 Girolando cattle -one image of the dorsum and another of the body lateral area of each animal. In addition to the images, the main measures of length, height, and perimeter were extracted physically: heart girth (HG P ), circumference of the abdomen (CA P ), body length (BL P ), occipito-ischial length (OIL P ), wither height (WH P ), and hip height (HH P ). The "P" subscript indicates that the origin of the measures is the animal's physical structure. These measurements were supported by a plastic measuring tape (Bovitec, Brazil) used for weighing dairy cattle by producers who do not have access to scales (Figure 1). The weight of each animal was recorded by a scale and annotated in the image dataset. For the construction of the image dataset, images and measurements were collected on May 24, 2017 in Campo Grande, MS (20° 38' 77.450" S latitude and −54° 60' 70.040" W longitude). Twelve Girolando (two males and 10 females; milk cattle) were measured and filmed. In addition, 22 Girolando cows were weighed and measured on August 2, 2017 in Terenos, MS (−20° 36' 73.24" South latitude and −54° 96' 68.17" longitude West). The physical measurements of the cattle body were collected ( Figure 1); videos were recorded to assemble an image dataset for measurement extraction. Although the collections were carried out in two different places, the images were included in the same image dataset.
The first data acquisition was performed in the afternoon, under natural light, on a sunny day with a few clouds. The second data acquisition started at 8.00 h and stopped at 17.00 h, throughout a sunny day, so that no artificial light source was required. In both data acquisitions, the animals were individually identified by using a wax marker stick, with the numbering corresponding to the weighing order, in addition to two colored lines drawn on the dorsum and two other colored lines drawn on the hind leg. The lines measured 30 cm internally and were made to calibrate the image to achieve a proportional measurement (Figure 2). For the image acquisition, the equipment used was a DVR-MD-1004NS digital video recorder (MIDI JAPAN, China), AHD 720p camera resolution, and 1 TB recording capability. The equipment was fixed to the trunk structure to acquire the images of the dorsum and walls of the fence for the acquisition of the body lateral area images (Figure 3). The videos  made at 30 frames per second were stored in the DVR and then transferred to computers, for preprocessing to extract the best images. The images selected were the ones with the best visualization of the colored drawn lines and of the contours of the cattle for measurement extraction (Figures 4 and 5). In total, 68 images were selected, 34 of which were from dorsum and 34 of body lateral area.
The images were subjected to ImageJ software (version 1.52, National Institutes of Health, USA) for measurement extraction. The software measured the hip width (HW I ), body length (BL I ), tail distance to the neck (TN I ), dorsum area (DA I ), dorsum perimeter (DP I ), wither height (WH I ), hip height (HH I ), area of the body lateral area (BLA I ), perimeter of the lateral area (PLA I ), and rib height (RH I ). The "I" subscript indicates that these measures were extracted from images.
After extracting the measurements described above, the data were analyzed using the SAS ® software (Statistical Analysis System, University Edition) to identify erroneously marked data or outliers, and to extract the mean, median, standard deviation, mean standard error, and maximum and minimum values for each variable.
The fact of obtaining too many variables, which, in the current case, corresponds to measurements from bovine images, may make the acquisition of images a process that is not efficient and even impossible.  Prediction of Girolando cattle weight by means of body measurements extracted from images Weber et al.

5
We thus sought to maximize the variance and analyze the (co)variances between the variables to identify the most significant ones. To meet the need described, we estimated the correlation matrix that allows to infer the existing associations between the quantitative variables, and thus to understand the relationships between the measurements extracted from the animals and the measurements extracted from the images of the same animals. Another way to evaluate the relationship between variables is to analyze the dispersion diagram. Thus, the variable with the highest correlation with body weight is presented in a scatter diagram.
The experiment led to the construction of a model with the best predictors from measurements taken from bovine images to predict body weight. In this sense, regression may offer either equations or lists of rules for this purpose. For each observation i, the value to be predicted is conditioned by the information contained in the variables X according to equation 1: in which y is the predicted value, β is the coefficient associated with variable X, and ε is the random error.
The stepwise method was used to find out the order of the predictors and their significance in the model generation. For the removal of the non-significant effect predictors, we defined that any effect on the model that was not significant at the standard level of 0.35 was removed and the algorithm would proceed to the next step. However, the most significant addition was maintained, as long as it was significant at the standard input level of 0.05. The stepwise method was performed with all predictors of measurements taken from the bovine images: HW I , BL I , TN I , DA I , DP I , WH I , HH I , BLA I , PLA I , RH I , and the body weight of each animal as independent variable. From then on, regression algorithms were used to predict body weight, from one or more dependent variables, composed by the measurements extracted from images.
Among the tools in the area of artificial intelligence that may aid in the tasks of machine learning, we selected the software Weka 3.8, which began to be developed in 1993 by the Machine Learning Group of the University of Waikato in New Zealand (Witten and Frank, 2005). It aggregates machine learning algorithms used for data mining tasks, by supporting the application of data preprocessing, sorting, regression, grouping, and association rules (Khan and Quadri, 2012). These methods allow for a computer program to automatically analyze a large body of data and decide which information is most relevant (Quilan, 1993).
The algorithms were: Linear Regression with model selection using the Akaike Information Criterion (AIC), SVM for regression (Smola and Scholkopf, 2004;Shevade et al., 2000), and Regression by discretization (Frank and Bouckaert, 2009) with Random Forests (Breiman, 2001). From the whole set of animal measurements, 66% were separated for training and the remainder for testing.
To verify which regression technique presented the best performance, the model was evaluated by means of the correlation coefficient that measures the degree of correlation between the variables, the residual by the mean absolute error (MAE), and the root mean square error (RMSE).

Results
Both body measurements extracted from the animals and those extracted from the images showed high correlations with body weight and some among themselves (Table 1). This matrix provided a way to examine the data interdependence. The examination of the correlation matrix indicated that the dependent variable HG P is the one with the highest bivariate correlation (0.88) with body weight dependent variable, followed by CA P (0.79), HW I (0.65), BL I (0.58), and HH P (0.55). Among these five variables, HW I and BL I were measures extracted from images.
The dispersion between HG P and body weight is presented near the regression line ( Figure 6). By means of analysis based on descriptive statistics (Table 2), we observed that the experiment included animals between 360 and 596 kg, with average weight of 473 kg and standard deviation of 62.8 kg.
In the stepwise method, the variables BL I , TN I , DP I , WH I , HH I , BLA I , PLA I , and RH I were evaluated and eliminated after evaluating their contributions as they were added or removed from the model. The final model (Table 3)    From the equation established by the stepwise method, body weights were predicted for the animals of the experiment so that we could compare ( Figure 7) the actual body weight and the estimated body weight at a 95% confidence interval.
The performance of algorithms from Weka package in the test set demonstrated that linear regression and SVM algorithms obtained the best results. The difference of correlation coefficient between the two algorithms was 0.03. The regression by discretization with Random Forests presented a correlation coefficient of 0.62 and an absolute mean error of 41.56 kg, which corresponds to 8.79% of the   average body weight, whereas RMSE showed a value of 50.94 kg, which corresponds to 10.77% of the average weight of cattle. The lowest absolute error found was in the linear regression, 38.46 kg, which corresponds to 8.13% of the average weight of cattle (Table 4).
Equation 2 presents the model trained in this experiment by the linear regression algorithm, in which WEIGHT is the estimated body weight of Girolando cattle, HW I is the hip width, and DA I is the dorsum area, taken from the animal images.
The rules generated by the Discretization Regression algorithm in the Random Tree algorithm were organized in tree form ( Figure 8); the first attribute to be considered is the hip width on the main node, followed by the dorsum area. In this way, rules were nested to the leaf node, which is where the weight limit and the instances that fit this set of rules are defined. Given the measurements and a new animal, for example, if the hip width measurement is less than, or equal to, 45.51 cm and the dorsum area is less than, or equal to, 5469.64 cm, the weight of that animal should be between 383.6 and 407.2 kg. Eight animals in the model were supported by these rules; six of them presented the first value of the weight limit, and two presented the second value.

Discussion
The variables WH P and HH P have a high correlation (0.86), which was expected, since they correspond to the animal height. A high correlation between OIL P and BL P (0.81) was also expected, since the distance from the tail to the neck is included in the body length. Also regarding the measurements extracted from the animals, a significant correlation among CA P , WH P , and HH P is similar to Heinrichs et al. (1992) and Pena et al. (2015).   Prediction of Girolando cattle weight by means of body measurements extracted from images Weber et al. 9 The measurements DA I and DP I , as well as BLA I and PLA I , extracted from the images, present a high correlation. Measures HH I and WH I have a high correlation, similar to the same measurements taken from the animals. Regarding RH I , there is a high correlation with WH I and HH I , because it is contained in the height measures, that is, part of the height value of the rump and the withers represents the height of the ribs: the sum of the height of the ribs with the height of the lower limbs results in the height of the rump, and so does the sum of the height of the ribs with the front limbs, in relation to the withers.
The HG P is clearly in strong positive correlation with body weight (Figure 6), since when the heart girth increases, the weight increases, such as other authors have observed (Heinrichs et al., 1992). In addition, the high standard deviation for HG P (Table 2) is largely due to the heterogeneities of the group analyzed, especially in differences related to age, sex, and number of parturitions (for cows).
There is a correlation between the measurements extracted from the animal images and the live weight (Table 3). In the variables studied, the HW I extracted from the images was the one that presented the highest correlation with body weight (r = 0.65), which agrees with Bretschneider et al. (2014), who studied this variable in Holstein cattle. There are high correlations of body length (r = 0.58), length to the neck (r = 0.47), and dorsum area (r = 0.42) with body weight. Body weight was also correlated with area and the perimeter of the dorsum.
Although the coefficients of body weight regression in relation to BL I and TN I are significant in the correlation matrix, the increase in adherence in the model by the inclusion of these two predictors does not justify the cost of acquiring this measure. Considering the correlation coefficient, we observed that the prediction based on the equation generated by the stepwise method (Table 4) was 70.83, a value very similar to the one found by Reis et al. (2008) in relation to measurements of Gir and Holstein cows.
With the 95% confidence level established in the method, we observed that the estimated value of body weight varies close to the actual weight, as is the case of the animals 1, 5, 6, 10, 14, 16, 17, 22, 25, 31, and 32.
Hip width occupies the root node of the tree and the dorsum area is the second trait that most influences body weight (Figure 8), which agrees with the stepwise regression, since both techniques presented relevance of the same traits.
An analysis of the results of the stepwise regression method allowed to evaluate the behavior of all variables extracted from the images at a correlation coefficient level of 0.70 and identified HW I and DA I as the variables that best explained the Girolando cattle weight, which agrees with several authors that reached the same value with measures extracted from the animals. As both are measures related only to bovine dorsum, further experiments with more data will be required to confirm if an image of the upper part of the animals is enough to estimate their weight.
The experiments with the linear regression algorithms, SVM for regression, and regression by discretization with random forests are useful to estimate models to predict body weight. However, it is necessary to evaluate the validation of the generated models, as well as to test other algorithms, to verify their appropriateness for the present purpose.
As future work, ranking algorithms should be tested as a technique for cattle sorting according to their body weight, which may be useful either for animal breeding and classification for slaughter.

Conclusions
In addition to the equation WEIGHT (kg) = 6.15421 * HW I (cm) + 0.01929 * DA I (cm 2 ) + 70.8388, the set of rules presented in this study may be recommended for estimating Girolando cattle body weight at a confidence level of 95% and a correlation coefficient of 0.71, by using measurements of the hip width and dorsum area, extracted from cattle images.