^{1}

^{2}

^{3}

^{3}

This study was conducted on 2049 eggs, collected from commercial white layer hybrids, with the purpose of predicting egg weight (EW) from egg quality characteristics such as shell weight (SW), albumen weight (AW), and yolk weight (YW). In the prediction of EW, ridge regression (RR), multiple linear regression (MLR), and regression tree analysis (RTM) methods were used. Predictive performance of RR and MLR methods was evaluated using the determination coefficient (R^{2)} and variance inflation factor (VIF). R^{2} (%) coefficients for RR and MLR methods were found as 93.15% and 93.4% without multicollinearity problems due to very low VIF values, varying from 1 to 2, respectively. Being a visual, non-parametric analysis technique, regression tree method (RTM) based on CHAID algorithm performed a very high predictive accuracy of 99.988% in the prediction of EW. The highest EW (71.963 g) was obtained from eggs with AW > 41 g and YW > 17 g. The usability of RTM due to a very great accuracy of 99.988 (%R^{2)} in the prediction of EW could be advised in practice in comparison with the ridge regression and multiple linear regression analysis techniques, and might be a very valuable tool with respect to quality classification of eggs produced in the poultry science.

In poultry production, animal products, such as poultry meat and egg, which are of a cardinal economic significance, are quite essential for the development of the economy of a country and meeting nutrient requirements of humans in the world. In the context of egg production, the quality of an egg, which is not only known as the basic product in poultry activities by breeders, but also provided cheaply to consumers, is dependent on internal (albumen weight and yolk weight) and external (shell weight) quality traits.

In recent years, RTM has been adopted extensively in the animal science field. The RTM was applied in the prediction of body weight and milk yield in different sheep and cattle breeds (

It is very easy to apply RTM for ordinal, nominal, and continuous variables. In particular, the CHAID algorithm in RTM among data mining algorithms (CART, CHAID, and Exhaustive CHAID) is practiced to construct a decision tree for regression problems, because it reveals non-linear and interaction effects among independent variables, and is a favorable alternative to RR and MLR techniques for continuous dependent and independent variables, as in the investigation. The CHAID algorithm uses merging, splitting, and stopping stages in the construction of a decision tree, and converts continuous variables into ordinal variables. It forms homogenous groups (nodes) by recursively splitting nodes for maximizing variability among nodes (

There is a limited availability of RTM in the poultry science (

Data of this study have been taken from

Structures of the inspected variables on 2049 eggs used in the experiment could be summarized briefly as follows: shell weight (SW, g) - continuous (quantitative) variable; yolk weight (YW, g) - continuous (quantitative) variable; albumen weight (AW, g) - continuous (quantitative) variable; and egg weight (EW, g) - continuous (quantitative) variable.

In the current study, EW was taken as a dependent variable (target), and YW, AW, and SW were considered independent (explanatory) variables, in order to predict the EW via MLR, RR, and RTM (CHAID algorithm) analysis methods.

In general, MLR can be written in matrix notation as:

Ridge regression is utilized as a more effective method than the least squares method in the event of multicollinearity. In RR analysis, the cross-product matrix for descriptive independent variables (SW, AW, and YW) is placed and ascended to one of the diagonal elements.

Ridge regression was an alternative predictor having a lower mean square error. Its estimator is indicated by a parameter 0 ≤

in which

By the means of

As a tree-based model, RTM recognizes the best independent variables influencing the target variable (

There are three steps (merging, splitting, and stopping) in the CHAID algorithm that allows multiple splits of any node for a regression problem (

The CHAID algorithm merely manipulates nominal or ordinal categorical independent variables. For this reason, continuous independent variables are converted into ordinal independent variables prior to using the following algorithm. For a given set of break points _{1}, _{2},...,_{K −1}

When K is the preferred number of bins, for the estimation of the break points _{i}

For k = 0 to (K−1), set _{k}
_{k}

Bonferroni adjustment was performed for RTM based on CHAID algorithm to obtain Adjusted P values of F values. The tree-based algorithm, having an automatically pruning process in ignoring unnecessary nodes in the decision tree, uses F significance test when a continuous dependent variable was used. We applied a ten-fold cross-validation under the statistical evaluation.

Initially, Pearson correlations were estimated between pairs of egg traits. The predictive power of RR, MLR, and RTM was measured by using the coefficient of determination (%R^{2)} as a proportion of the explained variability in EW. In the study, IBM SPSS 22 program was used for the statistical analyses.

In multicollinearity case, RTM algorithms visually admit quite easier interpretation of the data to construct decision trees, in comparison with the implementation of traditional methods, such as MLR and RR. In the statistical performance, RR analysis is traditionally advisable when compared with the multiple linear regression analysis.

Significantly positive correlations were found among egg quality traits (P<0.01). Pearson correlations between SW and YW (r = 0.470), SW and AW (r = 0.539), YW and AW (r = 0.654), SW and EW (r = 0.642), YW and EW (r = 0.777), and AW and EW (r = 0.932) were estimated in the present egg data.

To explain the total variability in EW, the collected egg data were analyzed using MLR. The MLR results are summarized in

SE - standard error; t - t test value; VIF - variance inflation factor.

SW - shell weight; YW - yolk weight; AW - albumen weight.

S = 2.01925; R-Sq = 93.4%; R-Sq(adj) = 93.4%.

Ridge regression for the collected data was executed for the prediction of total variability in EW, also preferred as an alternative method to MLR. Results from the regression analysis suggested that there was a very good explanation of 93.15% and illustrated a very similar tendency to the present MLR results addressed above (

VIF - variance inflation factor.

Ridge regression coefficient section for k = 0.005000.

A decision tree diagram was constructed via CHAID algorithm for obtaining detailed information on the independent variables significantly affecting EW (

Node 0, also known as a root node at the top of RTM diagram, was presented in all the studied eggs. The average EW for node 0 was 58.764 (S = 7.872) g from 2049 eggs. Node 0 was divided into eight new child nodes (nodes 1-8) on the basis of AW. Within these eight nodes, nodes 1, 2, 4, 6, and 7 appeared to be terminal nodes in the RTM diagram drawn via CHAID algorithm.

Node 1 (a cluster of eggs with AW ≤ 27 g) produced the average EW of 40.195 (S = 6.169) g (n = 174 eggs). Node 2 (a cluster of eggs with 27 < AW ≤ 31 g) had an EW of 50.065 (S = 2.903) g (n = 169 eggs). Node 3 (a cluster of eggs with 31< AW ≤ 33) was branched into nodes 9 and 10, with reference to YW, respectively, and the average EW of node 3 was estimated as 56.983 (S = 2.172) g (n = 235 eggs). Yolk weight had a very significant influence on EW of eggs available in node 3 (F = 52.243, df1 = 1, and df2 = 233) (Adjusted P<0.01). Node 9 (a cluster of eggs with YW ≤ 14 among eggs with 31 < AW ≤ 33 g) gave the average EW of 56.083 (S = 2.189) g (n = 121 eggs). As a cluster of eggs with YW > 14 among eggs with 31< AW ≤ 33 g, node 10 yielded the EW average of 57.939 (S = 1.700) g (n = 114 eggs). Node 4 (a terminal node obtained from eggs with 33 < AW ≤ 34 g) had the average EW of 58.961 (S = 2.241) g (n = 415 eggs). The average EW for node 5 (a cluster from n = 389 eggs with 34 < AW ≤ 36 g among all eggs) was 60.784 (S = 2.257) g. Node 5 was branched into nodes 11 and 12 on the basis of YW, respectively. Node 11 (a cluster of eggs having YW ≤ 17 among eggs with 34 < AW ≤ 36 g) generated the EW average of 59.946 (S = 1.654) g from n = 299 eggs. A cluster of eggs with YW >17 g among eggs with 34 < AW ≤ 36 g was node 12, with the average EW of 63.567 (S = 1.690) g from 90 eggs, which was heavier than the average of node 11.

This indicates that there was a profound impact of YW on EW of eggs referring to node 5 (F = 328.139, df1 = 1, and df2 = 387) (Adjusted P<0.01). Node 11, significantly influenced by SW (F = 43.027, df1 = 1, and df2 = 297), was divided into two new child nodes 15-16, respectively. EW averages from nodes 15 and 16 with 59.244 (S = 1.333) g and 60.438 (1.682) g were predicted nearly similar, respectively.

The average EW of 61.629 (S = 2.213) g was obtained from node 6, a cluster of eggs with 36 < AW ≤ 38 g (n = 251 eggs). Node 7, consisting of a cluster of eggs with 38 < AW ≤ 41 g, provided the average EW of 64.983 (S = 2.045) g from 232 eggs.

The average EW of 70.125 (S = 3.001) was predicted by using node 8 as a cluster of eggs with AW > 41 (n = 184), which was again branched into two child nodes 13-14, in relation to YW, respectively. This elucidates that the statistically significant effect of YW on EW of eggs included in node 8 (F = 76.738, df1 = 1, and df2 = 182) reappeared (Adjusted P<0.01) Comprising eggs with AW > 41 and YW ≤ 17 g, node 13 produced 68.680 (S = 2.991) g of EW (n = 103 eggs). Node 14, containing eggs with AW > 41 and YW > 17 g, had on average the heaviest EW of 71.963 (S = 1.757) g in EW, (n = 81 eggs), as the lightest EW was produced by node 1.

We mention a marvelous agreement between the real and predicted values obtained by RTM based on CHAID algorithm, having almost 100 (%R^{2)}.

In the current study, there were positive correlations among all independent variables (P<0.001), which confirmed the result of Ratherd et al. (2011), who informed that there is a high correlation between inner and external egg quality traits in Japanese quails, and acknowledged that multicollinearity occurs in regression models between independent variables. They applied the principal component regression to get rid of the multicollinearity problem, according to their findings. The benefits of ridge regression are mentioned in the presence of multicollinearity.

In many studies, it was stated that the correlations between inner and external egg quality variables were found to be highly substantial and at highly strong positive associations (P<0.01) (

Regression, correlation, and MLR analyses have been employed to identify the relationship between egg weight and these quality traits in different layer strains. Ridge regression in multicollinearity problem could give more accurate estimates than MLR analysis. In fact, making more accurate decision on preferring the most effective statistical methods is the most important matter in the estimation of EW from egg quality traits. In comparison with these statistical methods highlighted above, it was said that RTM, which can be understood and interpreted more easily in visual form, was not influenced by multicollinearity, outliers, and missing observations (

Egg quality is a very important product for the poultry farming and sector. Regression tree analysis based on the CHAID algorithm, with a very much higher predictive accuracy of 99.988 (%R^{2)}, is a powerful approach that detects the relationship between egg weight and internal (albumen and yolk weights) and external (shell weight) quality traits, which are indicative of egg quality. The decision tree from regression tree analysis depicts that the highest egg weight (71.963 g) is obtained from eggs with albumen weight >41 g and yolk weight >17 g. Consequently, it is expected that the employability of regression tree analysis will be hybrids in the poultry sector, because it does not involve any assumption about independent variables in regression tree analysis for being a non-parametric technique.

This paper was published as an abstract in VI. Balkan Animal Conference (BALNIMALCON 2013), October 3-5, 2013, in Tekirdag, Turkey.