Prediction of pregnancy state from milk mid-infrared (MIR) spectroscopy in dairy cows

Lisa Rienesl, Philipp Pfeiffer, Negar Khayatzadeh, Astrid Köck, Laura Dale, Andreas Werner, Clément Grelet, Nicolas Gengler, Franz-Josef Auer, Christa EggerDanner, Julie Leblois, Johann Sölkner 1 University of Natural Resources and Life Sciences, Vienna (BOKU), Division of Livestock Sciences, Department of Sustainable Agricultural Systems, Vienna, Austria 2 ZuchtData EDV-Dienstleistungen GmbH, Vienna, Austria 3 Regional association for performance testing in livestock breeding of Baden-Wuerttemberg (LKV Baden-Wuerttemberg), Stuttgart, Germany 4 Walloon Agricultural Research Center (CRA-W), Gembloux, Belgium 5 Université de Liège (ULg), Gembloux Agro-Bio Tech, Gembloux, Belgium 6 LKV Austria Gemeinnützige GmbH, Wien, Austria 7 Elevéo (awé groupe), Ciney, Belgium


Introduction
Pregnancy assessment is an essential tool for the reproductive management in cattle farms (e.g. Balhara et al., 2013, Pohler et al., 2016, Hirpa et al., 2018. Ideally, a cow should calve every year and therefore the identification of pregnant and non-pregnant animals at an early stage is crucial (Hirpa et al., 2018). Early detection of pregnancy status also enables early detection and treatment of problems (Bekele et al., 2016). Fertility is the most frequent reason for the culling of cows in dairy farms, accounting 24.2% in Austrian dairy farms in 2018 (Egger-Danner et al., 2018). Tools for pregnancy detection should be inexpensive and simple to apply under field conditions (Pohler et al., 2016). Basically, there are two types of diagnosing pregnancy state: direct methods such as estrus detection, transrectal palpation and transrectal ultrasonography, and indirect methods like analysis of progesterone and pregnancy-associated glycoproteins in milk or blood (Balhara et al., 2013, Pohler et al., 2016. Another important fact to be considered, are pregnancy losses. According to a review paper of Santos et al. (2004), the average embryonic mortality rate in dairy cows was 12.8% based on 14 studies. Another study by Humblot (2001) averaged the early and the late embryonic pregnancy losses in Holstein cows in 44 French herds after first insemination to 31.6% and 14.7%, respectively. For late embryonic and fetal losses, Santos et al. (2004) reported a value of 10.7% on average, based on 17 dairy farms. Pregnancy losses reduce the benefit of the early pregnancy diagnosis. Hence, repeated information about the pregnancy status of a dairy cow (e.g. at every test day) would be a truly useful tool for dairy farmers.
Mid-infrared (MIR) spectroscopy is the method of choice in the standard milk recording systems for quality control and to determine milk contents including fat, protein, lactose and urea (Grelet et al., 2015(Grelet et al., , 2016. MIR spectra data could also be used to predict fine components of milk such as minerals (Toffanin et al., 2015) or fatty acids (Soyeurt et al., 2011). Moreover, there are studies to predict various other traits and variables such as blood metabolites (Benedet et al., 2019) and methane emissions (Vanlierde et al., 2018). As it is well known that there are changes in milk yield and also milk composition during the pregnancy in dairy cows (Olori et al., 1997, Lainé et al., 2017, MIR spectra data could be potentially useful to predict pregnancy state of dairy cows. There are few relevant studies on this subject, which were exploring quite different approaches. Lainé et al. (2014) used residual spectra to detect pregnancy status and observed only the first 50 days after insemination. The reported prediction accuracies were very promising (sensitivity >0.99; specificity >0.84) but could not be reproduced on an independent data set (N. Gengler, 2020, University of Liege, Gembloux, Belgium, personal communication). Another study of Delhez et al. (2020) explored different modelling approaches for diagnosing pregnancy status from MIR spectra. In one strategy they only used single spectral records after insemination, from where records after a successful insemination were considered as 'pregnant' and records after an unsuccessful insemination were considered as 'open'. For the described strategy sensitivity was 0.65 and specificity 0.56. In another strategy, seven different models based on stages after insemination were developed; sensitivities ranging from 0.57 to 0.75 and specificities from 0.52 to 0.74 were obtained.
The aim of this study was to develop a discriminant model to predict the pregnancy status from routinely recorded MIR spectral data, and to further provide probabilities of pregnancy for each test day. Pregnancy probabilities obtained in this way could provide extra information for farmers in the framework of routine milk recording. Two different approaches were evaluated. The novelty of the second approach was the exploration of separate prediction models for different lactation stages.

Data and data preparation
The data for this study was from the Austrian milk recording system for the period July 2014 to February 2019 and was kindly provided by Zuchtdata GmbH. Test day milk data contained information on breed, herd, parity, days in milk, milk components (fat, protein, urea, lactose), somatic cell count (SCC) and standardized MIR spectral data for the respective test days. Additionally, information on the exact insemination and calving dates was available. Test day records of Fleckvieh, Brown Swiss and Holstein Friesian cows between 3 and 305 days of lactation were included. On average cows in the data set were pregnant at lactation day 93. Merging of the data sets and primary data preparation were done with the software SAS (SAS Institute Inc., 2017). Table 1 shows the number of records of the complete data set.
To define test day records as 'pregnant' and 'open', the pregnancy state of each cow was connected to the associated test day by the following procedure: gestation length was calculated as the date of re-calving minus the date of latest insemination, also defined as successful insemination. Only records of cows with ranges of gestation length as implemented in the joint genetic evaluation of Austria, Germany and Czech Republic were included: Fleckvieh 275 to 305 days, Holstein 268 to 298 days, Brown Swiss 276 to 306 days (C. Fuerst, 2019, Zuchtdata GmbH, Vienna, personal communication). Test days without a confirmed date of next calving were excluded. Test day records before a successful insemination date were coded as 'open' and all test day records between the date of successful insemination and date of re-calving were coded as 'pregnant'. This procedure is visualized in Figure 1. The distribution of all test day records by class of pregnancy status (open or pregnant) along the stage of lactation is displayed in Figure 2.  MIR spectra were collected in several Austrian milk labs with Foss instruments spectrometers. Those MIR spectra consist of 1,060 data points, which are the absorbance values of infrared light at different wavenumbers, with frequencies from 926 to 5,010 cm -1 . Spectral data from different machines and different periods were previously standardized into a common basis (Grelet et al., 2015). For prediction models only selected areas of the spectra were used: 968.1 to 1,577.5 cm -1 , 1,731.8 to 1,762.6 cm -1 , 1,781.9 to 1,808.9 cm -1 and 2,831 to 2,966 cm -1 (Grelet et al., 2016). The 212 selected data points contain most of the usable information after removal of areas known to be nonreproducible between instruments or non-informative due to strong water absorption. According to other relevant studies (Soyeurt et al., 2011, 2012, Grelet et al., 2016, Lainé et al., 2017, Mineur et al., 2017, Ho et al., 2019, Rienesl et al., 2019, first derivatives of selected spectra values (Savitzky-Golay-Filter) were taken. All further data preparation was done in Rstudio (R Development Core Team, 2008). The first derivative of 212 selected spectra variables were additionally corrected for days in milk (DIM), according to Vanlierde et al. (2015): each first derivative value of the selected spectra was multiplied by a constant (i.e., 1), a linear (√3 * x) and a quadratic [√5/4 * (3x² -1)] modified Legendre polynomial (Gengler et al., 1999), ]. This modification resulted in 636 (212 constant, 212 linear, 212 quadratic) spectra variables, which were finally used as predictor variables. The complete data set (403,863 test day records) was randomly split by farm and pregnancy state into half a calibration (training) set and a validation (test) set. Further, calibration set got balanced (1:1) in terms of pregnancy state by using random down sampling. The validation data set was kept unbalanced to get realistic conditions. For every test day record two additional variables were introduced: 'days pregnant' (test day date minus date of successful insemination) and 'days after insemination' (test day date minus date of latest insemination). The variable 'days after insemination' was needed to define expected pregnancy stage of a cow at a certain test day, as in validation we assumed that true pregnancy stage is unknown.

Methodology
Two different approaches of predicting pregnancy status were explored, considering the potential effects of stage of lactation and stage of pregnancy on milk composition and resulting MIR spectra patterns.

Approach 1: Single prediction model across the whole lactation and gestation
The first approach was to develop a single prediction model for all test day records, regardless lactation and gestation stage of the cow at the respective test day, similar to the study of Lainé et al. (2014).

Approach 2: Separate prediction models for each different (expected) pregnancy and lactation stage
The second approach was to produce separate models for different pregnancy and lactation stages (DIM). Test day records were clustered into 24 classes according to true or expected pregnancy stage and DIM (  (Kuhn, 2008). A 10-fold cross validation was used to fine tune the model, the number of components was set automatically (within a maximum number of 60 to avoid overfitting) for every run and discrimination was done by class probabilities. Spectra values were centered and scaled. Indicators of model fit were sensitivity (proportion of pregnant cases correctly assigned as pregnant), specificity (proportion of open cases correctly assigned as open), balanced accuracy (mean of sensitivity and specificity) and Area Under Receiver Operating Characteristic Curve (AUC). Model performance was evaluated with an external validation. The validation data set consisted only of data from farms which were not included in model building. The results presented below are means of 5 independent replicates per setting. The standard deviations of indicators of model fit were typically very low in approach 1 (0.001 to 0.003) and from 0.001 to 0.050 in approach 2, depending on sample size in the respective class. Data sets, data processing and methodology were very similar to a study on mastitis detection from MIR spectroscopy of Rienesl et al. (2019), carried out within the framework of the same project.

Results and discussion
In this study, we examined 2 approaches to predict the pregnancy status of dairy cows from routinely recorded MIR spectra. The results of the first approach, a single prediction model for all cows and test day records, regardless lactation and gestation stage, are displayed in Table 3. Both sensitivity (0.86) and specificity (0.84) were almost identical in calibration and validation. The value of AUC was 0.928 which indicates an outstanding performance of the prediction model according to Lantz (2015). In the study of Lainé et al. (2014) sensitivity was higher (>0.99) and specificity (>0.84) almost the same as in the present study. However, the results are difficult to compare because of differences in methodology. In the reported approach of Delhez et al. (2020), who used single spectra after insemination, prediction accuracies were lower than our results. Even if differences are expected given the studied populations and countries (Australia vs. Austria), the definition of 'open' was also different, which may have a considerable effect. To get closer insight on how prediction worked, the results of approach 1 were split up for different lactation stages (Table 4). This procedure showed an immense imbalance in sensitivity and specificity for the 5 different lactation stages. In lactation stage 1 most of the cows (2,515 out of 37,655) were open. This was expected as in our data cows got pregnant on average after 93 DIM. Sensitivity in the first lactation stage was 0.000 and specificity was 1.000. Hence, the model classified all open cases correctly as open, but was not able to classify a pregnant cow correctly as pregnant. Precisely, the model predicted only a single cow out of 2,515 actually pregnant cows in lactation stage one to be pregnant. In lactation stage 2 the number of test day records was much more balanced in terms of open (17,756) and pregnant cases (23,111). Though, the majority of cow's test day records (32,428) were predicted to be open and only 8,439 were predicted to be pregnant, which resulted in a sensitivity of 0.285 and specificity of 0.895. In lactation stage 3 the proportion of open cases decreased strongly (<15%) and also sensitivity (0.975) and specificity (0.086) changed dramatically compared to previous lactation stages. Similar results were found in lactation stage 4. In lactation stage 5, where the number of open cases was only 1.7%, sensitivity was 0.999 and specificity 0.000. According to those results it can be concluded that the model was not able to predict pregnant cases before the third month of lactation and vice versa not able to predict non-pregnancy after the third month of lactation. Moreover, we assume that the model was predicting the lactation stage to a quite high degree, which is strongly linked to pregnancy state. The sensitivity of approach 1 was additionally split up for different months of pregnancy (Table 5). We observed a very low sensitivity (0.380) in the first months of pregnancy and a moderate sensitivity (0.695) in the second month. In the third month of pregnancy sensitivity increased strongly to 0.95 and was above 0.99 from month 4 onward. Consequently, a single prediction model for the full lactation and gestation lengths was not sufficient. This led us to approach 2, where separate prediction models for different lactation stages and (expected) pregnancy stages were developed. The results are displayed in Table 6. The sensitivities were in the range from 0.494 (lactation stage 3, pregnancy day 1 to 30) and 0.995 (lactation stage 4, pregnancy day 211 to 240). Thus, differences were very big. The range of specificities was also very wide, from 0.512 (lactation stage 2, pregnancy day 1 to 30) to 0.884 (lactation stage 3, pregnancy day 121 to 180). Within lactation stage, the indicators of model fit increased in later (expected) gestation stages. For example, in the first lactation stage, sensitivity increased from 0.890 for the first (expected) pregnancy month to 0.946 for the second (expected) pregnancy month and specificity from 0.690 to 0.750. In lactation stage 4, sensitivity increased from 0.553 to 0.995 and specificity from 0.527 to 0.820. This finding partially coincides with the results of Delhez et al. (2020), who reported sensitivity and specificity greater than 0.74 from the 180 th day of pregnancy and lower (sensitivities 0.57 to 0.71; specificities 0.52 to 0.66) in earlier pregnancy stages.
Within the classes of (expected) pregnancy days, indicators of model fit mostly decreased with a later lactation stage. Sensitivities for the first 'expected' pregnancy months were 0.890 in the first lactation stage, 0.620 in the second lactation stage, and 0.504 in the last lactation stage. These results indicate that prediction accuracy decreases for cows which become pregnant (much) later than the average cow in the data set.
In general, results of approach 2 are very hard to compare with other mentioned studies. In the study of Delhez et al. (2020), the authors explored a strategy with 7 different modelling groups based on stages (days) after insemination, but regardless lactation stage, which was novel in our study.

Conclusions
This work explored the use of routinely recorded MIR spectral data to predict pregnancy status from dairy cows by developing and evaluating a discriminant model. Results indicate that prediction of pregnancy state is difficult because of the strong effect of lactation stage on the MIR spectrum and the fact that cows are typically open in early lactation and pregnant in late lactation. If a cow did not match this "common pattern", prediction was not satisfactory. Hence, a single prediction model for the full lactation and all periods of gestation was not sufficient. Developing separate prediction equations for stages of lactation and periods of expected pregnancy improved predictability of pregnancy status to some degree. Whether the prediction accuracies found in this study will be sufficient to provide farmers with an additional tool for fertility management needs to be explored in discussion with farmers and breeding organizations.