Datasets used in the paper "A new deep learning calibration method enhances genome-based prediction of continuous crop traits"
In this repository you can find the four datasets used in this paper in two formats, CSV and RData.
Dataset 1. Maize grain yield prediction. As previously reported by Montesinos-Lopez et al. (2016) this dataset consists of a sample of 309 maize lines evaluated for three traits: anthesis-silking interval, plant height, and grain yield (GY). Each trait was evaluated in three optimal environments (denoted Env1, Env2 and Env3).
Dataset 2. Groundnut seed yield per plant (SYPP) prediction. The phenotypic dataset Pandey et al. (2020) contains information on the phenotypic performance for various traits in 4 environments. It contains 318 lines in 4 environments denotes as Environment1, (ENV1): Aliyarnagar_Rainy 2015; Environment2 (ENV2):Jalgoan_Rainy 2015; Environment3 (ENV3):ICRISAT_Rainy 2015; Environment4 (ENV4):ICRISAT Post-Rainy 2015. The dataset is balanced, giving a total of 1272 assessments with each line included once in each environment.
Dataset 3. Chickpea biomass prediction. The phenotypic dataset reported by Roorkiwal et al. (2018) contains information for 315 lines evaluated in 6 environments (denoted as 1, 2, 4, 5, 6, 7) for biomass. The dataset is balanced with all lines assessed in all environments, giving a complete observation dataset for 1,890 lines.
Spring wheat data was available from the Global Wheat Program (GWP) at the International Maize and Wheat Improvement Center (CIMMYT) from elite yield trials (EYT) evaluated in four selection environments (denoted Bed5IR, EHT, Flat5IR, FlatDrip). The dataset included the performance data from the 2016-2017 cycle from a total of 980 lines assessed in the four environments, giving 3920 observations.
...