Khaoula El Hassouni, Muhammad Afzal, Kim A. Steige, Malte Sielaff, Valentina Curella, Manjusha Neerukonda, Stefan Tenzer, Detlef Schuppan
Abstract
Wheat is an important staple crop since its proteins contribute to human and animal nutrition and are important for its end-use quality. However, wheat proteins can also cause adverse human reactions for a large number of people. We performed a genome wide association study (GWAS) on 114 proteins quantified by LC-MS-based proteomics and expressed in an environmentally stable manner in 148 wheat cultivars with a heritability > 0.6. For 54 proteins, we detected quantitative trait loci (QTL) that exceeded the Bonferroni-corrected significance threshold and explained 17.3–84.5% of the genotypic variance. Proteins in the same family often clustered at a very close chromosomal position or the potential homeolog. Major QTLs were found for four well-known glutenin and gliadin subunits, and the QTL segregation pattern in the protein encoding the high molecular weight glutenin subunit Dx5 could be confirmed by SDS gel-electrophoresis. For nine potential allergenic proteins, large QTLs could be identified, and their measured allele frequencies open the possibility to select for low protein abundance by markers as long as their relevance for human health has been conclusively demonstrated. A potential allergen was introduced in the beginning of 1980s that may be linked to the cluster of resistance genes introgressed on chromosome 2AS from Triticum ventricosum. The reported sequence information for the 54 major QTLs can be used to design efficient markers for future wheat breeding.
Introduction
Wheat (Triticum aestivum ssp. aestivum) is the most widely grown crop and a major component of the human diet worldwide. This staple crop is one of the most important sources of energy and on average provides 20% of the total protein and calories in human nutrition. Wheat is consumed in many different forms, and each type of end-product requires a particular quality based on the viscoelastic properties of the dough, which are mainly influenced by the amount and composition of gluten. Gluten accounts for approximately 80% of the total protein in the grain and can be divided into gliadins and glutenins. Glutenins are classified into high and low molecular weight subunits (HMW-GS and LMW-GS), which are encoded by the loci Glu-1 and Glu-3, respectively. Differences between allele pairs in glutenin subunits have a strong influence on the end-use quality. For example, several studies have shown that the alleles Dx5 + Dy10 (Glu-D1d) are associated with high quality, whereas Dx2 + Dy12 (Glu-D1a) lead to poor quality. Consequently, wheat breeders have intensively selected for specific combinations of HMW-GS since the pioneering work of Payne et al. In addition to its effect on end-product quality, gluten and non-gluten proteins, such as wheat amylase trypsin inhibitors (ATIs), are also associated with various human health disorders, such as celiac disease, allergic reactions, and non-celiac wheat sensitivity. However, targeting wheat allergens has never been a goal of breeding programs other than improving gluten quality due to its impact on end-product quality. For instance, the abundance of many allergenic proteins or ATIs in wheat cultivars released in the last century has not changed. While only a few proteins have been investigated in detail in recent decades, the recent developments in the field of mass spectrometry-based proteomics has led to the possibility to determine hundreds of proteins in a single sample. Afzal et al. analyzed the flour proteome of 15 spelt and wheat cultivars grown in three different locations and identified 3050 proteins, including 300 proteins with moderate-to-high heritability (>0.4). However, to our knowledge, no study investigated the genetic architecture of the large number of proteins that have been discovered with modern proteomic tools so far.

Therefore, we performed a GWAS to investigate the genetic architecture of 114 proteins quantified using liquid chromatography-mass spectrometry (LC-MS)-based label-free quantitative (LFQ) proteomics from 148 bread wheat cultivars grown in three environments. In addition, the genetic and temporal trend of the alleles of some major QTLs associated with relevant proteins were investigated, and the sequence information for relevant QTLs is provided so that it is possible to design molecular markers.
Results
In a previous study, we measured 756 proteins in aqueous extracts from the whole-grain flour of 148 wheat cultivars grown in three environments. Only 114 of these 756 proteins had a stable expression across environments in most wheat cultivars with a heritability larger than 0.6. For these 114 proteins, we performed a GWAS and detected QTLs for 54 proteins that exceeded the Bonferroni-corrected significance threshold (Figure 1 and Figure 2). For all these 54 proteins, a single major QTL explaining 17.3 to 84.5% of the genotypic variance was identified (Figure 2b, Table 1). For 24 proteins, the identified QTLs explained >50% of the genotypic variance (Figure 2b). In contrast, for 60 proteins, no marker-trait association was detected that exceeded the Bonferroni-corrected significance threshold.

Figure 1. Manhattan plot showing significant marker-trait associations for all 54 proteins, where QTLs at Bonferroni-corrected significance threshold of p < 0.05 were identified (red line).
The major QTLs were distributed across many chromosomes and partly clustered in similar chromosomal regions (Figure 2 and Figure 3). QTLs were identified on chromosomes 1A, 2A, 4A, 5A, 7A, 1B, 3B, 4B, 5B, 6B, 7B, 1D, 4D, 6D, and 7D with a relatively similar distribution on the A and B genomes, whereas only 12% of the identified QTLs were located on the D genome. A higher number of proteins seemed to be affected by the major QTLs on chromosomes 1A, 2A, 1B, and 3B. Interestingly, some QTLs for different proteins were identified at almost the same genomic position (Figure 3). For instance, on chromosome 5A, the QTLs for prot085 and prot141, both ß-amylases according to the UniProt database, had the same chromosomal and physical position. Similarly, the QTLs for prot171 and prot179 had the same physical position on chromosome 3B, but according to the various protein annotation databases available, it is not yet clear whether these proteins belong to the same family. To design easy-to-use markers for breeding, we have summarized the SNP, genomic position, and sequence information of all identified 54 QTLs.

Figure 2. (a) Pie chart showing the distribution of the 54 major QTLs on different chromosomes; (b) frequency of the proportion of explained genotypic variance by the 54 QTLs.
We further investigated the allele frequencies of QTLs associated with these eleven proteins (Figure 4). For prot008, prot017, prot066, and prot235, allele frequencies of their QTLs were found to be around 0.5. In contrast, for the QTLs of the other seven proteins, allele frequencies tended to a considerably higher frequency of one allele. Interestingly, for five of these seven proteins, the QTLs alleles increasing protein abundance were more frequent. As the wheat cultivars used in this study were released in different decades of the last century, we grouped them accordingly to visualize potential selection trends by wheat breeders. For four proteins, we observed shifts in QTLs allele frequencies across the decades of breeding.
Finally, we investigated the chromosomal regions harboring the QTLs of the 11 proteins in detail. For these regions, we extracted high confidence (HC) genes from the bread wheat reference genome (IWGSC RefSeq v2.1) and evaluated these as potential candidate genes with functional annotations in the Pfam and InterPro databases similar to the different domains of gluten and allergenic proteins. Eight, five, and twenty-two potential candidate genes were identified in the QTLs target regions associated with gluten proteins (prot051 and prot104), allergens and gluten (prot017, prot028, prot139 and prot203), and non-gluten allergens (prot189, prot235), respectively. No potential candidate genes could be identified for the QTLs detected for the prot066, prot008, and prot288.
Major QTLs Identified for 54 Proteins
Implementing a GWAS using statistically conservative Bonferroni-corrected significance threshold, we identified major QTLs for 54 out of 114 proteins (Figure 1 and Figure 2). Our findings suggest that more than half of the investigated proteins are quantitatively inherited and controlled by many genes, each with rather small effect. This quantitative inheritance is well-described in literature for the most investigated traits, e.g., yield, but also classically determined protein content. In contrast, many of the QTLs identified for the 54 proteins had very high peaks in the Manhattan plot (Figure 1) and explained a large proportion of the genotypic variance of the individual proteins (Figure 2b, Table 1). In wheat, major QTLs are known, such as for plant height (Rht genes), heading time (Ppd genes), and disease resistance (e.g., Lr genes), but in most cases, the proportion of the explained genotypic variance was much lower than for many proteins in our study. Therefore, the identified QTLs could be very interesting for future wheat breeding, provided that the relevance of the respective proteins for future wheat supply chains is demonstrated.

Table 1. QTLs controlling two gluten proteins, four non-gluten allergenic proteins and five gluten proteins, which were also listed as allergens in Allergome database; identified in this study (HMW = High molecular weight; LMW = Low molecular weight; LTP = Lipid transfer protein).
The 54 identified QTLs were similarly distributed across the A and B genomes, but only a small number of them were detected on the D genome (Figure 2a and Figure 3). This is in line with the literature on genomics in wheat and can be explained by the limited genetic diversity of the D genome compared to the A and B genomes. Interesting breeding approaches have begun to utilize the genetic potential of the D genome of wheat, such as synthetic wheat. As these breeding lines are quite new and, to our knowledge, not yet present in European wheat cultivars, they were also not present in our wheat cultivar list.
For proteins belonging to the same family, we found that they are controlled by loci whose physical positions are located close to one another on the same chromosome or by loci on potential homologous chromosomes (Figure 3). For instance, we identified QTLs for six Cupin 1 proteins, all located on chromosomes 4A and 4B (Figure 3, Table 1). QTLs of proteins 045 and 092 were located on the identical physical map position on 4A, whereas QTLs for proteins 040, 054, and 120 were very close to each other on 4B. We found further QTLs clusters for other protein families. These QTLs are found for three late embryogenesis abundant proteins on 2A, two proteins of the aldo/keto reductase family on 1B, two β-amylases on 5A, and two proteins of chitinase class 1 on 7B. Potentially homologous chromosomal positions were identified for QTLs of two lipid transfer proteins on 5A and 5B, two plant antimicrobial proteins on 6B and 6D, three proteins of chitinase class 1 on 7A and 7B, and for two LMW-GS on 1A and 1B (Figure 3, Table 1). These findings are comparable to other traits where important gene families are located on the same group of homologous chromosomes, e.g., for plant height on chromosomes of group 4 (Rht1 and Rht2 genes) or heading time on chromosomes of group 2 (Ppd-1 genes). In summary, to our knowledge, this largest GWAS study on the wheat proteome revealed a similar genetic architecture of proteins as reported for other traits, with major QTLs for 24 out of 114 proteins.

Figure 3. Chromosome map showing the distribution of 54 QTLs with their respective protein name and physical position (Mbp). For a detailed investigation of wheat flour proteins, we focused on the eleven proteins associated with gluten and potential allergenicity based on UniProt and InterPro databases (Table 1). Five of the seven gluten proteins were also present in the Allergome database . We assigned these eleven proteins to the following groups: gluten, gluten and allergen, and non-gluten allergen. Using the UniProt database, we were able to name eight of these eleven proteins, including proteins important for baking quality such as HMW-GS Dx5 and LMW Glu-A3. QTLs for two lipid transfer proteins could be identified, but not for other known, potentially allergenic wheat proteins such as ATIs or serpins.
QTLs for Important Gluten Proteins
High and low molecular weight glutenins are of great importance for wheat end-use quality. They have been under intensive research and use in wheat breeding since the pioneering work of Payne and colleagues. We identified three major QTLs underpinning three proteins specifically related to glutenins, one HMW- and two LMW-GS proteins (Table 1). On chromosome 1D, we found a major QTL explaining 42.7% of the genotypic variance for prot017, which is annotated as HMW-Dx5 according to the Uniprot database (Table 1). Electrophoretic analysis by SDS-PAGE on our wheat cultivars revealed that the QTL allele GG of prot017 was represented by HMW banding unit 5 and the QTL allele AA by the HMW banding unit 2 for a total of 143 out of 148 wheat cultivars, confirming the Uniprot annotation. Interestingly, the allelic difference from the SDS banding pattern could be deduced from the quantitative measurements of a single protein (017), where cultivars with the Dx5 (GG) unit had a lower abundance than cultivars with the Dx2 (AA) unit (Figure 4). While it was not possible to qualitatively distinguish the highly homologous protein isoforms by the tryptic peptides quantified by our mass-spectrometry-based proteomics workflow, the apparently allele-dependent expression level of the gene product was perfectly captured by the complementary SDS-PAGE approach. This case study highlights that quantitative protein measurements can provide additional relevant information content to purely genetic analyses for breeding studies.
Possibility to Breed for Low Allergen Content
Although wheat is an important and mostly healthy staple crop, a sizeable number of people suffer from wheat sensitivities, with most potential triggers being proteins. We followed the approach of Zimmermann et al and Afzal et al and compiled a list of allergens based on data on seed-borne wheat allergens and the Allergome database. For nine proteins from this list, we detected major QTLs in our study (Table 1), which explained between 32.3% and 84.5% of the genotypic variance of the respective protein. Five of these were gluten proteins, two probable lipid transfer proteins, one peroxiredoxin, and one a potential protease inhibitor (Table S1). For three out of these nine proteins, we could observe a selection trend at the major QTLs in the wheat cultivars from the past decades (Figure 4). The marker allele producing high protein abundance was increased for prot288, a protease inhibitor, whereas the marker allele responsible for low protein abundance was increased for prot017, HMW-Dx5, and prot203, LMW-GS Glu-A3. The latter two are important wheat-quality proteins that plant breeders have intensively selected for, as discussed earlier, and at the same time, appear to be potential allergens for a small number of people. Interestingly, the better baking quality at these two loci appears to be correlated with lower protein abundance, i.e., lower allergen levels.

Figure 4. The effect of the major QTLs for gluten proteins prot051 and prot104 (a); gluten and allergenic proteins prot017, prot028, prot066, prot139, and prot203 (b); and non-gluten allergenic proteins prot008, prot189, prot235, and prot288 (c), and their allele frequencies according to the cultivar’s year of release (numbers below boxes represent the number of cultivars in the respective group). The leftmost boxplot in gray show the protein values for all cultivars. The protein-increasing allele is colored in orange; The protein-decreasing allele is colored in green. *** indicates significant difference at p < 0.001 between the two groups of cultivars containing contrasting alleles of a given marker.
The selection trend for prot288 is interesting in that the QTL allele, which increases protein abundance was introgressed in the early 1980s and its frequency then steadily increased by wheat breeders. Selection for or against potentially allergenic proteins has never been a goal in wheat breeding. Therefore, this selection trend may be due to the linkage with another target trait in wheat breeding that has been used since the 1980s and is largely influenced by the genomic region on the short arm of chromosome 2A. This chromosomal region contains an introgression from Triticum ventricosum that has a roughly comparable history. This introgression carries several important disease-resistance genes (e.g., Lr37 and Sr38-Yr17-Lr34) to various important rust diseases. In addition, the introgression also appears to improve yield stability and resistance to rice blast, all traits that are of great importance for many wheat breeding programs worldwide. For a large proportion of our wheat cultivars, molecular marker information for the disease resistance cluster Sr38-Yr17-Lr34 is available, which matches almost perfectly with the different QTL alleles of prot288. Consequently, the increase in the QTL allele that increases the abundance of the potential allergen prot288 could be due to indirect selection of disease-resistance genes nearby. According to the physical positions, our identified QTL is 6 Mbp away from the locus reported for the disease resistance cluster. Future studies will have to show whether this potential linkage can be broken by targeted selection using markers for both loci.
For the QTLs of six potential allergenic proteins, we did not detect clear selection trends over the decades of wheat breeding, but either an almost fixation on the QTL allele causing high protein abundance (prot139, prot189) or similar frequencies of both alleles. This is confirmed by our companion study in which we quantified the absolute protein amounts of eight ATIs by isotopically labeled standard peptides. Therein, major QTLs were identified for monomeric and dimeric ATIs with similar allele frequencies for the monomeric ATI 0.28 but near fixation of the QTL allele responsible for high protein abundance of the dimeric ATI 0.19-like. Consequently, the reported sequence information for the major QTLs identified in both studies could largely facilitate breeding for the low protein abundance of eleven potentially allergenic proteins. Further studies are, therefore, urgently needed to work out the relevance of reducing these proteins abundance for human and animal health, so that the laborious breeding progress for these additional traits can finally be addressed.