Most of quantitative traits in livestock are complex and regulated by combination of gene and environmental factors, which lead to great difficulty to locate the genes controlling the traits. Until recently in conventional breeding program, researchers used phenotypic records of animals and their relationship to estimate breeding value using statistical evaluation. But for some traits, breeder cannot improve them efficiently using above approach due to low heritability, difficulty or cost in collecting phenotypic data (Dekkers, 2004). Two strategies were used to look for regions which affect traits in genome: top-down (association mapping) or bottom-up (Ross-Ibarra et al. 2007). In former, researchers start with interested phenotype and study down to the underlying genetic basis. The association between marker and phenotype would suggest with variation in locus is the causative mutation underlying the QTL, or the variation is in LD with the QTL. Although it’s useful to dissect phenotypic variation in farm animal but it has some drawback. For example positional cloning is expensive and in a few times leads to success in livestock system (Grisart et al. 2002; Van Laere et al. 2003; Cohen et al. 2005). In bottom-up approaches, population genomic data generated and target of past selection identified by statistical evaluation. The main principles in bottom-up approaches to QTL mapping are detect neutral loci across the genome which it’s affected by genetic drift, demography, and evolutionary history of populations. Loci under selection will often behave differently therefore important gene base on pattern of genetic variation can be identified, even though in the absent of information which trait they regulate. It also can identify gene under strong selection pressure and finally fixed within breeds, and gene in adaptation to extreme environment, disease resistance (Akey et al. 2002; Hayes et. al. 2009). Moreover, most of traits which are important in animal breeding due to the lack of well define phenotype are not investigable with association study approach or classic QTL mapping (Dekkers, 2004), so gene mapping strategies should be define by studying of genetic structure of population. The important factor which influences the result of statistical gene mapping strategies in animal species is LD. Determining of LD extent is the first step to estimate how many markers are required to do whole genome association studies and also different evolutionary forces that may have generated LD in specific regions of genome are investigated by pattern of LD (Ardlie et al. 2002). So in this review we focused on concept, current approach to estimating and extent of LD in livestock population.
The association between alleles at closed loci which have a tendency to be co-inherited in population is known as linkage disequilibrium which is bases of genomic selection, genomic marker imputation, MAS, QTL mapping, parentage testing and whole genome association studies (Karimi et al. 2014; Taylor, 2014). Linkage disequilibrium is a non-random correlation among near alleles indicating haplotype descended from single ancestral chromosomes (Reich et al. 2001). Haplotype is combination of different markers in small distance of chromosome that are co-inherited together (Valle et al. 2003). The pairwise LD is estimated by some measurement as D, Dˊ and r2. For better understanding, suppose we have two bi-allelic loci A and B which having alleles A1, A2, B1, B2, respectively. Subsequently in two loci four haplotype can be arranged, A1B1, A2B1, A1B2, A2B2. If a frequency of each allele was 0.5 and alleles in each locus deviated independently, the expected frequencies for each haplotype will be 0.25. The deviation from expectation, defines as linkage disequilibrium and is measured as following formula (Lewontin and Kojima 1960):
D= freq (A1B1×A2B2) - freq (A1B2×A2B2) (1)
As such see in formula, the D dependent to allele frequency, on the other word LD decay with time (t) and recombination as below:
Dt= (1-r)t D0 (2)
D0 and Dt: extent of LD at starting point and t generation later, respectively.
r: recombination rate.
However LD was eroded by recombination over time, which occurs more frequently between markers far apart each other than between closely linked marker. Therefore D would be depended to distance between two markers and comparing of LD level has not been recommended (Ardlie et al. 2002). In this way, Dˊ value were defined as dividing D by its maximum possible value at given allele frequency of two loci (Lewontin, 1964).
DAB: parameter for A and B loci, (Lewontin and Kojima, 1960).
PA1, PA2, PB1 and PB2: allele frequencies for A1 and A2 in A locus and B1 and B2 in B locus, respectively.
DMax: Maximum level of LD in given allele frequency.
The Lewontin’s Dˊ change between 0 to 1, when Dˊ equal to 0 its signs no LD, but when Dˊ equal to 1 it means two markers are in complete LD and when Dˊ is between 0 to 1 it indicates recombination occur between two markers. Indeed, Dˊ is indicating recombination history. In addition, the Dˊ still is influenced by allele frequency and shows more inflation in small sample size (McRae et al. 2002). Another parameter for measuring is the square of the correlation coefficient (r2) between marker alleles (Hill and Robertson, 1966).
r2= D2 / A1B1A2B2 (4)
The r2 are benefit to assess pairwise LD, but they cannot be attended for more than two loci. The amount of LD between marker and a trait locus, which is measuring by r2, is deviously equivalent to the power of finding an association (Kruglyak, 1997; Pritchard and Przeworski, 2001; Teare et al. 2002). Also, the decline of r2 with distance determines how many markers are required to QTL mapping (Hayes et al. 2009).
Mechanisms of generation and erosion of linkage disequilibrium
In population, extent and distribution of LD are influenced by many factors, such as, selection, migration, genetic drift, mutation, small finite population size and recombination (Ardlie et al. 2002; Lander and Schork 1994; Karimi et al. 2014). Linkage disequilibrium can be generated by population admixture (migration). In a mixed population haplotype frequency is different. Linkage disequilibrium extent in such population depends to time since migration occur and different in allele frequency in two population (Greenwood et al. 2004). In small population genetic drift will result to LD by loss of some haplotype (Terwilliger et al. 1998) due to the random sampling of gametes to produce infinite number of offspring; hence it caused change in haplotype frequency (Ardlie et al. 2002). It seems that finite population size is important cause in genome wide LD in livestock population (Hayes et al. 2003; Kiselyova et al. 2014). Selection result to LD through increasing in frequency of a marker in neighboring to gene underling positive selection (Ardlie et al. 2002). The amount of LD generate by selection depends on generation interval in species and selection intensity. Indeed it is localized around gene under selection (Farnir et al. 2000). Recombination and mutation is two important factors which caused erosion of linkage disequilibrium. The extent of LD changes in negative relation to the local recombination rate (Greenwood et al. 2004). In hotspot recombinant the strong of LD will be decreased (Jeffreys et al. 2001). Weaker LD was observed between SNPs closely located in the CpG islands with high mutation rate. For economic traits affected by large number of allele the amount of LD erode by mutation is small (Ardlie et al. 2002). However LD to be found variable both within and among populations is not only affected by above factor, but also some other factors such as age of SNP creating mutation, population history, gene conversion, admixture and hitchhiking, hence it is variable even between two closed loci (Gabriel et al. 2002; Pritchard and Przeworski, 2001; Ardlie et al. 2002).
Linkage disequilibrium studies in livestock
The extent of LD in genome is bases of genomic selection, in addition to its usefulness in determining variability between breed, detection regions under positive selection (Gouveia et al. 2014) and pattern of crossing over (Meuwissen et al. 2001; Bohmanova et al. 2010). The first LD study in cattle was generated on Dutch black-and-white dairy cattle by Farnir et al. (2000) using 284 microsatellite markers. They found high level of LD (Lewontin’s D′) that extended over several tens of centimorgan. Similar results were observed in subsequent studies (Vallejo et al. 2003; Tenesa et al. 2007; Khatkar et al. 2006a). They all studied LD by microsatellite and reported high Dˊ with extensive LD in genome. Until 2006, most studies on LD were based on microsatellite markers or small number of SNPs covering one or only a few chromosomes. Initial study on LD with SNP carried out by Khatkar et al. (2006b). They genotyped 220 SNP on 433 Australian dairy bulls and found same level of LD (r2) with restricted extent. McKay et al. (2007) constructed whole genome LD maps for eight cattle breeds using 2670 SNPs. They reported that extent of LD (r2) was no more than 0.5 Mb in all breeds and suggested that 50000 SNP are required for whole genome association studies in cattle. After that similar results on the extent of genome-wide LD using high density SNPs were reported on other studies (Khatkar et al. 2008; Bohmanova et al. 2010; Sargolzaei et al. 2008; Laodim et al. 2015).
In beef cattle, initiate study were carried out by Lu et al. (2012) in Angus, Charolais and C beef cattle. They reported that the amount of LD decreased rapidly from 0.29 to 0.23 to 0.19 in Angus, 0.22 to 0.16 to 0.12 in Charolais, 0.21 to 0.15 to 0.11 in C breed, when the distance range between markers changed from 0-30 kb to 30-70 kb and then to 70-100 kb, respectively. In their study, the amount of LD decayed rapidly as SNP pair distance increased within 200 kb, but the LD over longer distances remained consistently low. Similar results were found in studies of other beef cattle (Espigolan et al. 2013; Mokry et al. 2014; Porto-Neto et al. 2014; Zhu et al. 2013). This is unlike to dairy cattle which LD decays to basic level as distance increased to 500 kb (Khatkar et al. 2008; Bohmanova et al. 2010; Sargolzaei et al. 2008; Karimi et al. 2014).
First time, study on LD in sheep through microsatellite was reported by McRae et al. (2002). They found High levels of LD to extend for tens of cMs and declined as a function of marker distance. Similar result was found by Meadows et al. (2008) in five Australian sheep breeds. García-Gámez et al. (2012), reported average r2= 0.329 for SNP up to 10 kb apart by 50 k Ovine BeadChip and estimated effective population size was 128 animals. In study of Mastrangelo et al. (2014), average r2 between adjacent SNPs across all chromosomes was 0.155 ± 0.204 for Valle del Belice, 0.156 ± 0.208 for Comisana, and 0.128 ± 0.188 for Pinzirita breeds. They reported the LD declined as a function of distance and average r2 was lower than the values observed in other sheep breeds. Similar results were obtained in study of Zhao et al. (2014). In general it seems, sheep appear to contain generally lower levels of LD than do other domestic species, and the extent of LD in sheep perseveres more limited distances than reported in dairy cattle, likely a reflection of aspects of their past population history. Therefore, to having a same power for detecting association, more markers will be required in this species (Meadows et al. 2008; Kijas et al. 2014).
In horse, Tozaki et al. (2005) estimated that useful LD in the Thoroughbred extends up to 7 cM, but this study covered only one small region of the genome. Similar results were obtained by Wade et al. (2009) on small regions of genome. Corbin et al. (2010) evaluate the extent and decay of LD in 817 Thoroughbreds by Equine SNP50 BeadChip. They found high LD (r2=0.6) in 5 kb and up to 20 kb apart mean r2 remained above non-syntenic levels. In their population Ne was estimated to be 100 animal. Their results were similar to Lee et al. (2014).
For chicken, first LD analysis carried out with microsatellite markers in layer hens (Heifetz et al. 2005). They reported the LD among markers apart up to 5 cM was strongly conserved across generations but decreasing rapidly with increased in markers distance. Similar results were reported in other researchs (Aerts et al. 2007; Andreescu et al. 2007; Rao et al. 2008). At first, Fu et al. (2015) characterized LD and haplotype structure using a 60 k SNP panel in crossbred broiler chickens and their component pure lines. They reported average level of r2 between adjacent SNPs across the chicken autosomes ranged from 0.34 to 0.40 in the pure lines but was only 0.24 in the crossbred populations. Compared with the pure lines, the crossbred populations showed smaller haplo-block sizes and lower haplotype homozygosity on macro-, intermediate and micro-chromosomes. Furthermore, correlations of LD between markers at short distances (0 to 10 kb) were high between crossbred and pure lines (0.83 to 0.94). In another study Khanyile et al. (2015) estimated LD in chickens from South African villages and conservation flocks, Malawi and Zimbabwe which were genotyped using the Illumina iSelect chicken SNP60K BeadChip. Higher LD, ranging from 0.29 to 0.36, was observed between SNP markers that were less than 10 kb apart in the conservation flocks. LD in the conservation flocks steadily decreased to 0.15 (PK) and 0.24 (VD) at SNP marker interval of 500 kb. Pengelly et al. (2016) investigated LD in chickens (Gallus gallus) at the highest resolution to date for broiler, white egg and brown egg layer commercial lines. They reported regions of LD breakdown, which may align with recombination hot spots, are enriched near CpG islands and transcription start sites, but concordance in hot spot locations between commercial breeds is only marginally greater than random.
The first study of LD in pig reported r2= 0.11 for markers 3 cM apart by Du et al. (2007). In research on Porcine SNP60 BeadChip in two pig populations, the average r2 was 0.48 for SNP 30 kb apart and r2 > 0.2 extended to 1.0 and 1.5 Mb (Uimari and Tapio, 2011). Amaral et al. (2008) estimated the extent of LD, haplo-block partitioning and haplotype diversity within haplo-blocks across several pig breeds from China and Europe and in European wild boar. They reported the extent of LD differed significantly between breeds, extending up to 2 cM in Europe and up to 0.05 cM in China and the European ancestral stock had a higher level of LD. The modern breeding programs increased the extent of LD in Europe and caused differences of LD between genomic regions. Badke et al. (2012) confirmed which LD in pigs is higher than in American Holstein cattle, especially at increasing marker distances (>1 Mb). They found High average LD (r2>0.4) between adjacent SNP which is important precursor for the implementation of MAS within a livestock species. These are similar to Du et al. (2007) results.
Useful marker in linkage disequilibrium study
Two common markers for identifying LD are SNPs and microsatellite. In Previous researches microsatellite markers have used to identify LD which it spaced evenly across the genome at 10 cM apart. Microsatellite is highly polymorphic marker. Therefore, it is more useful for haplotype detection both rare as well as common haplotype (Kiselyova et al. 2014) and provided more power to detect LD than did SNPs, even when information from three to five SNPs was combined (Gonzalez-Neira et al. 2007; Schaid et al. 2004). However, development of new high-throughput technologies in SNP genotyping makes it possible to study genome polymorphisms quickly and economically. SNPs are neutral bi-allelic marker with most abundant frequency in genome which has low heterozygosity and mutation rate (Vignal et al. 2009). Although these biallelic markers have lower heterozygosity, they are at a higher density in the genome and they are associated with lower genotyping error rates (Kennedy et al. 2003; Abecasis et al. 2001). Simulation studies have indicated that SNPs can offer equal or superior power to detect linkage compared with low-density microsatellite maps (Kruglyak, 1997).
Linkage Disequilibrium is the basis of whole genome association studies, genomic selection. Generally, it is measured by two parameter Dˊ and r2 by using microsatellite and SNP. Indeed D' reveals historical recombination and are more influenced by variation in allele frequencies than r2. The r2 is used for predicting the power of association mapping and the sample size required for association mapping is inversely proportional to r2 to obtain the same power in detecting of QTN. In addition, the pattern of decline in r2 has been used to determine the average useful LD for single point association mapping in this population. Previous researches on LD in livestock were carried out using microsatellites and reported high level of Dˊ in long distance. Until recently, microsatellites have been the primary type of markers used for linkage analyses. They are abundant, equally dispersed throughout the genome, highly polymorphic and more informative which. So, it use over the years. Since 2005 as progressing on large scale genome sequencing, SNP was dramatically used in such research and high level of Dˊ in short distance were reported. Although these biallelic markers have lower heterozygosity, but they are at a higher density in the genome and are associated with lower genotyping error rates. By using SNP the extent and the pattern of LD were reported on short distance in many livestock species. It has been found that LD to be highly variable between specious and even within and among populations in species. This result suggested not only LD is affected by mutation, recombination, selection, effective population, but also some other factors such as age of SNP creating mutation, population history, gene conversion, admixture and hitchhiking, LD measurement parameters and marker properties, can influence LD. Therefore the LD is a specific character for each population and determination of the pattern and extent of LD is necessary for each population separately in association study.
I thank people who helped in doing this research, a particular thank is for Prof. Luca Fontanesi (Italy) for who advices my research.