Inbreeding refers to mated parents who share one or more ancestors. Despite the importance of inbreeding in evolutionary genetics, animal and plant breeding, it is often used improperly because of absent in agreement to distinguish between related and unrelated individuals. Hence, inbreeding defines as outcome of mating between two individuals who are more related to each other than the average relatedness of population (Templeton and Read, 1994). Consequence of inbreeding is genotype frequencies changes though increasing homozygosity by expensing of heterozygosity, without affecting allele frequencies (Charlesworth and Willis, 2009). This leads to redistribution of the genetic variations within and between populations (Fernandez et al. 1995), reduce in performance traits related to fitness (Charlesworth and Willis, 2009) and reveal in homozygous recessive defects (Alvarez et al. 2009). Moreover, it results in genetic drift, interrupts Hardy–Weinberg equilibrium and change in effective population size. In human populations which kinship is conventional, inbreeding resulted to increase in risk of monogenic disorders (Bittles, 2003), complex diseases involving recessive variants (Bittles and Black, 2010), low-density lipoprotein (LDL) cholesterol disease and also blood pressure (Campbell et al. 2007). Due to the importance of inbreeding in human health, and for avoiding inbreeding depression on plants and animals performance, some genomic methods to quantifying inbreeding were developed. The first method proposed was estimating the inbreeding coefficient by pedigree, which is called the pedigree inbreeding coefficient (FPED) (Wright, 1992). FPED calculated by using the path coefficients technique with calculation an individual’s probability of inheriting two IBD alleles, when pedigree is known and alleles are transmitted from parent to offspring with a probability of 0.5. Although calculation of this coefficient was not complicated but clarified biological mean was difficult, especially for individuals with arbitrary pedigrees. Inbreeding coefficient defined as the probability which two haplotypes at any locus randomly sampled among all loci in the genome are IBD (Malécot, 1948). Despite the fact that, pedigrees must be known and accurate, this method has some disadvantages. First, FPED is equal to the expected proportion IBD in the genome without considering to recombination (Carothers et al. 2006). Second, FPED does not take in to account the relatedness among founders in base population (Suwanlee et al. 2007). Third, FPED assumes equal levels of autozygosity over whole genome and does not account for potential bias resulting from selection. Finally, pedigrees in most of species have some errors due to misinterpretation, misidentification and incorrect recording (Curik et al. 2002). The important of inbreeding led researchers to develop several genomic methods to estimate inbreeding. By developing in high-density genome-wide single nucleotide polymorphism array, it has made possible to calculate individual inbreeding coefficients from molecular data. A simple method is discovery of continuous regions of autozygosity in individual genomes (Broman and Weber, 1999). Autozygosity define as autosomal segments in genome which are identical-by-descent from both paternal and maternal sides (Wright, 1922). In the absent of mutation or recombination, two alleles are IBD if they have been inherited from the same ancestral alleles, either parental or maternal (Crow, 1954). Individual autozygosity can be measured using runs of homozygosity (ROH).This review interrogates ROH for estimating inbreeding based on genetic information. Run of homozygosity are defined and discussed in detail in human and livestock, and the effect of chip density on ROH detection are explored.
Runs of homozygosity (ROH)
Runs of homozygosity are successive homozygous segments of the genome where the two haplotypes inherited from the parents are identical by descent (Curik et al. 2014). At first, Broman and Weber (1999) defined the concept of ROH. They state recombination events interrupt the length of ROH and also it has broken down as increase number of generations from the common ancestor. ROH is not distributed across the genome as uniform (Stella et al. 2010), they are more common in some regions, termed ROH islands or ROH hotspots which are suggested to become a sign of selective sweeps and regions of the genome that are under positive selection (Nothnagel et al. 2010). These lead to the fixation of favorable alleles in the population in type of “hitchhiking” process (McQuillan et al. 2008). The Fst and iHS analysis revealed a significant correlation between small ROH segments and genomic regions under selection (Zhang et al. 2015). In human genome population history, genomic properties, and cultural habits could affect the observed islands of ROH (Curik et al. 2014), but ROH islands were communally observed to be in gene-rich region of the genome which has been affected by selection (Carothers et al. 2006). The ROH is rarely found in some part of genome called ROH cold spot which is likely to be regions enriched for loci associated with a critical function (Pemberton et al. 2012). It has been shown that the length and frequency of ROH help to describe the history of the population. As long ROH segments are supposed to be autozygous originated from recent common ancestors and short ROH segments originate from distant ancestors because chromosomal segments are broken up by repeated meiosis. So they have older origins or may involve some non-IBD stretches (Howrigan et al. 2011; Kirin et al. 2010; Curik et al. 2014). A small and isolated population, therefore, is expected to display longer ROH as compared to a crossbred population (Gibson et al. 2006). The availability of modern genome scan technologies such as high-density SNP arrays has provided an opportunity for investigating ROH regions in various species which leads to possibilities of comparing the extent and patterns of homozygosity between different populations.
The first analysis for the lengths, numbers, and distribution of ROH were reported in 2006 on HapMap populations (Gibson et al. 2006). After that, FROH introduced for measuring inbreeding level (McQuillan et al. 2008). It is defined as whole autosomal genome portion lying in runs of homozygosity in comparative to total autosomal genome length:
total length of ROH in an individual genome which the domains include the minimum specified number of successive homozygous SNPs.
related to autosomal genome covered by SNPs.
ROH on sex chromosomes in females and centromeric regions was ignored in most of analysis, since sex chromosome on female has different IBD distribution pattern and regions around centromeres has extent genomic domains without SNPs which may lead to biased estimates (Szmatoła et al. 2016). Estimation of inbreeding by FROH has several utility in compared to FPED such as, estimating of inbreeding in genotyped individual without pedigree, predicting autozygous more precisely in comparing to FPED, capturing autozygosity arising from distant common ancestors and finding specific regions on genome with higher levels of autozygosity via autozygosity distribution (Sölkner et al. 2010; Keller et al. 2011; Ferenčaković et al. 2011; Ferenčaković et al. 2013a; Ferenčaković et al. 2013b; Purfield et al. 2012). Three general tools were used to detection of ROH segments in SNP array data; PLINK v1.07 (Purcell et al. 2007) (http://pngu.mgh.harvard.edu/purcell/plink/), the Golden Helix SNP and Variation Suite (SVS; www.goldenhelix. com) and cgaTOH (Zhang et al. 2013) (Table1). PLINK define an ROH based on minimum specified number of homozygous SNPs within a specified kb distance using sliding window approach. The SVS algorithm work across chromosome to find ROH SNPs starting at every possible marker and distinguish the position of those runs shared among a user-determined number of samples. Although two programs have some different, both of them generate FROH with correlation coefficients > 0.99 and cannot recognize heterozygous SNPs lying close together in an ROH (Ferenčaković et al. 2013b). cgaTOH use SNP-wise approach to calculate ROHs, which it designates tracts of homozygosity (TOH). It also has extra advantage to sort the segments such as allele matching. The literatures represented that ROH density patterns are different among the 3 software, but there is consensus in location of ROH hotspots (Ferenčaković et al. 2013a).
At first, long homozygous segments of the human genome were identified using microsatellite markers, which later refer to runs of homozygosity (Broman and Weber, 1999). In this paper, the authors represented autozygosity may have an effect on gene mapping and health. Since 2005 by progressing on whole genome sequencing, homozygous segments of the genome have been identified using high-density SNP arrays and, additional development were occurred in analyzing of length, numbers and distribution of ROH in outbreed Hap Map populations (Gibson et al. 2006). Other study supposed that ROH can be used to map genes linked to diseases such as schizophrenia (Lencz et al. 2007). McQuillan et al. (2008) performed an extensive analysis in European populations, including island isolates within Croatia and Scotland. They defined a new genomic inbreeding coefficient (FROH), and showed that the correlation of this coefficient with FPED, FPLINK and MLH varied between 0.74-0.82. Since then many researchers apply the concept of ROH in population genomics and demography (Kirin et al. 2010; Nothnagel et al. 2010; Palamara et al. 2012), inbreeding depression (Keller et al. 2011; McQuillan et al. 2012), disease-linked genes (Nalls et al. 2009; Keller et al. 2012; Wang et al. 2013), and recombination (Bosse et al. 2012). Moreover, a lot of studies were carried out on relationship among ROH and different kind of cancer such as lung cancer (Wang et al. 2013), vulvar cancer (McWhirter et al. 2014), breast cancer (Thomsen et al. 2015) and thyroid cancer (Thomsen et al. 2016). Yang and Li (2014) coined the homozygosity disequilibrium (HD) is a nonrandom sizable ROH in the genome which is related to the population evolution and disease susceptibility. Genome-wide association study reported that diastolic blood pressure and hypertension associated to ROH in Human genomes, and the genes located in these regions associated with renin catalysis (REN), blood groups (ABO), calcium channels (CACNA1S) and apolipoprotein (APOA5). Other evidence has been reported on relationship between ROH and some disease such as Alzheimer disease (Ghani et al. 2013; Ghani et al. 2015), Parkinson's disease (Simón-Sánchez et al. 2012), psychosis in human (Melhem et al. 2014) and physical and psychological human traits (Verweij et al. 2014).
The first studies in run of homozygosity regions on cattle were carried out in 2010 (Sölkner et al. 2010; Ferenčaković et al. 2011). In this paper the pedigree and genotype data from 500 Austrian dual purpose Simmental bulls were used to estimate correlation between FROH and FPED. It revealed that in ROH with length > 4 Mb correlations between FROH and FPED were highest (0.68) and FROH for segments > 1 Mb indicating old inbreeding in population which cannot be traced using pedigree. In overall they concluded FROH is accurate and useful for measuring inbreeding level in cattle. Similar results in correlation between FROH and FPED were obtained in ROH with length > 8 Mb in other studies (Zhang et al. 2015; Gurgul et al. 2016).
Table 1 Comparison of ROH studies in different farm animal species
Purfield et al. (2012) extended this study in Holstein, Limousin and Simmental breeds using HD panel (n=777962). They found strong correlations (r=0.75, P<0.0001) between the FPED and FROH with length > 0.5 KB. Also they reported in the absent of pedigree data, ROH could be used to derive recent population history even if it was small population. After that the effects of inbreeding based on FROH in dairy cattle performance were calculated. It reported the total milk yield to 205 d postpartum decrease of 20 kg per 1% increase in FROH, and increases in open days per 1% increase in FROH (1.72 d), it was also noted, an increase in maternal calving difficulty (Bjelland et al. 2013). Similar results were obtained in study on daughter pregnancy rate and somatic cell score by increasing in inbreeding on Jersey cattle (Kim et al. 2015). Ferenčaković et al. (2013b) revealed inbreeding estimated based on the genomic coefficients FROH > 1 Mb and FROH > 2 Mb were considerably higher than pedigree-derived estimates, while FROH > 8Mb and FROH > 16 Mb were similar to FPED. In other study, it revealed the number of ROH < 4 Mb overestimated by less density panel, since heterozygous SNPs on the denser chip could not be identified (Ferenčaković et al. 2013a). It represented the ROH with length > 4 Mb may be related to the strong artificial selection and the use of artificial insemination which led to increases relatedness among animals (Zhang et al. 2015; Szmatoła et al. 2016; Kim et al. 2015). In populations with high linkage disequilibrium (LD) and recent inbreeding, the 50 k Bead Chip could provide a good estimate of inbreeding. While in population with low LD and ancient inbreeding, denser panel would have been required to identify short ROH precisely (Marras et al. 2015). Howard et al. (2015) studied characterize differences and similarities in the location and frequency of homozygosity in Jersey dairy cows and bulls from the United States, Australia and New Zealand. They reported differential ROH45 across all populations, is exhibited locations of the genome are undergoing differential directional selection (ROH45 counts the frequency of a SNP) being in a ROH of at least 45 SNP In beef cattle for the first time Purfield et al. (2012) identified ROH for European Holstein, Limousin and Simmental. They found that a mean sum of ROH lengths > 5 Mb identified was the highest for Holstein breed (145 Mb) and comparable for Limousin (45 Mb) and Simmental (55 Mb).Similar results were reported by Szmatoła et al. (2016). Most of studies represented that the beef cattle had a lower number of ROH compare to dairy cattle and dual-purpose breeds (Marras et al. 2015; Ferenčaković et al. 2011). It represented that Angus and Hereford also showed considerably higher sums ROH than Charolais, Limousin, Simmental and other breeds in the categories 1-5 Mb and 5-10 Mb (Iacolina, 2016b).
The first study for ROH in Pig, was carried out by using Porcine 60 Bead Chip in 52 samples from commercial breeds and wild populations of Eurasia (Bosse et al. 2012) (Table1). It was reported ROH is not equally distributed in genome, and some ROH hotspots overlapped to positive selected genes. Moreover, a strong correlation between the size and abundance of ROH with recombination rate and GC content were reported (Herrero-Medrano et al. 2013; Bosse et al. 2012). Also it was revealed inbreeding coefficients calculated from pedigree were strongly correlated to Run of homozygosity derived from SNP (r=0.814-0.919). However, these correlations are dependent on the number of SNPs and heterozygosity measured across different loci (Silio et al. 2013). Gomez‑Raya et al. (2015) represented the correlation between chromosomal length and chromosomal inbreeding coefficients were 0.84 (SE=0.14), it supported this hypothesis that FROH incorporate information on ROH length as an indication of recent inbreeding. Similar results were obtained in study of Iacolina et al. (2016a).
ROH in sheep was studied using 50k Bead Chip in Swakara breed for first time and a total 436 unique ROH regions that spanned between 1 to 6 Mb on autosomal chromosomes were reported (Muchadeyi et al. 2015). In other study ROH calculated in three pure breeds (Merino, Border Leicester, Poll Dorset) and two crossbred (F1 crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset) Australian sheep populations. The number of ROH differed significantly between populations and 80% of animals had at least one ROH longer than 1 Mb and 59% of animal had one ROH greater than 5 Mb. In addition to all animals in Pure breeds had at least one ROH longer than 1 Mb and 88% of animals had at least one ROH greater than 5 Mb (Al‑Mamun et al. 2015). Similar to dairy cattle it represented the pure breeds had more ROH across the whole genome than the crosses.
ROH detection in chicken was carried out on three African population include, 72 Ugandan, 100 Rwandan, and 24 Kuroilers chicken (Fleming et al. 2016). The number and extent of ROH differed among populations. Ugandan ecotypes have ROH on every chromosome except chromosome 16 and also longest median length of ROH in Genome, while Kuroilers had the fewest chromosomes contained ROH and shortest median length of ROH. In overall, the amount of the genome covered by ROH was ~2% to 40%. The analysis has found that ROH islands and deserts occur frequently in the chicken genome. Islands appear clearly in both macro- and micro-chromosomes, and in all regions of chromosomes. While islands are found less frequently in micro-chromosomes, this is expected due to the high rate of recombination in micro-chromosomes (Orazietti, 2015).
At first, Khanshour et al. (2013) performed ROH analysis to reveal signatures of positive selection in Arabian horse and detected longer ROH (>400 kb) and high inbreeding coefficients in Sorraia and Thoroughbred horses. Metzger et al. (2015) identified the distribution and the number of ROHs in 10 horse populations by using next generation sequencing data. In this study, in total 3784 ROHs were detected. Small ROH (40-49 kb) were abundance and equally distributed in all animals, whereas ROH longer than 60 kb differently distributed among different populations. Moreover, in non-breed horses, 198 ROHs in 50-SNP windows and seven ROHs in 500-SNP windows overlapped with genes affect reproduction, embryonic development, energy metabolism, muscle and cardiac development. In Seven breed, only three common ROHs in 50-SNP windows revealed which had partial covering on gene YES1 (related to fertility). In the Hanoverian, 18 ROHs be detected in the region of genes related to glycogen balance, reproduction, neurologic control, signaling process.
ROH overlaps to gene location
In analyzed ROH patterns in human populations, some hotspots on chromosomes 4 and 10 were reported that harbor genes undergo to selection, and some of them have even become fixed (Pemberton et al. 2012). Simón-Sánchez et al. (2012) reported early onset Parkinson's disease (EOPD) is particularly associated with autosomal recessive mutations, and three genes, PARK2, PARK7 and PINK1, which may appear in extended runs of homozygosity. In cattle, the high autozygosity region on chromosome 2 in Limousin cattle was detected which overlapped to the MSTN gene locus known as strong QTL for muscling traits (Szmatoła et al. 2016; Esmailizadeh et al. 2008). Homozygous region on BTA14 which may be related to DGAT1 variants which affected milk fat percentage, engaged in free fatty acids binding, transportation and regulation of lipid metabolism (Siegenthaler et al. 1994). A new homozygosity was observed on BTA16 in Polish Red, Limousin and Simmental breeds which are known to carry several QTLs for meat and carcass traits (Gutiérrez-Gi et al. 2008). In Holstein, within ROH islands, the presence of 183 genes was confirmed which possibly associated with other minor QTLs on milk production, Thyrotropin-releasing (TR) hormone receptor signaling pathway or other traits. Secretion of TR hormone by the hypothalamus is critical for the release of prolactin and growth hormones from the pituitary gland (Kaiser et al. 1994). In horses, functional analyses of ROHs showed genes involved in embryonic development, energy metabolism, muscle and cardiac development, fertility-related gene YES1, neurologic control, signaling, glycogen balance, melanogenesis, haematopoies is and gametogenesis (Metzger et al. 2015). In chicken, genes within runs of homozygosity consensus regions are linked to gene ontology (GO) terms related to lipid metabolism, immune functions and stress-mediated responses (Fleming et al. 2016). Carothers et al. (2006) represented ROH hotspot regions have been located in gene-rich region of the genome which has been affected by selection and are suggested to become a sign of selective sweeps and regions of the genome that are under positive selection (Nothnagel et al. 2010). These lead to the fixation of favorable alleles in the population in “hitchhiking” process (McQuillan et al. 2008).
Dose chip density influences the efficiency of ROH detection?
Most of research revealed this topic that the efficiency of ROH detection is influenced by SNP chip density. It was reported genome-wide scans with denser panel make it easy identification and count shorter ROHs (McQuillan et al. 2008). The 50 k chip in comparing to denser panel identified only 27.7% of all runs of homozygosity < 5 Mb; it means the 50 k chip has lower capability to detect short ROH. In other hand, the most of ROH > 5 Mb were detected with similar sensitivity in 50 k and denser panel. Since total ROH lengths were used to calculate inbreeding level and long ROH have great influence on this parameter, therefore, 50 k is a suitable in compromise between price and reliability in ROH detection (Purfield et al. 2012). Hamzić (2011) reported greater mean number of ROH shorter than 5 Mb comparing to Purfield et al. (2012) study by using 50 k. While the results for ROH with mean lengths greater than 5 Mb are very similar for both SNP panels. The 50 k chips overestimate the number of segments less than 4 Mb since it is not capable to identify heterozygous SNP genotypes within observed ROH. Conversely, the denser chip underestimated the number of ROH longer than 8 Mb (Ferenčaković et al. 2013a). Therefore, the minimum ROH that can be detected depending on density of SNP chip.
Since that 2005, all studies for estimating inbreeding level were carried out using pedigree. By progressing on whole genome sequencing technology, SNP array was available rapidly, and molecular data was conventionally used to estimate of inbreeding though run of homozygosity. ROH were calculated in a lot of species and population such as human, cattle, sheep, horses, pig, chicken, etc. However, it is difficult to compare these studies because of lack consensus among criteria for defining ROH. Any way most of studies represented highly correlation among FROH > 5 Mb and FPED. In ROH with length more than 1 or 2 Mb, FROH was higher than FPED. The number and patterns of ROH are different across breeds, subspecies and population which are consequence of signatures selection. It means population history can be detected via ROH. A denser panel such as bovine HD underestimated the number of ROH longer than 8 Mb because of incidental heterozygotes arising as a result of genotyping errors and sparse panel such as 50 k tended to overestimate the number of ROH that are shorter than 4 Mb. It supposed to denser panel led to accurate analysis but in compromise between price and reliability in ROH detection 50 k is appropriate. In general run of homozygosity enable preciously estimate inbreeding in population, but some problem may be observed in related to genotyping errors. Therefore improving in inbreeding estimation through next-generation sequencing data were required to reduce the effects of sequencing errors.
M. Nosrati acknowledges financial support from Payam Noor University.