In recent years, with the development of genetic engineering technology, rapid DNA sequencing has been made possible and a massive amount of nucleotide sequence information from different parts of the genome in different species is growing at a rapid pace. Deoxyribonucleic acid (DNA) sequence data contains a wealth of biologically useful information. The analysis of these sequences can be used by comparing the sequences of the genes with identified functions to specify the characteristics of different regions, such as active sites and functional regions of the DNA to interpret the evolutionary relationships between categorized groups (phylogenetic) and comparing an unidentified sequence with millions of sequences available in DNA information gene banks. Also, new and increasing reports on the sequences of the DNA have provided and created conditions as well as a motivation for evolutionary investigations using comparative studies. Therefore, it would be very desirable to use the statistical methods to estimate the evolutionary distance between similar sequences and the number of nucleotide substitutes. Estimating evolutionary distances between protein and DNA sequences is very important to form a phylogenetic tree, knowing the time and origin of the divergence and branching of species, and understanding the mechanism of the evolution of genes, proteins, and populations. To date, three categories of genes affecting the growth of follicles and ovulation rate have been identified, including activin receptor-like kinase 6 (ALK6), growth and differentiation factor-9 (GDF9) and bone morphogenetic proteins (BMP) category, which is the most famous of bone morphogenetic protein (BMP15) gene. All these genes are among the transforming growth factorβ (TGF-β) large family and they affect the regulation of the expression and secretion of the hormones affecting the follicular growth and ovulation rate. The growth factors BMP15, BMP6 and growth differentiation factor 9 (GDF9) are produced by ova (McNatty et al. 2005), while the receptors BMP type A1, A2 and type 2 are located on ova and somatic cells of ovarian follicles (Souza et al. 2002) and also on the cells of the pituitary gland of sheep. The BMP15 gene is located on chromosome X and consists of two exons, that are separated from each other by an intron with a length of 5.4 kb. Their full transcription product is a sequence of 1179 nucleotides which encodes a pre-peptide with a length of 393 amino acids and its full peptide has a length of 125 amino acids (Galloway et al. 2002). The study of the above positions in rodents has shown that mRNA transcribed from the position of BMP15 and the protein resulting from its expression are found in the early stages of ovulation in the oocyte cell (Dube et al.1998; Otsuka et al. 2001). Also, the BMP15 mutations cause a reduction in the amount of mature protein or a change in binding to the receptors on the cell surface. Thus, investigating the phenotype resulting from the mutation of the BMP15 locus shows that the presence of certain haplotypes is necessary for ovarian follicular evolution, normal ovulation, and formation of the corpus luteum in sheep. Besides, reports have shown that mRNA and the BMP15 protein exist at all stages of ovarian follicles of goats (Silva et al. 2004). To date, eight mutations have been identified in the gene encoding this factor in sheep, namely, Inverdal, Hanna, Belclare and Galway, Lacaune, Rasa Aragonesa, Grivette and Olkuska according to the name of the breed of the sheep in which these mutations are identified for the first time. The FecXI allele in sheep is related to a thymine to adenine change at position 896 cDNA encoding the BMP-15 factor. The FecXH allele is related to a cytosine to thymine change at position 871, which leads to the creation of a stop codon in amino acid 23 of the complete protein and this stop codon also leads to the loss of the biological activity of the BMP15 factor (Fabre et al. 2006). Additionally, the FecXG allele results from a thymine to adenine change at position 718. The FecXG mutation leads to the creation of a stop codon in amino acid 239 of the protein. Thus, the processed protein is not produced. The FecXB allele results from changing guanine to thymine at nucleotide 1100 and in this case, the change of amino acid serine to isoleucine occurs at position 99 of the protein sequence (Table 1) (Fabre et al. 2006). In Lacaune breed, the mutant allele (FecXL) associated with high prolificacy and was identified as a Cys321Tyr substitution altering the BMP15 protein function (Drouilhet et al. 2013). Both FecXGr and FecXO mutations are closely located into two very well conserved domains of the sheep, cow, pig, human and mouse BMP15 proteins, FecXGr, which corresponds to a substitution of a threonine to an isoleucine, clearly affected the hydrophobicity of the protein while FecXO altered the polarity and the molecular weight of the protein by replacing an asparagine to a histidine. These two mutations clearly affect he intrinsic properties of the BMP15 protein since they correspond to substitutions of polar amino acids by non-polar and basic amino acids suspected to modify consequently its three-dimensional structure (Demars et al. 2013). The recombinant BMP15 gene increases the proliferation of granulosa cells in mice and humans. Moreover, BMP15 in granulosa cells potentially stimulates the mRNA encoding the kit messenger (a factor necessary for the growth of ovum in primary follicles). Therefore, both BMP15 and kit play an important role in the early growth of follicles. Also, BMP15 is able to control the production of steroids. In fact, BMP15 in mice selectively controls the biological effects of follicle-stimulating hormone (FSH) on granulosa cells, by inhibiting the production of the FSH resulting from the production of progesterone and with no effects on the FSH resulting from estradiol synthesis. The fundamental mechanism points to the control of the negative feedback of the FSH receptor which leads to the prevention of the accumulation of the mRNA resulting from multiple genes related to FSH, such as genes steroidogenic acute regulatory protein (SARP),3ß-hydroxysteroid dehydrogenase (P450scc),P450 side chain cleavage enzyme - (3β-HSD) the luteinizing hormone (LH) receptor and inhibitor/activator subunits. BMP15 in sheep increases the proliferation rate of granulosa cells and prevents the secretion of the baseline FSH and the FSH resulting from the progesterone of the granulosa cells of small anterior follicles (Fabre et al. 2006). Totally the effect of sheep variants seems tightly related to the kind of mutations described.
Table 1 The mutations reported in the locus of the BMP15 gene
Indeed, 3 out of the 8 mutations identified so far are amino acids deletion FecXR or premature stop codon (FecXG and FecXH) in the BMP15 sequence impairing consequently the production of the BMP15 active form (Monteagudo et al. 2009). Thus, the objective of the present study is the bioinformatics analysis in order to analyze sequences, including searching in existing information gene banks, matching sequences, and estimating evolutionary distances and structure of the phylogenetic tree in species, such as sheep, mice, cows, goats, guinea pigs, humans, pigs, and other species, using partial and total sequences of the BMP15 gene available in NCBI gene bank.
MATERIALS AND METHODS
The analysis method and DNA information gene banks
One of the most informative methods used in sequence data analysis is similarity searching. For DNAs, similarity at the sequence level implies some structural or functional similarity between the protein products or regulatory elements of gene expression. Searching a database with an uncharacterized gene sequence can identify homologues in other species or sequence elements that encode structural domains within the protein. Searches can be conducted with either nucleotide or peptide sequences. However, detection of similarity at the nucleotide level is difficult unless the sequences are closely related. For analysis of coding DNAs, similarity searching with the translated protein sequence is more informative. A commonly used tool for similarity searching is BLAST (Basic Local Alignment Search Tool) because of its practical balance of speed, sensitivity and selectivity. In the present study, 23 sequences of the BMP15 gene, including mRNA and DNA, were taken from the NCBI information gene bank, and using the tool BLAST on the website http://ncbi.nlm.nih.gov, similar sequences, their similarities (both nucleotide and protein) and possible mutations were investigated (Table 2). Comparison among sequences and knowing the genetic parameters, such as the number of mutations, nucleotide diversity, the number of positions in which similar substitutions have taken place as well as their diversity, were determined using Dnasp v5 software program (Librado and Rozas, 2009) and determining the alignment of the sequence of the BMP15 gene with sequences of other organisms was done using MEGA6 (Tamura et al. 2013) software program.
Phylogeny and determination of evolutionary direction
In order to draw the phylogenetic tree, the protein sequence of the BMP15 gene was predicted for the species being studied using MEGA6 software program (Tamura et al. 2013). After editing the sequences and deleting the non-coding regions, the phylogenetic tree was drawn using neighbor-joining (NJ) method. In this method, a matrix (Q) was used, so that in this matrix, all the branches are used and the lowest value which represents high similarity between two branches will be selected and employed in a branching of the phylogenetic tree. Bootstrap values were obtained through 100 times of re-sampling. The phylogenetic tree was drawn using NJ method and equation 1.
1) Q(i, j)= (r –2)d(i, j) – Σd(i, k) – Σd(j, k)
d(i, j): distance between branches i and j.
k: k-th branch of the tree.
r: total number of the branches.
Q(i, j): numerical value of branches i and j.
Also, the maximum composite likelihood method was used to obtain the succession to substitution rate of nucleotides in purine and pyrimidine bases. Investigating the nucleotide changes that have changed amino acids (dN) in relation to the nucleotide changes that have not affected the resulting amino acid (dS), is a highly efficient and useful method for detecting the trend of natural selection for genes during evolution.
Table 2 Characteristics of DNA sequences used for the bioinformatics analysis of the BMP15 gene
Thus, using the numerical value of this ratio (dN/dS), the trend of natural selection was identified for the BMP15 gene. dN and dS values can be calculated using equations 2 and 3, respectively.
2) dN= -3 / 4Ln(1-4/3PN)
3) dS= -3 / 4Ln(1-4/3PS)
PN: ratio of the positions with non-similar substitutions.
PS: ratio of the positions with similar substitutions.
The significance of the numerical value of dN/dS was investigated using Fisher's test at 5% likelihood level.
RESULTS AND DISCUSSION
The mean inter-population genetic diversity was calculated to be 2.15 bp using the maximum composite likelihood method. Also, the scattering coefficient of evolution, which is another indicator for measuring the inter-population diversity, was estimated to be 1.25 bp using the number of base pair of nucleotides, Nei and Kumar (2000) and Tamura (2004) methods. The mean divergence between all sequence pairs was calculated to be 1.7, which represents the number of base substitutions at each site and an average of all sequence pairs. Divergence is defined using the average number of base substitutions at each site, between all the base pairs within the groups. This divergence was estimated to be 0.4 bp in cows, 2.1 bp in goats, zero in mice, 1 in pigs and 4.1 in sheep. Also, the mean distance between species, was estimated according to the sequence data taken from the NCBI gene bank, using the maximum composite likelihood method, which is based on the number of base pairs (Table 3). Bases substitution rate and pattern were estimated using tamura-nei model (Table3). These presented rates express the likelihood of the replacement of each base with another one. These likelihoods have been estimated according to the sequences being analyzed and changes in the sequence of bases (Table 4). As shown in Table 3, the maximum distance was observed between sheep and human and mouse while there was the lowest value between sheep and cow and goat. When each of the bases is evaluated, the probability of base substitution (r values) related to each of them should be considered. To simplify this issue, the sum of the likelihood values has been considered to be 100 (Tamura et al. 2004; Yang and Kumar, 1996). The bioinformatics analysis with 23 nucleotide sequences at 221 positions, after excluding missing data and checking for deleted distance showed 22.27%, 25.85%, as 24.61%, and 27.27% nucleotide frequency for adenine, thymine/uracil, cytosine and guanine, respectively. The divergence between species was estimated using pair comparison method and the number of bases.
Table 3 The distance between species using the number of base pairs in the locus of the BMP15 gene
Table 4 The probability of substitution (r) from one base (row) to another base (column)1,2
1 Each number in the table, the likelihood of replacement (r) of one base (row) with another base (columns).
2 The substitution rates of the bases from the same family (the replacement of a purine with a purine, or a pyrimidine with a pyrimidine) are shown in a diagonal (BOLD), and the transition rates of the bases from different families, are shown in Italics.
The numbers presented in detail in Table 5 are the number of base substitutions in the each site between sequences. The maximum divergence is 5 bp, in related to the comparison between sheep and cows. MEGA6 (Tamura et al. 2013) is a commonly used program for multiple sequence alignment. It uses a progressive algorithm to align sequences in successively larger groups, beginning with the most closely related sequences. Using MEGA6, 23 sequences being studied are compared and a tentative measure of similarity is derived, represented by a distance matrix. This is used to produce a phylogenetic guide tree (Figure 1), using the neighbour-joining (NJ) method (Saitou and Nei, 1987). The branching pattern of the tree is used to determine the most closely related pair of the sequences. A final alignment is obtained by repeating this procedure until it reaches the root of a tree. The resulting molecular phylogenetic tree represents two main branches of the phylogenetic relationships between the sequences. The final nodes (leaves) of the tree represent the existing sequences and refer to the practical units of classification, while the internal nodes represent hypothetical ancestor sequences. Presented tree is a branching, that is, each node creates two branches, each of which represents the occurrence of a specific event or differences between BMP15 in studied sequences. Comparative studies of sequences were used in a wide range of taxonomic levels, to evaluate phylogenetic relationships. Results showed different regions and intragenic distances of the DNA varied among species within a BMP15 sequences. Despite some similarity between sequences, phylogenetic tree and genetic scattering rate showed distances among species in BMP15s. The phylogeny results of a recent study (Bwaseh et al. 2016) based on nucleotide and amino acid sequences of BMP15 showed a similar clustering of sequences among the various species with those obtained in this study, although there was some intermingling between the species. An investigation on the sequences of the BMP15 gene in the species being studied carried out using MEGA6 (Tamura et al. 2013) and BioEdit software programs as well as basic local alignment search tool (BLAST) showed a lot of similarity among the species being studied in the locus of the BMP15 gene. One of the BMP15 gene sequences belong to sheep was set as query sequence and then in BLAST output, results of other 22 sequences were compared with query sequence. In Figures 2 to 5, most similar sequences were shown. In many cases, the similarity was 100 and in all cases, this similarity was greater than 98%. The result of comparing similarity, were rank first for Max Score/Total Score and the least E-values respectively. Score of the pairwise comparison between query DNA sequence and the desired DNA sequence in the NCBI database was calculated as fellow: +2 for each match; -1 for each mismatch, and -2 for a gap). Higher scores mean better alignments. In Figure 3, the sequence with the maximum similarity and the relevant E-values are shown in order, from top to bottom. The results of some two by two comparisons at some sites, using the BLAST tool are shown in Figures 6 to 8. Despite many similarities, the sequences at some sites had base variation and replacement too. In Figure 4, there are some various deletions in some regions of the gene in some sequences. As shown, most of mRNA sequences containing a deletion and sequences without deletion are DNA sequences. These deletions are mainly related to introns which were usually happened by splicing in RNA processing.
Table 5 The estimate of the divergence between the sequences of the BMP15 gene in domesticated species of animals
1 Column numbers are same as row numbers (spices/genus).
This is the reason why there are differences in different species and even in different organs and parts of the body of a species despite the same gene sequence. When DNA sequences are aligned with each other, identification of the appropriate corresponding nucleotides is quite difficult because there are only four types of nucleotides. Alignment of amino acid sequences is easier and can be more meaningful if the intent is to compare a group of related sequences for potential functional characteristics. In order to do this, amino acid sequences of BMP15 related to four most used species such as mouse, pigs, sheep, and humans were first retrieved from NCBI and then have been compared using BLAST. The BMP15 protein sequences was from Ovis aries (GenBank AAF81688.1), Sus scrofa (GenBank NP_001005155.1), Mus musculus (GenBank NP_033887.1) and Homo sapiens (GeneBank NP_005439.2) were aligned and compared. Results showed that differences in nucleotide sequences leading to changes in the protein sequence (Figure 9).
Figure 1 Phylogenetic tree drawn based on 23 nucleotide sequences of the BMP15 gene
Figure 2 The scoring of the similarity and matching rate of sequences using the tool "BLAST"
Figure 3 Sequence matching from base 900 up to base 1100 of the BMP15 gene in some species being studied
Figure 4 Sequence matching from base 5900 up to base 6100 of the BMP15 gene in some species being studied
Figure 5 Displays of a part of sequence matching and deletions (introns)
Figure 6 A comparison between the two sequences of the BMP gene of Ovis species
Figure 7 A comparison between the sequence of the BMP15 gene in Ovis and Capra species
These differences in the BMP15 protein sequences in species being compared are shown in the black line distinct from others. Also, the important mutations that occurred in this gene in different species are shown in red lines. The BMP15 protein plays an important role in women's fertility. A number of mutations have occurred in the BMP15 gene in humans, but none of them is common to those in sheep. It is interesting that all the mutations that have been reported in humans so far have been associated with diminished ovarian syndrome in heterozygotes. Therefore, heterozygous carriers in humans had an ovarian phenotype similar to that of infertile homozygous FecX ewes. In contrast, heterozygous ewes, FecXGr and FecXO carriers, had normal ovaries and a greater number of ovulation and lamb in each delivery (Demars et al. 2013).
Figure 8 A comparison between the sequence of the BMP15 gene in Ovis aries and Bos taurus
Figure 9 BMP15 multi-species sequences alignment and position of sheep mutations
According to the information of the NCBI gene bank, so far 59, 7, 17 and 6 single nucleotide polymorphisms (SNPs) are identified in the BMP15 gene of humans, cows, mice and sheep, respectively, and some of them are mentioned in Table 1 and Figure 9. Recent innovations in DNA sequencing technology have surprisingly enhanced our ability to determine the sequence of a large quantity of DNA. Also, extensive analysis of complementary DNAs-cDNA, the nucleotide sequence of messenger RNA (mRNA), has specified a large quantity of non-coding RNAs in eukaryotic cells and tissues that are involved in the regulation of gene expression. The comprehensive studies of the DNA, which were conducted with the aim of identifying functional parts of the human genome, reported cases of genes overlapping and genes with common exons and different transcription start sites (TSS). This fact and the other results challenged the traditional definition of the gene, a sequence of the DNA that encodes a chain of amino acids (Gojobori et al. 2009). It was previously shown that BMP15 evolved more quickly than the other members of the TGF-β family, with evidence for positive selection in BMP15, especially in Hominidae (Auclair et al. 2013). The BMP15 protein has been described in some species mainly human and sheep to play critical roles in female fertility or disorders. A large number of mutations in the BMP15 gene have been identified in women with premature ovarian failure (Di Pasquale et al. 2004; Di Pasquale et al. 2006; Dixit et al. 2006) and ovarian hyper stimulation syndrome (Hanevik et al. 2011; Moron et al. 2006), but none are in common between women and sheep. According to the research by McNatty et al. (2005) in mammals with low ovulation rates, the follicular growth and ovulation rate are affected by the BMP15 released from the oocyte to the somatic cells of the follicle. However, in rodents with high ovulation rates, the follicular cells are relatively insensitive to BMP15 changes (Yan et al. 2001). This indicates that the mechanisms through which an oocyte controls this process are different in species with a low and high ovulation rates. Although, the role of BMP15 has not been fully investigated in pigs, it has been shown that there is an active BMP system in their ovaries (Brankin et al. 2003; Brankin et al. 2004). In a study on Chinese Husheep which have high prolificacy, no polymorphism was reported in the locus of BMP15 (Guan et al. 2006). Also, studies on polymorphism of the locus of BMP15 in six breeds of Chinese goats have not shown an effective mutation which is associated with prolificacy in the exon parts of this position (He et al. 2006). The results of investigating the mutations in 5 regions of exon 2 of the BMP15 gene have shown a correlation with fertility in several breeds of sheep (Montgomery et al. 2001). Mutations in the BMP15 fertility gene play an important economic role in sheep and probably in the reproduction of ruminants (Galloway et al. 2000; Hanrahan et al. 2004; McNatty et al. 2005). The function of a main single gene is responsible for the high ovulation rate in sheep Booroola Merino, Inverdale, Belclare, and Cambridge, but there is no reason for the existence of a main gene responsible for prolificacy in other fertile sheep, such as Finish Landrac and Romanov (Gordon et al. 2004; Gordon et al. 2005). These findings suggest that, at least two mechanisms of genetic control play a role in the high fertility of sheep. The biological effects of mutations varied among mammal species (Yan et al. 2001). In addition, Hashimoto et al. (2005) have suggested that species-specific differences in the processing of BMP15 may be correlated with the differences existing in fertility rates among species. In addition to existing variation in DNA and amino acids sequences between BMP15 in different species this study highlighted specific segments significantly more conserved in BMP15 from mono-ovulating as compared to super-ovulating species. Thus, further analyses are required in addition to DNA sequencing, for understanding and perceiving the complexities of the genome.
There are several mechanisms of genetic control for regulation of fertility in mammalian species in which BMP15 play an important role. The biological consensuses of mutation in BMP15 varied among mammal species and are species-specific. Various mRNA processing of BMP15 in different mammalian species may cause a wide range of performance despite same DNA sequences. In addition to existing variation in DNA and amino acids sequences be-tween BMP15 in different species this study highlighted specific segments significantly more conserved in BMP15 from mono-ovulating as compared to super-ovulating species. Thus, further analyses are required in addition to DNA sequencing, for understanding and perceiving the complexities of the genome.
We thanks form all of Ph D student for sharing information about phylogenetic analysis with DNA sequences and computer center of University of Zabol for using bioinformatics packages.