Camels are important livestock animals for transport and also as supply sources of milk and meat. Nowadays, interests in camels breeding are promoting for human nutrition and production of modern therapeutics (Nguyen et al. 2000; Kastelic et al. 2009; Jirimutu et al. 2012; Wu et al. 2014; Altaher and Kandeel, 2015). Several DNA-based marker systems have been expanded, containing 'restriction fragment length polymorphism' (RFLP) (Saiki et al. 1985), 'random amplified polymorphic DNA' (RAPD) (Mbwana et al. 2006), 'inter-simple sequence repeats' (ISSRs) (Reddy et al. 2002), 'simple sequence repeats' (SSRs) (Peakall et al. 1998), 'amplified fragment length polymorphism' (AFLP) (Vekemans et al. 2002), and their variants, to monitoring genetic variability, genome study, molecular breeding and 'marker-assisted selection' (MAS) in varied species (Joshi et al. 2010; Bakhtiarizadeh et al. 2012; Asadi and Rashidi Monfared, 2014). In comparison with the other type of genetic markers, SSRs (microsatellites) are uniquely determined by co-dominant inheritance, multi-allelic matter, high reliability, frequent in the genome, high polymorphism, and high percentage of cross-species transferability (Yan et al. 2008; Kumar et al. 2015; Sadder et al. 2015; Nirapathpongporn et al. 2016; Du et al. 2017). Thus SSRs markers are applied for assessing the genetic variability, protection of species, genetic mapping, 'marker-assisted selection', and supplying a valuable tool for conducting aconnection among morphologic and genetic changes (Kim et al. 2008; Asadi and Rashidi Monfared, 2014; Wang et al. 2017; Cai et al. 2019). Meaningful advances have been made for developing more valuable approaches for achieving the new SSRs (Zane et al. 2002; Kim et al. 2008; Vieira et al. 2016; Wang et al. 2017; Taheri et al. 2018), but isolation of these markers remained costly, labour -intensive and time-consuming (Yan et al. 2008; Wang et al. 2015). Expressed sequence tags (ESTs) are sequenced from segments of the coding regions of the genome under determined biological conditions (Ellis and Burke, 2007). ESTs can be extended from cDNA libraries to prepare an economical source of gene-based molecular markers (Mirkin, 2006; Kim et al. 2017). The aggregation of many numbers of ESTs in a general database has led to the growth of a novel category of functional genomic markers called microsatellite markers derived from EST (EST-SSRs) that can be very quickly developed at a low cost, via data mining (Yan et al. 2008; Bakhtiarizadeh et al. 2011). EST-SSRs markers have many advantages relative to other DNA-based genetic markers as well as identification of changes untranslated sequences (5′-UTR and 3′-UTR), introns and coding sequences, and also having a more suitable level of cross-species transferability as well a lot of conserved than SSRs markers (Li et al. 2004a; Guzinski et al. 2016). Due to this fact that EST-SSR genomic markers, connection with coding sequences might also lead to tagging genes directly for quantitative trait locus (QTL) mapping of vital traits (Asadi and Rashidi Monfared, 2014; Zhou et al. 2016). Presently, with the development of genomic data, especially ESTs, the employment of bioinformatics tools lead to an increase of the discernment of EST-SSR markers in many species such as shrimp (Pérez et al. 2005), zebrafish (Ju et al. 2005), cattle (Yan et al. 2008), sheep (Zhang et al. 2010), human cancer (Bakhtiarizadeh et al. 2011), chicken (Bakhtiarizadeh et al. 2012), fish (Zheng et al. 2014), quail (Bai et al. 2016), passerine bird (Khimoun et al. 2017), pond loach (Feng et al. 2018), Ephedra sinica (Jiao et al. 2019). The purpose of the current study was to specify cluster EST-SSRs markers in camel and the term enrichment analysis of them, to measure and compare the frequency and distribution of different kinds of EST-SSRs, and to extend EST-SSR markers as genetic and genomic tools in camel.
MATERIALS AND METHODS
Collecting EST sequences
Whole dromedary camel ESTs (17155) considered in the current investigation were obtained from the website (http://camel.kacst.edu.sa/) (Al-Swailem et al. 2010) and were saved as FASTA format. These authors have been used Nine inbred camels from three distinct breeds (black, white and brown coat color) and three age categories (young (0-6 months), adult (2-3 years), and aged (4-6 years)). They carried out RNA isolation of the nine camels. Samples were collected and pooled of eleven tissues body of the camel (liver, heart, stomach, pancreas, muscle, brain, kidney, lung, spleen, colon and genitals).
To find EST-SSRs has used the SSR Locator software (Maia et al. 2008). In this study, EST-SSRs have studied which their motifs consist of 2 to 6 nucleotides. Then, the minimum repeat pattern was selected as seven for dinucleotides, six for trinucleotide and five for other motifs including tetra-, penta-, and hexamers.All subsequent analyses were executed under R environment and Microsoft Excel. Graphs are also drawn by these softwares.
Primer design and functional annotation
Primers were designed by primer 3 in batch mode with the cooperation of the SSR Locator interface module for each EST-SSRs. For designing the primers, the sequences were considered that contained enough quantity of flanking sequence. The evaluation criteria were used: primer size 18-25 bp, with the optimum of 20 bp, primer annealing temperature 58-63 ˚C (optimum of 60 ˚C), primer GC content equal to 30%, with the optimum of 50% and product length 100-300 bp. BLASTX (with an E value equal and/or less than 10-6) were used to compare the genes containing SSRs with the non-redundant protein database for survey the function of these genes. To identify over-represented gene ontology categories and the functional clustering of EST-SSRs were analyzed and success annotated to familiar proteins with the database for annotation visualization and integrated discovery (DAVID) bioinformatics tool (Huang et al. 2009). The background model with the default DAVID settings was applied to gene annotation of the whole genome.
RESULTS AND DISCUSSION
Screening of ESTs for SSRs`
Distribution of EST and EST-SSR for the camel are presented in Table 1. A complete of 862 SSRs were detected from 17155 EST sequences. From 17155 EST only 827 EST had SSRs, that 794 (96%), 31 (3.8%) and 2(0.2%) of them contained 1, 2 and 3 SSRs, respectively. The dimeric motifs were the most abundant SSRs (38.86%) in a camel, followed by 27.15%, 21.46%, 6.96%, and 5.57% for tri-, hexa-, tetra- and pentameric motifs (Figure1).
Table 1 Summary of mining expressed sequence TAGs (EST) and EST-SSRs distribution in camel
Figure 1 Frequency distribution of different microsatellite markers derived from EST (EST-SSRs) (2-6 motif unit) in camel
The numbers on the columns demonstrate the percentage of each EST-SSR
The repetitiveness of the various SSR derived from EST is shown in Figure 2 for every repeat number. The number of repeats ranged from 4 to 48. Hexamers of four repeats were the most prevalent (18.91%) and after that were trimers of six repeats (13.23%) and dimers of seven repeats (12.99%). The length of SSRs changed between 14 and 108 bp according to the length of the repeat motif (repeat number×motif length).
Distributions of camel SSRs with various repeat motifs
The observed frequencies of different repeat motifs containing the SSRs are presented in Figures 3-6. The recognized SSRs containing 4 kinds of dimer motifs, 24 kinds of the trimer, 38 kinds of the tetramer, 31 kinds of Pentamer and 47 kinds of hexamer motifs. The best frequent dimer motif was AC/TG (54%) and the AT/TA was the second plentiful kind (32.8%). Also, the GC/CG (1.2%) was the least frequent kind (Figure 3). The GCC/GGC (19.2%) was the most frequent trimer motif, followed by AGC/GCT (10.3%), CAG/CTG (9.8%) and CTC/GAG (9.8 %) (Figure 4). Most popular motifs between tetramers were TTTA (13.3%), TTTG (6.7%) and AAAC (6.7%) (Figure 5). The AAAAG (10.4%) and TTGTT (10.4%) were most popular motifs across pentamers (Figure 6). The most plentiful hexamer motif was AACCAC (67.6%), while other Hexamer motifs had almost identical frequencies.
Development of EST-SSR markers
To design pair primer all 827 sequences that included SSRs were used. 732 (88.51%) of them were ready to be accustomed to design primer pairs and 95 (11.49%) EST-SSR failed to have right flanking sequences for primers. Results of the virtual polymerase chain reaction (PCR) run shows that 597 of 732 primers made appropriate fragments.
Gene ontology analysis and annotation of EST-SSRs sequences
To examine the 827 sequences recognized as including SSRs was applied BLASTX. Incomplete annotation of camel genome caused, only 382 of 827 sequences were annotated. The GO enrichment analysis of sequences including EST-SSRs at all three levels of GO classification is shown in Table 2. The most of EST-SSRs were discover to be included in the macromolecule catabolic process and RNA processing and splicing and cellular homeostasis. Most of the EST-SSRs enriched to cellular components were dependent on the organelle, membrane-enclosed and nuclear lumen. The GO assignments for the molecular function displayed that superlative of the camel EST sequences including SSRs were involved in transcription regulator activity and RNA binding. Functional annotation clustering determined 3 annotated classes associated with the detected genes (P ≤ 0.05).
Figure 2 Frequency distribution of the microsatellite markers derived from EST (EST-SSRs) based on the number of repeats of the different SSR motif types
Figure 3 Frequency distribution for the 4 dimer motifs recognized in the camel sequence The numbers on the columns demonstrate the percentage of these dimer motifs across all dimer types
Figure 4 Frequency distribution for all 23 trimer motifs recognized in the camel sequence
The numbers on the columns demonstrate the percentage of these trimer motifs across all trimer types
Figure 5 Frequency distribution for all tetramer motifs recognized in the camel sequence
The numbers on the columns demonstrate the percentage of them
Figure 6 Frequency distribution for all pentamer motifs recognized in the camel sequence
The numbers on the columns demonstrate the percentage of them
Two-dimensional heat maps of clusters used to detection of similarities and dissimilarity of annotations among the gene group members. Cluster 1 had the highest enrichment score (5.23) and included 41 genes (Figure 7). Cluster 2 included 28 genes with an enrichment score of 2.84 (Figure 8). For developing the available camel SSR markers, the database containing 17155 ESTs was systematically searched for microsatellite motifs. The outcomes clearly demonstrate that a useful source for mining SSRs are camel ESTs. It had been shown that the quantity of EST-SSRs was 4.0%. This EST-SSR frequency was similar to cattle (4%) (Yan et al. 2008). Microsatellite-containing ESTs varies between vertebrate and ranged from 2% to 15% (Slate et al. 2007; Zhang et al. 2010; Bakhtiarizadeh et al. 2012; Nirapathpongporn et al. 2016; Zhou et al. 2016; Feng et al. 2018). These difference in the quantity of EST-SSRs perhaps influenced by redundancy, identification criteria of SSR, databases size and mining tools (Yan et al. 2008; Zhou et al. 2016). In present study, the dimeric motifs were the best frequent SSRs (38.86%) in camel that was in agreement with several other animal and chicken species (Yan et al. 2008; Bakhtiarizadeh et al. 2012; Abe and Gemmell, 2014; Sadder et al. 2015) but was dissimilar to some crop species (Varshney et al. 2002; Durand et al. 2010; Joshi et al. 2010; Zhou et al. 2016; Wang et al. 2017) and Misgurnus anguillicaudatus (Jiao et al. 2019) that trimeric motifs were plentiful.
Table 2 The GO enrichment analysis of sequences containing EST-SSRs at three levels of GO category
Figure 7 Two-dimensional gene annotation heat map for cluster 1
This cluster contains 41 genes with an average enrichment score of 5.23
The blue area of the heat map demonstrates common annotations and the red areas demonstrate differences in annotations
The trimeric motifs were the second most frequent repeats (27.15%), followed by 21.46%, 6.96%, and 5.57% for hexa-, tetra- and pentameric motifs. The frequency of hexamer repeats was in agreement with cattle (13%) (Yan et al. 2008) but was the difference from chickens (less than 1%) (Bakhtiarizadeh et al. 2012). The quantity of the various SSRs motifs for every repeat number indicated that smaller repeat motifs are major between the identified SSRs. Amazingly, the occurrence of the repeat unit decrease with enlarging the length of them. This may be distinguished by the very fact that longer repeats motifs have higher mutation rates and therefore are less stable (Toth et al. 2000). The AC/TG was the best frequent kind of Dimeric motifs (54%) in the current investigation. The second frequency was AT/TA (32.8%) and The GC/CG motif has the lowest frequency (1.2%). This pattern of dimeric SSRs was similar to what had been found in alpaca (Reed and Chaves, 2008), cattle (Yan et al. 2008), sheep (Zhang et al. 2010), zebrafish (Ju et al. 2005) but different from that in grass(AG/TC) (Wang et al. 2017), rubber tree (AG/TC), (Nirapathpongporn et al. 2016), mint (AG/TC) (Kumar et al. 2015) chicken (AT/TA) (Bakhtiarizadeh et al. 2012), turmeric (AG/TC) (Joshi et al. 2010), hops (AT/TA) (Singh et al. 2012). This pattern could also be dependent on higher frequencies of confident amino acids in some species and various frequency of dimeric motif in different regions of genomes (Toth et al. 2000). The most plentiful trimer motif was GCC/GGC (19.2%), followed by AGC/GCT (10.3%). These results were in agreement with cattle (Yan et al. 2008) and other investigation in the animal species (Li et al. 2004b) which AGC and GCC were the most frequent, but were in disagreement with catfish (Serapion et al. 2004), zebrafish (Ju et al. 2005) and Misgurnus anguillicaudatus (Jiao et al. 2019) that the AAT/TAA repeat was the best frequent and trimer motifs made up only Gs and / or Cs nucleotides are infrequent. Also, the present result is similar to some of the plant species that GCC/GGC was the best frequent motif (Qin et al. 2015; Zhou et al. 2016; Wang et al. 2017). In the chicken, CAG was the most abundant trimeric repeat motif (Bakhtiarizadeh et al. 2012), although this trimer was one of abundant trimer in this study, but not in the first rank.
Figure 8 Two-dimensional gene annotation heat map for cluster 2
This cluster contains 28 genes with an average enrichment score of 2.84
The blue area of the heat map demonstrates common annotations and the red areas demonstrate differences in annotations
Dissimilar the delivery of the trimer motifs, the AT-rich tetramer, and pentamer motifs were the most abundant kind of camel EST-SSRs (Figures 5-6). Moreover, entirely composition of SSRs in camel coding regions is comparable to that in vertebrates and demonstrate that G/C repeats are less frequent than A/T repeats for these regions. Finally, current results obviously show that the major microsatellite types are taxon-dependent. In the study of Toth et al. (2000) reported that 'strand-slippage theories' alone cannot present SSRs distribution in the whole genome, also, enzymes and various proteins associated with different aspects of DNA-proceeding (such as replication and repair) and 'Chromatin remodeling' could be to blame for the taxon-specificity of SSRs frequency. Annotating the sequences containing SSRs provides favorable conditions to inspect the functional variability of the various proteins (Bakhtiarizadeh et al. 2012). In general, GO is a helpful tool to unify the representation of gene and gene product features across all species (Consortium, 2008). Table 2 indicated the top-level (P≤0.05) GO terms at three levels along with the gene groups associated with the GO term. At the GO biological process related to gene list, 19 of the 42 assignments were significant. 19 of the 28 hits were meaningful in the cellular component and for molecular function 7 of the 15 assignments were significant. The results of the GO enrichment analysis showed that categories associated with gene expressions were significantly enriched that was in accordance with previous studies (Bakhtiarizadeh et al. 2012; Zhou et al. 2016). This suggests that EST-SSRs may play functional roles in the regulation of gene. The functional categories of genes based on GO term showed that one cluster of 3 had more than 40 genes (Figure 7), indicating that these genes were categorized in the same functional group and also in the organelle, membrane-enclosed and nuclear lumen and nucleoplasm of cellular component clusters. Additionally, there are significant genes in the lists that are associated with dryland adaptations, containing fat and water metabolism, responses to aridity and heat stress (Al-Swailem et al. 2010; Jirimutu et al. 2012; Wu et al. 2014). The new evidence of this study shows that the genomic distribution of SSRs is non-random, likely due to their roles in the regulation of gene activity.
Nowadays, the extension of functional molecular markers like EST-SSRs is a very important and key goal for animal breeding. Especially, in marker-assisted selection programs. In this study, to develop the useful camel EST-SSR markers, the database including 17155 ESTs was systematically searched for microsatellite motifs. Our results clearly revealed that camel ESTs are a valuable resource for mining SSR markers.Finally, EST-SSRs recognized in this research are a helpful resource of camel genomic markers that can be proved and applied in various population genetic experiments in dromedary camel. Likewise, it is obvious that the number of EST-SSRs is not high, but this condition will be significantly modified with the utilization of next-generation sequencing data.
Our special thanks to University of Jiroft for providing facilities support for this study.