WANG Dan , YAO Hong, WANG Wei-Min, ZOU Gui-Wei and WANG Han-Ping
(1. Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Wuhan 430223, China; 2. Aquaculture Ge
netics and Breeding Laboratory, Ohio State University South Centers, 1864 Shyville Road, Piketon, Ohio 45661, USA;3. College of Fishery, Huazhong Agricultural University, Wuhan 430070, China)
Abstract: The yellow perch, Perca flavescens, is an economically significant freshwater fish species in the Midwest of the United States. TypeⅠ markers are useful for comparative mapping and other genetic analyses, but limited quantities in yellow perch. In the present study, EST containing microsatellite sequences were identified and characterized for yellow perch by data mining from updated public EST databases. Out of 21968 EST sequences of yellow perch, about 14.4% of ESTs contain repeat motifs of various types and length. CA repeat was the most abundant dinucleotides. Out of the 62 EST-SSRs for which PCR primers were designed, 15 loci showed polymorphic in a yellow perch population with the alleles per locus ranging from 4 to 19 (average 9). The observed (HO) and expected (HE) heterozygosity of these EST-SSRs were 0.103—0.929 and 0.116—0.934, respectively. Four EST-SSR loci significantly deviated from the Hardy-Weinberg equilibrium (HWE) expectation, and the remaining 11 loci were in HWE. These new EST-SSR markers should provide sufficient polymorphism for population genetic studies and genomic mapping of yellow perch.
Key words: Yellow perch; Expressed sequence tag (EST); Microsatellites; EST-SSRs; Gene ontology(GO)
The yellow perch,Perca flavescens, is an economically significant freshwater species in the Midwest of the United States[1]. However, dramatic reductions in population sizes of this species have been underway since 1950’s in many areas of the United States, especially in the Great Lakes and Mississippi River[2,3]. To supply continued high market demand,yellow perch has been intensively cultured during the last decade. Recently, an O’GIFT (Ohio Genetic Improvement of Farmed-fish Traits) breeding program has been launched by OSU to improve growth rate and production efficiency of yellow perch. One of essential steps for marker-assisted selection (MAS) of farmed yellow perch is construction of linkage maps.
Development of polymorphic marker is necessary for the construction of a high-density linkage maps of yellow perch. Simple sequence repeats(SSRs), also called microsatellites or short tandem repeats (STRs), probably constitute the best choice for medium-high density maps, because of their reproducibility, multiallelic nature, codominant inheritance,relative abundance and good genome coverage[4].Traditionally, the method for development of SSR markers involves the creation of small insert genomic DNA libraries, followed by a subsequent DNA hybridization selection by probing them either with radioactively labeled probes or trapping them with biotinylated SSR motifs, and clone sequencing[4]. These processes are necessary for many organisms but normally time-consuming and labor-intensive.
One source of sequences for microsatellite development is expressed sequence tags (ESTs). ESTs are particularly attractive for marker development since they represent coding regions of the genome and putative function can often be deduced by homology searches. While ESTs provide means for the identification of genes, microsatellites provide high level of polymorphism[5]. Microsatellites identified in ESTs are typically referred to as EST-SSRs or genic SSRs,contrasting to TypeⅡ SSRs which come from random sequences of the genome. Microsatellite-mining from SSR-containing ESTs is inexpensive and timesaving, and has proved to be an effective approach to develop microsatellies for genetic map and population genetics studies in animals[6—9]and plants[10,11].In aquaculture animals, Serapion,et al.[12]reported a pioneer study on the development of EST-SSRs by bioinformatic mining from channel catfish (Ictalurus punctatus) databases. In yellow perch, some ESTSSRs generated by data mining from partial EST databases have been previously reported[13]. Since then publicly accessible EST sequences have increasingly accumulated in yellow perch.
In this study, we report a new batch of ESTSSRs in the yellow perch by data mining from updated public EST resources and by laboratory testing for polymorphism of selected EST-SSRs. These ESTSSR markers would be useful for construct linkage mapping and population genetics studies in yellow perch in the future.
EST sequences of yellow perch were downloaded from GenBank, DDBJ and EMBL databases for a period between December 1, 2008 and March 15, 2010. All matched sequences were displayed in FASTA format and saved as a text file. EST sequences were analyzed by cluster analysis using the Contig Express module in VectorNTI package (available at http://download.invitrogen.com) and linear assembly algorithm was applied. The criteria for clustering were set at a minimum overlap of 30 bases (default is 20 bases). Each cluster was visually inspected to ensure the fidelity of alignment to avoid pseudoclusters caused by repetitive elements or long strings of microsatellite repeats. ESTs belonging to contigs and singletons were recorded.
All the ESTs were screened for potential microsatellites by using the software Tandem Repeat Finder[14]with following parameters: match: 2, mismatch 7, indel: 7, PM: mini-score, 30, and max period size 500. Strings of oligo sequences were used to search microsatellites: 6 repeats for di-nucleotides, 5 repeats for tri-nucleotides, 4 repeats for tetra-nucleotides, 3 repeats for penta-nucleotides and hexa-nucleotide.
The BLAST2GO program[15]used BLASTX to find homologous sequences for input unique ESTSSR sequences and extracts gene ontology (GO)terms to each hit using existing annotations. Assembled unigenes that had significant (e-5) BLASTX matches to the nr database were mapped to candidate GO terms. An e-value-hit-filter of e-6, an annotation cutoff of 55, a GO Weigh of 5, and a Hsp-Hit Coverage Cutoff of 0 were used as Blast2GO annotation parameters. These GO terms were assigned to the query sequence to give an assessment of the biological process, the molecular function and the cellular compartments represented.
A total of 30 adult yellow perch were collected lively from a wild population in Lake Wallenpaupack in Pennsylvania, U.S. Individual fin-clips were stored immediately into 95% ethanol. Total DNA were extracted from an optimum volume of fin tissue following a modified version of the pure gene protocol for extraction from fish tissue[16]. Each locus was amplified with a three primer system in which only the M13 and CAG primers were fluorescently labeled with FAM, HEX, or NED. Polymerase chain reaction of 6 μL contains 3 μL of JumpStart RedMix (Sigma),1.5 pmol of both nontailed and labelled primers, and 0.1 pmol of the tailed primer, 25 ng DNA, in the presence of 100 μm spermidine. Amplification was be conducted in PTC-100 thermal cyclers (MJ Research)using an initial denaturation at 94℃for 2min, followed by 35 cycles of 30s denaturation at 94℃, 30s annealing at a locus-specific temperature, 30s extension at 72℃, and a final 5-min extension at 72℃.Amplification products were separated using an ABI 3130 Prism DNA genetic analyzer.
The number of alleles (Na) and the range of alleles were calculated for each locus using the software PopGene[17]. Counts the number of alleles (Na)with nonzero frequency was calculated by the software PopGene. Expected (He) and observed (Ho) heterozygosity, and the fitness of genotypic frequency to the Hardy-Weinberg equilibrium (HWE), were analyzed by using programme ARLEQUIN[18]. Unbiased estimates for the exact P-values for tests of conformation to HWE were calculated using the Markov chain randomization method[19]. All these tests were adjusted for multiple simultaneous comparisons using a sequential Bonferroni correction.
A total of 21968 ESTs of yellow perch with an average length of 768 bp were downloaded from public databases and subject to bioinformatic analyses. In total, 3174 (about 14.4%) of these ESTs have SSRs inside. After clustering and assembly, 1867 unique ESTs were identified, including 1310 singletons and 557 contigs (Tab. 1).
Majority of these EST-SSRs were composed of dinucleotide and trinucleotide repeats. Specifically,the abundance of di-, tri-, tetra-, penta-nucleotide and hexa-nucleotide motifs among the ESTs was 31.5%,28.3%, 13.3%, 12.1% and 14.9%, respectively (Fig. 1).In dinucleotides, CA/GT is the most abundant (338),with the AT/TA the second (135), with CT/GA the third (113) and the GC the least (2). The proportion of the trinucleotide repeats was evenly distributed, with the two most frequent types GAG and AAG accounting for 7.6% and 5.9% of the total motifs, respectively. In addition, GA-rich types were predominant,contrasting to low occurrence of GC-rich types among the trinucleotide EST-SSRs.

Tab. 1 A summary of Yellow perch ESTs containing microsatellites

Fig. 1 Distribution of types of motifs from SSR-containing ESTs in Yellow perch
The BLASTx results revealed that about 977 of these ESTs showed similarity to genes or proteins of known function, whereas the remaining 890 had no significant matches, probably representing novel genes.
Gene ontology (GO) categories were assigned to 977 unique sequences. Fig. 2 shows the distributions of gene ontology terms (second level GO terms).These annotated gene sequences were involved in 368 biological process terms, 395 molecular function terms and 739 cellular component terms.
Among the ESTs with significant homology or highly significant homology, theP. flavescensESTs had the highest number of BLASTX hits to theT. nigroviridisdatabase (36.4%), followed by theD. reriodatabase (20.3%) (Fig. 3).
Fifteen of the 62 EST-SSRs were found to be polymorphic in a test population. The observed heterozygosity of these polymorphic loci ranged from 0.103—0.929, and expected heterozygosity ranged from 0.116—0.934. The number of alleles of the polymorphic EST-SSRs in yellow perch ranged from 4 to 19 (mean 9) (Tab. 2). When the frequencies and distributions of the alleles and genotypes were compared under the Hardy-Weinberg equilibrium expectation, 4 of the 15 loci showed significant departure from HWE after Bonferroni correction (P<0.003),and the remaining 11 EST-SSRs were in HWE (Tab. 2).These new EST-SSR markers should provide sufficient polymorphism for population genetic studies and genome mapping of yellow perch.
Since Serapion,et al.[12]reported the development of TypeⅠ marker (EST-SSRs) in an aquaculture species for the first time by using bioinformatic analysis, mining EST-SSRs from EST databases have been reported in many aquaculture animals such as fish[6,20], five shrimp species[7,8]and scallop[21]. Online resources of EST sequences have been increasingly accumulated in fish. In this study, we analyzed updated ESTs of yellow perch from different public databases, and found that the percentage of SSR-containing ESTs in all screened ESTs is 14.4% in yellow perch, which is higher than that reported in all of aquaculture animals so far e.g. black tiger shrimp(Penaeus monodon)(13.7%)[8], channel catfish(11.2%)[12],Misgurnus anguillicaudatus(10.7%)[22],Osteoglossum bicirrhosum(8.29%)[23],Miichthys Miiuy(6.74%)[24], common carp (Cyprinus carpio) (5.55%)[9],bay scallop (Argopecten irradians) (3.9%)[21], and red sea bream (4%) (Chrysophrys major)[20]. The abundance of EST-derived microsatellites seems to be highly specie-specific in aquacultured animals.

Fig. 2 Distribution of second level GO annotation terms in Yellow perch: (a) biological process, (b) molecular function and (c) cellular component
Dinucleotides are predominant in the types of microsatellite repeats in most aquaculture species characterized so far, although trinucleotide repeats are most abundant in plants[25]. In the present study, although more than one third of the EST-derived SSRs are dinucleotides (37.18%), trinucleotide repeats also take a considerable proportion (30.81%) of the total loci.
Of the dinucleotides, CA repeat is the most abundant in yellow perch (Fig. 1), which is consistent with previous findings for either TypeⅠ and TypeⅡmicrosatellites in fish[12,26], various plant species[27],and vertebrates as a whole[28]. Interestingly, there is no significant advantage motifs in the tri-nucleotide repeat. This may be one of the characteristics of ESTSSRs in yellow perch.
EST containing SSRs sequence has sparked great interest as a means of studying genetic variation,linkage mapping, gene tagging and evolution[29]. Expansion of trinucleotide repeats within genes is well known to cause pathological conditions in humans[30].Rohrer,et al.[31]mapped EST-SSR markers add to the porcine genetic map and provide valuable links between the porcine and human genome. Vasemagi,et al.[32]consider EST-SSRs as the best candidate loci affected by divergent selection, and hence, they serve as promising genes associated with adaptive divergence in Atlantic salmon (Salmo salarL.).
Among the 1867 unique microsatellite-containing ESTs or genes, 62 were randomly chosen for pilot tests for primer design, locus amplification and polymorphism. The information of each locus has been listed in Tab. 2.
Although SSRs at different positions in a gene help determine the regulation of expression and the function of the protein produced, little attention has been paid to the identified and characterize of these sequences in yellow perch.
This study of transcriptome analysis of EST containing-SSRs sequences and provides preliminary data that will increase understanding yellow perch biological function. Of the 1867 singletons and contigs, 977 showed homology in BLASTX to published sequences. A large number of genes were discovered from these sequences such as the interferon, MHC II components, defense proteins, cytokines, glutathione-S-transferase, heat-shock proteins and tumor-necrosis factor (TNF) receptor etc.

Fig. 3 Percentage of top BLASTX hits based on organismBLASTX searches were conducted and the top hits were categorized based on organism. For instance, 36.4% of the unique sequences with BLASTX hits had top hits from Tetraodon nigroviridis, suggesting that the largest proportion of the Yellow perch gene sequences were most similar to those from T. nigroviridis, followed by those from D. rerio, etc. in the GenBank non-redundant protein database
Further characterization of these genes should provide information to enhance the understanding of the biological function of yellow perch and provide molecular tools for further study of genetic breeding.

Tab. 2 Characterization of 15 microsatellite loci in Yellow perch