999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Chromosome-level genome assembly of the Chinese longsnout catfish Leiocassis longirostris

2021-08-16 08:09:44Wen-PingHe,JianZhou,ZheLi
Zoological Research 2021年4期

The Chinese longsnout catfish (Leiocassis longirostrisGünther) is one of the most economically important freshwater fish in China.As wild populations have declined sharply in recent years,it is also a valuable model for research on sexual dimorphism,comparative biology,and conservation.However,the current lack of high-quality chromosome-level genome information for the species hinders the advancement of comparative genomic analysis and evolutionary studies.Therefore,we constructed the first high-quality chromosomelevel reference genome forL.longirostris.The total genome was 703.19 Mb,with 389 contigs and contig N50 length of 4.29 Mb.Using high-throughput chromosome conformation capture (Hi-C) data,the genome sequences (685.53 Mb) were scaffolded into 26 chromosomes ranging from 17.36 to 43.97 Mb,resulting in a chromosomal anchoring rate for the genome of 97.44%.In total,23 708 protein-coding genes were identified in the genome.Phylogenetic analysis indicated thatL.longirostrisand its closest related speciesP.fulvidracodiverged approximately 26.6 million years ago.This highquality reference genome ofL.longirostrisshould pave the way for future genomic comparisons and evolutionary research.

Leiocassis longirostris(also named Jiangtuan) belongs to the family Bagridae,which contains more than 220 species(Ferraris,2007),and the order Siluriformes.It is a semimigratory and commercially important freshwater species endemic to China,especially the Huaihe,Liaohe,Minjiang,Yangtze,and Pearl rivers,and the western regions of the Korean Peninsula (Shen et al.,2014;Wang et al.,2006;Zhu et al.,2005).In recent years,wild populations ofL.longirostrishave experienced a rapid decline due to over-fishing,water pollution,hydropower construction,and other human activities(Liang et al.,2016;Luo et al.,2000;Wang et al.,2006;Xiao & Yang,2009).Thus,to facilitate conservation and evolutionary research,we constructed the first high-quality chromosomelevel reference genome forL.longirostrisusing BGISEQ-500,Nanopore,and high-throughput chromosome conformation capture (Hi-C) technologies.

One healthy adult femaleL.longirostris(Figure 1A)collected from a farm at the Sichuan Academy of Agricultural Sciences in Meishan,Sichuan Province,China,was used for genome sequencing.Muscle tissue was collected for DNA extraction after treatment with the anesthetic tricaine MS-222.Genomic DNA for BGISEQ-500 and Nanopore sequencing was isolated using standard chloroform-isoamyl alcohol extraction procedures (Sambrook et al.,1989).DNA quality and quantity were measured using a NanoDrop? One UV-Vis spectrophotometer (Thermo Fisher Scientific,USA) and Qubit?3.0 fluorometer (Invitrogen,USA),respectively.

A DNA library (200–400 bp insert size) was constructed following the manufacturer’s instructions as described in previous study (Huang et al.,2017).The library was then sequenced following the BGISEQ-500 protocols (Huang et al.,2017).The short-read data obtained from the BGISEQ-500 platform were filtered using SOAPnuke v1.5.2 (Chen et al.,2018).The adapter sequences were removed from the reads,and paired reads with more than 10% ambiguous or lowquality (Phred score<5) bases were discarded,with BLAST v2.2.31 applied for the evaluation of sample contamination(Altschul et al.,1990).As a result,we obtained a total of 64.11 Gb short reads (Supplementary Table S1).Using Jellyfish v2.2.6 (Mar?ais & Kingsford,2011),theK-mer frequency distribution was calculated. The Jellyfish results were subsequently delivered to GenomeScope (Vurture et al.,2017).Using aK-mer size of 17,theK-mer frequency distribution forL.longirostriswas obtained (Supplementary Figure S1).As a result,the genome size ofL.longirostriswas estimated to be 688.99 Mb,with heterozygosity,repeat content,and GC content of 0.35%,42.53%,and 38.43%,respectively.

Figure 1 Genome analysis of L.longirostris

For Nanopore sequencing,we prepared a library using a Ligation Sequencing Kit (Oxford Nanopore Technologies,UK,SQK-LSK109) according to the manufacturer’s instructions.The library was sequenced using the Nanopore GridION X5 sequencer (Oxford Nanopore Technologies,UK) with flow cell R9.4 on five flow cells.Base calling was performed using Guppy v2.0.8 with default parameters,and reads were filtered for mean_qscore_template ≥7.NanoPlot v1.0.0 (De Coster et al.,2018) was then used to filter the Nanopore reads.For the construction of the Hi-C library,1 g of muscle tissue was used to prepare a library according to previously established protocols (Rao et al.,2014).The library was then sequenced on a BGISEQ-500 sequencer (BGI Genomics,China) using 100 bp paired end sequencing.

For transcriptome sequencing,the liver tissues of 15L.longirostrisindividuals collected from the same farm were used for RNA extraction with TRIzol reagent (Invitrogen,USA),followed by treatment with DNase I (Invitrogen,USA) to remove genomic DNA.RNA concentration and integrity were measured using a Qubit?RNA Assay Kit and Qubit?2.0 fluorometer (Life Technologies,USA) and an RNA Nano 6000 Assay Kit with the Agilent Bioanalyzer 2100 system (Agilent Technologies,USA),respectively.Three RNA sequencing libraries (five fish per library) with an insert size of 250–300 bp were prepared using a NEBNext?Ultra? RNA Library Prep Kit for Illumina?(NEB,USA) following the manufacturer’s protocols,and then sequenced on the Illumina Hiseq X Ten platform (Illumina Inc.,USA) as 150 bp paired-end reads.The raw RNA-seq reads were cleaned and assembled as described previously (Ye et al.,2018).

Using the Nanopore sequencing platform,we obtained 43.23 Gb long reads,with an expected average sequencing coverage of 61.48 X for genome assembly (Supplementary Table S1).We then performedde novogenome assembly using Canu v1.8 (Koren et al.,2017) following the correction,trimming,and contig construction steps. After contig assembly,three rounds of contig sequence polishing were performed with cleaned genomic short reads using Pilon v1.23(Walker et al.,2014).Purge Haplotigs v1.0.3 (Roach et al.,2018) was used to produce an improved and deduplicated assembly.Finally,we obtained the assembled genome ofL.longirostris,which was 703.19 Mb in length,with 389 contigs and an N50 contig size of 4.29 Mb.This is a medium-sized genome among other sequenced catfish genomes (Table 1;Supplementary Table S2).We performed genome assembly quality control using the distribution of GC_depth.The GC_depth scatter plots demonstrated a Poisson distribution,indicating that this genome had no significant contamination.The overall GC-content of 39.67% in theL.longirostrisgenome was slightly higher than that of the walking catfish(Clarias batrachus) (Li et al.,2018) and common carp(Cyprinus carpio) but much lower than that of most teleost genomes (Xu et al.,2014).The completeness of the assembledL.longirostrisgenome was estimated using BUSCO v3.0.2 (Sim?o et al.,2015) with the actinopterygii_odb9 database.As a result,4 293 (93.6 %) of the 4 584 BUSCO genes were completely identified in the genome,including 4 109 (89.6%) single-copy and 184 (4.0%)duplicated genes. These results suggest high genome assembly completeness.

For chromosome-level assembly of theL.longirostrisgenome,Hi-C reads were first filtered using HIC-Pro v2.8.0(Servant et al.,2015).Juicer v1.5 (Durand et al.,2016a) was then used to analyze the Hi-C datasets,and 3D-DNA v170123 was used to anchor the genome assembly to the chromosomes (Dudchenko et al.,2017) with parameters “-m haploid -s 0 -c 26”.The contact matrix of theL.longirostriscontigs was mapped using Juicebox v1.11.08 (Durand et al.,2016b) (Figure 1B).A total of 126.35 Gb clean Hi-C reads were obtained,and 685.53 Mb (97.44% of total genome)genome sequences were successfully scaffolded into 26 pseudochromosomes.The number of chromosome scaffolds is consistent with previous research on karyotypes ofL.longirostris(2n=52;Hong & Zhou,1984).The lengths of chromosomes ranged from 17.36 Mb to 43.97 Mb(Supplementary Table S3). The scaffold N50 of the chromosome-level assembly was 28.03 Mb (Table 1).

For the annotation of repetitive sequences,we used RepeatModeler v1.0.10 (Bao & Eddy,2002),which employs two complementary computational methods,i.e.,RECON v1.08 and RepeatScout v1.0.5 (RepeatScout,RRID:SCR 014653) (Price et al.,2005),to identify repeat element boundaries and family relationships from sequence data.Subsequently,the outputs from the RepeatModeler and RepBase v21.01 library were combined and used for further characterization of transposable elements (TEs),many of which are not repetitive,and other repeats by homology-based methods,including identification with RepeatMasker v4.0.7,rmblast-2.2.28 (RRID:SCR 012954).Using RepBase-based homology andde novomethods,239.11 Mb (33.99% of total genome) repetitive elements were identified,with DNA transposons (146.40 Mb,20.81%) being the most abundant type in the genome (Supplementary Table S4-1).The proportion of repetitive elements inL.longirostrisis similar to that in theGlyptosternon maculatumgenome (33.96%) (Liu et al.,2018) and higher than that of most teleost genomes(Supplementary Table S4-2).

Combined homology-,de novo-,and transcriptome-based methods were used for gene prediction in the genome.The protein sequences of nine fish species,includingDanio rerio,Gasterosteus aculeatus,Ictalurus Punctatus,Larimichthys crocea,Oreochromis niloticus,Oryziaslatipes,Pangasianodon hypophthalmus,Tachysurus fulvidraco,andTakifugu rubripes,were downloaded from the Ensembl database and mapped onto the assembledL.longirostrisgenome using BLASTN.Subsequently,GeneWise v2.2.0(Birney et al.,2004) with default options was used for homologous annotation.Forde novoprediction,Augustus v3.1.0 (Stanke & Waack,2003) was used to predict gene models.In addition,RNA-seq data were aligned to the assembledL.longirostrisgenome to predict gene coding regions.The gene models were then predicted by combining the above homology-,de novo-,and transcriptome-based information using PASA v2.3.3 (Haas et al.,2003).Various databases,including SwissProt (Boeckmann et al.,2003),Kyoto Encyclopedia of Genes and Genomes (KEGG)(Kanehisa & Goto,2000),TrEMBL (Boeckmann et al.,2003),InterPro (Zdobnov & Apweiler,2001),and Gene Ontology(GO) (Ashburner et al.,2000),were used to functionally annotate the predicted protein-coding genes,and GLEAN(Elsik et al.,2007) was used to create a consensus gene set.Finally,a total of 23 708 protein-coding genes were identified in theL.longirostrisgenome (Supplementary Table S5),of which 21 692,20 072,23 114,21 169,and 16 638 proteincoding genes were annotated in the SwissProt,KEGG,TrEMBL,InterPro,and GO databases,respectively(Supplementary Table S6 and Figure S2).BUSCO was also used to test the completeness of the genome annotation with the actinopterygii_odb9 database,which showed that 92.4%complete and 4.0% fragmented conserved single-copy orthologs were predicted forL.longirostris.

Table 1 Summary of sequenced catfish genomes

For non-coding RNAs,microRNA (miRNA) and small nuclear RNA (snRNA) were predicted using INFERNAL v1.1(Nawrocki & Eddy,2013) and the Rfam database (Kalvari et al.,2018).Transfer RNA (tRNA) and ribosomal RNA (rRNA)were identified using tRNAscan-SE v1.3.1 (Lowe & Eddy,1997) and RNAmmer v1.2 (Lagesen et al.,2007),respectively.After analysis,422 miRNAs,2 118 tRNAs,1 838 rRNAs,and 1 925 snRNAs were annotated in theL.longirostrisgenome (Supplementary Table S7).

To identify gene families,protein sequences from the longest transcripts of each gene fromL.longirostrisand 10 other fish species,includingD.rerio,Astyanax mexicanus,G.aculeatus,G.maculatum,I.punctatus,Lepisosteus oculatus,Oreochromis niloticus,Oryzias latipes,Pelteobagrus fulvidraco,andT.rubripes,were aligned using BLASTP with an e-value threshold of 1e-5.OrthoMCL v1.4 (Li et al.,2003)was then used to construct gene families.A total of 19 438 gene families and 3 585 single-copy ortholog families were identified among the 11 species,with 68 gene families specific toL.longirostris(Supplementary Table S8).In addition,11 729 (89.1%) gene families were shared by the four catfish species,with 301 gene families specific toL.longirostris(Supplementary Figure S3).

To investigate the phylogenetic relationships ofL.longirostriswith the above 10 fish species,the shared singlecopy genes were aligned by MUSCLE v3.8.31 (Edgar,2004).RAxML v8.2.1163 (Stamatakis,2014) was then employed to construct a phylogenetic tree with the -m PROTGAMMAAUTO model and 100 bootstrap replicates.MCMCTREE v3.8.31(Yang,2007) was used to estimate divergence time based on the “correlated molecular clock” and “HKY85” models.Phylogenetic analysis indicated thatL.longirostrisandP.fulvidraco,which are both from the family Bagridae,were clustered onto one branch,andL.longirostriswas close to theP.fulvidraco,G.maculatum,andI.punctatusclades,which belong to the Siluriformes order.These results are similar to previous phylogenetic analyses based on the mitochondrial genome ofL.longirostris(Liu et al.,2019).Our results also showed thatL.longirostrisdiverged~26.2 million years ago from its closest related speciesP.fulvidraco(Figure 1C).Furthermore,phylogenetic analysis estimated thatI.punctatusdiverged fromP.fulvidracoaround 82.2 million years ago,consistent with the 81.9 million years reported in previous study (Gong et al.,2018). Collinearity analysis of chromosomes betweenL.longirostrisandI.punctatuswas performed using LASTZ v1.02.00 (Harris,2007) with parameters “T=2 C=2 H=2 000 Y=3 400 L=6 000 K=2 200”.As a result,all 26 pseudochromosomes ofL. longirostrisdisplayed high homology with the corresponding chromosomes ofI.punctatus(Figure 1D),suggesting highqualityL.longirostrisgenome assembly.

In the present study,the first chromosome sequences forL.longirostriswere constructed using a combination of BGISEQ-500,Nanopore,and Hi-C technologies.The reference genome exhibited high quality in terms of continuity and completeness.This study should improve our understanding of theL.longirostrisgenome and provide valuable chromosomal information for genomic comparisons and evolutionary research among important aquaculture species.

DATA AVAILABILITY

The raw genome and RNA sequencing data were deposited in the National Center for Biotechnology Information (NCBI)database under accession No.PRJNA692071.

SUPPLEMENTARY DATA

Supplementary data to this article can be found online.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS’ CONTRIBUTIONS

W.P.H.,H.L.,J.Z.,and H.Y.designed the experiments;W.P.H.,H.L.,J.Z.,Z.L.,T.S.J.,C.H.L.,Y.J.Y.,M.B.X.,and C.W.Z. performed the experiments and analyzed data;W.P.H.,G.J.L.,H.Y.X.,and H.Y.wrote the paper.All authors read and approved the final version of the manuscript.

主站蜘蛛池模板: 一本大道东京热无码av | 欧美中文一区| 国产乱子伦视频在线播放| 亚洲第一福利视频导航| vvvv98国产成人综合青青| 久久熟女AV| 一级高清毛片免费a级高清毛片| 久久免费视频播放| 欧美精品高清| 91精品国产情侣高潮露脸| 香蕉久久国产精品免| 99热国产这里只有精品无卡顿"| 亚洲第一国产综合| 国产一级精品毛片基地| 国产精品不卡永久免费| 国产成人精品高清不卡在线 | 女人18毛片久久| 日本午夜三级| 欧美翘臀一区二区三区| 91福利免费| 国产一级裸网站| 国产18在线| 91精品国产91久无码网站| 亚亚洲乱码一二三四区| 亚洲国产高清精品线久久| 性喷潮久久久久久久久| 国产av一码二码三码无码| 伊人久久福利中文字幕 | 3D动漫精品啪啪一区二区下载| 亚洲成人黄色网址| 成人国产三级在线播放| 国产自在自线午夜精品视频| 久久久久免费看成人影片| 毛片基地视频| 午夜老司机永久免费看片| 色噜噜狠狠狠综合曰曰曰| 国内丰满少妇猛烈精品播 | 久久香蕉国产线看观看亚洲片| 国产毛片一区| 久久久国产精品无码专区| 为你提供最新久久精品久久综合| 性色生活片在线观看| 4虎影视国产在线观看精品| 亚洲国产精品日韩av专区| а∨天堂一区中文字幕| 在线观看av永久| 国产传媒一区二区三区四区五区| 久久不卡精品| 福利视频99| 这里只有精品在线播放| 亚洲人成电影在线播放| 日韩午夜福利在线观看| 亚洲欧美精品日韩欧美| 天堂网国产| 黄色在线网| 免费播放毛片| 国产精品第5页| 91久久精品国产| 天天躁狠狠躁| 国产精品丝袜视频| www亚洲天堂| 人妻少妇乱子伦精品无码专区毛片| 无码人妻热线精品视频| 免费激情网址| 日韩在线第三页| 99视频在线精品免费观看6| 亚洲成a人片77777在线播放 | 18禁色诱爆乳网站| 国产成人综合在线观看| 原味小视频在线www国产| 国产在线精品美女观看| 四虎在线观看视频高清无码| 久久91精品牛牛| 中国毛片网| 亚洲AV无码不卡无码| 91精品亚洲| 亚洲欧美日本国产专区一区| 午夜福利在线观看成人| 91成人在线观看| 亚洲天堂2014| 99资源在线| 中文字幕久久亚洲一区|