Leading Edge Genomic Services & Solutions

De Novo Sequencing

Service Overview
Novogene Data
Contact Us

de novo sequencing and de novo assemblyWith de novo sequencing, the first genome map for a species is generated, providing a valuable reference sequence for phylogenetic studies, analysis of species diversity, mapping of specific traits and genetic markers, and other genomics research. Novogene is at the forefront of de novo sequencing as it becomes more rapid and affordable. Novogene’s founder, Dr. Ruiqiang Li, is a leading genomics expert and a primary developer of the SOAPdenovo software package for genome assembly. Dr. Li and the Novogene team have contributed to many important publications on novel genome sequences, and we can provide you with the high level of expertise required for your specific project. With the PacBio Sequel Systems and Oxford PromethION System that provide long range sequencing and higher throughput, Novogene is highly experience in high-quality de novo assembly of plant and animal. Novogene offers de novo sequencing service using various platforms including PacBio Sequel, Oxford PromethION, Illumina Novaseq, and 10X Genomics Chromium platforms. For each project, our scientists will design the best sequencing strategy utilizing an optimal combination of short reads and long range sequence information to achieve the most comprehensive de novo assembly results for your genome of interest.

Project Types

Simple Genome

Simple genome refers to a haploid genome with a low repeat content (less than 50%), or a diploid genome with a low rate of heterozygosity (less than 0.5%), such as most mammals, birds, and cultivated crops.

Complex Genome

Complex genome refers to a diploid or polyploid genome with a high repeat content (higher than 50%) or a high rate of heterozygosity (higher than 0.5%), such as many species of plants, aquatics, and insects.
  • Moderately heterozygous genome (diploid)
  • Highly heterozygous genome (diploid)
  • Highly repetitive genome (diploid)

The Novogene Advantage

  • Highly experienced: We have completed major de novo genome sequencing projects, and our data has been published in top-tier journals.
  • Largest sequencing capacity: We have the largest Illumina and PacBio sequencing capacities in the world, allowing us to provide high quality data, fast turnaround, and affordable prices.
  • Bioinformatics expertise: We use best-in-class and self-developed software, such as SOAPdenovo and NovoHeter, for complex genome assembly.
  • Diverse strategies: Incorporating sequencing results from various platforms including Illumina Novaseq, PacBio Sequel, Oxford PromethION and 10X Genomics Chromium, we offer the best assembly solution tailored towards each unique genome.

Project Workflow

de novo sequencing service project workflow

Sequencing Strategy & Data Quality Guarantee

Platform Illumina HiSeq
Sequencing Libraries 350 bp insertion
Sequencing Strategy PE150
Assembly Strategy I
50X PacBio Sequel long read data/Oxford PromethION reads data
Sequencing Libraries 250 bp / 450 bp / 2 Kb / 5 Kb / 10 Kb insertions 250 bp / 350 bp / 450 bp / 2 Kb / 5 Kb / 10 Kb / 20 Kb insertions
Software SOAPdenovoII NOVOheter
Data Quality Guarantee Contig N50 ≥1 Mb Contig N50 ≥ 300 Kb
Assembly Strategy II
High quality de novo assembly (70X PacBio Sequel reads)
Data Quality Guarantee Contig N50 ≥ 2 Mb Contig N50 ≥ 40 Kb Scaffold N50 ≥ 500 Kb
Assembly Strategy III
Hybrid assembly integrating Illumina short read data, PacBio Sequel long read data and 10X Genomics linked read data
Data Quality Guarantee Contig N50 ≥ 500 Kb Scaffold N50 ≥ 1 Mb Contig N50 ≥ 200 Kb Scaffold N50 ≥ 1 Mb
Assembly Strategy IV
High quality de novo assembly (70X PacBio Sequel reads)
Data Quality Guarantee Contig N50 ≥ 1 Mb Contig N50 ≥ 500 Kb
Assembly Strategy III (Recommended)
Super-scaffold and chromosomal scale de novo assembly integrating PacBio Sequel reads/Oxford PromethION reads and 10X Genomics / BioNano / Hi-C
Data Quality Guarantee Contig N50 ≥ 2 Mb Scaffold N50 ≥ 4 Mb Contig N50 ≥ 500 Kb Scaffold N50 ≥ 1 Mb
de novo sequencing service project workflow Figure. Visual illustration of assembly strategy V: Super-scaffold and chromosomal scale de novo assembly integrating PacBio Sequel reads and 10X Genomics / BioNano / Chicago / Hi-C. Customized services are also available upon request. Please contact us for more information.

Sample Requirements

  • DNA amount for survey: ≥ 2 µg
  • DNA amount for genome de novo sequencing per library: ≥ 2 µg (for Illumina sequencing) and > 20 µg (for PacBio sequencing)
  • DNA concentration: ≥ 50 ng/μl (for Illumina sequencing) and > 100 ng/µL (for PacBio sequencing)
  • Purity: No degradation, no DNA contamination

Analysis Pipeline

de novo sequencing service analysis pipeline
The following studies utilized Novogene's expertise in de novo sequencing and de novo assembly. The Apostasia genome and the evolution of orchids Nature 549, 379–383 (2017) In this study, the researchers adopted one of Novogene’s de novo assembly strategies, incorporating Illumina’s short-read-based assembly and gap filling using PacBio long read data and 10X Genomics linked-read data. This strategy allowed researchers to obtain the first draft genome of Apostasia shenzhenica, as well as to improve the two previously published genome sequences of the orchids Phalaenopsis equestris and Dendrobium catenatum. A genome size of 349 Mb, with scaffold N50 of 3 Mb and contig N50 of 80 Kb, was assembled for Apostasia shenzhenica. In P. equestris, the scaffold N50 increased from 20.56 Kb to 45.79 Kb and the contig N50 increased from 359 Kb to 1.2 Mb. In D. catenatum, the scaffold N50 increased from 391.46 Kb to 1.05 Mb and the contig N50 increased from 33.1 Kb to 51.76 Kb. Transcriptome data was also obtained from the three samples for comparison. The new genome and transcriptome data obtained from these 3 species was combined with the published data available for 12 other plant species to construct a phylogenetic tree and estimate their divergence times. One shared whole-genome duplication (WGD) event that occurred shortly before divergence in all orchids was identified, and 36 putative functional MADS-box genes were also found. The Apostasia shenzhenica genome data provides a reference for future exploration of evolutionary history, functional genomics, and population genetics. It also expands our understanding of specific gene family functions in orchid evolution. The study serves as an excellent example of Novogene’s advanced and comprehensive platforms being utilized to obtain high quality genome data with high level of completeness for species without an available reference genome.
Figure. Phylogenetic tree showing divergence times and the evolution of gene family sizes
Scallop genome reveals molecular adaptations to semi-sessile life and neurotoxins Nature Communications, 8:1721 (2017) The bivalve mollusks, with their extraordinary adaptability to their environment, are excellent models for studying the adaptive evolution of animals. The scallop, a semi-sessile bivalve mollusk, plays a critical role in the study of bilaterian evolution and adaptation to benthic ecology, yet it has been a challenge to sequence its genome due to its super high genome heterozygosity. In this Novogene co-authored paper that was published in Nature Communications, researchers investigated the scallop C. farreri at the genome, transcriptome, and proteome levels using both Illumina and PacBio SMRT sequencing platforms. Illumina and PacBio SMRT sequencing as well as a modified SOAPdenovo strategy were adopted to create a 779.9Mb genome assembly, with contig N50 = 21.5kb and scaffold N50 = 602kb, which is comparable or better than those of previously published bivalve genomes. This study further revealed the molecular changes causing multiple adaptive traits, such as neurotoxin resistance, sophisticated eyes, and large adductor muscle. These key adaptive traits are the result of expansion and mutation of a few specific genes, suggesting that minor changes to an organism’s genome may have a large effect on its phenotype and adaptation. This study illustrates how diverse sequencing strategies and the multi-omics approach are powerful tools that can be used to expand our understanding of bilaterian evolution and adaptation.
Figure. Genomic landscape and polymorphism analysis of the scallop C. farreri
Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement Nature Biotechnology 33:531 (2015) In this study, Novogene’s de novo whole genome sequencing and RNA sequencing technologies were employed to obtain allotetraploid cotton sequences. We generated 2-4 fold longer sequence scaffolds (N50 = 1.6M) than those obtained for other allopolyploid species in previous research. The current study also uncovered an important asymmetric evolution in allotetraploid cotton. Positively selected genes (PSGs) for fiber improvement and stress tolerance were focused in the A subgenome and the D subgenome, respectively. In summary, this study provides a comprehensive genome analysis and explores the selection and evolution processes associated with the A and D subgenomes in allotetraploid cotton, providing valuable information for fiber improvement.
de-novo-data Figure. Syntemic analysis and asymmetrical evolution of the allopolyploid cotton genome
Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nature Genetics, 46:1303–1310 (2014) Researchers from Novogene and the Chinese Academy of Science collaborated on the whole-genome sequencing of the golden snub-nosed monkey, a species of Old World monkey. The de novo genome sequencing data were compared with genome resequencing data from three other related species, and provided insights into the evolutionary adaptations associated with the unique diets of these primates.
de-novo-data Figure. Phylogenetic tree and estimated divergence times for GSM and other mammals.

Novogene Publications

Journal Title
Nature Communications, 9: 1615 (2018) The Gastrodia elata genome provides insights into plant adaptation to heterotrophy.
Nature Plants, 4: 82–89 (2018) A genome for gnetophytes and early evolution of seed plants.
Nature Communications, 9: 2683 (2018) Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis
Nature, 549: 379–383 (2017) The Apostasia genome and the evolution of orchids.
Nature Ecology & Evolution, 1(5):120. Scallop genome provides insights into evolution of bilaterian karyotype and development
Nature Communications, 8: 1721 (2017) Scallop genome reveals molecular adaptations to semi-sessile life and neurotoxins.
Nature Plants, 3: 946-955 (2017) The Aegilops tauschii genome reveals multiple impacts of transposons.
Genome Research, 27: 865-874 (2016) Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies.
Nature Biotechnology 33:531 (2015) Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement.
Nature Genetics, 46:1303 (2014) Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history.
Nature Biotechnology, 32:1045 (2014) De novo assembly of soybean wild realtives for pan-genome analysis of diversity and agronomic traits.
Nature Communications, 4:2071 (2013) Ground tit genome reveals avian adaptation to living at high altitudes in the Tibetan plateau.
Nature Genetics, 45:1431 (2013) Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars.