Leading Edge Genomic Services & Solutions

De Novo Sequencing

Service Overview
Novogene Data
Contact Us

de novo sequencing and de novo assemblyWith de novo sequencing, the first genome map for a species is generated, providing a valuable reference sequence for phylogenetic studies, analysis of species diversity, mapping of specific traits and genetic markers, and other genomics research.

Novogene is at the forefront of de novo sequencing as it becomes more rapid and affordable. Novogene’s founder, Dr. Ruiqiang Li, is a leading genomics expert and a primary developer of the SOAPdenovo software package for genome assembly. Dr. Li and the Novogene team have contributed to many important publications on novel genome sequences, and we can provide you with the high level of expertise required for your specific project.

With the addition of 20 latest PacBio Sequel Systems which provide long range sequencing and 7 times higher throughput, Novogene has become the world’s largest PacBio SMRT sequencing facility. Novogene offers de novo sequencing service using various platforms including PacBio Sequel, Illumina HiSeq, and 10X Genomics Chromium platforms. For each project, our scientists will design the best sequencing strategy utilizing an optimal combination of short reads and long range sequence information to achieve the most comprehensive de novo assembly results for your genome of interest

Project Types

Simple Genome

Simple genome refers to a haploid genome with a low repeat content (less than 50%), or a diploid genome with a low rate of heterozygosity (less than 0.5%), such as most mammals, birds, and cultivated crops.

Complex Genome

Complex genome refers to a diploid or polyploid genome with a high repeat content (higher than 50%) or a high rate of heterozygosity (higher than 0.5%), such as many species of plants, aquatics, and insects.
  • Moderately heterozygous genome (diploid)
  • Highly heterozygous genome (diploid)
  • Highly repetitive genome (diploid)

The Novogene Advantage

  • Highly experienced: We have completed major de novo genome sequencing projects, and our data has been published in top-tier journals.
  • Largest sequencing capacity: We have the largest Illumina and PacBio sequencing capacities in the world, allowing us to provide high quality data, fast turnaround, and affordable prices.
  • Bioinformatics expertise: We use best-in-class and self-developed software such as SOAPdenovo and NovoHeter for complex genome assembly.
  • Diverse strategies: Incorporating sequencing results from various platforms including Illumina HiSeq, PacBio Sequel and 10X Genomics Chromium, we offer the best assembly solution tailored towards each unique genome.

Project Workflow

de novo sequencing service project workflow

Sequencing Strategy & Data Quality Guarantee


 GENOME SURVEY ANALYSIS
PlatformIllumina HiSeq
Sequencing Libraries350 bp insertion
Sequencing StrategyPE150
 SIMPLE GENOME DE NOVO SEQUENCINGCOMPLEX GENOME DE NOVO SEQUENCING
Assembly Strategy I
Illumina short-read-based assembly
Sequencing Libraries250 bp / 450 bp /
2 Kb / 5 Kb / 10 Kb insertions
250 bp / 350 bp / 450 bp /
2 Kb / 5 Kb / 10 Kb / 20 Kb insertions
SoftwareSOAPdenovoIINOVOheter
Data Quality GuaranteeMammal (except Chiroptera) or bird genome:
Contig N50 ≥ 40 Kb
Scaffold N50 ≥ 4 Mb

Other genomes:
Contig N50 ≥ 30 Kb
Scaffold N50 ≥ 1 Mb
Contig N50 ≥ 20 Kb
Scaffold N50 ≥ 500 Kb
Assembly Strategy II (for genome with draft reference)
Illumina short-read-based assembly and gap-filling using PacBio long read data
Data Quality GuaranteeMammal (except Chiroptera) or bird genome:
Contig N50 ≥ 80 Kb 
Scaffold N50 ≥ 4 Mb 

Other genomes:
Contig N50 ≥ 60 Kb 
Scaffold N50 ≥ 1 Mb
Contig N50 ≥ 40 Kb
Scaffold N50 ≥ 500 Kb
Assembly Strategy III
Hybrid assembly integrating Illumina short read data, PacBio Sequel long read data and 10X Genomics linked read data
Data Quality GuaranteeContig N50 ≥ 500 Kb
Scaffold N50 ≥ 1 Mb
Contig N50 ≥ 200 Kb
Scaffold N50 ≥ 1 Mb
Assembly Strategy IV
High quality de novo assembly (70X PacBio Sequel reads)
Data Quality GuaranteeContig N50 ≥ 1 MbContig N50 ≥ 200 Kb
Assembly Strategy V (Recommended)
Super-scaffold and chromosomal scale de novo assembly integrating PacBio Sequel reads and 10X Genomics / BioNano / Chicago / Hi-C
Data Quality GuaranteeContig N50 ≥ 1 Mb
Scaffold N50 ≥ 3 Mb
Contig N50 ≥ 500 Kb
Scaffold N50 ≥ 1 Mb

de novo sequencing service project workflow
Figure. Visual illustration of assembly strategy V: Super-scaffold and chromosomal scale de novo assembly
integrating PacBio Sequel reads and 10X Genomics / BioNano / Chicago / Hi-C.


Customized services are also available upon request. Please contact us for more information.

Sample Requirements

  • DNA amount for survey: ≥ 10 µg
  • DNA amount for genome de novo sequencing per library: ≥ 2 µg (for Illumina sequencing) and > 20 µg (for PacBio sequencing)
  • DNA concentration: ≥ 50 ng/μl (for Illumina sequencing) and > 80 ng/µL (for PacBio sequencing)
  • Purity: OD260/280 = 1.8 - 2.0 without degradation and RNA contamination
*DNA amount quantified by Qubit 2.0. Please consult our team for total DNA amount requirements for your genome of interest.

Analysis Pipeline

de novo sequencing service analysis pipeline

The following studies utilized Novogene's expertise in de novo sequencing and de novo assembly.

Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement
Nature Biotechnology 33:531 (2015)

In this study, Novogene’s de novo whole genome sequencing and RNA sequencing technologies were employed to obtain allotetraploid cotton sequences. We generated 2-4 fold longer sequence scaffolds (N50 = 1.6M) than those obtained for other allopolyploid species in previous research. The current study also uncovered an important asymmetric evolution in allotetraploid cotton. Positively selected genes (PSGs) for fiber improvement and stress tolerance were focused in the A subgenome and the D subgenome, respectively. In summary, this study provides a comprehensive genome analysis and explores the selection and evolution processes associated with the A and D subgenomes in allotetraploid cotton, providing valuable information for fiber improvement.

de-novo-data
Figure. Syntenic analysis and asymmetrical evolution of the allopolyploid cotton genome


Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history.
Nature Genetics, 46:1303–1310 (2014)

Researchers from Novogene and the Chinese Academy of Science collaborated on the whole-genome sequencing of the golden snub-nosed monkey, a species of Old World monkey. The de novo genome sequencing data were compared with genome resequencing data from three other related species, and provided insights into the evolutionary adaptations associated with the unique diets of these primates.

de-novo-data
Figure. Phylogenetic tree and estimated divergence times for GSM and other mammals.

Novogene Publications

JournalTitle
Nature Communications, 4:2071 (2013)Ground tit genome reveals avian adaptation to living at high altitudes in the Tibetan plateau.
Nature Genetics, 45:1431 (2013)Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars.
Nature Biotechnology, 32:1045 (2014)De novo assembly of soybean wild realtives for pan-genome analysis of diversity and agronomic traits.
Nature Genetics, 46:1303 (2014)Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history.
Nature Biotechnology 33:531 (2015)Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement.
Genome Research, 27: 865-874 (2016)Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies.
  Whole Genome Sequencing on HiSeq X (Human/ Animal/ Plant)
  Whole Exome Sequencing
  mRNA-Seq
  LncRNA Sequencing
  Small RNA Sequencing
  Whole Genome Bisulfite Sequencing
  ChIP-Seq
  Animal & Plant Re-Sequencing
  de novo Sequencing
  Pan-genome Sequencing
  Metagenomic Sequencing
  Single-cell DNA Sequencing
  Single-cell RNA Sequencing
  16S/18S/ITS Amplicon
  HiSeq Lane Sequencing
  NovaSeq Flowcell Sequencing
  Others- please specify
  Human
  Others