• Language
Tell us about your project
Contact us to discuss how we can help you achieve your research goals
Research Services

Human Whole Genome Sequencing

Overview

Human whole genome sequencing (hWGS) enables researchers to catalog a genetic constitution of individuals and capture all variants (single-nucleotide variations (SNVs), insertions and deletions (InDels), copy number variations (CNVs), and large structural variants (SV) present in a single assay. Equipped with the powerful Illumina NovaSeq 6000 system, Novogene is capable of sequencing up to 280,000 human genomes per year at the lowest cost per genome. With the addition of Oxford Nanopore PromethION and PacBio Sequel Systems, Novogene also provides hWGS services with more complete and accurate characterization of human genome and complements missing sequencing reads, especially in highly polymorphic and highly repetitive regions from short reads sequencing. With extensive experience in whole genome sequencing and advanced bioinformatics capabilities, Novogene is able to expertly meet customer needs for delivering large project results with quick turnaround times and the highest quality results.

Applications

  • Genetic disease study
  • Cancer research
  • Human population evolution
  • DNA biomarkers
  • Pharmacogenomics

Advantages

  • State-of-the-art NGS technologies: Novogene is a world leader in sequencing capacity using state-of-the-art technology, including Illumina HiSeq and NovaSeq 6000 Systems.
  • Highest data quality: We guarantee a Q30 score ≥ 80%, exceeding Illumina’s official guarantee of ≥ 75%. See our data example.
  • Extraordinary informatics expertise: Novogene uses its cutting-edge bioinformatics pipeline and internationally recognized, best-in-class software to provide customers with highly reliable, publication-ready data.

Sample Requirements

Platform Type Sample Type Amount (Qubit®) Purity
Illumina Novaseq 6000
Genomic DNA ≥ 200 ng
OD260/280=1.8-2.0
Genomic DNA (PCR free) ≥ 1.5 μg
Genomic DNA from FFPE ≥ 0.8 μg
PacBio Sequel I/II HMW Genomic DNA ≥ 10 μg (for Sequel I)
≥30 μg (for Sequel II)
OD260/280=1.8-2.0;
OD260/230=2.0-2.2;
Fragments should be ≥ 30 Kb for Sequel I, ≥ 60 Kb for Sequel II
Nanopore PromethION HMW Genomic DNA ≥ 10 μg OD260/280=1.8-2.0;
OD260/230=2.0-2.2;
Fragments should be ≥ 30 Kb

Download full version

Note: For detailed information, please contact us.

Sequencing Parameters and Analysis Contents

Platform Type Illumina Novaseq 6000 PacBio Sequel I/II Nanopore PromethION
Read Length Paired-end 150 bp average > 10 Kb for Sequel I
average > 15 Kb for Sequel II
average > 17 Kb
Recommended Sequencing Depth
For rare diseases: 30-50× For genetic diseases: 10-20× For genetic diseases: 10-20×
For tumor tissues: 50×, adjacent normal tissues and blood 30× For tumor tissues: ≥20× For tumor tissues: ≥20×
Standard Data Analysis
Data quality control Data quality control
Alignment with reference genome Sequence alignment
SNP/InDel/SV/CNV detection Structural variant (SV) detection
Somatic SNP/InDel/SV/CNV detection (tumor-normal paired samples) Variation annotation

Download full version

Note: Sequencing depths and bioinformatic analysis (or advanced analysis for Cancer or Disease) requests can be customized based on the project needs. Please contact us for more information.

Project Workflow

Genomic sequencing identifies WNK2 as a driver in hepatocellular carcinoma and a risk factor for early recurrence (Zhou et al., 2019)

Background:

Hepatocellular carcinoma (HCC) is a relatively common type of cancer with rising incidence and mortality rates. Although advances in the treatment and management of patients with HCC have improved survival rates, HCC still has a high rate of early recurrence. This study aimed to systematically define genomic alterations in Chinese patients with HCC and to identify mutations associated with early tumor recurrence in those patients.

Sampling & Sequencing Strategy:

Sampling:
• 182 Chinese primary HCC samples

Sequencing Strategy:
• Human whole genome sequencing (49 cases), whole exome sequencing (18 cases), and targeted region sequencing (115 cases) on Illumina platforms (PE150)

Results & Conclusion

By using WGS, this study described the genomic landscape, including somatic SNVs/InDels, CNVs, and SVs, and identified five prominent mutational signatures in 49 Chinese patients with HCC (Figure 3). Through WGS, WES, and targeted sequencing of 182 primary HCC samples, the results suggest that WNK2, RUNX1T1, CTNNB1, TSC1, and TP53 may play roles in HCC invasion and metastasis, and that WNK2 had the most significant difference in mutation frequency (Figure 4). Biofunctional investigations revealed a tumor-suppressor role for WNK2; its inactivation led to ERK1/2 signaling activation in HCC cells, tumor-associated macrophage infiltration, and tumor growth and metastasis. This study describes the genomic events that characterize Chinese HCCs and identify WNK2 as a driver of HCC that was associated with early tumor recurrence after curative resection.

Figure 1. Genomic alterations and mutational signatures in 49 Chinese primary HCCs that had tumor early.

Figure 2. The mutational spectrum in HCCs with or without early recurrence.

Reference: Zhou SL, Zhou ZJ, Hu ZQ, et al. Genomic Sequencing Identifies WNK2 as a Driver in Hepatocellular Carcinoma and a Risk Factor for Early Recurrence[J]. Journal of Hepatology 2019, doi: 10.1016/j.jhep.2019.07.014.

Characteristics of genomic alterations of lung adenocarcinoma in young never-smokers (Luo et al., 2018)

Background:

Non-small-cell lung cancer (NSCLC) has been recognized as a highly heterogeneous disease with phenotypic and genotypic diversity in each subgroup. While never-smoker patients with NSCLC have been well studied through next generation sequencing, the potentially unique molecular features of young never-smoker patients with NSCLC remains largely unknown.

Sampling & Sequencing Strategy:

Sampling:
• 36 never-smoker patients with lung adenocarcinoma (LUAD)

Sequencing Strategy:
• Human whole genome sequencing on Illumina platform (PE150)

Results & Conclusion

The study revealed that besides the well-known gene mutations, several potential lung cancer-associated gene mutations that were rarely reported (e.g., HOXA4 and MST1) were identified. The lung cancer-related copy number variations (e.g., EGFR and CDKN2A) were enriched and the lung cancer-related structural variations (e.g., EML4-ALK and KIF5B-RET) were commonly observed. Notably, new fusion partners of ALK (SMG6-ALK) and RET (JMJD1C-RET) were found. Furthermore, a high prevalence of potentially targetable genomic alterations was observed in the cohort. Finally, the research identified germline mutations in BPIFB1, CHD4, PARP1, NUDT1, RAD52, and MFI2 were significantly enriched in the young never-smoker patients with LUAD comparing with the in-house noncancer database (p<0.05). This study provides a detailed mutational portrait of LUAD occurring in young never-smokers and gives insights into the molecular pathogenesis of this distinct subgroup of NSCLC.

Figure 3. Mutation landscape of lung adenocarcinoma in young never-smoker patients.

Reference: Luo WX, Tian PW, Wang Y, et al. Characteristics of genomic alterations of lung adenocarcinoma in young never-smokers[J]. International Journal of Cancer, 2018, 143, 1696‒1705.

Genetic alterations in esophageal tissues from squamous dysplasia to carcinoma (Liu et al., 2017)

Background:

Esophageal squamous cell carcinoma (ESCC) is the most common subtype of esophageal cancer. Little is known about the genetic changes that occur in esophageal cells during the development of ESCC. This study performed next-generation sequence analyses of esophageal nontumor, intraepithelial neoplasia (IEN), and ESCC tissues from the same patients to track genetic changes during tumor development.

Sampling & Sequencing Strategy:

Sampling:
• 227 esophageal tissue samples from 70 patients with ESCC undergoing resection

Sequencing Strategy:
• Human whole genome sequencing (7 cases), whole exome sequencing (18 cases), and targeted region sequencing (45 cases) on Illumina platforms (PE150)

Results & Conclusion

The study revealed significant similarities in the types and frequency of mutations between IEN and ESCC (Figure 1), including similarity in the DNA damage mutation signature. Mutations in the CCND1, CDKN2A, and FGFR1 genes were also revealed as the early driver events from phylogenetic and clonal analysis. However, the number of non-overlapping SNVs in tissues taken from the same individuals indicated that various lesions formed independently and that there was independent clonal expansion of mutations. As shown in this study, using multiple NGS applications provides novel approaches for exploring early diagnostics and treatments for cancer.

Figure 4. The mutation variation landscape of ESCC, IEN, and simple hyperplasia (ESSH) from whole genome sequencing and whole exome sequencing.

Reference: Liu X, Zhang M, Ying SM, et al. Genetic alterations in esophageal tissues from squamous dysplasia to carcinoma[J]. Gastroenterology, 2017, 153: 166‒177.


Sequencing error rate distribution

Note: The x-axis represents position in reads, and the y-axis represents the average error rate of bases of all reads at a position.

GC content distribution

Note: The x-axis is position in reads, and the y-axis is percentage of each type of bases (A, T, G, C); different bases are distinguishable by different colors.

Sequencing depth & coverage distribution

Note: Average sequencing depth (bar plot) and coverage (dot-line plot) in each chromosome. The x-axis represents chromosome; the left y-axis is the average depth; the right y-axis is the coverage (proportion of covered bases).

SNP detection

Sample Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
CDS 22318 22343 22271 22702 22654 22418
Synonymous SNP 11342 11375 11329 11,439.00 11387 11376
missense SNP 10335 10340 10334 10643 10649 10400
stopgain 77 81 72 87 87 8.30
stoploss 14 13 11 12 12 10
unknown 558 541 536 531 528 501
intronic 1263778 1261992 1262435 1259099 1262095 1271575
UTR3 25167 25134 25496 25396 25462 25510
UTR5 5568 5562 5644 5767 5829 5702
splicing 84 85 84 86 90 96
ncRNA exonic 11867 11818 11734 11628 11697 11760
ncRNA intronic 205360 205028 200363 199813 200397 205018
ncRNA splicing 66 66 58 61 64 60
upstream 22383 22339 22230 22648 22744 22708
downstream 23565 23544 23515 23221 23235 23557
intergenic 2119447 2115048 2110391 2091107 2098406 2138433
Total 3700477 3693838 3685038 3662384 3673519 3727684

Circos

Note:
Novogene shows Circos only when CNV analysis was carried out. The figure consists seven rings from outer to inner.
(1) The outer circle (the first circle) is chrome information.
(2) The second ring represents the read coverage in histogram style. A histogram is the average coverage of a 0.5Mbp region.
(3) The third ring represents indel density in scatter style. A black dot is calculated as indel number in a range of 1Mbp.
(4) The fourth ring represents snp density in scatter style. A green dot is calculated as snp number in a range of 1Mbp.
(5) The fifth ring represents the proportion of homozygous SNP (orange) and heterozygous SNP (grey) in histogram style. A histogram is calculated from a 1Mbp region.
(6) The sixth ring represents the CNV inference. Red means gain, and green means loss.
(7) The most central ring represents the SV inference in exonic and splicing regions. TRA (orange), INS (green), DEL (grey), DUP (pink) and INV (blue).

Heatmap of significantly mutated genes


Linkage analysis

Note: The upper x-axis is chromosome number; the lower x-axis is centimorgan (cM). And the y-axis is LOD score.