Request A Quote
Contact us to discuss how we can help you achieve your research goals
Services

Isoform Sequencing (Full-length Transcript Sequencing)

What is Isoform Sequencing (Full-length Transcript Sequencing) ?

Isoform Sequencing (Iso-Seq) using PacBio SMRT (Single Molecule, Real-Time) technology enables sequencing of full-length transcript isoforms (from 5’UTR to 3’poly-A tail) within the targeted genes. Iso-Seq is a high-throughput method for characterizing fusion genes, identifying alternative splicing, annotating genomes, and discovering novel transcripts.
Iso-Seq can be fully leveraged for medical and agricultural research purposes, including disease mechanism investigation, exploring drug resistance mechanisms, discovering new genes, as well as studying plant development and biotic and abiotic stresses.

Service Specifications

d.Applications of Isoform Sequencing

In Medical Research
Iso-seq can be effectively utilized in medical research:

  • For investigating disease mechanisms
  • For discovering alternatively spliced transcripts as potential biomarkers
  • For exploring drug resistance mechanisms
  • For identifying new genes and transcripts
  • For improving genome annotations to recognize the coding regions, regulatory elements, and structural elements of the genes

In Agricultural Research

  • For understanding plant development under environmental stress
  • For discovering new isoforms resulting from alternative splicing

Benefits of Isoform Sequencing

  • Leading sequencing capacity, high-quality data, fast turnaround, and affordable prices.
  • Well-developed pipeline to discover novel transcripts, differential expressions, and function annotations.
  • Ability to optimize the sequencing process to surpass PacBio’s standards in read length and output.

Iso-seq Specifications: RNA Sample Requirements;

 

Library Type Sample Type Amount Concentration RIN (Agilent 2100) Purity
(Nanodrop™/Agarose Gel)
PacBio sequel II/IIe
RNA Library
Total RNA ≥ 800 ng ≥ 30ng/μl ≥ 6.5 A260/280=1.8-2.2;
A260/230=1.3-2.5;
NC/QC≤2.5

*Nc/Qc: NanoDrop concentration/Qubit concentration

Iso-seq Specifications: Sequencing & Analysis;

Platform PacBio Sequel system
Recommended data amount ≥ 15 G bases pair per sample
Content of Analysis
  • Data Quality Control
  • Transcriptome Analysis
  • Isoform Characterization

    Structural Category

    Length Distribution

    Transcriptome Diversity

  • Function annotation
  • Structure analysis
  • Transcription Factor analysis

    lncRNA prediction*

    Fusion Transcript analysis*

    Alternative Splicing analysis*

    Alternative PolyAdenylation analysis*

    *Only available when reference genome is available

Note: For detailed information, please refer to the Service Specifications and contact us for customized requests.

Novogene Workflow of iso-seq Service;

From sample and library preparation, SMRT sequencing, and data quality control, to bioinformatics analysis, Novogene provides high-quality products and professional services. Each step is performed in agreement with a high scientific standard and meticulous design to ensure high-quality research results.

Sampling:

Fresh-frozen primary and metastatic tumors with paired normal tissue

Sequencing Strategy:

1. Illumina Technology: sequenced on an Illumina Hiseq X Ten platform to generate 125 bp paired-end reads.
2. PacBio System: sequenced on a PacBio RS II small-molecule real-time (SMRT) sequencing platform by two SMRT cells.

Figure 1. a. Correlation analysis on the numbers of transcriptomic events (ATSS, AS and APA) and detected numbers of isoforms for normal ovarian tissue, primary tumor, and distal metastasis. b Expression of genes with multiple isoforms was compared with those with a single isoform.

Figure 2. Hierarchical clustering of isoform expression in normal tissue and ovarian tumors.

Figure 3. Identified somatic genetic and transcriptomic aberrations in genes involved in proteostatic stress regulation. P Primary tumor, M Metastatic tumor.
Conclusion:

This study integrated second- and third-generation sequencing platforms to generate a multidimensional dataset on a patient affected by metastatic epithelial ovarian cancer. Besides, it reveals clinical application of the emerging long-read full-length analysis for improving molecular diagnostics is feasible and informative. An in-depth understanding of the tumor transcriptome complexity allowed by leveraging the hybrid sequencing approach lays the basis to reveal novel and valid therapeutic vulnerabilities in advanced ovarian malignancies.

Self-Recognition of an Inducible Host lncRNA by RIG-I Feedback Restricts Innate Immune Response

Background:

Innate immune system can sense the invading pathogens via pattern recognition receptors (PRRs) to initiate efficient innate response for the elimination of the pathogens. As the most extensively studied PRR for recognition of RNA virus, retinoic acid-inducible gene-I (RIG-I) has been shown to recognize viral RNAs in the cytoplasm and trigger innate immune response through the production of type I interferons (IFNs) and proinflammatory cytokines. However, the biological significance and the underlying mechanisms for the interaction of lncRNAs with RBPs in the immunity and inflammation remain to be further investigated. The increasing evidence for the RBP-lncRNA interactions in association with protein functions inspired us to ask whether RIG-I can bind to ‘‘self’’ cellular lncRNAs, and if so, what is the biological function and importance of such self-recognition in maintaining immune homeostasis by feedback restricting or timely terminating RIG-I recognition of ‘‘non-self’’ RNA-induced innate inflammatory response.

Sequencing Strategy:

1. RIP-seq
2. Pacific Bioscience RS II platform

Figure 4. Location and read depth of cluster analysis of RIP-seq data mapped to Lsm3 loci using PacBio platform and Illumina Platform.
Conclusion:

In this study, the full-length transcriptome sequencing was used to identify the full-length sequence of cytoplasmic lnc-Lsm3b, and a self-recognition model of lncRNA-RIG-I to inhibit RIG-I activation was found. This approach prevents overexpression of IFN-I to maintain the body’s immune homeostasis. lncRNA was identified as an important regulatory element for nucleic acid innate immune recognition and inflammation regulation, and it also revealed the key functions of lncRNA in anti-virus, providing new ideas for the study of prevention and treatment of inflammatory diseases.

A survey of the sorghum transcriptome using single-molecule long reads

Background:

Sorghum, a C4 crop plant used for food, feed, fibre and fuel, is one of the best-adapted cereals to drought and temperature; hence, used as a model system to investigate the molecular basis of adaptation to abiotic stresses. Although the genome sequence of several sorghum lines has been completed recently29,30, the transcriptome is not well annotated; the extent of alternative splicing (AS), the number of splice isoforms and transcriptome diversity due to alternative polyadenylation (APA) are largely unknown.

Sampling:

Sorghum (Sorghum bicolor L. Moench) seedlings under drought stress and control.

Sequencing Strategy:

1. Illumina Platform: Hiseq
2. PacBio System: performed on a PacBio RS II instrument for a total of 28 SMRTcells.

Figure 5. An example of a gene that produces 13 novel splice isoforms.

Figure 6. PCR validation of alternative splicing events identified by Iso-Seq.
Conclusion:

In this study, full-length splice isoforms and APA sites of the sorghum transcriptome were sequenced and identified using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify. The analysis results reveal transcriptome-wide full-length isoforms at an unprecedented scale and uncovered novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop.


CCS

CCS (Circular Consensus Sequence), also known as reads of insert, can be created by correcting and aligning subreads to each other taken from a single ZMW. The CCS (Consensus Sequence Sequence) number can be obtained through subreads.

Length distribution of CCS(Circular Consensus Sequence) reads

Note:
The x-axis represents the read length; the y-axis indicates frequency count corresponding to the read length


Structural Category

Distribution of isoform numbers by the characterization results. There are a significant number of isoforms for NIC or NNC (Novel isoforms) (left); Usually, one-gene-one-isoform distribution can be observed in most of the cases, especially for Novel genes (right).

Isoform numbers by structural category (left) and by gene type (right)

Note:
The x-axis shows isoform classification; the y-axis shows isoform percentage for each classification (Left);
The x-axis shows gene type; the y-axis shows genes percentage for each “isoforms per gene” classification (Rright)


Length Distribution

The transcript length distribution and exon number distribution of the isoforms by the structural classification are both presented in a boxplot.

Transcript length distribution by structural classification (left) and exon numbers distribution by structural classification (right) by transcript type

Note:
The x-axis shows transcript classification; the left y-axis shows the length of transcript in each classification; the right y-axis shows the number of exons of transcript in each classification


Gene Ontology

The Gene Ontology (GO) project aims to provide reliable descriptions of gene products within several databases. GO vocabularies (ontologies) explain gene products concerning their associated biological processes, molecular functions, and cellular components in a species-independent approach. GO annotation is only available for identified novel genes and isoforms.

Gene Ontology Annotation Classification

  • Structure analysis
  • Note:
    The x-axis shows the three GO categories, and the y-axis shows the number of differential genes annotated to the term (including the sub-term of the term). The three different categories represent the three basic classifications of GO term (from left to right are biological processes, cellular components, and molecular functions);


    CNCI (Coding-Non-Coding Index)

    CNCI (Coding-Non-Coding Index) is a powerful signature tool to predict the sequences based on the intrinsic composition and offers accurate classification of transcripts assembled from whole-transcriptome sequencing data. PLEK is a tool for predicting long non-coding RNAs and mRNAs in the absence of genomic sequences or annotations using a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm. The results from PLEK and CNCI are shown in the Venn diagrams.

    Venn diagrams of results from PLEK and CNCI


    Alternative Splicing

    Summary of alternative splicing events

    Note:
    Alt.3’:Alternative 5′ splice site; Alt.5’:Alternative 3′ splice site

    *Please contact us to get the full demo report.