What is Isoform Sequencing (Full-length Transcript Sequencing) ?
Isoform Sequencing (Iso-Seq) using PacBio SMRT (Single Molecule, Real-Time) technology enables sequencing of full-length transcript isoforms (from 5’UTR to 3’poly-A tail) within the targeted genes. Iso-Seq is a high-throughput method for characterizing fusion genes, identifying alternative splicing, annotating genomes, and discovering novel transcripts.
Iso-Seq can be fully leveraged for medical and agricultural research purposes, including disease mechanism investigation, exploring drug resistance mechanisms, discovering new genes, as well as studying plant development and biotic and abiotic stresses.
d.Applications of Isoform Sequencing
In Medical Research
Iso-seq can be effectively utilized in medical research:
- For investigating disease mechanisms
- For discovering alternatively spliced transcripts as potential biomarkers
- For exploring drug resistance mechanisms
- For identifying new genes and transcripts
- For improving genome annotations to recognize the coding regions, regulatory elements, and structural elements of the genes
In Agricultural Research
- For understanding plant development under environmental stress
- For discovering new isoforms resulting from alternative splicing
Benefits of Isoform Sequencing
- Leading sequencing capacity, high-quality data, fast turnaround, and affordable prices.
- Well-developed pipeline to discover novel transcripts, differential expressions, and function annotations.
- Ability to optimize the sequencing process to surpass PacBio’s standards in read length and output.
Iso-seq Specifications: RNA Sample Requirements;
|Library Type||Sample Type||Amount||Concentration||RIN (Agilent 2100)||Purity
|PacBio sequel II/IIe
|Total RNA||≥ 800 ng||≥ 30ng/μl||≥ 6.5||A260/280=1.8-2.2;
*Nc/Qc: NanoDrop concentration/Qubit concentration
Iso-seq Specifications: Sequencing & Analysis;
|Platform||PacBio Sequel system|
|Recommended data amount||≥ 15 G bases pair per sample|
Content of Analysis
Transcription Factor analysis
Fusion Transcript analysis*
Alternative Splicing analysis*
Alternative PolyAdenylation analysis*
*Only available when reference genome is available
Novogene Workflow of iso-seq Service;
From sample and library preparation, SMRT sequencing, and data quality control, to bioinformatics analysis, Novogene provides high-quality products and professional services. Each step is performed in agreement with a high scientific standard and meticulous design to ensure high-quality research results.
Fresh-frozen primary and metastatic tumors with paired normal tissue
1. Illumina Technology: sequenced on an Illumina Hiseq X Ten platform to generate 125 bp paired-end reads.
2. PacBio System: sequenced on a PacBio RS II small-molecule real-time (SMRT) sequencing platform by two SMRT cells.
This study integrated second- and third-generation sequencing platforms to generate a multidimensional dataset on a patient affected by metastatic epithelial ovarian cancer. Besides, it reveals clinical application of the emerging long-read full-length analysis for improving molecular diagnostics is feasible and informative. An in-depth understanding of the tumor transcriptome complexity allowed by leveraging the hybrid sequencing approach lays the basis to reveal novel and valid therapeutic vulnerabilities in advanced ovarian malignancies.
Self-Recognition of an Inducible Host lncRNA by RIG-I Feedback Restricts Innate Immune Response
Innate immune system can sense the invading pathogens via pattern recognition receptors (PRRs) to initiate efficient innate response for the elimination of the pathogens. As the most extensively studied PRR for recognition of RNA virus, retinoic acid-inducible gene-I (RIG-I) has been shown to recognize viral RNAs in the cytoplasm and trigger innate immune response through the production of type I interferons (IFNs) and proinflammatory cytokines. However, the biological significance and the underlying mechanisms for the interaction of lncRNAs with RBPs in the immunity and inflammation remain to be further investigated. The increasing evidence for the RBP-lncRNA interactions in association with protein functions inspired us to ask whether RIG-I can bind to ‘‘self’’ cellular lncRNAs, and if so, what is the biological function and importance of such self-recognition in maintaining immune homeostasis by feedback restricting or timely terminating RIG-I recognition of ‘‘non-self’’ RNA-induced innate inflammatory response.
2. Pacific Bioscience RS II platform
In this study, the full-length transcriptome sequencing was used to identify the full-length sequence of cytoplasmic lnc-Lsm3b, and a self-recognition model of lncRNA-RIG-I to inhibit RIG-I activation was found. This approach prevents overexpression of IFN-I to maintain the body’s immune homeostasis. lncRNA was identified as an important regulatory element for nucleic acid innate immune recognition and inflammation regulation, and it also revealed the key functions of lncRNA in anti-virus, providing new ideas for the study of prevention and treatment of inflammatory diseases.
A survey of the sorghum transcriptome using single-molecule long reads
Sorghum, a C4 crop plant used for food, feed, fibre and fuel, is one of the best-adapted cereals to drought and temperature; hence, used as a model system to investigate the molecular basis of adaptation to abiotic stresses. Although the genome sequence of several sorghum lines has been completed recently29,30, the transcriptome is not well annotated; the extent of alternative splicing (AS), the number of splice isoforms and transcriptome diversity due to alternative polyadenylation (APA) are largely unknown.
Sorghum (Sorghum bicolor L. Moench) seedlings under drought stress and control.
1. Illumina Platform: Hiseq
2. PacBio System: performed on a PacBio RS II instrument for a total of 28 SMRTcells.
In this study, full-length splice isoforms and APA sites of the sorghum transcriptome were sequenced and identified using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify. The analysis results reveal transcriptome-wide full-length isoforms at an unprecedented scale and uncovered novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop.
CCS (Circular Consensus Sequence), also known as reads of insert, can be created by correcting and aligning subreads to each other taken from a single ZMW. The CCS (Consensus Sequence Sequence) number can be obtained through subreads.
Length distribution of CCS(Circular Consensus Sequence) reads
The x-axis represents the read length; the y-axis indicates frequency count corresponding to the read length
Distribution of isoform numbers by the characterization results. There are a significant number of isoforms for NIC or NNC (Novel isoforms) (left); Usually, one-gene-one-isoform distribution can be observed in most of the cases, especially for Novel genes (right).
Isoform numbers by structural category (left) and by gene type (right)
The x-axis shows isoform classification; the y-axis shows isoform percentage for each classification (Left)；
The x-axis shows gene type; the y-axis shows genes percentage for each “isoforms per gene” classification (Rright)
The transcript length distribution and exon number distribution of the isoforms by the structural classification are both presented in a boxplot.
Transcript length distribution by structural classification (left) and exon numbers distribution by structural classification (right) by transcript type
The x-axis shows transcript classification; the left y-axis shows the length of transcript in each classification; the right y-axis shows the number of exons of transcript in each classification
The Gene Ontology (GO) project aims to provide reliable descriptions of gene products within several databases. GO vocabularies (ontologies) explain gene products concerning their associated biological processes, molecular functions, and cellular components in a species-independent approach. GO annotation is only available for identified novel genes and isoforms.
Gene Ontology Annotation Classification
The x-axis shows the three GO categories, and the y-axis shows the number of differential genes annotated to the term (including the sub-term of the term). The three different categories represent the three basic classifications of GO term (from left to right are biological processes, cellular components, and molecular functions)；
CNCI (Coding-Non-Coding Index)
CNCI (Coding-Non-Coding Index) is a powerful signature tool to predict the sequences based on the intrinsic composition and offers accurate classification of transcripts assembled from whole-transcriptome sequencing data. PLEK is a tool for predicting long non-coding RNAs and mRNAs in the absence of genomic sequences or annotations using a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm. The results from PLEK and CNCI are shown in the Venn diagrams.
Venn diagrams of results from PLEK and CNCI
Summary of alternative splicing events
Alt.3’:Alternative 5′ splice site; Alt.5’:Alternative 3′ splice site
*Please contact us to get the full demo report.