subsetNewTranscripts.Rd
`subsetNewTranscripts()` will retain transcripts in `query` that are distinct from those in `ref`
subsetNewTranscripts(query, ref, refine.by = c("none", "intron", "cds"))
GRanges object containing query GTF data.
GRanges object containing reference GTF data.
Whether to refine the selection process by removing query transcripts with similar introns or CDS structure to reference. Default input is "none", and can be changed to "intron" or "cds" respectively.
Filtered GRanges GTF object
`subsetNewTranscripts()` will compare query and reference GTF GRanges and return query transcripts with different exon structures from reference transcripts. Transcriptome assemblers may sometime extend 5' and 3' ends of known transcripts based on experimental data. These annotated transcripts can be removed by inputting "intron" to the refine.by argument. This will further compare and remove transcripts of identical intron structures. Alternatively, transcripts with unique CDS coordinates can be selected by typing "cds" to the refine.by argument.
# Load dataset
data(matched_query_gtf, ref_gtf)
# shortlist new transcripts
subsetNewTranscripts(matched_query_gtf, ref_gtf)
#> Removing transcripts with exact exon coordinates
#> GRanges object with 27 ranges and 6 metadata columns:
#> seqnames ranges strand | type transcript_id
#> <Rle> <IRanges> <Rle> | <factor> <character>
#> [1] chr10 79854652-79863424 + | transcript transcript3
#> [2] chr10 79854652-79854721 + | exon transcript3
#> [3] chr10 79856504-79856534 + | exon transcript3
#> [4] chr10 79858752-79858824 + | exon transcript3
#> [5] chr10 79858952-79859271 + | exon transcript3
#> ... ... ... ... . ... ...
#> [23] chr10 79862014-79862047 + | exon transcript4
#> [24] chr10 79862449-79862541 + | exon transcript4
#> [25] chr10 79862653-79862869 + | exon transcript4
#> [26] chr10 79862978-79863055 + | exon transcript4
#> [27] chr10 79863145-79864359 + | exon transcript4
#> gene_id old_gene_id match_level gene_name
#> <character> <character> <numeric> <character>
#> [1] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [2] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [3] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [4] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [5] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> ... ... ... ... ...
#> [23] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [24] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [25] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [26] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [27] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths