`subsetNewTranscripts()` will retain transcripts in `query` that are distinct from those in `ref`

subsetNewTranscripts(query, ref, refine.by = c("none", "intron", "cds"))

Arguments

query

GRanges object containing query GTF data.

ref

GRanges object containing reference GTF data.

refine.by

Whether to refine the selection process by removing query transcripts with similar introns or CDS structure to reference. Default input is "none", and can be changed to "intron" or "cds" respectively.

Value

Filtered GRanges GTF object

Details

`subsetNewTranscripts()` will compare query and reference GTF GRanges and return query transcripts with different exon structures from reference transcripts. Transcriptome assemblers may sometime extend 5' and 3' ends of known transcripts based on experimental data. These annotated transcripts can be removed by inputting "intron" to the refine.by argument. This will further compare and remove transcripts of identical intron structures. Alternatively, transcripts with unique CDS coordinates can be selected by typing "cds" to the refine.by argument.

Author

Fursham Hamid

Examples

# Load dataset
data(matched_query_gtf, ref_gtf)

# shortlist new transcripts
subsetNewTranscripts(matched_query_gtf, ref_gtf)
#> Removing transcripts with exact exon coordinates
#> GRanges object with 27 ranges and 6 metadata columns:
#>        seqnames            ranges strand |       type transcript_id
#>           <Rle>         <IRanges>  <Rle> |   <factor>   <character>
#>    [1]    chr10 79854652-79863424      + | transcript   transcript3
#>    [2]    chr10 79854652-79854721      + | exon         transcript3
#>    [3]    chr10 79856504-79856534      + | exon         transcript3
#>    [4]    chr10 79858752-79858824      + | exon         transcript3
#>    [5]    chr10 79858952-79859271      + | exon         transcript3
#>    ...      ...               ...    ... .        ...           ...
#>   [23]    chr10 79862014-79862047      + |       exon   transcript4
#>   [24]    chr10 79862449-79862541      + |       exon   transcript4
#>   [25]    chr10 79862653-79862869      + |       exon   transcript4
#>   [26]    chr10 79862978-79863055      + |       exon   transcript4
#>   [27]    chr10 79863145-79864359      + |       exon   transcript4
#>                      gene_id old_gene_id match_level   gene_name
#>                  <character> <character>   <numeric> <character>
#>    [1] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>    [2] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>    [3] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>    [4] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>    [5] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>    ...                   ...         ...         ...         ...
#>   [23] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>   [24] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>   [25] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>   [26] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>   [27] ENSMUSG00000006498.14       GeneA           4       Ptbp1
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths