Predict NMD sensitivity on mRNA transcripts

predictNMD(x, ..., cds = NULL, NMD_threshold = 50, progress_bar = TRUE)

Arguments

x

Can be a GRanges object containing exon and CDS transcript features in GTF format.

Can be a GRangesList object containing exon features for a list of transcripts.If so, `cds` argument have to be provided.

Can be a GRanges object containing exon features for a transcript. If so, `cds` argument have to be provided.

...

Logical conditions to pass to dplyr::filter to subset transcripts for analysis. Variables are metadata information found in `x` and multiple conditions can be provided delimited by comma. Example: transcript_id == "transcript1"

cds

If `x` is a GRangesList object, `cds` has to be a GRangesList containing CDS features for the list of transcripts in `x`. List names in `x` and `cds` have to match.

If `x` is a GRanges object, `cds` has to be a GRanges containing CDS features for the transcript in `x`.

NMD_threshold

Minimum distance of stop_codon to last exon junction (EJ) which triggers NMD. Default = 50bp

progress_bar

Whether to display progress Default = TRUE

Value

Dataframe with prediction of NMD sensitivity and NMD features:

is_NMD: logical value in prediciting transcript sensitivity to NMD

stop_to_lastEJ: Integer value of the number of bases between the first base of the stop_codon to the last base of EJ. A positive value indicates that the last EJ is downstream of the stop_codon.

num_of_down_EJs: Number of EJs downstream of the stop_codon.

`3_UTR_length`: Length of 3' UTR

Author

Fursham Hamid

Examples


## ---------------------------------------------------------------------
## EXAMPLE USING SAMPLE DATASET
## ---------------------------------------------------------------------

# Load datasets
data(new_query_gtf, query_exons, query_cds)

## Using GTF GRanges as input
predictNMD(new_query_gtf)
#>     Predicting NMD sensitivities for 4 mRNAs
#> # A tibble: 4 × 6
#>   transcript  stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord     
#>   <chr>                <dbl>          <int>          <dbl> <lgl>  <chr>         
#> 1 transcript1           -130              0           1158 FALSE  NA            
#> 2 transcript2           -130              0           1497 FALSE  NA            
#> 3 transcript3            364              3            644 TRUE   chr10:79862473
#> 4 transcript4           -130              0           1085 FALSE  NA            

### Transcripts for analysis can be subsetted using logical conditions
predictNMD(new_query_gtf, transcript_id == "transcript1")
#>     Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#>   transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#>   <chr>               <dbl>          <int>          <dbl> <lgl>  <chr>    
#> 1 transcript           -130              0           1158 FALSE  NA       
predictNMD(new_query_gtf, 
transcript_id %in% c("transcript1", "transcript3"))
#>     Predicting NMD sensitivities for 2 mRNAs
#> # A tibble: 2 × 6
#>   transcript  stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord     
#>   <chr>                <dbl>          <int>          <dbl> <lgl>  <chr>         
#> 1 transcript1           -130              0           1158 FALSE  NA            
#> 2 transcript3            364              3            644 TRUE   chr10:79862473


## Using exon and CDS GRangesLists as input
predictNMD(query_exons, cds = query_cds)
#>     Predicting NMD sensitivities for 4 mRNAs
#> # A tibble: 4 × 6
#>   transcript  stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord     
#>   <chr>                <dbl>          <int>          <dbl> <lgl>  <chr>         
#> 1 transcript1           -130              0           1158 FALSE  NA            
#> 2 transcript2           -130              0           1497 FALSE  NA            
#> 3 transcript3            364              3            644 TRUE   chr10:79862473
#> 4 transcript4           -130              0           1085 FALSE  NA            
predictNMD(query_exons, cds = query_cds, transcript_id == "transcript3")
#>     Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#>   transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord     
#>   <chr>               <dbl>          <int>          <dbl> <lgl>  <chr>         
#> 1 transcript            364              3            644 TRUE   chr10:79862473


## Using exon and CDS GRanges as input
predictNMD(query_exons[[3]], cds = query_cds[[3]])
#>     Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#>   transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord     
#>   <chr>               <dbl>          <int>          <dbl> <lgl>  <chr>         
#> 1 transcript            364              3            644 TRUE   chr10:79862473



## ---------------------------------------------------------------------
## EXAMPLE USING TRANSCRIPT ANNOTATION
## ---------------------------------------------------------------------
# \donttest{
library(AnnotationHub)
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#> 
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:rtracklayer’:
#> 
#>     hubUrl

## Retrieve GRCm38 trancript annotation
ah <- AnnotationHub()
#> snapshotDate(): 2022-10-31
GRCm38_gtf <- ah[["AH60127"]]
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> Importing File into R ..
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> '?repositories' for details
#> 
#> replacement repositories:
#>     CRAN: https://cloud.r-project.org
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> '?repositories' for details
#> 
#> replacement repositories:
#>     CRAN: https://cloud.r-project.org

## Run tool on specific gene family
predictNMD(GRCm38_gtf, gene_name == "Ptbp1")
#>     Predicting NMD sensitivities for 9 mRNAs
#> # A tibble: 9 × 6
#>   transcript         stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_c…¹
#>   <chr>                       <dbl>          <int>          <dbl> <lgl>  <chr>  
#> 1 ENSMUST00000057343            364              3            644 TRUE   10:798…
#> 2 ENSMUST00000095457           -130              0           1085 FALSE  NA     
#> 3 ENSMUST00000165704           -130              0           1497 FALSE  NA     
#> 4 ENSMUST00000165724            286              2            362 TRUE   10:798…
#> 5 ENSMUST00000168683           -145              0              0 FALSE  NA     
#> 6 ENSMUST00000169091            -36              0              0 FALSE  NA     
#> 7 ENSMUST00000169483            139              1            152 TRUE   10:798…
#> 8 ENSMUST00000171599            -63              0              0 FALSE  NA     
#> 9 ENSMUST00000172282           -130              0           1158 FALSE  NA     
#> # … with abbreviated variable name ¹​PTC_coord
# }