predictNMD.Rd
Predict NMD sensitivity on mRNA transcripts
predictNMD(x, ..., cds = NULL, NMD_threshold = 50, progress_bar = TRUE)
Can be a GRanges object containing exon and CDS transcript features in GTF format.
Can be a GRangesList object containing exon features for a list of transcripts.If so, `cds` argument have to be provided.
Can be a GRanges object containing exon features for a transcript. If so, `cds` argument have to be provided.
Logical conditions to pass to dplyr::filter to subset transcripts for analysis. Variables are metadata information found in `x` and multiple conditions can be provided delimited by comma. Example: transcript_id == "transcript1"
If `x` is a GRangesList object, `cds` has to be a GRangesList containing CDS features for the list of transcripts in `x`. List names in `x` and `cds` have to match.
If `x` is a GRanges object, `cds` has to be a GRanges containing CDS features for the transcript in `x`.
Minimum distance of stop_codon to last exon junction (EJ) which triggers NMD. Default = 50bp
Whether to display progress Default = TRUE
Dataframe with prediction of NMD sensitivity and NMD features:
is_NMD: logical value in prediciting transcript sensitivity to NMD
stop_to_lastEJ: Integer value of the number of bases between the first base of the stop_codon to the last base of EJ. A positive value indicates that the last EJ is downstream of the stop_codon.
num_of_down_EJs: Number of EJs downstream of the stop_codon.
`3_UTR_length`: Length of 3' UTR
## ---------------------------------------------------------------------
## EXAMPLE USING SAMPLE DATASET
## ---------------------------------------------------------------------
# Load datasets
data(new_query_gtf, query_exons, query_cds)
## Using GTF GRanges as input
predictNMD(new_query_gtf)
#> Predicting NMD sensitivities for 4 mRNAs
#> # A tibble: 4 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript1 -130 0 1158 FALSE NA
#> 2 transcript2 -130 0 1497 FALSE NA
#> 3 transcript3 364 3 644 TRUE chr10:79862473
#> 4 transcript4 -130 0 1085 FALSE NA
### Transcripts for analysis can be subsetted using logical conditions
predictNMD(new_query_gtf, transcript_id == "transcript1")
#> Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript -130 0 1158 FALSE NA
predictNMD(new_query_gtf,
transcript_id %in% c("transcript1", "transcript3"))
#> Predicting NMD sensitivities for 2 mRNAs
#> # A tibble: 2 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript1 -130 0 1158 FALSE NA
#> 2 transcript3 364 3 644 TRUE chr10:79862473
## Using exon and CDS GRangesLists as input
predictNMD(query_exons, cds = query_cds)
#> Predicting NMD sensitivities for 4 mRNAs
#> # A tibble: 4 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript1 -130 0 1158 FALSE NA
#> 2 transcript2 -130 0 1497 FALSE NA
#> 3 transcript3 364 3 644 TRUE chr10:79862473
#> 4 transcript4 -130 0 1085 FALSE NA
predictNMD(query_exons, cds = query_cds, transcript_id == "transcript3")
#> Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript 364 3 644 TRUE chr10:79862473
## Using exon and CDS GRanges as input
predictNMD(query_exons[[3]], cds = query_cds[[3]])
#> Predicting NMD sensitivities for 1 mRNAs
#> # A tibble: 1 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_coord
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 transcript 364 3 644 TRUE chr10:79862473
## ---------------------------------------------------------------------
## EXAMPLE USING TRANSCRIPT ANNOTATION
## ---------------------------------------------------------------------
# \donttest{
library(AnnotationHub)
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#>
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:rtracklayer’:
#>
#> hubUrl
## Retrieve GRCm38 trancript annotation
ah <- AnnotationHub()
#> snapshotDate(): 2022-10-31
GRCm38_gtf <- ah[["AH60127"]]
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> Importing File into R ..
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> '?repositories' for details
#>
#> replacement repositories:
#> CRAN: https://cloud.r-project.org
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> '?repositories' for details
#>
#> replacement repositories:
#> CRAN: https://cloud.r-project.org
## Run tool on specific gene family
predictNMD(GRCm38_gtf, gene_name == "Ptbp1")
#> Predicting NMD sensitivities for 9 mRNAs
#> # A tibble: 9 × 6
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD PTC_c…¹
#> <chr> <dbl> <int> <dbl> <lgl> <chr>
#> 1 ENSMUST00000057343 364 3 644 TRUE 10:798…
#> 2 ENSMUST00000095457 -130 0 1085 FALSE NA
#> 3 ENSMUST00000165704 -130 0 1497 FALSE NA
#> 4 ENSMUST00000165724 286 2 362 TRUE 10:798…
#> 5 ENSMUST00000168683 -145 0 0 FALSE NA
#> 6 ENSMUST00000169091 -36 0 0 FALSE NA
#> 7 ENSMUST00000169483 139 1 152 TRUE 10:798…
#> 8 ENSMUST00000171599 -63 0 0 FALSE NA
#> 9 ENSMUST00000172282 -130 0 1158 FALSE NA
#> # … with abbreviated variable name ¹PTC_coord
# }