R/genome_join.R
genome_join.Rd
This is an extension of interval_join
specific to genomic intervals.
Genomic intervals include both a chromosome ID and an interval: items are only
considered matching if the chromosome ID matches and the interval overlaps.
Note that there must be three arguments to by, and that they must be in the order
c("chromosome", "start", "end").
genome_join(x, y, by = NULL, mode = "inner", ...) genome_inner_join(x, y, by = NULL, ...) genome_left_join(x, y, by = NULL, ...) genome_right_join(x, y, by = NULL, ...) genome_full_join(x, y, by = NULL, ...) genome_semi_join(x, y, by = NULL, ...) genome_anti_join(x, y, by = NULL, ...)
x | A tbl |
---|---|
y | A tbl |
by | Names of columns to join on, in order c("chromosome", "start", "end"). A match will be counted only if the chromosomes are equal and the start/end pairs overlap. |
mode | One of "inner", "left", "right", "full" "semi", or "anti" |
... | Extra arguments passed on to |
All the extra arguments to interval_join
, which are
passed on to findOverlaps
, work for genome_join
as well. These include maxgap
and minoverlap
.
library(dplyr) x1 <- tibble(id1 = 1:4, chromosome = c("chr1", "chr1", "chr2", "chr2"), start = c(100, 200, 300, 400), end = c(150, 250, 350, 450)) x2 <- tibble(id2 = 1:4, chromosome = c("chr1", "chr2", "chr2", "chr1"), start = c(140, 210, 400, 300), end = c(160, 240, 415, 320)) if (requireNamespace("IRanges", quietly = TRUE)) { # note that the the third and fourth items don't join (even though # 300-350 and 300-320 overlap) since the chromosomes are different: genome_inner_join(x1, x2, by = c("chromosome", "start", "end")) # other functions: genome_full_join(x1, x2, by = c("chromosome", "start", "end")) genome_left_join(x1, x2, by = c("chromosome", "start", "end")) genome_right_join(x1, x2, by = c("chromosome", "start", "end")) genome_semi_join(x1, x2, by = c("chromosome", "start", "end")) genome_anti_join(x1, x2, by = c("chromosome", "start", "end")) }#> # A tibble: 2 x 4 #> id1 chromosome start end #> <int> <chr> <dbl> <dbl> #> 1 2 chr1 200 250 #> 2 3 chr2 300 350