R/interval_join.R
interval_join.Rd
Joins tables based on overlapping intervals: for example, joining
the row (1, 4) with (3, 6), but not with (5, 10). This operation is sped up
using interval trees as implemented in the IRanges package. You
can specify particular relationships between intervals (such as a maximum gap,
or a minimum overlap) through arguments passed on to
findOverlaps
. See that documentation for descriptions
of such arguments.
interval_join(x, y, by, mode = "inner", ...) interval_inner_join(x, y, by = NULL, ...) interval_left_join(x, y, by = NULL, ...) interval_right_join(x, y, by = NULL, ...) interval_full_join(x, y, by = NULL, ...) interval_semi_join(x, y, by = NULL, ...) interval_anti_join(x, y, by = NULL, ...)
x | A tbl |
---|---|
y | A tbl |
by | Columns by which to join the two tables. If provided, this must be two columns: start of interval, then end of interval |
mode | One of "inner", "left", "right", "full" "semi", or "anti" |
... | Extra arguments passed on to |
This allows joining on date or datetime intervals. It throws an error if the type of date/datetime disagrees between the two tables.
This requires the IRanges package from Bioconductor. See here for installation: https://bioconductor.org/packages/release/bioc/html/IRanges.html.
if (requireNamespace("IRanges", quietly = TRUE)) { x1 <- data.frame(id1 = 1:3, start = c(1, 5, 10), end = c(3, 7, 15)) x2 <- data.frame(id2 = 1:3, start = c(2, 4, 16), end = c(4, 8, 20)) interval_inner_join(x1, x2) # Allow them to be separated by a gap with a maximum: interval_inner_join(x1, x2, maxgap = 1) # let 1 join with 2 interval_inner_join(x1, x2, maxgap = 20) # everything joins each other # Require that they overlap by more than a particular amount interval_inner_join(x1, x2, minoverlap = 3) # other types of joins: interval_full_join(x1, x2) interval_left_join(x1, x2) interval_right_join(x1, x2) interval_semi_join(x1, x2) interval_anti_join(x1, x2) }#>#>#>#>#>#>#>#>#>#> id1 start end #> 3 3 10 15