R/fuzzy_join.R
fuzzy_join.Rd
The match_fun
argument is called once on a vector with all pairs
of unique comparisons: thus, it should be efficient and vectorized.
fuzzy_join( x, y, by = NULL, match_fun = NULL, multi_by = NULL, multi_match_fun = NULL, index_match_fun = NULL, mode = "inner", ... ) fuzzy_inner_join(x, y, by = NULL, match_fun, ...) fuzzy_left_join(x, y, by = NULL, match_fun, ...) fuzzy_right_join(x, y, by = NULL, match_fun, ...) fuzzy_full_join(x, y, by = NULL, match_fun, ...) fuzzy_semi_join(x, y, by = NULL, match_fun, ...) fuzzy_anti_join(x, y, by = NULL, match_fun, ...)
x | A tbl |
---|---|
y | A tbl |
by | Columns of each to join |
match_fun | Vectorized function given two columns, returning
TRUE or FALSE as to whether they are a match. Can be a list of functions
one for each pair of columns specified in |
multi_by | Columns to join, where all columns will be used to test matches together |
multi_match_fun | Function to use for testing matches, performed on all columns in each data frame simultaneously |
index_match_fun | Function to use for matching tables. Unlike
|
mode | One of "inner", "left", "right", "full" "semi", or "anti" |
... | Extra arguments passed to match_fun |
match_fun should return either a logical vector, or a data frame where the first column is logical. If the latter, the additional columns will be appended to the output. For example, these additional columns could contain the distance metrics that one is filtering on.
Note that as of now, you cannot give both match_fun
and multi_match_fun
- you can either compare each column
individually or compare all of them.
Like in dplyr's join operations, fuzzy_join
ignores groups,
but preserves the grouping of x in the output.