Join two tables based not on exact matches, but with a function describing whether two vectors are matched or not

The match_fun argument is called once on a vector with all pairs of unique comparisons: thus, it should be efficient and vectorized.

fuzzy_join(
  x,
  y,
  by = NULL,
  match_fun = NULL,
  multi_by = NULL,
  multi_match_fun = NULL,
  index_match_fun = NULL,
  mode = "inner",
  ...
)

fuzzy_inner_join(x, y, by = NULL, match_fun, ...)

fuzzy_left_join(x, y, by = NULL, match_fun, ...)

fuzzy_right_join(x, y, by = NULL, match_fun, ...)

fuzzy_full_join(x, y, by = NULL, match_fun, ...)

fuzzy_semi_join(x, y, by = NULL, match_fun, ...)

fuzzy_anti_join(x, y, by = NULL, match_fun, ...)

Arguments

x	A tbl
y	A tbl
by	Columns of each to join
match_fun	Vectorized function given two columns, returning TRUE or FALSE as to whether they are a match. Can be a list of functions one for each pair of columns specified in `by` (if a named list, it uses the names in x). If only one function is given it is used on all column pairs.
multi_by	Columns to join, where all columns will be used to test matches together
multi_match_fun	Function to use for testing matches, performed on all columns in each data frame simultaneously
index_match_fun	Function to use for matching tables. Unlike `match_fun` and `index_match_fun`, this is performed on the original columns and returns pairs of indices.
mode	One of "inner", "left", "right", "full" "semi", or "anti"
...	Extra arguments passed to match_fun

Details

match_fun should return either a logical vector, or a data frame where the first column is logical. If the latter, the additional columns will be appended to the output. For example, these additional columns could contain the distance metrics that one is filtering on.

Note that as of now, you cannot give both match_fun and multi_match_fun- you can either compare each column individually or compare all of them.

Like in dplyr's join operations, fuzzy_join ignores groups, but preserves the grouping of x in the output.