## Stochastic record linkage

Given the features of two records/entities, the job of stochastic record linkage is to give a measure of the closeness of the two entities. The final job is to find if the two records refer to the same entity. This can be accomplished by building a threshold-based classifier based on the weights.

We will show how to leverage two methods, `emWeights`

and `epiWeights`

, implemented in the `RecordLinkage`

package.

### Expectation maximization method

The method, `emWeights`

, is based on the expectation maximization algorithm to derive from the weights, a measure of the closeness of two entities. According to this method, two conditional probabilities, one for match and an other for no match, has to be derived.

P (features | match = 0) and P (features | match = 1) are estimated using the expectation maximization algorithm. The weights are calculated as the ratio of these two probabilities. This approach is called the **Fellegi-Sunter model**.

> library(RecordLinkage)> data("RLdata500...