Evaluation¶

This class allows you to evaluate the accuracy of the parameters of the Fellegi-Sunter model.

class faster.evaluation.Evaluation(Lambda: float, Ksi: array, Counts: array)[source]¶

A class for evaluating the accuracy and uncertainty inherent in the estimates of the Fellegi-Sunter model.

Parameters:

Lambda (float) – Unconditional match probability.
Ksi (numpy.ndarray) – Array containing the conditional match probabilities for each pattern of discrete similarity levels across variables.
Counts (numpy.ndarray) – Array containing the observed counts for each pattern of discrete similarity levels across the compared variables.

Parameters:: S (float) – Threshold value used to calculate the False Discovery Rate (FDR).
Returns:: The False Discovery Rate (FDR), defined as the proportion of false matches among all pairs with a conditional match probability greater than or equal to the threshold S.
Return type:: float

Parameters:: S (float) – Threshold value used to calculate the False Negative Rate (FNR).
Returns:: The False Negative Rate (FNR), defined as the proportion of true matches among all pairs with a conditional match probability less than the threshold S.
Return type:: float

Frontier()[source]¶: Calculates the False Discovery Rate (FDR) and False Negative Rate (FNR) for all thresholds between 0 and 1 with increments of 1e-3, and displays the resulting frontier curve.

Optimal_Threshold(Alpha: float)[source]¶

Computes the threshold value that minimizes a linear combination of the False Discovery Rate (FDR) and the False Negative Rate (FNR).

Parameters:: Alpha (float) – Weight assigned to the False Negative Rate (FNR) in the linear combination.
Returns:: Threshold value that minimizes the weighted sum of the FDR and FNR.
Return type:: float