Evaluation

This class allows you to evaluate the accuracy of the parameters of the Fellegi-Sunter model.

class faster.evaluation.Evaluation(Lambda: float, Ksi: array, Counts: array)[source]

A class for evaluating the accuracy and uncertainty inherent in the estimates of the Fellegi-Sunter model.

Parameters:
  • Lambda (float) – Unconditional match probability.

  • Ksi (numpy.ndarray) – Array containing the conditional match probabilities for each pattern of discrete similarity levels across variables.

  • Counts (numpy.ndarray) – Array containing the observed counts for each pattern of discrete similarity levels across the compared variables.

FDR(S: float)[source]
Parameters:

S (float) – Threshold value used to calculate the False Discovery Rate (FDR).

Returns:

The False Discovery Rate (FDR), defined as the proportion of false matches among all pairs with a conditional match probability greater than or equal to the threshold S.

Return type:

float

FNR(S: float)[source]
Parameters:

S (float) – Threshold value used to calculate the False Negative Rate (FNR).

Returns:

The False Negative Rate (FNR), defined as the proportion of true matches among all pairs with a conditional match probability less than the threshold S.

Return type:

float

Frontier()[source]

Calculates the False Discovery Rate (FDR) and False Negative Rate (FNR) for all thresholds between 0 and 1 with increments of 1e-3, and displays the resulting frontier curve.

Optimal_Threshold(Alpha: float)[source]

Computes the threshold value that minimizes a linear combination of the False Discovery Rate (FDR) and the False Negative Rate (FNR).

Parameters:

Alpha (float) – Weight assigned to the False Negative Rate (FNR) in the linear combination.

Returns:

Threshold value that minimizes the weighted sum of the FDR and FNR.

Return type:

float