Evaluation
This class allows you to evaluate the accuracy of the parameters of the Fellegi-Sunter model.
-
class faster.evaluation.Evaluation(Lambda: float, Ksi: array, Counts: array)[source]
A class for evaluating the accuracy and uncertainty inherent in the estimates of the Fellegi-Sunter model.
- Parameters:
Lambda (float) – Unconditional match probability.
Ksi (numpy.ndarray) – Array containing the conditional match probabilities for each pattern of discrete similarity levels across variables.
Counts (numpy.ndarray) – Array containing the observed counts for each pattern of discrete similarity levels across the compared variables.
-
FDR(S: float)[source]
- Parameters:
S (float) – Threshold value used to calculate the False Discovery Rate (FDR).
- Returns:
The False Discovery Rate (FDR), defined as the proportion of false matches among all pairs with a conditional match probability greater than or equal to the threshold S.
- Return type:
float
-
FNR(S: float)[source]
- Parameters:
S (float) – Threshold value used to calculate the False Negative Rate (FNR).
- Returns:
The False Negative Rate (FNR), defined as the proportion of true matches among all pairs with a conditional match probability less than the threshold S.
- Return type:
float
-
Frontier()[source]
Calculates the False Discovery Rate (FDR) and False Negative Rate (FNR) for all thresholds between 0 and 1 with increments of 1e-3, and displays the resulting frontier curve.
-
Optimal_Threshold(Alpha: float)[source]
Computes the threshold value that minimizes a linear combination of the False Discovery Rate (FDR) and the False Negative Rate (FNR).
- Parameters:
Alpha (float) – Weight assigned to the False Negative Rate (FNR) in the linear combination.
- Returns:
Threshold value that minimizes the weighted sum of the FDR and FNR.
- Return type:
float