-
Notifications
You must be signed in to change notification settings - Fork 7
Penalized Calibration
Sometimes, we want to calibrate on a set of margins while controlling the magnitude of the reweighting factors. Unfortunately, in some cases, the margins constraints cannot be met with the reweighting factors limited to the desired scope. A solution can be to drop a few margins from the problem, but a potential drawback of such a solution is that the estimates using the calibrated weights for the margins that were excluded from the problem can be very far from the margin.
Alternatively, we can use penalized calibration. With this calibration method, the calibration won’t be done exactly on every margin, but the difference to the margins can be controlled by assigning a cost to each margin. These costs may be infinite: calibration is done exactly on every margin with an inifinite cost.
One can refer to BOCCI, JEAN-FRANCOIS BEAUMONT–CYNTHIA. "Another look at ridge calibration." Metron 66.1 (2008): 5-20 for more details.
-
Example survey
Say we want to perform calibration on the same margins than what we did on page xxx, but with a gap between the minimum and the maximum reweighting factor not exceeding 1.4. We can try a calibration with the logit method and bounds (0.6 ; 2.0):
testTightLogit <- calibration(data=data_employees, marginMatrix=margins, colWeights="weight"
, method="logit", bounds=c(0.6,2.0), description=TRUE, popTotal = 230)But the problem doesn’t have a solution:
Warning messages:
In calibAlgorithm(Xs, d, total, q, inverseDistance, updateParameters, :
No convergenceLet’s try to achieve this goal by using penalized calibration. Say that in our problem, we absolutely need an exact calibration on the variable "salary", but we don’t need to be exact on the other three variables. We write:
costs <- c(1,1,1,Inf)
testPenalizedLogit <- calibration(data=data_employees, marginMatrix=margins, colWeights="weight"
, costs=costs, gap=1.4, description=TRUE, popTotal=230)Test with lambda = 0.00674285392857323
[1] 0.1949513
[1] 1.595005
Found lambda = 0.00674285392857323 ; count = 30
################### Summary of before/after weight ratios ###################
Calibration method : linear
Mean : 0.9661
0% 1% 10% 25% 50% 75% 90% 99% 100%
0.1950 0.2434 0.5774 0.7712 0.9579 1.1268 1.4663 1.5819 1.5950
################### Comparison Margins Before/After calibration ###################
Careful, calibration may not be exact
$Total
Before calibration After Calibration Margin
230 230 230
$category
Before calibration After Calibration Margin
1 110 92.77 80
2 70 84.08 90
3 50 53.15 60
$sex
Before calibration After Calibration Margin
1 110 128.9 140
2 120 101.1 90
$department
Before calibration After Calibration Margin
1 130 110.14 100
2 100 119.86 130
$salary
Before calibration After Calibration Margin
434000 470000 470000
In this example, calibration is indeed done exactly on the variable "salary", and approximately on variables "category", "sex", and "department". For these last three variables, the estimates using the weights coming from the penalized procedure are closer to the margin than the initial Horvitz-Thompson estimate.
Finally, bear in mind that just like for calibration on margins, statistical results on penalized calibration only hold for sufficient sample sizes.
Please note that it is much safer to enter a value for the parameter popTotal when doing penalized calibration. Icarus will issue a warning if you try to use the penalized calibration method without entering a value for popTotal. The reason is that the popTotal parameter is sometimes used when margins that are entered as categorical variables. If the popTotal parameter is not entered, calibration is not done on the population total, which can lead to erratic behavior for the categorical variables of the problem. When popTotal is entered, calibration on the population total is done exactly.