Skip to content

Penalized Calibration

Antoine R edited this page Aug 19, 2019 · 3 revisions

Penalized calibration

What is penalized calibration?

Sometimes, we want to calibrate on a set of margins while controlling the magnitude of the reweighting factors. Unfortunately, in some cases, the margins constraints cannot be met with the reweighting factors limited to the desired scope. A solution can be to drop a few margins from the problem, but a potential drawback of such a solution is that the estimates using the calibrated weights for the margins that were excluded from the problem can be very far from the margin.

Alternatively, we can use penalized calibration. With this calibration method, the calibration won’t be done exactly on every margin, but the difference to the margins can be controlled by assigning a cost to each margin. These costs may be infinite: calibration is done exactly on every margin with an inifinite cost.

One can refer to BOCCI, JEAN-FRANCOIS BEAUMONT–CYNTHIA. "Another look at ridge calibration." Metron 66.1 (2008): 5-20 for more details.

  • Example survey

Say we want to perform calibration on the same margins than what we did on page xxx, but with a gap between the minimum and the maximum reweighting factor not exceeding 1.4. We can try a calibration with the logit method and bounds (0.6 ; 2.0):

testTightLogit <- calibration(data=data_employees, marginMatrix=margins, colWeights="weight"
                              , method="logit", bounds=c(0.6,2.0), description=TRUE, popTotal = 230)

But the problem doesn’t have a solution:

Warning messages:
In calibAlgorithm(Xs, d, total, q, inverseDistance, updateParameters,  :
  No convergence

Let’s try to achieve this goal by using penalized calibration. Say that in our problem, we absolutely need an exact calibration on the variable "salary", but we don’t need to be exact on the other three variables. We write:

costs <- c(1,1,1,Inf)

testPenalizedLogit <- calibration(data=data_employees, marginMatrix=margins, colWeights="weight"
                              , costs=costs, gap=1.4, description=TRUE, popTotal=230)
After a few iterations, we get the result in the log:
Test with lambda = 0.00674285392857323
[1] 0.1949513
[1] 1.595005
Found lambda = 0.00674285392857323 ; count = 30

################### Summary of before/after weight ratios ###################
Calibration method : linear
Mean : 0.9661
    0%     1%    10%    25%    50%    75%    90%    99%   100% 
0.1950 0.2434 0.5774 0.7712 0.9579 1.1268 1.4663 1.5819 1.5950 

################### Comparison Margins Before/After calibration ###################
Careful, calibration may not be exact
$Total
Before calibration  After Calibration             Margin 
               230                230                230 

$category
  Before calibration After Calibration Margin
1                110             92.77     80
2                 70             84.08     90
3                 50             53.15     60

$sex
  Before calibration After Calibration Margin
1                110             128.9    140
2                120             101.1     90

$department
  Before calibration After Calibration Margin
1                130            110.14    100
2                100            119.86    130

$salary
Before calibration  After Calibration             Margin 
            434000             470000             470000 

In this example, calibration is indeed done exactly on the variable "salary", and approximately on variables "category", "sex", and "department". For these last three variables, the estimates using the weights coming from the penalized procedure are closer to the margin than the initial Horvitz-Thompson estimate.

Finally, bear in mind that just like for calibration on margins, statistical results on penalized calibration only hold for sufficient sample sizes.

Parameter popTotal

Please note that it is much safer to enter a value for the parameter popTotal when doing penalized calibration. Icarus will issue a warning if you try to use the penalized calibration method without entering a value for popTotal. The reason is that the popTotal parameter is sometimes used when margins that are entered as categorical variables. If the popTotal parameter is not entered, calibration is not done on the population total, which can lead to erratic behavior for the categorical variables of the problem. When popTotal is entered, calibration on the population total is done exactly.

Clone this wiki locally