Doubly Robust (DR) estimators of ATE are all the rage. One popular DR estimator is Robins’ Augmented IPW (AIPW). The reason why Robins’ AIPW estimator is called doubly robust is that if either your IPW model or your y ~ x model is correctly specified, you get ATE. Great!

Calling something “doubly robust” (DR) makes you think that the estimator is robust to (common) violations of commonly made assumptions. But DR replaces one strong assumption with one marginally less strong assumption. It is common to assume that IPW or Y ~ X are right. But DR replaces either of these with the OR clause. So how common is it to get either of the models right? Basically never. If neither model is right, you multiply the bias terms. And that ought to blow up the bias.

(There is one more reason to worry about the use of word ‘robust.’ In statistics, it is used to convey robustness of to violations of distributional assumptions.)

Given the small advance in assumptions, it turns out that the results aren’t better either (and can be substantially worse):

- “None of the DR methods we tried … improved upon the performance of simple regression-based prediction of the missing values. (see here.)
- “The methods with by far the worst performance with regard to RSMSE are the Doubly Robust (DR) approaches, whose RSMSE is two or three times as large as the RSMSE for the other estimators.” (see here and the relevant table is included below.)

Some people prefer DR for efficiency. But the claim for efficiency is based on strong assumptions being met: “The local semiparametric efficiency property, which guarantees that the solution to (9) is the best estimator within its class, **was derived under the assumption that both models are correct.** This estimate is indeed highly efficient when the π-model is true and the y-model is highly predictive.”

p.s. When I went through some of the lecture notes posted online, I was surprised that the lecture notes explain DR as “if A or B hold, we get ATE” but do not discuss the modal case.