In many matching papers, the key claim proceeds as follows: our matching method is better than others because on this set of contrived data, treatment effect estimates are closest to those from the ‘gold standard’ (experimental evidence).
Let’s side-step concerns related to an important point: evidence that a method works better than other methods on some data is hard to interpret as we do not know if the fact generalizes. Ideally, we want to understand the circumstances in which the method works better than other methods. If the claim is that the method always works better, then prove it.
There is a more fundamental concern here. Matching changes the estimand by pruning some of the data as it takes out regions with low support. But the regions that are taken out vary by the matching method. So, technically the estimands that rely on different matching methods are different—treatment effect over different sets of rows. And if the estimate from method X comes closer to the gold standard than the estimate from method Y, it may be because the set of rows method X selects produce a treatment effect that is closer to the gold standard. It doesn’t however mean that method X’s inference on the set of rows it selects is the best. (And we do not know how the estimate technically relates to the ATE.)