No Stopping: Impact of the Stopping Rule on the Sex Ratio

20 Jun

For social scientists brought up to worry about bias stemming from stopping data collection when results look significant, the fact that a gender based stopping rule has no impact on the sex ratio seems suspect. So let’s dig deeper.

Let there be n families and let the stopping rule be that after the birth of a male child, the family stops procreating. Let p be the probability a male child is born and q=1−p

After 1 round: 

\[\frac{pn}{n} = p\]

After 2 rounds: 

\[ \frac{(pn + qpn)}{(n + qn)} = \frac{(p + pq)}{(1 + q)} = \frac{p(1 + q)}{(1 + q)} \]

After 3 rounds: 

\[\frac{(pn + qpn + q^2pn)}{(n + qn + q^2n)}\\ = \frac{(p + pq + q^2p)}{(1 + q + q^2)}\]

After k rounds: 

\[\frac{(pn + qpn + q^2pn + … + q^kpn)}{(n + qn + q^2n + \ldots q^kn)} \]

After infinite rounds:

Total male children: 

\[= pn + qpn + q^2pn + \ldots\\ = pn (1 + q + q^2 + \ldots)\\ = \frac{np}{(1 – q)}\]

Total children:

\[= n + qn + q^2n + \ldots\\ = n (1 + q + q^2 + \ldots)\\ = \frac{n}{(1 – q)}\]

Prop. Male:

\[= \frac{np}{(1 – q)} * \frac{(1 – q)}{n}\\ = p\]

If it still seems like a counterintuitive result, here’s one way to think: In each round, we get pq^k successes, and the total number of kids increases by q^k. Yet another way to think is that for any child that is born, the data generating process is unchanged.

The male-child stopping rule may not affect the aggregate sex ratio. But it does cause changes in families. For instance, it causes a negative correlation between family size and the proportion of male children. For instance, if your first child is male, you stop. (For more results in this vein, see here.) This has the consequence that women on average grow up in larger families and that may explain some of the poor outcomes of women.

But why does this differ from our intuition that comes from early stopping in experiments? Easy. We define early stopping as when we stop data collection as soon as the results are significant. This causes a positive bias in the number of false-positive results (w.r.t. the canonical sample-fixed-in-advance rule). But early stopping leads to both kinds of false positives—mistakenly thinking that the proportion of females is greater than .5 and mistakenly thinking that the proportion of males is greater than .5. The rule is unbiased w.r.t. to the expected value of the proportion.