Diabetes as a Dummy Variable

30 Dec

“Adults with diabetes are nearly twice as likely to have heart disease or stroke as adults without diabetes.”

NIH

These alarming numbers are used to justify more pre-emptive testing, e.g., Calcium score for the heart, more aggressive use of statin therapy, etc., among diabetics.

The analyses that underlie the inference about risk that diabetics face compare the outcomes of people who are diagnosed with diabetes and similar non-diabetic others; the analyses treat diabetes as a dummy variable.

Let’s for a moment separate the diagnosis of diabetes (which may be based on having an H1A1c above 6 once) from how well diabetics control their blood sugar (“glycemic control”; H1A1c < 7). For argument’s sake, let’s also assume that glycemic control is the true causal variable for elevated risk of heart disease and stroke. The risk of heart disease and stroke rises with extent of glycemic control (intensity) and the length of the period (duration) for which your blood sugar is uncontrolled with risk profile for people with glycemic control looking no different from the ‘similar non-diabetic’ others. The predicates imply that excess heart attacks and strokes happen in patients who cannot control their blood sugar levels. Now say that the percentage of diabetics who are able to achieve glycemic control is approximately 25% (see here for a study of patients on insulin therapy). It means then that applying a policy learned on diabetes or not leads to overtreating some diabetics and undertreating others. Improving the precisions of the targeting variable may improve outcomes.

p.s. It may be that the optimal policy with respect to some therapies is ‘overtreatment’ as the famous Metformin study that showed that diabetics on Metformin live longer than non-diabetics suggests.

What do we learn from bestseller regressions?

26 Dec

In The Bestseller Code, Archer and Jockers learn about the attributes of bestsellers by regressing whether or not a book is a bestseller on features of the book’s text. They find that topics like “human closeness” are prognostic of a book’s success. So is the author’s ability to focus on a few topics.

“It turns out that successful authors consistently give that sweet spot of 30 percent to just one or two topics, whereas non-bestselling writers try to squeeze in more ideas. To reach a third of the book, a lesser-selling author uses at least three and often more topics. To get to 40 percent of the average novel, a bestseller uses only four topics. A non-bestseller, on average, uses six.”

The Bestseller Code

The authors also conclude that “[t]wo notable sets of underperforming topics are all things fantastical and otherworldly.” This got me thinking about whether the insights stood the test of time (the regression doesn’t take account of evolution of readers’ tastes) or if the insights were right to begin with.

What is clear is that it is hard to interpret “underperforming.” One understanding of underperforming is that people don’t like books with topic X. Another is that people love books on topic X but because of that, there is a greater supply of books with topic X, making any book with topic X less likely to succeed (see here for a related simulation). As the authors write (in a slightly different context):

“Copycat publishing works just that way. After The Girl with the Dragon Tattoo, there was a vogue for publishing Swedish crime writers across the world.”

The authors seem to side with the former interpretation though the latter explanation seems likelier. Why does it matter? It tells us that we may learn more about readers’ tastes from modeling supply. And if the aim is to figure out the optimal strategy as an author, it is apt to consider both readers’ preferences and what is undersupplied in the market (assuming the cost of supplying is the same).

Reading Readers’ Needs

14 Dec

With Gaurav Gandhi

The books people pick to learn from often fail them. All too often, the book has intimidating jargon, or the examples are irrelevant to the reader, or there is too much duplication. Or the content is just wrong. 

Readers make poor choices about books because finding a ‘good’ book that matches their level and needs is hard. In fact, given that readers’ needs are likely too diverse, the right book likely doesn’t exist for many, if not most, readers.

Technology can help improve this equilibrium. It can improve the supply by helping writers write more and more clearly. It can also reduce search costs by helping readers pick a book that is better suited to them. Technology can also make books more comprehensible by customizing the content to the reader’s ability and interest. For starters, we could translate the text into the readers’ language of choice. Or let readers read in someone else’s voice (convert to an audiobook). To better cater to readers’ abilities, we could dynamically adapt the text difficulty and add definitions and illustrations based on the reader’s preferences.

No text is perfect. But the reader may be unaware of the gaps in data and arguments. A service that lists alternative views and links to other arguments, data, and authoritative sources may be useful.

It is conventional wisdom that discussion can improve comprehension. We could follow that logic and provide tools and access to a community to discuss the book. If building a community is hard, a bot could engage the user in a discussion. 

Readers don’t just read to comprehend, they also read to retain, share, and synthesize. But most reading applications today provide scant support for these needs. Note-taking ability and tools that help retain information, e.g., auto-generated flashcards or quizzes, etc., are all missing.

Many of these gaps don’t just exist for reading. They also exist for other modes of information consumption. The missing features are a symptom. The underlying problem is that we do not cognize the need for an app for unstructured (or semi-structured) learning. The good news is that today LLMs can help fill many of the gaps. The only thing that we need to do is get building!

Inter-Group Prejudice

16 Dec

Prejudice is a bane of humanity. Unjustified aversive beliefs and affect are the primary proximal causes of aversive behavior toward groups. Such beliefs and sentiments cause aversive speech and physical violence. They also serve as justification for denying people rights and opportunities. Prejudice also creates a deadweight loss. For instance, many people refuse to trade with groups they dislike. Prejudice is the reason why so many people lead diminished lives.

So why do so many people have incorrect aversive beliefs about other groups (and commensurately, unjustified positive beliefs about their group)?

If you have suggestions about how to improve the essay, please email me.