Gojiberries (Page 2)

Sign in Subscribe

More issues

WAR For Cricket

In baseball, WAR (Wins Above Replacement) is a comprehensive metric to quantify a player’s contribution to team success, relative to a replacement-level player. The idea has slowly migrated to other sports. In cricket, however, WAR remains underdeveloped. Most public evaluations still rely on averages, strike rates, or wickets — useful,

Pareto ML Deployments

In machine learning, a common deployment strategy is to replace an existing model with one that performs better overall. Another common strategy refines this approach by limiting deployment to user segments or regions where the improvements are clear. Both approaches allow regressions: new errors on cases that the old model

(Don't) Forget About It: Toward Pareto Improving GD

Machine learning models don't improve like traditional software. When we "update" a model, it sometimes begins to mishandle cases it previously solved—an outcome known as regression or “forgetting.” This issue is well-studied in continual learning, where models learn multiple tasks sequentially (French, 1999). Standard solutions

Greedy is Good. Less Greedy May be Better.

Forward stepwise regression, agglomerative hierarchical clustering, and CART rely on a simple principle: make the best local choice at each step. Greedy choices can also be optimal when problems possess the greedy choice property—where globally optimal solutions can be reached through locally optimal decisions, as in minimum spanning trees

Hungary For More? Optimal 1-to-Many Matching for Causal Inference

The Hungarian algorithm (Kuhn–Munkres) efficiently finds optimal one-to-one matches between treated and control units by minimizing total matching cost (typically Euclidean distance in covariate space). It has been used for estimating treatment effects via matching. But it has a limitation: it is strictly one-to-one. In many causal inference settings,

Optimizing Early Trajectories in K-Means Clustering: Lookahead Initialization for K-Means

K-Means performance depends heavily on how clusters are initialized. While k-means++ improves over random starts by spreading centroids apart, it’s still greedy and can lock into suboptimal configurations—especially in noisy or high-dimensional data. This post explores a simple tweak: lookahead initialization. For each candidate seed, we simulate a

The Invisible Hand Needs a Hand: Two Short Vignettes on Market Design

Top Choice On LinkedIn (Premium?), users can mark up to three jobs per month as a “Top Choice,” making them 43% more likely to hear back. (Note: Given that the relevant base rate is likely close to 1%, a 43% relative increase means going from 1 to 1.43 per

Credit Where It Is Due: Two Facts About Credit Cards

The Credit Card Market Has Become More Concentrated Over Time In 1987, the top 10 credit card issuers accounted for just about two-fifths of the market share based on outstanding balances. In 2024, "A relatively small group of card issuers holds most of the outstanding credit card balances, with