Maximizing Vocabulary Coverage with Books: A Submodular, NP-Hard Selection Problem

Choosing one book with maximal weighted vocabulary coverage is straightforward and computable in near-linear time. Selecting the best k books becomes the NP-hard maximum coverage problem, making exact solutions impractical for larger k. Greedy submodular maximization and related heuristics offer efficient, provably good approximations in practice.

Key Points

Single-book selection is easy: compute a weighted coverage score using language-wide word frequencies; this can be done in near-linear time after optional stop-word filtering.
Exact selection of the best two books is already quadratic; the best k-book selection is the NP-hard maximum (weighted) coverage problem.
Because the objective is submodular and monotone, greedy algorithms provide strong approximation guarantees and are practical.
Libraries like submodlib implement fast submodular maximization methods suitable for this task.
Heuristics such as block additions, look-ahead, and pruning overlapping books can improve practical results without changing worst-case approximation bounds.

Sentiment

The overall sentiment of the Hacker News discussion is critical and largely opposes the implicit assumption that algorithmic optimization, as described in the article, is the most important factor for effective or successful language learning applications. Commenters argue that human, emotional, and motivational factors (such as gamification and low effort) are far more crucial in real-world user adoption and perceived success.

Opposed

Language learning is primarily an emotional process, and successful applications prioritize "vibes" and emotional engagement over strictly optimal or efficient teaching methods.
Popular language learning apps like Duolingo are not truly effective language learning tools but rather gamified systems that provide dopamine or the *feeling* of learning without substantial instruction.
Consumers in the language learning market often prioritize low effort, gamification, maintaining streaks, or leaderboard scores over actual, efficient, and deep language acquisition.
The success of market leaders like Duolingo is attributable to factors such as being highly gamified or requiring the lowest user effort, rather than superior teaching algorithms or methods.