# Thoughts on scoring words **Status:** Draft | January 2025 What makes a word a good word for a crossword? What makes it interesting? Some words might entertain a solver, such as `TANTALIZE`. Others might disgust or frustrate them: consider `MOIST` or `SSW` (the direction). Some words are overused and can cause eyes to roll — `EGOT` and `OLEO` — while others can excite and provoke wonder. Good words are the backbone of any word puzzle. When combined into a grid, it almost becomes a form of poetry: a combination of words that engage and delight the solver. It can be predetermined in places as the letters and words require, but it can have moments of whimsy and surprise. Or perhaps clever moments that provoke thought. This document is an attempt to enumerate measurable dimensions for words that could be interesting for word puzzles, and propose a few ways of using that to score word lists. There are no absolutes when it comes to language: everyone's lived experience is different and the language they know and speak and are familiar with ranges from person to person. A word getting a score here may underrepresent its value. Nevertheless, this attempts to provide some structure. > Crossword setters generally try to write for a common audience and > make their puzzles accessible. However, there are no absolutes when > it comes to culture. Words that are familiar to some are obscure to > others. There's a reason crosswords thrive in local newspapers. A > common location provides at least some common grounding for a setter > to target. > > For a wonderful musing and history of this from a gendered > perspective, read [Anna Shechtman's _The Riddles of the > Sphinx_](https://www.harpercollins.com/products/the-riddles-of-the-sphinx-anna-shechtman?variant=42834880167970). ## Puzzle Kinds It's worth noting that how each puzzle kind uses words affects how they approach words. Standard crosswords use a lot of filler words, and may have less flexibility as to what to choose. On the other hand, cryptics have a lot more ability to choose their words carefully and relatively few words in their grids. # Overall approach We are proposing scores to words to get better results when creating grids. These scores would be surfaced both in the _Word List_ and _Autofill_ functionalities. We start with the following assumptions: * _**Variety is key**_ First and foremost, having a good set of different types of words keeps the solver entertained and engaged. * _**Don't clump traits**_ It's bad form to have too much similarity in a section of a puzzle. * _**Where possible use familiar words...**_ It's fine to send your solver to the dictionary for some words, but if they need a dictionary to make any progress you might be making it too hard. * _**...but not too many**_ Expecting to stretch your users' vocabulary is a plus. In addition, occasionally you need to reach for an obscure word to make an otherwise strong section fill. * _**Human editing is best**_ Perhaps in the future it's possible to have AI create high quality grids, but the best ones will still have a high degree of human intervention. This is meant as an assistive tool and shouldn't be used to override editorial control. ## Traits We propose a few measurable **traits** for a word that can have a numerical rating. These dimensions can be used to drive variety in a grid, and give the autosolver something to work with beyond word shape. These ratings are considered independently of the grid being filled and can be precomputed before hand. The traits proposed are: 1. Lexical interest 1. Frequency 1. Familiarity 1. Definition count 1. Sentiment Each word can have a score for each trait. That would give the setter the ability to assess the overall grid and make decisions. It could also be used by the autosolver to pick better words. # Details We propose a way of measuring each of the traits below. For each trait, we discuss how to measure it and touch a little on how to calculate this. It will take quite some experimentation to build a practical score, ## Lexical Interest: Bigrams and Trigrams Unusual looking words that catch the eye are a often a plus in crosswords, and a good way to differentiate. One way to make a word unusual is to have an unexpected run of characters. For example, I would argue that `KNAVE` is more interesting than `THINK`. They both have an `N` and a `K` in them, but the `KN` bigram is rarer than `NK`. Likewise, there are some trigrams that are fairly rare — for example `OXC` in `OXCART`. This is an easier score to calculate as we don't need additional datasets. Go through the word and see if any pair or triple of letters is unexpected. If any of them are pass a threshold, we add it to the score. ## Frequency How often a word is used may be an interesting characteristic. A word thats used a lot. Fortunately, we can use the google ngram dataset to calculate the frequency of each word. There are also great lists of words used in existing puzzles that we could use to constrain it to crosswords. **Links:** * https://books.google.com/ngrams/ * https://cryptics.georgeho.org/ ## Familiarity Familiarity is akin to frequency but is different rating. Words can be familiar to solvers and not be in common parlance. Familiarity is harder to determine, though there are efforts out there to build a table. We'll have to research this. **Links:** * https://arxiv.org/pdf/1806.03431 ## Number of definitions and parts of speech This trait is particularly crossword-centric. Some words (think `SET` and `RUN`) have a lot of different meanings, and are useful for cryptics. We could compose a score valuing words that have a higher number of definitions or multiple parts of speech. We have the data to determine this already. ## Sentiment and beauty / Swearing and profanity It's possible to determine the sentiment around a word. People have done surveys to determine if it's positive or negative. Along the same lines, there are profane words that people probably don't want to see while solving a crossword over breakfast. > **NOTE:** The Peter Broda list has its own scoring > system. Empirically, it strongly values profanity and crassness in > its list. We may want to separate profanity as a separate trait from > sentiment. **Links:** * https://github.com/stdlib-js/datasets-liu-positive-opinion-words-en * https://en.wikipedia.org/wiki/Phonaesthetics * https://www.sciencedirect.com/science/article/abs/pii/0749596X86900215 * https://github.com/surge-ai/profanity # Other possibilitiies It's worth talking about a few things that are too situational to be a good dimension, or are hard to measure/calculate. ## Word Shape The _shape_ of a word. That is to say, the graphemes that are combined to create it. This is highly situational and can't be precomputed. For example, consider a Standard Crosswords with the word `WHEY` in it. That word will work really well in the last row, as every letter in it is a valid and relatively high-frequency last letter for the down clues. If it shifts up a row, you start running into problems. Words ending in `Y?` and `H?` are more rare, and it's not nearly as good a word in that position. The same word would have very different scores based on where it is. The autofill algorithm tries to acount for that by checking the crossing words for their frequency. As a result, we can skip this factor when precomputing scores. ## Clue words Crosswords are primarily about the clues, of course. Some words — especially for cryptics — just lead to good clues. Words with other words inside them, or words clever common homonyms or anagrams. The fact that `SILENT` and `LISTEN` are anagrams is a good (though overused) example of this. The afforementioned `WHEY` is a homonym of `WAY` and `WEIGH`, which makes it also valuable in cryptics. It's also worth considering word fragments. For example, words with other words embedded with them make for good cryptic answers too (and great for rebus puzzles). I don't have a good concept of how to measure this trait, yet.