[ad_1]
We recently published the Berkeley Crossword Solver (BCS), a modern level of American-style crossword solving. BCS combines neural question answering and probabilistic inference to achieve near-perfect performance on most American-style crosswords, such as the one shown below:
Figure 1: An example of an American-style crossword puzzle
An earlier version of BCS, along with Dr.Fill, was the first computer program to beat all competitors in the world’s best crossword tournament. The latest version is The New York Times’ current most efficient crossword puzzle system, achieving 99.7% letter accuracy (see technical paper, web demo, and code release).
Crossword puzzles are difficult for both humans and computers. Many of the clues are vague or imprecise and cannot be answered until the crossing of limitations is considered. While some clues are similar to answering factoid questions, others require relational reasoning or understanding complex wordplay.
Here are some examples from our database (answers at the bottom of this post):
- They are issued by the HAAS School of Berkeley (4)
- winter hours at Berkeley (3)
- The domain found that UC Berkeley was one of the first schools to adopt (3)
- Angeleno at Berkeley, say (8)
BCS uses a two-step process to solve crossword puzzles. First, it generates a probability distribution of possible responses to each cue using a question-answering (QA) model; Second, it uses probabilistic inference, combined with local search and a generative language model, to resolve conflicts between proposed intersection answers.
Figure 2: Architectural diagram of the Berkeley crossword puzzle
The BCS question-answering model is based on DPR (Karpukhin et al., 2020), which is a bi-encoder model commonly used to find passages relevant to a given question. However, instead of passages, our approach maps both questions and answers into a common embedded space and finds the answers directly. Compared to the previous state-of-the-art method for answering crossword clues, this approach achieved a 13.4% absolute improvement in top-1000 QA accuracy. We performed a manual error analysis and found that our QA model generally performed well on questions involving knowledge, common sense, and definitions, but it often had difficulty understanding wordplay or topic-related cues.
After running the QA model on each cue, BCS runs a belief loop propagation to iteratively update the response probabilities in the network. This allows information from high confidence predictions to be extended to more complex cues. After the belief distributions converge, BCS obtains an initial solution to the puzzle by greedily taking the highest probability answer at each position.
BCS then refines this solution by using a local search that tries to replace low-confidence characters in the grid. Local search works using a guided proposition distribution, in which symbols that had lower marginal probabilities during the belief propagation are iteratively replaced until a local optimal solution is found. We evaluate these alternative characters using a character-level language model (ByT5, Xue et al., 2022), which handles novel responses better than our closed-book QA model.
Figure 3: An example of the changes made by our local search procedure
We evaluated BCS puzzles from five major crossword publishers, including The New York Times. Our system averages 99.7% letter accuracy, which jumps to 99.9% if you ignore puzzles involving rare themes. It solves 81.7% of puzzles without a single error, which is 24.8% better than the previous state-of-the-art system.
Figure 4: Results compared to previous state-of-the-art Dr.Fill
The American Crossword Puzzle Tournament (ACPT) is the largest and longest-running crossword puzzle tournament and is organized by Will Shortz, New York Times crossword editor. Two previous approaches to solving computer crossword puzzles have received major attention and competition in the ACPT: Proverb and Dr.Fill. Proverb is a 1998 system that finished 213th out of 252 players in the tournament. Dr.Fill’s first competition was ACPT 2012 and he placed 141st out of 650 competitors. We teamed up with Dr.Fill creator Matt Ginsberg and combined an early version of our QA system with Dr.Fill’s search procedure to outperform all 1,033 human competitors in the 2021 ACPT. Our combined submission solved all seven puzzles in under a minute, missing only three letters in two puzzles.
Figure 5: Results of the 2021 American Crossword Puzzle Tournament (ACPT)
We’re really excited about the challenges that remain in crosswords, including tackling complex themes and playing more difficult words. To encourage future work, we are publishing a dataset of 6.4M question-answer clues, a demo of the Berkeley crossword solver, and our code at http://berkeleycrosswordsolver.com.
Answers to the clues: MBAS, PST, EDU, INSTATER
[ad_2]
Source link