Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Friday 6 September 2019

Making Crossword Entries from Jeopardy! Before and After Clues


Before and After clues in Jeopardy! are clues pointing to two answers in which the last part of one answer is also the first part of the second answer. For example, "Supernatural kids' cartoon meets Star Wars prequel" could be a clue to both "Danny Phantom" and "The Phantom Menace", which would be shortened to "Danny Phantom Menace". 


These are a weak spot for me in Jeopardy, so I tried to make some more crosswords using only the 'before and after' clues on from Jeopardy! as found in the J-archive. It worked, sort of.


 More specifically, it worked fine, but it took a lot of supervision, there were only about 550 crossword-viable clues and answers in 15 years of shows, and the results are a bunch of trivia-based on crossword clues that are either impossible or, well, trivial depending on the solver's knowledge.


My original hope was that, among the 200,000+ clues that I had from an old J-archive scrape, that there would be a couple of thousand clues that could be cleaned and slotted into crosswords automatically in order to help me practice for the next online test in spring. Instead, there were about 900 clues that had bother "before" and "after" in their category name.


library(jsonlite)
library(stringr)

raw = fromJSON("Jarchive2014.json")

> dim(raw)
[1] 216930      7

> before = str_detect(raw$category,"BEFORE")
> after = str_detect(raw$category,"AFTER")
> table(before,after)
       after
before   FALSE   TRUE
  FALSE 215646    195
  TRUE     215    874



Thankfully, before and after clues are always in text format instead of being audio and visual clues, so they get through most of my usual filters.

all$Viable = TRUE

## hyper-references signal audio or video clues
all$Viable[str_detect(all$Clue,"A HREF")] = FALSE
all$Viable[str_detect(all$Clue,"a href")] = FALSE

## These are lists, where a crossword clue won't make sense
all$Viable[str_detect(all$Clue,"<br ")] = FALSE

## Another indication of a video clue
all$Viable[str_detect(all$Clue,"Clue Crew")] = FALSE

## CrossWORD not crossNUMBER
all$Viable[str_detect(all$Word,"[0-9]")] = FALSE


From my experience watching, every 'before and after' clue comes from a category that has both the words 'before' and 'after' in the title. However, it's not a bijection; sometimes these words are used as wordplay to introduce some other category like "BEFORE AN AFTER" for answers like 'rafter' and 'crafter'. In one case, the category was "Before and after", but the clues were about how a single thing had changed before and after some event. Cases like these made up 15-20% of the clues and were removed manually.


From the remainder, I extracted the common word between two parts of the answer, which was usually a single word. This common word was to be the crossword answer, and the remaining part of the Jeopardy! answer would be the crossword clue. So "Danny Phantom Menace" would be clued as "Danny _______ Menace" with the answer in the grid being PHANTOM. For three-word Jeopardy! answers, this worked nearly all the time by taking the middle of the three words. 


For four-word answers, this usually (but not always) worked by whichever middle word was capitalized and had at least three letters. Under this four-word scheme, "Licence to ______ Bill" and "full ______ of Lords" became clues to "Kill" and "House" respectively. Of these, each one was manually checked for edge cases like "Gone with the (WIND)shield" and capitalization quirks like "Catherine The (GREAT) Expectations" and "Round (TABLE) of contents". Seeing as before and after is a weak spot of mine, I may have missed a few.

There's even a few triple before and afters, like "Paradise (LOST) (GENERATION) X", "Josephine (BAKER) (STREET) smarts", and "Orlando (MAGIC) (MOUNTAIN) Dew", which I left in. What to do with these is left as an exercise.

About 100 of the answers were five or more words long, at which point it became very hard to find the common term to make crossword clues. This include like "the simple (LIFE) goes on", "In the Heat of the (NIGHT) Gallery", "Never Never (LAND) of the midnight sun", and "Pol (POT) calling the kettle black". At this point, I decided not to chase diminishing returns, especially because many of these would be easy crossword clues if either side of the common word was given.

What remained is a database of clues that could be used to help fill a crossword, but not nearly enough to build a decent one on their own. As such, I gave up trying to make crosswords from these, and instead decided to just share the database. I hope that it helps some future constructor looking for fresh fill for puzzles they are already working on, especially one working for Queer Qrosswords ( https://queerqrosswords.com/ ), or The Inkubator ( https://inkubatorcrosswords.com/ ).


Examples:

Smokey _______ Crusoe
Robinson
Pixie and _______ Chicks
Dixie
My Fair _______ Marmalade
Lady
Paul _______ Texas Ranger
Walker
Miss _______ the Beautiful
America
Electoral _______ of Cardinals
College
Victorias _______ Weapon
Secret
Westminster _______ Road
Abbey
Little _______ of Gibraltar
Rock
birth of _______ Williams
Venus
William Henry __________ Ford
Harrison
Jacques Louis _______ Hasselhoff
David
American _______ Yourself
Express
The Pelican_______  Encounter
Brief
Victor _______ Audiences
Mature
Roe v  _______  Boggs
Wade
Veruca _______ Lake City
Salt
Proud ______ Baker Eddy
Mary
Don _______ Chi Minh
Ho
William _______ and Teller
Penn
Huey _______ John Silver
Long
Sharon _______ of Scone
Stone
Marvin v _______  Gaye
Marvin
Andre the _______ Squid
Giant
Spider _______ Friday
Man
Ponce de _______ Spinks
Leon
Yabba Dabba ______ wop
Doo


Crosswords are a great way to pass the time in transit

No comments:

Post a Comment