HaQ: a proof-of-concept automated trivia answer app

Important disclaimer: use of this app is not only illegal (it violates HQ’s terms and conditions) but also unethical (given the real cash prizes at stake). I haven’t used HaQ to win a live HQ game, nor will I provide a working implementation here. This is a proof-of-concept only, inspired by similar projects.

an animated image showing HaQ answering a trivia question

The HaQing experience

HQ is a live, virtual trivia game show where hundreds of thousands of people puzzle their way through a gauntlet of 12 trivia questions for a chance to split $2,000 or more. In all honesty it’s very Black Mirror-esque.

Almost everyone I know who’s played HQ has admitted to Googling a question in the ten seconds you’re allowed to choose an answer. I’m also not the first person to think about automating this; Stephen Cognetta wrote a piece on using the Tesseract OCR library to parse screenshots taken from a mirrored device and so did Toby Mellor. All of the writeups I’ve seen, though, rely on mirroring the device using Quicktime, a process which can potentially be detected by an app and blocked.

Using a webcam as a substitute would get around this, but the logical next step (and a serious upgrade in cool factor) would be to build an app. Something where you hold your smartphone up to the screen, tap a button, and automagically get the right answer.

So I built just that. Meet HaQ.

OCR and parsing

This is what a typical HQ question screen looks like:

a standard HQ Trivia question screen with three answers

From the official Google mobile vision API reference:

a page of text with blocks of text, lines of text, and individual words demarcated

The API will recognize the question and answers as separate blocks of text, making HQ very friendly to OCR. I built my app using the sample codelab for the mobile vision API as a base which let me visualize the detected text to verify OCR accuracy.

an HQ Trivia question screen with the question and answers demarcated

In practice, the mobile vision API performs very quickly and reasonably accurately. It updates at about 2 frames per second, and only has minor errors in recognizing similar characters (ex. “g” and “q” for some fonts).

Once it recognizes the text, it identifies the question and answer strings by their position on the screen, and then passes them on to the next step.

Finding the right answer

We have several advantages here over, say, Watson in a Jeopardy game. We have an internet connection and can Google all we want. We also have three prospective answers, so we know it has to be one of them.

In Cognetta’s writeup he details three methods and their relative effectiveness, which are similar to the methods used by others:

Straight up Googling the question and eyeballing the answers. This doesn’t work for questions that rely on picking one of the given choices (ex. “which of these…” questions), but it works about half the time and is especially useful for “misconception” questions.
Googling the question and counting instances of each answer on the results page. This can be automated and works more often than the previous method but is prone to misconception questions.
Googling the question with each answer separately and comparing the number of results returned. This works better with “which of these…”-type questions but can still fail depending on the relative magnitude of the search terms.

As tempting as it is to write some sort of deep learning, natural language processing algorithm to semantically analyze each question, I know I would get nowhere and the prototype would never get finished. So I used an ensemble method similar to the others:

Three separate queries of the question string to Google, Bing, and DuckDuckGo, counting the number of occurrences of each answer in the search results.
One query to Google for each question with each answer separately and counting of results returned.

Incidentally, this is against Google’s Terms and Conditions, so sorry Google. (Also, hire me.) Below is some console output showing the raw numbers for each method.

a console log showing search result numbers for each potential answer

Legality aside, these methods in combination work with near-perfect accuracy on simpler, classic trivia questions. Even with some types of harder questions the ensemble method will still give a good idea of what the correct answer choice could be. However, this is far from a game-breaking app; it performs abysmally on many types of questions and occasionally struggles to even return a definite answer.

Some quirks:

Search result counting is often very skewed in favor of popular search terms. The answer choice “New York City” would have many more search results than “Albany” regardless of whether the question.
These methods are far more effective for “classic” trivia questions (ex. in what year was the Statue of Liberty created?) than pop culture questions (ex. which of the following male fashion trends made a comeback this year?)
The app is significantly worse at parsing questions that have a negative words such as “not”, “isn’t”, etc. Sometimes picking an answer choice that isn’t displayed by HaQ yields the right answer, but that strategy fails at least as often as it works.
Looking for short answers in a list of search results has to be handled with caution. For example, one question about video game companies had “IGN” as an answer, which a naive search could count as being in unrelated words such as “signing”, “ignition”, etc.
All four queries combined usually return in under two seconds so it is sufficiently fast to be used in practice.

Displaying the answers

Since this is a proof-of-concept prototype I didn’t make it too fancy. There are four buttons each displaying the logo of a search engine, and when the queries return the button displays the most likely answer (or “?” if the query was inconclusive).

I thought about displaying the raw data behind each analysis, such as the number of search results found or the number of occurrences of each potential answer in the search results, but it become too complicated and the ensemble method was accurate enough anyway.

Potential improvements

Adding more methods to the ensemble would naturally give greater accuracy. I limited my search engine scraping so as not to get my IP blocked (and again, it’s against the Terms and Conditions) but legal methods like the Faroo API or the Bing Search API would let me scrape more data.
Learning how to parse trivia questions is a rabbit hole that leads to several fields at the forefront of computer science and data science – ex. natural language processing, deep learning, artificial intelligence – which are technically challenging to develop and implement but would undoubtedly lead to better accuracy.
If I was developing this for public distribution, which I’m not, I would make the user interface nicer.
While the OCR is sufficiently accurate enough to be reliable under simulated real-world conditions, it can err with letters that are similar to others. Search engines autocorrect queries so it’s not a problem if the question is misspelled, but searching for a misspelled answer would likely give no results (ex. searching for “Albanq” instead of “Albany”). Some way to autocorrect misspelled search terms, or to extract the most likely search term from OCR on multiple frames, would help alleviate this problem.