The Great Knowledge Box Showdown: Google Now vs. Siri vs. Cortana / Blogs / Perficient

Google’s Knowledge Graph has been the center of much attention lately. We have been hearing a lot about another concept called the Knowledge Vault as well. But just how extensive is Google’s capability? And what about Siri and Bing/Cortana? How do they stack up? To find out, we loaded the Google App onto an iPhone (Google Now is part of the Google App), tested out Siri, and got our hands on a Windows phone so we could test Cortana, and took them all for an extended test drive.
UPDATE! (20 September 2015) With the introduction of a Siri for iOS 9, Re/code Magazine asked Perficient Digital to rerun our question set to see how much Siri has improved (if at all). Read the results here.

These are the things we set out to measure in this study. To do that, we took 3086 different queries and compared them across all three platforms. These were not random queries. In fact, they were picked because we felt they were likely to trigger a knowledge panel.
In addition, this was a straight up knowledge box comparison, not a personal assistant comparison. In addition, please note that Cortana is in beta, and is promoting itself as a personal assistant. For purposes of this study, a “knowledge box” or “knowledge panel” is defined as content in the search results that attempts to directly answer a question asked in a search query. Others in the industry sometimes refer to these as “Answer Boxes”. Here is a simple example of one:

Knowledge boxes can show up in many forms, including:

On the right rail of the search results
As step by step instructions above the regular web search results
As a structured snippet incorporated into the regular web search results
In the form of a carousel above the search results

All queries in this test were done using voice commands via their respective apps), even when using Google and Bing. The reason we did this is that there are many commands in Google and Bing that behave differently when the search query is typed in, and we wanted to do a straight apples to apples comparison. The devices used were:

Cortana running on a Nokia lumia 635 Windows Phone
Siri running on the iPhone 4s and iPhone 5
The Google App (of which Google Now is a part) running on the iPhone 4s and iPhone 5

You can see Perficient Digital staff members Caitlin O’Connell and Justin Markuson demonstrate some basic queries in this short 3 minute video

Click here to jump straight to the results!

Types of Results

Google uses many sources of data for the Knowledge Graph. Here is what Google’s Amit Singhal told us about that back in May 2012:

Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also augmented at a much larger scale because we’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.

Choosing a Global Software Development Partner to Accelerate Your Digital Strategy

To be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.

Get the Guide

Note: When this study was first published, the screenshots we showed here were from their desktop search, and we have now changed that. However, ALL searches performed in the test were performed via phone and voice search as detailed above. This was simply an author error in taking the screenshots.
Not only does Google pull from many sources, but they also have many different types of ways of presenting results. Let’s look at a few interesting examples:

Not only do we get our answer, but Google offers us info on three other tall buildings. Google noticed that people who search on the height of the Eiffel Tower often want to know the height of other tall buildings. If you click on the “Burj Khalifa” link, it becomes even more interesting. Here is what you get:

The fascinating part about this result is that it has dramatically expanded the number of options with regard to other famous buildings, by presenting us a carousel (the common industry name for the strip of results up at the top) of results. I don’t get that result if I simply search on “burj khalifa height.” Instead, I get a much simpler variation as follows:

As you see here, the results are quite different. The version with the carousel reflects the fact that I clicked on the link for Burj Khalifa when viewing the Eiffel Tower result. As soon as Google saw I was interested in more than one building, they gave me an even larger set. They could potentially figure out which buildings to show by seeing which queries typically follow your current query.
I.e., of all the people who search “how tall is the Eiffel Tower,” how many of them then search on some other building? Chances are, the most popular follow on queries relate to Burj Khalifa, the Empire State Building, and the Statue of Liberty, which is why these are shown in the Eiffel Tower result above. However, it seems that not many people do follow on queries for other buildings after searching on “burj khalifa height.”
It’s important to note that this is speculation, and it could also be that Google is simply testing different variants to see what works best. But in the long run, you can anticipate that a combination of statistics, testing, and UI design significantly increase the variety of possible search results that you might see, based on the order in which you perform the searches.
You can also see other types of results. Some of these are extracted from third party web sites, such as this one:

This one is drawn from Wikipedia, but Google may also draw information from other web sites. One example of this is shown here:

This result is an example of what we call step-by-step instructions, where you can actually receive a full procedure in the search results. These also divide into three different types:

All the required steps are presented in the results, so visiting the website is not needed to get the requested info.
Only some of the steps are provided, and therefore getting the complete process requires going to the source site.
All of the steps are provided, but some of the steps are not completely detailed, so this still requires going to the specified site to get all the info needed by the searcher.

Another type of result is what we refer to as a “structured snippet.” These are results that look like the following:

Notice how we see a regular search result, but some information has been extracted directly from that result and shown inline. For now, these are relatively rare. They may just be something that Google is testing for the moment, and which could expand significantly in the future if Google likes the results.
One last consideration that we examined is accuracy, or whether or not Google provides the answer to the question answered. Here is one fun example of a query where Google did not answer your question shown from their desktop search when we first tried it:

Note that when we originally did the study, we saw a similar result in the phone results, but it appears that Google has fixed it, as you can see here:

At least they left the snarky part in.

What About Siri?

We tested all the same queries on Siri as well. It is been well known that Siri sources data from Wolfram Alpha (a knowledge-based search engine that was at one time touted as a Google killer), but our testing showed that it also pulls in results from Wikipedia, Yahoo, and Bing. Here is a sample result that pulls in data from Wolfram Alpha:

Here is a sample result using Wikipedia as a source:

In case you were wondering, I happen to like black russians. Next up is a sample result from Yahoo:

Note that the question on this one was “when is the sunrise,” and the answer I get is when the sun rose this morning here in Southborough, Massachusetts. It also appears that Siri draws its image search results from Bing, as shown here:

Like Google, Siri does make mistakes too. For example, when I ask “what does a cardiologist do,” I get this answer:

As you can see, Siri provides me information on two cardiologists located near where I am, which does not relate to the question asked at all. Last, but not least, Siri provides some very entertaining results as well. Here is one of the more fun ones:

So now you know.

And Bing/Cortana?

As with Google, there are queries that respond differently when spoken (using Cortana) than when simply entered into the search box using your keyboard. For that reason, all queries tested were spoken. Here is an example of a simple direct answer query:

For a large number of the tested queries, Cortana returned YouTube videos that purport to answer the questions. We did not count these as knowledge panel results. Cortana also drew upon the Oxford dictionaries to get definition type terms, such as you can see in this result:

Cortana also appears to draw data from Wikipedia, Freebase, the New York Times, and other website sources. Here is an example of a query that appears to be drawn from the Facebook.com web site:

You can also find some fun stuff using Cortana. Here is the answer to the classic question “what is love?”:

Here is hoping that Cortana can speed up its investigation on this matter. ;->

Some Notes on Bing vs. Cortana

It was interesting to note that in many cases Cortana would not return knowledge panels when a text-based search in Bing would. Google actually tended to do the opposite (voice search would bring up results that a regular text search would not). We did a spot check of scenarios where Cortana returned some type of knowledge result, but it did not fully answer the question to see in how many cases Bing returned a more enhanced result.

We checked a total of 234 of these, and 78 of these (33 percent) provided fully complete answers in Bing. So Bing is further along in what they are doing than what is integrated into Cortana at this point.

Detailed Study Results

The study data shown below is as of October 4, 2014. Note that the engines all make changes in the results on an ongoing basis, and we do intend to monitor these results over time. With that said, let’s get to it:

Percent of Queries Showing Some Type of Enhanced Result

This includes knowledge boxes on the right, knowledge panels in the main column, and/or structured snippets. Here is what we found:

Google Now (this was the Google App running on the iPhone) returns twice as many results as Siri and nearly three times as many results as Cortana. This is clear evidence that Google is much further down the path with this type of work than either Apple or Cortana. As noted above, Bing, using text-based search queries, returns knowledge boxes for more types of results than Cortana does at this time. [Tweet This!]

Do Enhanced Results Fully Answer the Question?

This section focused on whether or not the returned query fully addressed the question. The scoring here was harsh. If you asked “how old is the great wall of China” and the knowledge panel result showed that the Great Wall was completed in 206 BC, you got no credit. In addition, even if the first regular web search result shows the result in its description or title, you also got no credit. Keep in mind, this was a knowledge panel test.

Looking at the scores here, one might conclude that Cortana and Siri are genuinely bad based on the scores. However, please bear in mind that this was a knowledge base test. The enhanced results returned in both systems had a far higher rate of being at least somewhat helpful, and in Cortana’s case, had a high rate of improving the standard search results. But you still need to click to see what you were really looking for. [Tweet This Result!]
Here is an example of a query for which Cortana returns a result, but which does not directly answer the question:

Here is one for Siri for the phrase “who has the most patents”:

Last, but not least, here is one for Google Now that when we first tried it did not really get the job done:

Note that we saw a similar result in our phone query, but I mistakenly took a desktop for this post. It appears that Google has now fixed this problem, as shown by this phone screenshot:

Each of these shows the examples of the struggles that each vendor has in truly nailing down a definitive answer to the question. The information is potentially helpful, but the answer we requested was not included.

More Specifics on Google Now

Google presents many results without providing attribution. These are generally in the form of well-established facts, such as “what is the capital of Maine?”. The split works out roughly to 75/25, as shown here: [Tweet This Result!]

Also of interest is a closer look at the step-by-step instructions. We actually found 276 examples of step-by-step instructions. One concern that many have expressed is that this might steal traffic from the publisher’s website from which the information was taken. However, we only found 59 different scenarios where the complete instruction set was provided.
I am betting that for those other 217 web sites, being the identified authority on answering this type of query is absolutely awesome:

Final Thoughts

So there you have it. As of October 4, Google Now has a clear lead in terms of the sheer volume of queries addressed, and more complete accuracy with its queries than either Siri or Cortana. All three parties will keep investing in this type of technology, but the cold hard facts are that Google is progressing the fastest on all fronts.
Share this study!
View and share our infographic of the results
View and share our slide deck on Slideshare
View and pin the results on Pinterest
Please check out some of our other studies:

Study Credits: Thanks to Caitlin O’Connell and Justin Markuson for their hard work on this study, and to Mark Traphagen for creating the opening image.
Here is the full set of queries used in the study
See all our social media & SEO studies!

Thoughts on “The Great Knowledge Box Showdown: Google Now vs. Siri vs. Cortana”

penguintamer October 8, 2014 at 1:03 am

Someone needs to note that the Siri result about patents was for the question “Who has the most patents?”. It’s not written in the article anywhere, so as far as the reader knows, you may have asked “What is a patent?” and gotten the correct answer!
Greg October 8, 2014 at 8:28 pm

Interesting study. I use both heavily. On Monday, I asked both Siri and Google Now this question “who is playing on Monday night football tonight?” Siri gave me the right answer straight away. Google Now gave me Sunday’s game results. I tried asking Google Now a few other ways and got nothing even close to the right answer.
That said, I personally find both Google Now and Siri to be relatively similar in the these kinds of questions. I do like the immediacy of cards in Google Now as it runs on an Android and wish Apple would support something similar at the home screen level (like Dashboard on OS X Mavericks)
Eric Enge October 8, 2014 at 9:04 pm

Penguintamer – thanks for noting that, fixed it.
Tomas October 8, 2014 at 9:31 pm

It should also be mentioned that Cortana is still in beta.
Chris October 8, 2014 at 10:34 pm

As a former Android user, I can say anecdotally that Google Now is far superior to Siri. Not only is Siri truly “dumber,” interacting with her is extremely frustrating. It’s well beneath Apple. That’s the problem with a company that says it always sweats the details and act purely — when they have a hunk of junk, it’s harder for them to realize it.
kosh October 8, 2014 at 11:17 pm

You forgot to use Wolfram Alpha.
Ask W.A. for the height of the Eiffel Tower and it also tells you the distance to the horizon, the *ratio* of height to the Burj Kalifa, and roughly how many stories high it is.
As for your “quarter cup of butter” question, I think Wolfram’s answer kicks the Google answer off the field. I got a thorough nutritional profile, and links to alternative answers based on international variations on the “cup” measurement as well as to five specific types of butter.
Try “Dive calculator”. Or “oscars won by Meryl Streep”. Or “How many m&ms fit in the Grand Canyon?”
Wolfram Alpha makes the others look simple.
rob October 8, 2014 at 11:37 pm

Google got the height of the Taj Mahal way off.
Wilhem von Hapsburg October 8, 2014 at 11:59 pm

Were there any other digital personal assistants worth looking at? How about S Voice by Samsung, HTC’s Hidi, or Voice Mate by LG?
Brad October 9, 2014 at 1:47 am

Why didn’t you link to the queries used? or did I miss it?
Eric Enge October 9, 2014 at 2:44 am

Hi Kosh – Siri uses Wolfram Alpha as its data source, which is why we did not test it separately. But you are right, we could have done that as well.
Eric Enge October 9, 2014 at 2:45 am

Hi Wilhem – we had limited resources, so we chose these 3. Given that our focus was on knowledge boxes (or instant answers that are knowledge focused) type solutions, we chose the search engine based ones.
Göran October 9, 2014 at 3:28 am

Google friendly article, no?
Lech Rzedzicki October 9, 2014 at 3:52 am

If you can afford the time, it would be interesting to include IBM Watson in the comparison – it now has an API to do that.
Mikka Hanson October 9, 2014 at 4:53 am

Great article Eric.
Finally a comprehensive benchmark that doesn’t seem to be biased towards a specific system.
What’s about the speech recoginition, though? Any thoughts on that? I know that is very hard to measure objectively, but do you favor a specific assistent over the other? How sensitive are they to sourround noises? How well do they recognize sentences? That’s also something i find very important about a personal assistent.
Nicolas October 9, 2014 at 5:46 am

this is weird. for “How long is the Gettysburg address”, Google Now gives me the answer “President Lincoln delivered the 272 words Gettysburg Address on November 19, 1863” and then proceeds to give the first words of it.
Sina Samangooei October 9, 2014 at 8:06 am

This is a really interesting comparison. Will you share the list of questions you were using for your analysis?
Lee October 9, 2014 at 8:54 am

Back in Dec 2011 I had a new Samsung Galaxy Nexus and I tried comparing it with a friend’s iPhone on a few questions. It was very informal, and I couldn’t conclude either one was better at the time, but one question highlighted an interesting behavior.
I asked both phones “Where have all the flowers gone?”
Google gave me a link to a video of Peter Paul & Mary singing the song, plus a link to a lyrics web page; while Apple gave me a list of nearby florists.
It occured to me that Apple was probably trying to monetize the query – choosing results that might allow them to collect advertising revenue or brag about targeted ad results or something like that.
The cardiologist query might indicate the same thing. I asked Google now what a cardiologist does and got a complete answer plus citation, while Apple didn’t seem to understand the question and gave results that might be monetized.
Itamar October 9, 2014 at 10:32 am

May you share the query set with us?
Ren October 9, 2014 at 11:40 am

Your results are already outdated. It appears that you aren’t using iOS 8’s update to Siri, which provides much more functionality (and provides the correct answer to the last query in the embedded youtube video).
Mir October 9, 2014 at 12:09 pm

I agree with Goran. This article seems pretty biased towards Google and lacking basic information like number of queries used, what queries were used, were the same queries issued to all platforms, etc.
Mir October 9, 2014 at 12:11 pm

Thanks, just found the number of queries above. Would still love to hear what queries were actually used.
Fernando October 9, 2014 at 12:43 pm

How’s this for a curveball? I just asked the same question, and just got a list of sources to click on. Not actual voice response.
Rahul October 9, 2014 at 1:23 pm

It wasn’t mentioned here, but Google will give you nutrition information if you ask for it. It can also do side by side comparisons and change the quantity to see how it will affect the nutrition info. This even works with foods that don’t typically come with nutrition labels (like produce).
Zeph October 9, 2014 at 1:25 pm

What is the “quarter cup of butter” query? I can’t find reference to it in the article and am interested to compare the results myself.
I, too, am often impressed by the results I get from Wolfram Alpha. I often begin my Siri inquiries with “Wolfram” to ensure that I get results from that data point.
Zeph October 9, 2014 at 1:28 pm

Never mind. I see it, now. I was searching for the text and only after posting realized that it might have only been in a screenshot that I actually had to parse manually. 😀
Eric Enge October 9, 2014 at 2:44 pm

Itamar – we will indeed share the query set shortly. Getting it formatted for publication, and flat out at a conference. As soon as I get a couple of hours we will add a link to it into the post.
Eric Enge October 9, 2014 at 2:44 pm

Hi Mir – We will indeed share the query set shortly. Getting it formatted for publication, and flat out at a conference. As soon as I get a couple of hours we will add a link to it into the post.
John October 9, 2014 at 4:31 pm

Can you link the data set? I’m curious what queries you used and how you used them.
Eric Enge October 9, 2014 at 5:02 pm

John – I am currently at a conference, but plan to publish the queries used and link to it from this post before end of day tomorrow.
Studybuddy October 9, 2014 at 5:04 pm

Interesting. My own results tightly correspond to yours in that Google Now handles queries much more efficiently. Also, I’ve found that the way you interact with the assistant varies from person to person. For example I will say to Google Now “Navigate to the mall” and it works, but my girlfriend will say “Hi Google, could you please take me to the food court, love you,” and it complies. She almost sees it as another person but conversely I interact with it like a tool. Even though this has nothing to do with the study you guys did, I found this interesting and though I should share.
Art Carnage October 9, 2014 at 5:37 pm

I’m assuming you’re referring to American football. You can just say “Show me the NFL Schedule” and Google Now will display an interactive card showing the upcoming games, positioned to the current week. This is the same display you probably got, but you didn’t notice that you could swipe left and right to display other weeks.
Spam October 9, 2014 at 5:40 pm

I just asked Google Now the final question from the video (“How long is the Lincoln Tunnel?”), and it told me (with voice) that it is 1.5 miles long. Seemingly another interesting example that it is either being improved as time and/or usage progresses, or that it simply isn’t entirely consistent in the results it gives.
Art Carnage October 9, 2014 at 5:51 pm

It’s reporting the value from Wikipedia. If you have a better answer, feel free to change it there (but I have a feeling you’ll be challenged on it, so be prepared to back up your answer). I’ve found a lot of other sources that support Wikipedia’s number.
PJay October 9, 2014 at 5:53 pm

Methodology? How were the questions asked? I ask because just in the video referencing this report, 3 questions were asked and the wording was different between devices in one of the questions. It seems, to be truly fair, a recording of each question should be made and played back to ensure that each device would get the exact same question, wording, intonation, and volume.
Spam October 9, 2014 at 5:53 pm

I tried the same thing for tonight’s game “Who is playing on Thursday Night Football tonight?” and was given last Thursday night’s score vocally, and was shown all the scores from last weekend’s games.
However, when I asked “Who is playing on Monday Night Football next week?” I received no vocal response but was shown the schedule of games for this weekend (including tonight’s game).
Art Carnage October 9, 2014 at 5:59 pm

There are actually 6 different versions of the Address, each with slight variations, so there is no single “right” answer to your question. The version you probably memorized was the “Bliss” version.
PJay October 9, 2014 at 6:02 pm

And this is why phrasing is key. Ask anyone who knows how to search Google and you can take the exact same set of words, ordered differently, and either get exactly what you need, or get nothing close to what you were looking for.
Art Carnage October 9, 2014 at 6:07 pm

Why would your phone’s OS upgrade have anything to do with Siri’s answers? Your phone is not deriving the answers. It’s just sending them out to a processor which does all the work, and then receives the answer. All three of them work the same way.
Art Carnage October 9, 2014 at 6:12 pm

You think it’s biased, and then complain that you don’t have any of the information needed to determine if it’s actually biased. So basically, you’re basing your belief that it’s biased, on the fact that Apple didn’t win.
Chase Casey October 9, 2014 at 6:23 pm

Eric, when you publish the details of the study can you be sure to include what devices you used, and what version of the operating systems you used during the study.
Why all of the screenshots for Cortana and Siri are directly from the cellphone, but everything for Google Now is from a computer?
This was a good study and hopefully Microsoft, Apple, and Google are notified with the findings so that they may further improve their products.
Justin Ross October 9, 2014 at 6:32 pm

I’d be interested in seeing an updated version with iOS8’s Siri.
Also, also be curious to know whether you used Google Now on a mobile device for the testing, or just used the voice recognition in-browser on a PC (as the information displayed on a full browser is obviously going to be a bit more robust than that displayed on a mobile device, which could easily lead to the desired information being shown on a full browser, but truncated on a mobile browser).
The included video shows queries done on a mobile device, but the screenshots of google are all in-browser.
Adrian Wolf October 9, 2014 at 6:34 pm

If you ask Google Now how many words are in the Gettysburg Address it correctly answers – so I guess there is still some work in understanding the different ways people can ask the same question…
Richard Call October 9, 2014 at 6:47 pm

The facts seem to have a pro-Google bias, and vice-versa.
William McCarty October 9, 2014 at 8:48 pm

I find it troubling that the answer to “who is on the five dollar bill” is touted as a good “combined” answer. I am sure that the picture shown is not Abraham Lincoln.
Andrew October 9, 2014 at 8:55 pm

Comparing a neural network with deep learning (Google Now) to Apple’s programmed Siri is like putting a heavy weight against a feather boxer. They’re not even in the same league.
OMAR October 10, 2014 at 12:02 am

Cortana is still in beta yet when actually comparing the three to everyday uses Cortana is by far more advanced than siri and Google now. Siri and google now dont have as much access to the phone and personal data as cortana actually learns the user. Google now is the fastest but can only handle searches and simple tasks. Siri and Cortana are truly personal assistants but cortana is a lot faster than siri and actually learns the user to make it a little more personal. Not biased at all as I have a Nokia 1020, LG G3 and iPhone 6. Cortana is definitely a step above the rest.
Leon Kennedy October 10, 2014 at 12:39 am

Who paid for this study? You’re an online marketing company… what are you marketing here?
Also, I have an iPhone 6 Plus with iOS 8.0.2 and an Xperia Z1S and I’m seeing different results with my queries. Additionally, why have you left out Wolfram Alfa results? That’s a pretty huge chunk of Siri’s capability.
I have to say that prima facie, this ‘study’ appears to be bunk. Sorry.
Eric Enge October 10, 2014 at 1:51 am

Leon – the Wolfram Alpha results were part of the bulk of what Siri returned in the study, so they were included.
We paid for the study with our own funds. No 3rd party money was involved. In terms of the outcome of this study, we are selling nothing.
Sorry you did not like the study.
Eric Enge October 10, 2014 at 1:52 am

Omar – as noted in the beginning of this study, it was focused on knowledge boxes, not the value of these apps as personal assistants.
Eric Enge October 10, 2014 at 1:57 am

Good point that the bills shown are earlier versions of the 5 dollar bill. However, the text clearly calls out that it’s Abraham Lincoln.
Chief Bill October 10, 2014 at 2:32 am

It may be worth noting W.A. is a computational search engine that is used by just about all of the other search engines when some type of mathematics, logic, is involved. W.A. is an offshoot/follow-up? of Mathematica , IMO, the greatest math/sci information tool available.(W/O a high level security clearance that is. =o) Great for what it is designed for. Computational Search Engine. Not real good in the abstract> Just an opinion, have a great day. Chief
rob October 10, 2014 at 3:10 am

holy lol
warren October 10, 2014 at 5:19 am

From what I have seen in advertising, these tools are not be touted as search aids, but as a means to voice control your device to add appointments, call someone, dictate and etc… Was this type of use taken into consideration for each vendors functionality?
Eric Enge October 10, 2014 at 10:04 am

Hi Warren – as noted in the study, this was really meant to test the knowledge box / answer box capabilities of each tool, not their personal assistant capabilities. So the appointment, call and dictation capabilities were not included in the test. Our intention is to rerun the test again sometime soon.
Justin Markuson October 10, 2014 at 10:31 am

I’m assuming your confusing the results of our study for a clear and present bias towards Google. To be honest, I’m not sure where this notion originates. Whenever there is a clear winner in any study (or even sports event), there are always rumors of bias/conspiracy. In this study, there was none. We had no reason to provide preference. All in all, it would have been nice to see one of the underdog command systems match Google’s results, but the data wasn’t even close. That’s not a fact of bias, that’s a testament to how well Google Now operates as a command system in comparison to Siri and Cortana.
Justin Markuson October 10, 2014 at 10:35 am

The question was not “How long is the Lincoln Tunnel”. If you watch the video, we asked Google Now, “How old is the Lincoln Tunnel”, and the result gave us its length. We chose the query because the answer was seemingly incorrect on all 3 devices.
Justin Markuson October 10, 2014 at 10:37 am

Google Now was tested using a mobile device. It appears the screenshots came from a laptop/desktop. We apologize for the confusion.
Ronald Drescher October 10, 2014 at 2:04 pm

This study was of moderate interest, but my main use of Siri is to actually do things: set timers, wake me up in the morning, create appointments, add to my groceries list, etc. I use these features several times each day with very satisfactory results. I would be much more interested in a comparison between Siri and Google Now on performing these kinds of tasks.
kafeaulait October 10, 2014 at 3:33 pm

I’m half convinced that Richard there was being sarcastic. How else exactly can *facts* be *biased*?
Bob October 10, 2014 at 5:49 pm

Google Now does far more than just provide search results. Either you haven’t really used it, or haven’t used it in a very long time.
Bob October 10, 2014 at 5:52 pm

It’s predictable, but still amusing that those who don’t like the results, because they don’t favor the device they use, assume the study is flawed or biased. We seem to have turned into a people who believe that facts can be changed if only you have strong enough faith in an alternate set of beliefs.
Mike October 10, 2014 at 10:41 pm

Cortana is what 3 months old? and is more of a personal assistant.
Not quite apples to oranges, but Cortana is much better as a personal assistant.
I can click on a search link and read just fine once its provided. I don’t a “HAL9000” answer. Where I have to filter through the speech that Google is giving me.
Shamu October 11, 2014 at 7:34 am

How can this test be “Google” biased? I f you noticed that the tests were done on a IOS device with the exception of Cortana. Correct me if I am wrong but Siri should have the distinct advantage here as it is integrated into the OS.
Eric Enge October 11, 2014 at 8:42 am

Hi Mike – as we acknowledged in the start of the write-up, we were purposefully focused on mapping out the knowledge base of the 3 participants, not the personal assistant capabilities of them. Cortana could well be a much better personal assistant, we don’t know, as we were not testing for that.
Kevin Stirtz October 11, 2014 at 9:59 am

Google knows how tall I am but Bing doesn’t have a clue. Of course, I do have a much closer relationship with Google than with Bing.
Danny Sullivan October 12, 2014 at 1:48 pm

Eric, what you list as a Cortana failure is a desktop image, where the knowledge box isn’t a direct answer but two query refinement boxes. I’d not have counted that as Cortana. While I’m pretty sure Google Now still would have done great, did you have a lot of these cases? That wouldn’t seem to count Cortana correctly.
Eric Enge October 12, 2014 at 3:04 pm

Hi Danny – edited this comment after your email – I removed the example. We actually did count it right in the study itself, and I made a mistake of including this as an example the way I did. Thanks for pointing that out.
FYI – we plan to release the entire query set early this week. As part of part of that, we are re-running all the queries for which we determined that Siri and Cortana failed to make sure we did not miss anything. The differences we are seeing so far from our published results are not statistically significant (< 1% change in our reported results).
Eric October 12, 2014 at 7:50 pm

Microsoft is running Cortana ads panning Siri. I don’t think it gets a pass for being “in beta”.
Dilbert October 12, 2014 at 8:36 pm

Can you describe how were these ~3100 queries selected ? How did you know they may fire a knowledge box ? Did you test the queries against bing/ google to pick them ?
Eric Enge October 12, 2014 at 9:16 pm

Hi – we did not pre-test the queries before picking them. We focused on picking queries about places, people, processes, where it would be likely that the engine might be able to return the result from a data base of some sort.
Nathanael October 14, 2014 at 8:48 am

It may be interesting to note that if one alters the question to “Who HOLDS the most patents?” Siri answers the question correctly (Thomas Edison; or IBM for “Which company holds the most patents?”).
Eric Enge October 14, 2014 at 10:40 am

Nathanael – that IS interesting. There are lots of these little language twists in the questions. Speaks to what types of phrases each engine gets. Note that the phrases we used were not picked based on prior knowledge that they worked with Google, they were just picked based on our belief that they could trigger a response.
Nathanael October 15, 2014 at 6:04 am

I agree. It’s been my experience that all three personal assistants are more than occasionally plagued with such phrasing issues, to the point where it’s become habitual, when I don’t get the answer I’m looking for, to simply rephrase the question until I do.
Having said that, I don’t personally often use Siri for her knowledge content. I find myself more dependent on her for voice dictation (I dictated this post, for example), and limited facilities such as schedule management, alarms, quick emails, voice-initiated calls, weather reports, and so forth.
John October 16, 2014 at 3:53 pm

Eric, I’m interested to see the full query set, but it doesn’t appear to be posted yet. Is that still coming soon?
Eric Enge October 16, 2014 at 5:31 pm

Hi John – this is now live, as a link right at the end of the article.
Boomer October 17, 2014 at 9:43 am

Too bad Google phones are still Java phones.
Jeff Jockisch October 17, 2014 at 11:25 am

I have seen the same behavior, Nathanael, across services and even with ChaCha over the last several years. Matching algorithms using page rank actually suck for surfacing answers. The NLP involved needs to be much more complex and exact phrasing is crucial at times.
Often times users of ChaCha would rephrase their questions until they got the answer they were seeking, just like we see happening now with these personal assistants.
We never solved the problem perfectly, but we found that the ‘shape’ of the question was an important component of the match. I think Google is doing this now, along with named entity matching, to really create a powerful system.
Fred November 2, 2014 at 1:49 pm

He has an LG G3. How old you think it is that you’d question how long ago he used Google Now?
Obvious Google fanboi is obvious.
Bob November 3, 2014 at 12:18 am

You published the questions, thank you, but it’s kind of hard to reproduce the test without the full results, e.g. the result of individual questions on each platform.
Eric Bray December 25, 2014 at 10:26 am

Is all of this leading up to a version of HAL? 😉
Eric Enge December 25, 2014 at 10:35 am

Exactly right. Googler’s, such as Amit Singhal, openly speak about creating the “Star Trek Computer”. One where you can address the computer conversationally, and it can return any piece of information you ask of it.
kunit January 12, 2015 at 6:31 pm

One little thing that needs clarified is this has nothing to do with Google Now. In fact, Google Now isn’t even available on iOS or any other OS besides Android for that matter. What you referred to Google Now throughout the post is actually just Google search. Google Now is the beast of a personal assistant that attempts to give you information before you ask based on your personal usage habits.
Jakob Boman February 22, 2017 at 2:33 pm

Great article.
It is very interesting to read it in retrospective.
Thanks for sharing!
Jeppe M June 8, 2017 at 6:21 am

Interesting article to read looking back. Thanks
Martina July 3, 2017 at 9:01 am

It comes a no surprise, that Google turns out to be the best. A new survey here in 2017 would be really cool?
Eric Enge July 3, 2017 at 9:06 am

We already did that, and you can see the results here: https://www.PerficientDigital.com/digital-personal-assistants-test. Hope you enjoy it!
Mads July 7, 2017 at 4:59 am

It comes as no surprise, that Google dominates this survey. Would do even more so now I assume?
Eric Enge July 7, 2017 at 4:23 pm

You can see the 2017 version of this survey here: https://www.PerficientDigital.com/digital-personal-assistants-test
van toan November 18, 2017 at 1:34 am

It may be interesting to note that if one alters the question to “Who HOLDS the most patents?” Siri answers the question correctly (Thomas Edison; or IBM for “Which company holds the most patents?”).
van toan November 18, 2017 at 1:35 am

The facts seem to have a pro-Google bias, and vice-versa.
tinonline December 1, 2017 at 3:29 am

It comes a no surprise, that Google turns out to be the best. A new survey here in 2017 would be really cool?
Eric Enge December 1, 2017 at 10:15 am

We’re beginning to work on a new one now.