Can Google Search Console keywords replace GA keywords?
When I noticed back in 2011 that my keywords were disappearing from Google Analytics, I have to confess I panicked a little.
This information was central to SEO.
Without knowing the keywords people used to find and interact with my webpages how could I analyze website visitors?
I started searching deep in SEO communities for a solution.
I saw dozens of workarounds from using Google Ads, custom filters, on-site search, etc.
They all seemed like good alternatives to find ‘not provided’ keywords, but there was no way of knowing how reliable the data was, especially, considering organic keywords are very different to paid keywords.
It’s hard to put thousands of budget spend into guesswork to try and track conversion keywords.
However, I came across Google Webmaster Tools, which is now called Google Search Console.
And, it seemed to show the organic keywords people used to find different webpages.
Are Google Search Console keywords accurate?
A common complaint about Google Search Console (GSC) is that the data is not the same as what you see on Google Analytics.
While it is a question of precision, it’s not a question of accuracy so to speak.
That difference is by design.
GSC & GA measure different things
GSC is centered on query and click, or selection logs, so the data is close to what you will see on access log files.
By contrast, GA collects data from the clickstream. That introduces variables for how and what is measured.
To probe deeper into what causes the differences in data between GSC and web analytics tools, we have to understand how each tool collects and understands user behavior.
Query selection and click logs
Google is driven to improve search quality leading it to track data points for every search, and every searcher, to understand what’s happening in the Serps.
They say they don’t allow clicks and click-through rates to influence rankings (not true), they also say that they use click data for performance evaluation.
There are several evaluation measures for information retrieval.
The ratio of users who click on a link to the number of users who view a page.
Session abandonment rate
The ratio of search sessions that do not result in a click.
Session success rate
The ratio of user sessions that lead to a success; measured using dwell time as a primary factor and secondary factors like copying text from a featured snippet.
Zero result rate
The ratio of Serps that returned with zero results.
Google uses a model to evaluate Serp user behavior by assessing click behavior, user attention, and satisfaction.
What about log files?
Query and click logs are text files that record data about users and their interactions with different Serps.
Web search engines receive millions of queries per day from users. For each query, the search engine generates a record in its query log.
The query record may include one or more query terms, a timestamp, an IP address, and an identifier associated with a user who submits the query terms, for example, a user identifier in a web browser cookie.
Hence, the search engine query logs are a more robust version of the GSA search logs.
Google defines a query session as a record that includes queries closely spaced in time and / or queries that are related to the same topic of interest.
The query session extraction process is based on heuristics.
For example, consecutive queries belong to the same session if they share some keywords or if they are submitted within the same time period, even though there is no common keyword among the search terms.
The heuristics are why Search Console Console keywords and your analytics package will never match up.
Essentially, Google decides in its query log if searches in a session are unique enough to be recorded or not.
Thus, what you believe to be two distinct visits to your site because they came from two different searches and landed on two different landing pages could potentially be considered as one keyword depending on how it is logged on Google’s query logs.
What about click logs?
Click logs, in contrast, feature more information on the behavior of the user once they have been presented with a series of results.
While Google Search Console only provides a fraction of the total keyword data, it’s clear how the Search Analytics tool is a limited user interface built on top of this dataset.
Every click is tracked along with the features behind what generated the position of a result in a Serp.
What determines a click?
If I search for a keyword and click a result, hit return, and click the same result again, does Google consider that two distinct clicks or just one?
The first thing to note is that they sample the data.
However, they may not sample the data when Google considers the two similar queries to represent one search.
This fundamental difference is why thousands of your keywords don’t show up on GSC.
Say for example a person searches for an ‘Italian restaurant in New York’. Immediately after, the same person searches for ‘Indian restaurant in New York’. These two queries are related due to the use of the same keywords ‘restaurant’ and ‘New York’ and the time proximity of the searches.
What is a query session?
A query session consists of one or more queries from a single user, including either all queries submitted over a short period of time or a sequence of queries with overlapping keywords that may extend over a longer timeframe.
Queries that concern different topics or interests are assigned to different sessions unless these queries are submitted in very close succession (less than 5 minutes) and are not otherwise assigned to a session that includes other similar queries.
The same user looking for restaurants in New York may later submit a query ‘Apple car’ looking for information about the concept electric car. This new query is related to a topic unrelated to New York restaurants, and is therefore not grouped into the same session as the restaurant queries.
So the queries from a single user may be associated with multiple sessions. Two sessions associated with the same user will share the same cookie but will have different session identifiers.
Indeed the logging behind Google’s Search engine uses a specific series of methodologies to determine what a distinct search and distinct click are.
This may or may not align with how your analytics platform is configured to believe what a session is.
How GA determines a session
Web analytics services follow a different measurement methodology.
A session can be user-defined.
By default, a session lasts until there is 30 minutes of inactivity, (you can also adjust this limit to seconds or hours.)
So, while we don’t know the exact timing of what Google Search considers a session, it is certainly less than 30 minutes.
How a user is tracked through a session ID
A session ID is allocated to a visitor on his first visit to a site.
It is different from a user ID because sessions are short-lived and may become invalid after a certain goal has been met. For example, once a buyer creates a new account or buys a product, he can not use the same session ID again.
As a result, a user can potentially be measured multiple times for the same visit.
Analytics packages allow for varying levels of specificity in their configuration.
There are many reasons why you won’t see consistency between two analytics packages let alone GSC and GA.
Why don’t GSC and GA match?
In brief, a Google Search Console click is not a Google Analytics session and a Google Analytics session is not a Google Search Console click.
If a user has clicked twice, that could be considered two clicks and one session.
Conversely, if a user were to perform the two different searches and make two different clicks, their activity may be considered one impression and one click, but they could also invalidate their session ID or otherwise timeout at some point and be considered two distinct visits in analytics.
So you can see why Google Search Console keywords are incomplete.
In addition, GSC uses canonical URLs whereas analytics can use any URL for reporting a session.
Why is this a problem?
Spend some time reading online about retrieving keywords ‘not provided’ and you will find dozens of GSC advocates.
Trying to find commonalities between GSC and GA is pointless, as you are looking at two sides of the same data coin, just measured differently.
Performance data from Google Search Console is a measure of what’s happening on the Serps, not what is happening on your site.
And, GSC’s position data measures something different than your rankings data.
How to get better data?
The accuracy of data reported in GSC increases the more specificity you introduce.
In other words, if you create profiles that reflect deeper levels of the directory structure, the tool yields more data.
When adding lots of profiles, the key limiting factor is that the GSC limits you to 1,000 queries per search filter.
However, an API can pull your data at a rate of 5,000 per search filter.
And to get even more data you could loop through a series of tries as search filters.
This ensures that you’re using as many subsets of keywords as possible as filters to pull out as many results as possible.
Doing this by subdirectory and following your site’s taxonomy will allow you to get the most precise data.
The end of (not provided)
As I spoke about in the beginner’s guide to ‘not provided’, GA was never the same after Secure Search.
That’s what I thought before I began to see this as a data science problem.
By working through seven steps to find Google Analytics keywords, together with a team of data scientists from the Fraunhofer Institute Berlin, we were able to get all keywords back with an 83% degree of certainty.
This means you can reliably map user visits to sessions, track organic keyword conversions, and see all organic keywords in GA again, as it was before Secure Search was introduced.
Here’s how it looks to have all of your keywords back in GA.
You can see here that with the Keyword Hero view, (not provided) is down to just 1.94% of total keywords.