Since Google introduced Secure Search in 2011, web analytics tools like Google Analytics, no longer display the keywords a person googled to land on a website and instead conceals these as ‘not provided.’
So, if someone visits your website by typing ‘red running shoes’ then the keyword referral data would be ‘red running shoes.’ Google wants to protect user’s privacy, so this data is blocked from Google Analytics. Hence, when you see (not provided) in Google Analytics, this is a keyword without keyword referral data.
In this detailed article, I will show you why and when this problem appeared along with several effective solutions, one of which can guarantee a certainty threshold of 83%.
When did ‘not provided’ become a problem?
To understand how all web analytics data transformed into ‘not provided’ we need to take a trip back to 2011.
Google Analytics had just launched and with it, webmasters gained incredibly detailed insight into website visitor behavior.
In the organic search report, businesses could pinpoint the exact words people searched for and how they interacted with the site.
This made it easy to see which keywords were the most valuable.
Let’s take an example. If an ecommerce retailer observed that people who arrived on its site after searching for ‘Adidas high top trainers’ had a high conversion rate, this meant that optimizing that page to closer match user intent would likely increase conversions.
However, in 2011, Google started encrypting search data.
According to Google, this move was made to make search more secure.
As Google disclosed in 2011, the increased privacy measures were taken partly due to the introduction of personalized search results.
Nowadays, most of us don’t even consider the fact that our search results take personal information into account.
Like when you search for restaurants in New York and expect Google to show you restaurants close to your current location, even if you didn’t search for say ‘Italian restaurants in New York.’
But this wasn’t always the case.
As Google started incorporating personal information into search results, they had to protect users’ private information.
They achieved this by encrypting searches from logged-in Google users.
So, how does that process work?
Before the change, a normal Google search would redirect users to an HTTP version of the domain they clicked on.
This might seem like a small update, but it had a huge impact on our ability to access keywords.
That’s because redirecting users to a secure version of their desired domain encrypts their search queries.
For privacy concerns, this change was fantastic. And I’m glad for it.
The new encryption process added an extra layer of security to Google search.
However, it became a huge obstacle for SEO and the online marketing world.
Indeed, the quest to reclaim ‘not provided’ has been a dragon in the cave for SEOs ever since.
7 steps to reclaim ‘not provided’
Many of you will have already known the history of ‘not provided’ and will have skipped straight to this section. For the past decade, SEO professionals have grappled with reclaiming these hidden keywords. We’ve come across a few workarounds which we will discuss in a later section and debunk a few myths.
One such myth is that users of Analytics 360 still see all keywords.
Google offers this tool with a hefty price tag of around $150,000 / year for large websites. However, the myth that these users can still view all organic keywords is false. Even with a premium account, users do not get to access more keywords than the ordinary free user.
There is, however, a 7 step approach that works with 83% certainty.
Step 1: Pull together nine different data sources
First, you need to pull together nine different data sources including Google Analytics and Search Console data via Google certified API access.
Next use massive parallel sequencing, cloud-based artificial intelligence / machine learning algorithms to statistically match search phrases to sessions and cluster them.
This data then needs to be uploaded back into a new Google Analytics property, allowing you to analyze your new data set in a familiar setting without interfering with your original data.
If this is something you don’t have the time or resources to do please either skip to the alternative methods below or let our free ‘not provided’ solution save you the time and hassle.
Step 2: Perform Google Analytics data analysis
Next, you need to analyze Google Analytics account history, looking for:
- Patterns e.g. site structure, URLs that pull traffic; clusters among organic traffic; device categories; time / date; locations.
- Type and content e.g. number of direct sessions attributed to organic; news or ecomm; one-pager or many product sites; semantical topics.
Now you can generate a large set of possible keywords per URL using
- 3rd party data sources e.g. rank monitoring services; browser extensions data; Bing search API
- 1st party user data Search Console; remaining keywords in GA
The result is one big bucket with all possible keywords per URL.
Is that it?
None of this data is useful unless you cluster and classify the meta keywords.
By using external APIs you can look for abnormal behavior in a keyword’s history.
Were there changes in traffic in the last 12 months (per country / device)?
Did spikes occur? Is this a new keyword? Is there a reference article on Wikipedia, if so how does it behave?
After creating a keyword set and checking for potential spikes and abnormalities, you will need to understand these keywords.
Is this a brand keyword or a non-brand keyword?
Is it a person / brand name? (important for n-gram analysis).
Is the keyword transactional / navigational or informational?
The result is one big data frame where certain parameters are attributed to the keywords.
Step 3: Analyze traffic for fluctuations
Let’s suppose we have a site that ranks for positions 13 – 15 in calendar week X for the term
‘shoes. At this point we have the following metrics for this landing page:
In calendar week X + 1 the site now ranks fifth for shoes and the metrics changed to:
In this simple example we assume ceteris paribus, that nothing else has changed, so almost all new sessions can be attributed to ‘shoes’.
Why is this keyword analysis important?
With just a few computations we gain some valuable insights.
- ‘Shoes’ in position 5 brings in 1000 more daily sessions compared to positions 13 – 15.
- The bounce rate of the keyword is identical to other keywords that rank for this site.
- The conversion rate of this keyword seems to be significantly lower than the conversion rate of the others. It has brought only 5 more conversions with 1000 sessions, compared to the 10 conversions with the 500 sessions we had before.
When analyzing this, we take a couple of factors into consideration, most importantly:
- Seasonal fluctuations
- Overall site performance during this time frame
- Keyword performance → is the spike due to a trend?
Luckily the search engine results page changes quite a lot, otherwise, this would not be possible.
Step 4: Train the keyword classifications
This is where it gets a bit complicated.
At this point, we need to start matching keywords and sessions.
Using the data from before, we calculate the probability of a certain keyword matching a session cluster.
Sessions that were captured through extensions and those where the keyword is still visibly in Google Analytics (= “hard data”) can be matched with 100% certainty, so these can be left out.
After the first probability is calculated, the data will again be compared against the “hard data”. This happens on the cluster level to adjust the classifications.
After the first iteration, we need to construct strings from the GA data. These strings contain between 5 and 25 dimensions.
It’s interesting how few sessions are left if you query a big string even on huge sites.
You can check this by using the Core Reporting API where you get up to 7 dimensions in one string.
Now: the keywords have been clustered and you get only a handful of sessions back from one string, leaving very few potential keywords per session cluster (the cluster has gotten smaller, too).
Still with me?
An unfair visual representation
Keyword A might now be matched with one or more sessions.
This is a session cluster, containing four sessions.
It comes from requesting all sessions with a certain string.
Step 5: Match sessions with keywords using an algorithm
At this stage, we need to use an adjusted, unique algorithm that tries to match sessions with keywords.
This algorithm is applied to the entire history of the GA account and then it matches the results with the “hard data”, which is the remaining visible keywords and the keywords captured by browser extensions.
After this last step, the algorithm is ready to go for each page. In most cases, the historic data is not enough as some data sources are not available retrospectively, which is why it usually works far better after a month.
Step 6: Start computation to unlock ‘not provided’
The computational process is essentially the same as the training process:
- Compute possible keywords
- Check results against ‘hard data’
- Adjust classifications based on results
Now we have the final output to get back ‘not provided’.
The matching does not happen through CIDs.
As such, looking at that mirrored account, you’ll see that some dimensions are missing.
This is because the matching happens based on dimension strings and not sessions.
Are you still with me?
Good, because this is where it heats up.
Step 7: Session vs. string-based matching
If the data was session-based, you’d get everything back like this:
But what it looks like instead is this:
I mentioned something about an 83% certainty earlier on if you remember.
This is just the average threshold but it’s easier to communicate.
In reality, the certainty threshold for a keyword to match with a session varies between 80% and 85%.
The certainty level is based on the assumption that 95% of all possible keywords ”not provided” have been found in the first bucket (see the big red keyword bucket above.)
Alternative approaches to ‘not provided’
There are various alternative approaches to decode ‘not provided’. SEOs have approached the problem from a few different angles using Google Search Console, Google Analytics filters, on-site search, and surveys to name a few.
Google Search Console
This is by far the most popular approach to finding not provided.
GSC will give you insight into search terms people use to reach landing pages on your website.
Many webmasters rely on GSC to make data-driven marketing decisions.
However, we should remain wary of using this data in isolation.
Like any single data source, you need to test it for both internal and external validity.
And always verify using different tools and services.
To get started go to Google Search Console and sign in with your account.
Then click on Performance from the menu on your left.
Here you can see a visual representation of your total clicks, impressions, average CTR, and average position.
Now, scroll down and you’ll see the Queries tab.
It shows the keywords used most frequently to find your website.
In a later section (Google Search Console vs Keyword Hero) I discuss why this only reveals part of the picture when it comes to ‘not provided’.
Google Analytics filter
By creating a custom filter in Google Analytics, you can get the requested URL displayed instead of ‘not provided’.
Such filters work only for new traffic.
In the case of a URL for a homepage, you can assume a brand keyword was searched.
It can be estimated which terms have been entered to get to the page.
This method was described in detail in Dan Barker’s econsultancy blog:
First, in your GA account, go to Admin, then Profiles.
Click the name of the profile you want to work with, then select the Filters tab.
Create a new filter in your Analytics account:
This filter will attempt to extract the ‘not provided’ terms.
Surveys are an easy way to find out how a visitor ended up on your site.
On-site surveys can be set up to appear when a visitor came from a search engine.
Most survey providers allow you to select Secure Search users.
The survey may consist of a single question, like, “What word(s) in Google did you use to find us”?
Posing a single question has an impressive response rate among website visitors.
The results of such a survey reveal a short-term fix for ‘not provided’.
Thus, the information that is withheld by Google can be asked directly to visitors.
For Query Parameter you need to type in the word that designates an internal query parameter.
To find out what your query parameter is, visit your site and conduct a search with your site search box. Once you’ve done this, you’ll see the URL change to something like
Whatever comes after the “?” is your query parameter. In this example, the query parameter is “s.” Add your query parameter to the Site Search Settings.
Next, you’ll see the option to Strip query parameters out of the URL.
This will prevent your search from showing up in your content reports.
Now you can head over to Behavior| Site Search | Search Terms, and you can see what visitors are searching for on your site.
You can check what kind of information users are hoping to find on your site.
This is a good alternative approach to figuring out what keywords people are using.
You can check your site search report and find that users have searched for say ‘not provided’.
Cross-referencing this data against Google Search Console info can give you a good first impression of what users want when they visit your site.
Bonus approaches to ‘not provided’
- It is possible to buy some keyword data. The paid search results on Google are not affected by Secure Search. Keywords used to gain clicks on Google Ads are still visible. This data is available directly on Google Ads and on Google Analytics. Bear in mind that when you pay people to click on an ad the outcome is very different to people searching for terms themselves.
- The last alternative approach to ‘not provided’ is the Traffic Sources queries report in Analytics. This provides another workaround to access some of the most popular keywords on websites. It’s a basic report but smaller sites can still use it for more information on relevant organic traffic.
Larger sites, with higher volume traffic, will need a more robust tool.
There are two ways to access the report from Google Analytics:
- Navigate to Realtime then Traffic Sources.
- Acquisitions, then Campaign, find tabs to view paid, organic, or all keyword traffic sources.
The Traffic Sources Query report is the easiest of the alternative methods for ‘not provided’ but its value is limited for larger websites.
Data science problem
Processing ‘not provided’ data as described above would have worked back in 2011.
But working with extremely large datasets was impossible at the time.
Just uploading the information back into customers’ accounts would take a day on X1 Instances, Amazon Web Services. And it would have cost a small fortune.
However, it is possible now, with far greater data processing powers compared with six years ago.
At our HQ in Berlin, a team of data scientists from the Fraunhofer Institute of Technology, make sense of large amounts of data.
We have access to vast amounts of data points from browser extensions that allow us to think about this as more of a data science problem.
Because that’s what this is.
Up until now, people have considered this an SEO problem or a Google Analytics problem.
But when you see it from the perspective that data processing is the limiting factor you can begin to see a way around based on data science.
8 Actions after getting back ‘not provided’
Once you have gained access to the data and engineered around it, you can tap into the well of invaluable information. With ‘not provided’ once again appearing under Organic Search you can begin to start refining your approach to search optimization.
Understand conversion keywords
The goal of most SEO initiatives is to acquire users that behave in ways that align with your company’s business goals.
That could be a purchase conversion for e-commerce companies or a page visit with a low bounce rate for publishers.
Optimizing for traffic only is not going to deliver great results, you need to track organic keyword conversions.
Understanding which exact ‘not provided’ keywords are driving conversions is especially important in competitive markets.
This involves identifying the organic keywords that generate the most revenue, profit, and conversions.
Finding the keywords that result in the most conversions is probably the single most valuable piece of information for webmasters.
First, navigate in your Google Analytics interface to your keywords (Acquisition| Overview| Organic Search).
(Some of the following actions assume Keyword Hero is already running.)
You can sort the table by clicking on a title at the top of the column. In our example, we chose the number of sales.
But conversion rates and revenue are equally interesting.
Transactional search phrases like “buy” seem to result in significantly increased conversion rates compared to search phrases that are only informational (or at least not as obviously transactional): of the nine keywords that resulted in sales in our analysis, three contained the word “buy”.
The second important thing we notice is that the phrase “bouquet” is very prevalent among our queries, and comes with a slightly better conversion rate, and a surprisingly higher order value.
Users that already know that they are looking for a flower bouquet, spend more than twice as much as those who don’t use that phrase.
Equipped with this information, we would optimize our site for more revenue by catering more to those users that are explicitly looking for “bouquets”. This will result in more of this valuable traffic and even higher conversion rates.
If there is too little conversion data available, this type of analysis is not possible or doesn’t make that much sense. In this case, you need to use different metrics:
If you’re the webmaster of an e-commerce shop but you have too little sales to analyze and make predictions about individual keywords, a great way to still gather meaningful data are the Goals in Google Analytics.
Instead of only looking at the eventual sale, we define Goals on the way there, such as added to cart. In most cases, this will result in >100% more data and makes it that much easier to gather significant data points.
Google Analytics behavioral metrics
If there weren’t yet enough users on your site to create a meaningful number of Goals or even sales, you can use behavioral metrics such as bounce rate or time on site. They will often correlate with Goals and Sales and can deliver valuable insights about whether your users seem to find what they search for.
If you’re using behavioral metrics, even a little data is enough to recognize trends in individual keywords and optimize your site accordingly.
Find out keywords for quick wins
Keyword Hero creates the custom dimension Position, which conveys the position of a keyword in Google’s Serps (= Search Engine Result Page).
It’s the position for each session so that a keyword can have many positions.
If it is in position 1, the session was triggered, when your link was on the very top of Google’s first result page.
If a keyword is in position 10, it’s at the very bottom of Google’s first page. Position 11 would be at the top of Google’s second result page.
The difference in traffic between positions 10 and 11 can be massive: we often see increases of 10x!
To find the keywords that are close to being on page 1 of Google, click on the secondary dimension box and select “Position”.
Now you can order the table by clicking on the Position column.
Look out for keywords that have both excellent transactional (e.g. sales) or behavioral metrics (e.g. time on site) and are ranking on positions 11 – 16.
Focusing just a little more on those keywords, i.e. by writing more relevant content, will have a massive impact on your sessions and sales.
Access keyword performance across devices
It is important to understand how your keywords perform depending on the device category that is used.
To find out how many sessions you get from certain keywords, click on a keyword in the keyword view and add the secondary dimension ‘Device Category’. You can now see all behavioral and transactional metrics split by device:
The easiest way is to install our ‘mobile vs. desktop’ dashboard.
If you want to do it yourself and see your rankings split by device, click on a single keyword. Now add a segment by clicking on ‘Add Segment’ above the keyword table. Then search for Mobile Traffic and select it. Now add another segment and look for Tablet and Desktop Traffic.
Now you have your rankings split by devices and you can see how many sessions you’re pulling in from each device category.
Avoid ranking cannibalism
Sometimes you’re ranking with several landing pages for a specific keyword or keyword set.
Likely, there will be significant differences in how those pages appeal to users coming with the same search intent.
The idea is to funnel traffic from a specific keyword to those pages that have the best metrics.
To find out, which Urls rank for a particular keyword, click on any keyword in the keyword view, preferably the one with the most organic traffic.
Now, add ‘landing page’ as the secondary dimension.
What you see now is a single keyword and the 10 (!) Urls that Google considers relevant for this query.
About half (52%) of the sessions triggered by the keyword land on the homepage. The other 47% spread across the nine remaining URLs.
Comparing the behavioral metrics such as time on site and bounce rates of the Urls, it’s obvious that only the homepage has acceptable metrics. The other nine sites don’t seem to fulfill the user intent and have bounce rates.
In this case, we should consider differentiating the other landing pages more, so that the homepage will pull in most of the traffic for this keyword.
Analyze long-tail keywords
Long-tail keywords find the searcher in the right stage of the buying cycle or search funnel.
One of the benefits of revealing ‘not provided’ is the ability to see long-tail keywords.
70% of all website traffic comes from long-tail keywords. You can target long-tail keywords instead of competing for a difficult keyword with more established websites.
With these niche keywords, it’s also easier to match the intent of the searcher.
Thus it’s more clear what they are looking for when using a search term.
For example, a searcher who uses the keyword “runners” might be looking to buy running shoes, research famous athletes, or find answers about running shoes.
But if a searcher uses the phrase “running shoes for rainy weather,” it’s a safer bet to assume they are looking to make a purchase.
Create better content
You might be getting a few hundred views on articles you post.
But are the hits due to the keywords you are targeting?
When you can see which keywords do well for your site, you can tweak your content strategy for better SEO results.
Instead of writing articles about famous athletes, you could hone in on the search intent of somebody looking for runners that have good grip or are waterproof, based on the keywords they are using to find your article.
There is little point in writing articles just for the sake of it.
You can tailor your writing to the search intent of users once you have unlocked keyword search data.
Adjust to Google’s algorithm updates
Google runs countless search algorithm tests in a year.
This will affect your keywords.
Going blind without doesn’t mean you won’t still get thousands of visitors per day.
However, they might be coming due to a different keyword.
You don’t have to worry anymore about Google Panda, Hummingbird nor Penguin. Or any other algorithm changes.
Detect and monitor brand keywords
Brand keywords are keywords with your brand name in them.
These are different to non-brand keywords which are keywords that do not relate to your brand.
Let’s look at a couple of examples:
Nike Flyknit shoes
This is a brand keyword for Nike. Notice that brand keywords can contain more than one keyword.
Let’s say a store called FastShoes.com sells these.
“FastShoes” is a brand keyword. “Nike Flyknit shoes” is not because this keyword relates to another brand.
“FastShoes Flyknit” would count as a brand keyword. Notice that “Flyknit” on its own is only ever a brand keyword for Nike.
This is important to
- Maintain brand authority
- Maximize conversions: brand keywords are typically high traffic, high converting. Make sure you occupy the number one spot in the SERPs
- Optimize brand keyword performance: as your keyword set grows, it becomes harder to keep track of how they are performing
- Identify gaps: make sure all your brand keywords are satisfying user intent by analyzing user behavior at a keyword level
Google Search Console vs Keyword Hero
Using maths to retrieve hidden keywords takes a multipronged approach.
But first, let’s do some myth-busting.
First, using Google Search Console (along) is not a replacement for GA as it was before Secure Search.
Google Search Console and Google Analytics are often used interchangeably.
Yet they serve different purposes.
Performance data from GSC is a measure of what’s happening on Google itself, and not necessarily what is happening on your site.
However, as you introduce more specificity into how you review a website, the precision of the data reported in Google Search Console increases.
Hence, if you add hundreds of subdirectories to GSC, the increase in data precision can be valuable to draw the curtain on analytics not provided
Google Search Console and Google Analytics
Each of these valuable Google tools serves a different purpose.
Google Analytics is user-oriented.
It shows you who visits and interacts with the main content on your website.
Google Search Console is search-engine focused.
It shows site owners how to improve visibility and presence in the SERPs.
As such, both tools provide different metrics, with Google Analytics favoring clicks and Google Search Console prioritizing impressions.
This is why we use both along with 7 other data sources.
Is it ok to use Google Search Console?
Google Search Console is useful for some light-touch insight into how your website is performing on search, but nothing like the on-site behavioral data which Google Analytics used to provide.
Google only takes a measurement when the results page is used.
If a results page is not used during the period, no data is collected.
GSC does not enable a valid comparison between mobile and desktop data.
That is because of the significantly different user behavior.
For example, a smartphone user might click on the second or third page of Serps less frequently than desktop users.
Since the GSC data is based on this user behavior it doesn’t give the complete picture.
Ranking distribution is a key feature of GSC.
Yet only pages that appear on page one of SERP are worth further consideration.
This is because GSC data is only collected when a searcher accesses the search page.
Hence, the ranking distribution can not be accurately determined.
They are too few users accessing the second or third page of results to provide a reliable evaluation.
Google has been consolidating Search Console data since 2019 based on the canonical URL given in an HTML page.
Thus, Google is layering over the data for which URL was displayed in the search results.
For larger websites, this can create problems.
It means that configuration errors, redirects, AMP conversions, etc can no longer be uniquely tracked.
The GSC API counts and delivers keywords as a combination of keyword / term, URL, device and country.
Terms like keywords not provided are delivered as 130 different combinations.
You should only count each keyword once.
In addition, average positions on GSC can be confusing.
The position information often has decimal places.
You can see an average position of 1.7 for a keyword.
This decimal-place is based on Google’s method of counting the results.
Google counts the results that contain a link to the target page from top to bottom and then continues with other search result elements.
This counting method makes sense from the perspective of a search engine.
But for a webmaster, it can lead to false assumptions.
If your page ranks no 1 and there is only one knowledge panel in the results as an additional element in which the website also appears, Google will display the placement with the average position of 6 ((pos 1 + pos 11) / 2).
Google Search Console data is not complete
GSC was promoted as a replacement to the data derived from referrer keyword data.
However, this is now filtered by Google for privacy reasons.
Unfortunately, there is no information available from Google on the scope, extent, and background to the level of keyword filtering.
It is also unknown whether the filtering changes over time.
Data in GSC is compared to previously available keyword data, which used the referrer-string.
Yet it’s not the same. Data is missing and the rules are unclear.
Nonetheless, GSC will give you a good starting point to see some of your ‘not provided’ keywords, but you will need other data sources for a well-rounded interpretation.
For each keyword on GSC, you can access data for clicks, impressions, click-through rates (CTRs), and average position.
This gives you a good idea of what the most important organic keywords are for your website.
The problem with this report is that you can only get data for your entire website, not individual pages and you can’t map it to individual sessions on Google Analytics.
It doesn’t discern whether your site shows up on page one or page 1000 of Google.
Hence impressions in this report don’t necessarily mean people are clicking through to the page where you appear on Google.
Nonetheless, GSC plays a role in getting ‘not provided’ back.
Not set vs ‘not provided’
If you’ve looked at your keyword list in the last year or so, you may have noticed keywords: (not set) and (not provided).
Together they will make up 99%+ of your organic traffic.
The keyword (not set) identifies traffic that doesn’t arrive via a keyword, so might not come through search at all. This includes organic traffic from email, referral sites, Google Images, etc. Visitors from Google Images and Google Maps are classified as referrals with the source google.com, not organic search.
Because keywords are set for search traffic, the (not set) keyword will never appear in your organic Search reports, so it is likely something you won’t have to worry about. Hence it’s not important to consider ‘not set’ for keyword performance.
What does seeing the keyword ‘not provided’ mean? Any keywords searched organically by users who are logged into their Google Accounts (Gmail, etc) will show up in your Analytics reports as (not provided), as mentioned previously because their content is being withheld for the purpose of Secure Search.
The keyword will still be reported as ‘organic’ search, but the keyword itself is not visible to you.
Benefits of being able to see keywords in Google Analytics
Once you can view ‘not provided’ again you will be able to see what search terms people used to find different landing pages.
1. Monitor keyword performance
Secure Search is a challenge for webmasters. It makes monitoring keywords difficult. Some tools promise to help and offer rank monitoring and estimated traffic for individual keywords. Organic keyword data provides insights into what keywords drive the most traffic to your site and to different pages, without it you are playing darts with a blindfold.
However, the problem with these tools is their inaccurate sample set. They only look at a small number of keywords and vary in their sample size among different verticals. So they offer the users a fraction of the keywords that a website actually ranks for.
2. Evaluate behavior per keyword
Importantly, the transactional and behavioral metrics of individual keywords can no longer be measured. Webmasters don’t know whether a user who searched for ‘shoes’ has a higher conversion rate or longer time on site than a user who searched for ‘Red Nikes’. Hence, search optimization is virtually impossible.
3. Brand keyword visibility
Organic keyword data allows you to see what people are associating with your brand name as you can see what they are searching for along with your brand term. Also, organic keyword data provides insights that can help you optimize certain pages on your sites that may have high bounce rates for specific keywords.