Author: Luca Bares (Luca Bares) - Senior SEO-analyst, Wayfair.
CTR rate (CTR) - it is an important metric that can be used for different purposes - from assessment income opportunities and prioritizing keywords and ending with the assessment of the impact of changes in the SERP to the market.
Many SEO-experts build their own curves CTR for sites to make these predictions even more accurate. However, if these curves are based on the information from Google Search Console, it is not that simple.
GSC, as is known, is an imperfect tool which may provide inaccurate data. This confuses the information that we get from the GSC, and may hinder the accurate interpretation of the curves CTR, we create with this tool. Fortunately, that help eliminate these errors, so you can better understand what they say your data.
If you thoroughly clean the data and thoroughly thought through analysis methodology, you can much more accurately calculate the CTR for your site. To do this will need to perform four steps:
- Remove the keyword data for their sites from GSC - the more information you get, the better.
- Remove keywords that can distort the overall picture. Brand searches can cause the displacement curves in CTR, so they should be removed.
- Find the optimal level of impressions for your data set - Google samples the data at low levels of impressions, so it is important to remove the keywords on which Google may provide inaccurate data on these lower levels.
- Choose the methodology for determining the position in the rankings - not one set of data is not ideal, so you can change the methodology for classification of grades depending on the size of the sample keywords.
A small digression
Before proceeding to the peculiarities of the construction of curves CTR, it is important to mention an easy way to calculate the index of CTR, which will also be used in the article.
To calculate the CTR, upload keywords for which your website is ranked, together with data on clicks, impressions, and position.
Then take the amount of clicks divided by the sum of impressions on every level rank of GSC data, and you get your own curve CTR. For more information on the actual reduction of the numbers for the curves CTR you can read the article from the SEER .
This calculation becomes quite a challenge when you start trying to control the offset, which are inherent in data on CTR. However, although we know that this approach gives the wrong information, we really are few other options, so the only way - to try to the maximum to eliminate bias in our data set and know some of the problems that arise when using them.
Without control and work with data taken from the GSC, you can get results that are illogical. For example, you may find that according to your curves, in positions 2 and 3, the mean CTR is much higher than in position 1.
If you do not know that the data that you use from the Search Console, may be inaccurate, you can take them for the truth and a) try to put forward a hypothesis about why the curves CTR seem so based on incorrect data, b) withdraw inaccurate estimates and predictions on the basis of these curves CTR.
Step 1: Extract the data
The first part of any analysis - is to get the data.
- Google Search Console
Search Console - this is the easiest platform in terms of getting the data that he collects Google. You can enter in service and export all your keyword data in the past 3 months. Google will automatically upload them to a file in CSV format.
The disadvantage of this method is the fact that the GSC is exporting only 1,000 keywords at a time, which makes your sample is too small for analysis. This limitation can be circumvented by using a keyword filter for basic queries that you configure, and download multiple files. But it is quite time-consuming process. The methods listed below - it is better and easier.
- Google Data Studio
For anyone who is not a programmer, this is definitely the best option. Data Studio (in the Russian version - "Data Center") is connected directly to the GSC account, with no restrictions on the amount of data downloaded there. During the same three-month period with the help of this tool you can get 200 thousand. Keyword (!) Instead of 1000, if you do it in the GSC.
- Google Search Console API
One of the best ways to get the right data - it is connected directly to the source, using its API.
Using this method, you will have much more control over the data that you are extracting, and you get a rather large set of data.
The main disadvantage here is that for this you will need knowledge and resources for programming.
Note. Tools in this section are listed by the volume of data that can be obtained with their help - descending.
Step 2: Remove keywords that can distort the overall picture
After you extract the data they need to be cleaned.
- Remove branded keywords
When you create a common curves CTR, it is important to remove all branded keywords. These phrases are usually higher CTR, which leads to a shift in the average sample values, so they must be removed.
- Remove the keywords associated with the search functions
If you know that certain queries your site is regularly ranked in some search features such as Knowledge Panel, then they should also be removed. The reason is that we expect CTR to positions 1-10, and search functions can shift averages.
Step 3. Determine the optimal level of impressions in the GSC for its data
The largest deviations caused by the data from the GSC, due to the fact that the service includes sample data for a minimum of impressions that you want to delete.
For some reason, Google significantly overestimates the CTR for queries with low impressions. For example, here is a graph of distribution shows that we have done, taking into account data from the GSC for the keywords that have only one impression and CTR for each position.
According to this schedule, most of the keywords that have received only 1 show, have CTR by 100%. It is extremely unlikely that the keywords with a single display will have a CTR. This is especially true for the keywords with the position of # 1 below. This gives us a fairly strong evidence that the data on the low-level data can not be trusted, and we must limit the number of keywords in our sample.
3.1. Use the normal distribution curve to calculate the CTR
To further ensure that Google provides distorted information, let's look at the distribution of CTR for all keywords in our dataset.
Since we expect the average values of CTR, the data must comply with the normal distribution curve (Bell Curve). However, in most cases, CTR curves with data from the GSC strongly deflected to the left with long tails, which again points to the fact that Google has reported a very high CTR at low volumes shows.
If we change the minimum number of impressions for sets of keywords that we analyze, then eventually we will become closer and closer to the center of the graph. Below is an example of CTR distribution site in increments of 0.001.
The above chart shows the number of impressions at a very low level, approximately 25. Data distribution is mainly on the right side of the graph. In this case, a high concentration of small left means that this site has a very high CTR. However, increasing the level of impressions for the keyword to 5000, the distribution of the keywords will be much closer to the center.
This schedule will most likely never will be centered around CTR of 50%, because it is a very high average. Therefore, the graph must be shifted to the left. The main problem is that we do not know how, because Google provides us with sample data. The best thing we can do is to try to guess. But the question arises: what is the right level of impressions for filtering keywords to get rid of the erroneous data?
One way to find the right level of impressions to create curves CTR - use the method described above, in order to understand where the distribution of CTR is close to normal. Normally distributed data set of CTR has fewer spikes and is less likely to contain a large number of distorted data fragments from Google.
3.2. Determine the best level of capping for CTR calculation Site
You can also create levels of impressions, to see where there are fewer rasbros analyzed data, instead of the normal curve. The smaller rasbros in your score, the closer you get to an accurate CTR curve.
- multilevel CTR
Multilevel CTR should be calculated for each site, because the sample of GSC for each resource varies depending on the keywords for which it is rated.
For example, we have seen a situation where the curves diverged CTR by 30% without proper control over the calculation of this indicator. This step is important because the use of all the data points in the calculation CTR can greatly shift your results. And the use of too few data points gives you is too small a sample size to get an accurate idea of what your CTR. The key is to find a middle ground between them.
In the above multi-level table has marked variability in the interval of all the shows to> 250 hits. However, after that mark the difference between the levels is small.
For the analyzed site shows the correct level - more than 750 since the beginning with him, the differences between the levels are pretty small. The level of> 750 shows still gives us a sufficient number of keywords ranking on every level of our data set.
When you create a multi-level curves CTR is also important to calculate how much data is used to build each point of data at all levels.
In the case of small sites, you may find that you do not have enough data for a reliable calculation of the curves CTR, but it will not be obvious if only to look at multi-level curves. It is therefore important to know how the data volume you have at every stage, in order to understand what level of impressions is the most accurate to your site.
Step 4. Determine which methodology for determining the position you want to use
Once you determine the correct level of impressions, you will need to filter the data so that you can proceed to the calculation of curves using CTR statistics on impressions, clicks and positions.
The problem with the data on the position is that they are often inaccurate. Therefore, if you have an effective tool to monitor keywords, much better to use your own data instead of Google data. Most people can not keep track of so many products, so they have to use a Google Account. This, of course, it is permissible, but it is important to remain cautious.
How to use of GSC data position
When calculating CTR curves using data on the average position of the GSC the following question arises: what figures to use - rounded or accurate?
Using exact numbers allows us to get a better idea of what kind of CTR in the first position. Keywords with the exact position more likely ranked the gap in the time period for which the data were collected. The problem is that the average position - is the average, so we have no way of knowing if a keyword is actually ranked the gap all the time or only partially.
Fortunately, if we compare the CTR on the exact positions and rounded, they are similar in terms of the estimated CTR values based on sufficient data. However, if you do not have enough data, the average position may fluctuate. Using a rounded position, we get a lot more data, so it makes sense to use these numbers if you have insufficient data at exactly the same position.
In this case, there is one caveat, which deals with assessment of CTR in the first position. For it is better to use the exact value of the position, not rounded - to avoid possible under-CTR.
Adjusted exact position
Therefore, if you have enough data for the 1st position, use only the exact figures on the position. For smaller sites, you can use an adjusted exact position.
Since Google provides on average up to two decimal places, one way to get "a more accurate position of" # 1 - is to include all of the keywords that have a rating below 1.1 positions. As a result, you get a couple hundred additional keywords that will make your data more reliable.
It's also hard not to reduce our average since GSC somewhat inaccurate estimates the average position. In Wayfair STAT we use as a tool for tracking the positions of key words, and then compare the difference between the data for the average of the GSC and STAT, ratings near the close of the first position, but not 100% accurate. As soon as you start to go down below, the difference between the STAT and GSC will be more, so stay tuned to see how far from the Topa issue you are to incorporate more keywords in the data set.
We conducted this analysis for all items tracked by Wayfair, and found that the lower the position, the less close match between these two tools. So, Google does not offer a very high quality of the ranking data, but the data are sufficiently accurate closer to the 1st position. We therefore convenient to use an adjusted exact position to increase the collection of data without worrying about loss of quality (within reasonable limits).
conclusion
GSC - is an imperfect tool, but it provides SEO-experts the best information for understanding the effectiveness of a site in terms of CTR in the SERPs.
Given the limitations tool, it is important to control the maximum amount of data available to them. To do this, choose your ideal source to retrieve the data, delete the keywords with low impressions and use the correct methods of rounding. If you do it all, you will be much more likely to get a more accurate and consistent curves for CTR site.