John Mueller Explains Why GoogleBot Does Not Crawl All Site Pages
- 22 July, 2021
- Jason Ferry
- Search Engine Optimisation
Online businesses and webmasters conduct website SEO optimisation for Google in order to boost their rankings and site traffic. In a Google SEO Office Hours hangout, Google's John Mueller answered a question about why the search engine did not crawl enough web pages - an issue that has confused many search engine optimisation agencies and businesses.
The person who asked the question said that Google was crawling in a way where it could not keep pace with enormously large websites. Mueller then gave several reasons why Google might not be crawling enough web pages.
Search Engine Crawling
Search engine crawling refers to the process where search engine web crawlers (spiders or bots) visit and download a site's page. The crawlers then extract the page's links to discover additional pages.
Spiders and bots regularly crawl websites, so search engines can determine if the site owner has made any changes to their pages. If the search engine detects any changes, it will update the page's information in its index.
Google Crawl Budget
Google's crawler, GoogleBot, crawls from page to page, indexing and ranking them. However, the Internet is too large for GoogleBot to crawl. Therefore, the search engine company had to strategise and compromise by choosing to only index higher-quality web pages.
According to Google's developer page, the term "crawl budget" refers to the amount of time and resources that the search engine company devotes to crawling a website. The page also states that GoogleBot can crawl everything on a website but cannot index everything.
Instead, the search engine evaluates, consolidates, and assesses each site page to determine if it is worthy of indexing after crawling.
There are two main elements that determine Google's crawl budget: crawl capacity limit and crawl demand.
What Decides GoogleBot Crawl Budget?
The person who asked the question explained that they own a website containing thousands of pages. However, Google only crawls about 2,000 web pages per day, which seemed too slow for such a large website. The person also said that they had seen a backlog of 60,000 discovered pages that have not been crawled nor indexed by Google.
This frustrated the SEO as they've tried to make improvements but have yet to see any significant increase in the number of pages crawled. They asked Mueller for insights to better understand Google's current crawling budget.
Mueller responded, saying that there are two main reasons for this situation. He said that if the person's server is slow in terms of response time, then it would be visible in the crawl stats report. Therefore, he advised the SEO to aim for a number below 300-400 milliseconds on average.
With a faster website response time, Google can crawl and index web pages much faster than usual. However, one should note that this is not the same thing as page speed.
The next reason may have something to do with the website's quality. According to Mueller, if GoogleBot perceives a website as low-quality, the search engine will avoid crawling the web pages. He admitted that they mostly experience this with newer websites.
Moreover, Mueller said that some people are confident about creating a website with millions of web pages because they have a database, and everything can be put online. Many web pages waiting for GoogleBot, but the search engine is hesitant to crawl them because it isn't sure of their quality.
Google Search Central agrees with this idea. According to the blog page, there are several factors that could affect Google's crawl budget:
- Soft error pages
- Low-quality and spam content
- Hacked pages
- On-site duplicate content
- Infinite spaces and proxies
- Faceted navigation and session identifiers
Crawl Rate Limit
The term "crawl rate" refers to the number of requests per second that GoogleBot makes to a website when crawling it. SEOs cannot change how often the search engine crawls their sites, but they can request a recrawl if they updated a piece of content.
According to Google Search Central, GoogleBot's main priority is to ensure that it does not rob its users of a quality browsing experience when they visit sites. "Crawl rate limit" refers to limiting the maximum fetching rate for a specific website.
Basically, it represents the number of simultaneous parallel connections that Google's crawler uses to crawl the site and the time it needs to wait between requests. The crawl rate can fluctuate based on several factors:
- Limit set in Search Console: Site owners can reduce the number of crawls done for their website. However, one should note that setting a higher limit does not necessarily mean Google would increase their crawling.
- Crawl health: If the site has a fast response, the limit would go up, which means Google can use more connections to crawl. But with a slower site response or server errors, the limit would go down, and GoogleBot crawls less.
Factors That Affect The Number Of Web Pages Google Crawls
Many factors can affect the number of pages that Google can crawl. For instance, websites hosted on shared servers may have trouble with search engine crawling. One of the most common reasons for this issue is that other websites on the server might be using too many resources, slowing down the whole server. Another reason could be rogue bots, which cause the site to slow down.
The server speed plays a crucial role in search engine crawling. It is wise for search engine optimisation agencies and webmasters to check their server speed after hours at night because this is when GoogleBot usually crawls web pages - a less disruptive time when fewer users are surfing the Internet.
Work With The Best SEO Agency In The UK Today!
Position1SEO can promise you a spot on Google's Page 1 with zero risks! Thanks to our no-stone-unturned approach and white hat SEO tactics, you can have peace of mind knowing that you can achieve results without incurring Google penalties. We are experts in website SEO optimisation for Google, and we guarantee that your site will be in its best state for GoogleBot to crawl.
Our team has garnered years of experience working with a long list of clients in different niches, so you can rest assured that we are more than capable of getting great rankings for your website, no matter what industry you work in.
If you want to know more about what we can do for you, call us on 0141 846 0114 or email us at office@position1seo.co.uk.