The Effect of Big Robots.txt Files on SEO
Google’s John Mueller was recently asked whether or not keeping robots.txt files within a reasonable size is one of the best Google SEO practices. Mueller gave his insights to Google SEO experts in the Google Search Central SEO office-hours hangout on 14 January.
One SEO told Mueller that they were concerned about having a big, complex robots.txt file. They said it has over 1,500 lines with many disallows that continue to grow as years go by. The disallows hinder Google from indexing URLs and HTML fragments using AJAX calls.
Setting a noindex is one way to exclude URLs and HTML fragments from Google’s index, but the SEO said they couldn’t do this, so they filled the website’s robots.txt with disallows instead. So, they asked Mueller if a huge robots.txt file could negatively impact their SEO.
SEOs Should Think Twice Before Keeping Large Robots.txt Files
Mueller explained that a huge robots.txt file would not directly harm a website’s SEO, but it is more difficult to maintain than smaller files. It can result in unforeseen problems down the road if SEOs don’t keep an eye on their website’s robots.txt file. Just because it is a large file does not necessarily mean that it is a problem, but it can easily create problems in the future.
The SEO then asked if they would encounter issues if they did not include a sitemap in the robots.txt file. Mueller said no, noting that the different methods of submitting sitemaps are equivalent for Google.
Afterwards, the SEO asked several follow-up questions regarding the topic.
Google Recognises HTML Fragments
The SEO asked Mueller about the SEO impact of drastically shortening the robots.txt file, like removing all disallows.
They also asked Mueller if Google would recognise HTML fragments even if they are irrelevant to site visitors and if HTML fragments would appear in Google’s search index if they do not disallow them in robots.txt.
Another question they asked is how Google deals with pages with AJAX calls, such as footer and header elements.
The person then summarised all their questions by stating that most of the things disallowed in their robots.txt file are footer and header elements, which are irrelevant to the user.
Mueller said that it’s impossible to predict what would happen if they immediately index those fragments. Trial and error may be the best way to answer this question. He added that the SEO should find out how those fragment URLs are used. And, if they are unsure, they can take a fragment URL and crawl it. Then, they can look at the fragment URL’s content and check it in search.
By doing so, they should see if the fragment URL affects anything with regards to the site’s indexed content or if that content is immediately findable within the website. If so, one should ask if it causes them any problems.
Mueller advised Google SEO experts to work based on that because one can easily block things that aren’t used for indexing with robots.txt. It is also a waste of time maintaining a huge robots.txt file, but that doesn’t impact their website.
Considerations for Building Robots.txt Files
Lastly, the SEO asked Mueller if there are specific guidelines to follow when building a robots.txt file. Mueller said that there is no specific format to follow, and it’s entirely up to the site owner. Some websites have huge files, whereas others have tiny ones; they should all operate fine.
Mueller also said that they use an open-source robots.txt parser code. So, SEOs may also have their developers execute the parser or set it up to test it. Then, they can use that parser to check the URLs on the website and see which ones will be blocked and what effect they will have. SEOs can perform this process in advance to ensure that nothing bad happens after making them live.
What Is Robots.txt?
A robots.txt file tells crawlers – like Googlebot – what they should look for. It’s part of the robots exclusion protocol (REP). Google uses Googlebot to crawl websites and gather data to understand how to rank sites in Google search results. One can find a robots.txt file by placing /robots.txt after the web address.
The Importance of Robots.txt
Many SEOs complain that their websites do not rank after launching or migrating to a new domain despite months of hard work. The biggest reason for this is that they have an outdated robots.txt file. An outdated robots.txt file will prevent all search engines from crawling the website.
Another important thing about robots.txt has something to do with Google’s crawl budget. If a big website wants to hide its low-quality pages from Googlebot, it can disallow them in the robots.txt file. Doing so would free up the site’s crawl budget, allowing Google to prioritise crawling for higher-quality pages.
There are no hard and fast rules for robots.txt files, but Mueller explained that a huge robots.txt file would undoubtedly cause problems in the future.
What to Hide with Robots.txt
With a robots.txt file, one can disallow specific categories, directories, or pages from appearing in the search results. One can use the “disallow” directive to exclude certain URLs. Below are some types of pages that you may want to hide using a robots.txt file:
- Pages with duplicate content
- Admin pages
- Dynamic product and service pages
- Pagination pages
- Shopping cart
- Account pages
- Thank you pages
Optimise Your Web Pages with Position1SEO
Position1SEO is a UK-based SEO agency that aims to provide all the solutions to your SEO-related problems. We have an experienced team that provides high-quality SEO marketing services, website SEO optimisation, and more.
We provide fresh content related to your products and services and optimise your web pages, ready to be crawled by Googlebot. Our team guarantees better conversion rates, affordable prices, and the best Google SEO techniques.
All our services are offered at reasonable prices so you can get more without exhausting your funds! For more information about our SEO packages, don’t hesitate to contact us today!