What is the Robots.txt?
SEO
4Quick Read
The robots.txt is a file located at the root of a website that instructs search engine robots which pages to crawl or block. It's used to protect sensitive data and improve SEO efficiency. www.example.com/robots.txt If you're wondering, 'But with sitemap.xml, do web crawlers read my site?' you're on the right track.
What is the relationship between robots.txt and sitemap.xml?
The robots.txt and sitemap.xml have a complementary relationship and perform different functions. The robots.txt is the general guide for crawlers on which pages to index and which not to, and it includes the sitemap. The sitemap is the specific guide and provides the structure of the pages and their relationship with each other.
Why is it important for SEO?
The robots.txt file is important for SEO because it allows website administrators to control web crawlers' access to specific pages or folders, helping to optimize the crawl budget and prevent the indexing of duplicate content.
How do I create and submit a robots.txt?
To create a robots.txt file on a website, you will need to use the commands User-Agent, Allow, Disallow, and Sitemap. If you use WordPress, you can automatically create it with the Yoast SEO plugin. In other CMS, it is created automatically (e.g., Shopify), or you can create it manually in the following way:
Open a text editor like Notepad.
Write the commands, as shown in the example below.
Save the file as robots.txt and upload it to the root folder of your site.
Submission is done automatically by Google's crawlers, and you don't need to do anything else.
Example of robots.txt
#Example of allowing all bots to read your site
User-agent: *
Disallow: /members/
Sitemap: https://www.example.com/sitemap.xml
Best Practices and Tips
Below are 5 tips and best practices when creating a robots.txt file.
Use the disallow command very carefully (block only what does not add value to SEO or sensitive personal information, e.g. /cgi-bin/, /wp-admin/, /cart/, /scripts/ , /plugins/ )
Do not use the command
Disallow: /Use Disallow on directories with duplicate content
Always include the Sitemap with the full URL in the robots.txt file.
If a page is not needed, use noindex instead of excluding it.
Frequently Asked Questions (FAQ)
How do I use the User-Agent in a robots.txt?
The User-Agent in a robots.txt file is a directive used to instruct web crawlers or bots whether they are allowed to read the website or not.
You can allow all bots to read you with the asterisk *. Alternatively, you can specify each bot individually, as shown below.
Googlebot
Bingbot
Slurp Bot
DuckDuckBot
YandexBot
Facebot
How do I use Allow / Disallow in a robots.txt?
The Allow and Disallow commands are used in a robots.txt file to define whether specific pages or folders in a website will or will not be indexed.
#Example of blocking Bingbot and Ahrefsbot
User-agent: Bingbot
Disallow: /
User-agent: Ahrefsbot
Disallow: /
How do I verify if the robots.txt is correct?
To check if a robots.txt file is verified without errors, there are many methods, with the most familiar being the following.
What is crawl budget?
The crawl budget is the number of pages a bot can crawl on a website within a specific timeframe.
How do I use robots.txt to optimize the Crawl Budget?
The crawl budget is one of the main factors for SEO in 2024, and through the robots.txt, you can block specific folders and URLs (only if you know what you are doing) to optimize it. On sites with a large number of pages, block those with very low or no traffic.









