A robots.txt file Instructs search engine robots which pages or files the crawler can and cannot request from your site. The robots.txt file is primarily used to specify the parts of your website that should be crawled by spiders or crawlers.
A robots.txt file tells search engines where they can and can't access your site.
Mainly, it lists all the content that you want to lock out of search engines like Google. You can also tell certain search engines (not Google) how they can crawl for authorized content.
This is How Robots.txt Looks Like. You can check robots.txt of any site (https://www.xyz.com/robots.txt)
One of the best uses of the robots.txt file is to optimize crawling budgets for search engines by telling them not to crawl the parts of your site that are not shown to the public. For example, if you visit the robots.txt file for any WordPress site. you’ll see that it disallows the login page (wp-admin).
Since we use that page for logging in the backend. it wouldn’t make sense for search engine bots to waste their time crawling it.
You can use a similar command to Avoid bots from crawling specific pages. After the disallow, enter the part of the URL after the (domain URL).com which you don’t want to be crawled. Put that between two forward slashes.
So if you want to tell a bot to not crawl your page http://xyz.com/page/, you can add this:
Disallow : /page/
You may be wondering exactly what types of pages to exclude from indexing. Here are some common scenarios where this would happen:
Duplicate content is useful. While duplicate content is generally a bad thing, there are a few instances where it's necessary and acceptable. For example, if you have a printable version of a page, you technically have duplicate content. In this case, you can tell bots not to scan one of these versions (usually the printable version). It's also handy if you're testing split pages that have the same content but different designs.
Thank you pages. The thank-you page is one of the favorite pages for marketers because it signifies a new lead.
It turns out that some thank you pages are accessible through Google. This means that people can get to these pages without going through the lead capture process, and that's bad news.
By blocking your thank you pages, you can ensure that only qualified prospects see them. So let's say your thank you page is at https://xyz.com/thank-you/. In your robots.txt file, blocking this page would look like this:
Setting up your robots.txt file properly isn't just improving your own SEO. You are also helping your visitors. If the search engine bots can spend their crawl budgets wisely, they will organize and display your content in SERPs in the best way, which means you will be more visible. Setting up your robots.txt file doesn't take much effort either. This is mostly a one-time setup and you can make small changes if needed.