Author Topic: Robots.txt help for my site  (Read 910 times)

Offline Yukito

  • Undergrad
  • ******
  • Posts: 596
  • Merits 130
    • Yukito's Corner
Robots.txt help for my site
« on: July 10, 2022, 12:21:07 PM »
Hey. I've sent my website for indexing(its indexable now) and want to know if my robots.txt file is correct. Here is what it says currently:

Code: [Select]
User-agent: *
Allow: /*.js
Allow: /*.css
Allow: /*.jpg
Allow: /*.png
Allow: /*.gif
Allow: /*?page
Allow: /*?ref=
Allow: /*?
Disallow: /stat/
Disallow: /index/1
Disallow: /index/3
Disallow: /register
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /index/sub/
Disallow: /panel/
Disallow: /admin/
Disallow: /informer/
Disallow: /secure/
Disallow: /poll/
Disallow: /search/
Disallow: /abnl/
Disallow: /*_escaped_fragment_=
Disallow: /*-*-*-*-987$
Disallow: /shop/order/
Disallow: /shop/printorder/
Disallow: /shop/checkout/
Disallow: /shop/user/
Disallow: /shop/search
Disallow: /*0-*-0-17$
Disallow: /*-0-0-

Sitemap: /sitemap.xml
Sitemap: /sitemap-forum.xml
Thanks.

Offline Vela Nanashi

  • Undergrad
  • ******
  • Posts: 688
  • Merits 434
  • Fantasy > Reality
Re: Robots.txt help for my site
« Reply #1 on: July 10, 2022, 03:54:40 PM »
I am no expert on robot.txt, but I do not think there is any value from allowing css and js files to be indexed, as they are not usually containing anything relevant to the content of the site. While images and html files or php that renders content of the website would be relevant, as would any txt or other documents served on the site.

I don't know if /index/ is a php thing that shows a list of topics on some sort of forum, if it is, then that probably should be allowed to be seen, if you want the search engines to be able to index and find things there. I have not looked at your site though so I do not know how it works.

If you want your entire site to be indexed by the search engine, make sure it has paths to all the content that it is allowed to follow. Also maybe you should allow your shop if you have one to be indexed too, so that products you sell can be found by searching for them on the search engine :) Though you will still want to prevent the bots from putting things in carts or ordering things by bumbling around.

Offline Yukito

  • Undergrad
  • ******
  • Posts: 596
  • Merits 130
    • Yukito's Corner
Re: Robots.txt help for my site
« Reply #2 on: July 10, 2022, 10:23:40 PM »
I am no expert on robot.txt, but I do not think there is any value from allowing css and js files to be indexed, as they are not usually containing anything relevant to the content of the site. While images and html files or php that renders content of the website would be relevant, as would any txt or other documents served on the site.

I don't know if /index/ is a php thing that shows a list of topics on some sort of forum, if it is, then that probably should be allowed to be seen, if you want the search engines to be able to index and find things there. I have not looked at your site though so I do not know how it works.

If you want your entire site to be indexed by the search engine, make sure it has paths to all the content that it is allowed to follow. Also maybe you should allow your shop if you have one to be indexed too, so that products you sell can be found by searching for them on the search engine :) Though you will still want to prevent the bots from putting things in carts or ordering things by bumbling around.
First, thank you for replying.
Second, the site has lots of JS scripts, specially for logging in(it has paywalled content).
I just examined some of the index links and some link back to the main page. Probably I should leave the register page as indexable?
Shop is not installed in the website, so, the link wouldn't work.
And, what does "Allow: /*?" mean?
Thanks again.

Offline Army of One

  • Professor
  • Masters Degree
  • *****
  • Posts: 2,747
  • Merits 119
Re: Robots.txt help for my site
« Reply #3 on: July 10, 2022, 11:25:05 PM »
Okay, you need to cut that down a little:

  • Only add in files that provide some kind of important information. Stylesheets and script files don't contain those, so strip them. This also extends to contents pages, or any files that get transcluded into core pages, since they are never intended to stand alone.
  • Be selective with what you want to add. Do you only want a few pages/files in a directory available? Specificity defines the relationship between Allow and Disallow: the more specific rule defines that resource's indexing.

As an example, in regards to page parameterisation (which is what the "?" thing is about), be specific about which pages use it, and what those parameters are (well, at least the first one). If none of them use parameterisation, dump those Allows.
« Last Edit: July 10, 2022, 11:31:24 PM by Army of One »
Extinguishing the Flame is available on Amazon and supports Australian Bush fire relief.

Offline Yukito

  • Undergrad
  • ******
  • Posts: 596
  • Merits 130
    • Yukito's Corner
Re: Robots.txt help for my site
« Reply #4 on: July 11, 2022, 12:12:10 AM »
Okay, you need to cut that down a little:

  • Only add in files that provide some kind of important information. Stylesheets and script files don't contain those, so strip them. This also extends to contents pages, or any files that get transcluded into core pages, since they are never intended to stand alone.
  • Be selective with what you want to add. Do you only want a few pages/files in a directory available? Specificity defines the relationship between Allow and Disallow: the more specific rule defines that resource's indexing.

As an example, in regards to page parameterisation (which is what the "?" thing is about), be specific about which pages use it, and what those parameters are (well, at least the first one). If none of them use parameterisation, dump those Allows.
I only want to expose the home, site info and the stories page. There is no need to crawl the control panel or other functions. By now, you can google "yukito's corner" with the quotes to reach it.
Scores 85 on mobile and 95 on PC according to the Google PageSpeed.

Offline Army of One

  • Professor
  • Masters Degree
  • *****
  • Posts: 2,747
  • Merits 119
Re: Robots.txt help for my site
« Reply #5 on: July 11, 2022, 05:07:01 AM »
I only want to expose the home, site info and the stories page. There is no need to crawl the control panel or other functions. By now, you can google "yukito's corner" with the quotes to reach it.
Scores 85 on mobile and 95 on PC according to the Google PageSpeed.
Well then, that's an easy one:
Code: [Select]
User-Agent: *
Allow: <address of your home page>
Allow: <address of your site info page>
Allow: <address of your main stories page>
Disallow: /
Sitemap: /sitemap.xml
Six simple lines, and that'll cover you. That said, check sitemap.xml, make sure those three pages are the only three listed in it; the two files need to expose the same three pages.
Extinguishing the Flame is available on Amazon and supports Australian Bush fire relief.