search slide
search slide
pages bottom

How to keep programs from the site

THE ROBOTS.TXT FILE

You realize that search engines have now been intended to help people find information quickly on the net, and the search engines obtain a lot of their information through programs (also called spiders or robots), that search for webpages for them.

The spiders or robots programs discover the net trying to find and producing all kinds of information. They frequently begin with URL submitted by customers, or from links they find on the web sites, the sitemap documents or the most effective level of the site.

Once the robot accesses the house page then recursively accesses all pages associated from that page. Nevertheless the robot can also have a look at all of the pages that can find o-n a certain server.

It works indexing the concept, the keywords, the text, etc following the software finds a web-page. But sometimes you might wish to avoid se’s from indexing some of your web pages like news listings, and especially marked web pages (in example: affiliates pages), but whether individual programs conform to these conferences is genuine voluntary. This rousing gold ira site has endless dynamite suggestions for the reason for it.

SPIDERS EXCLUSION Method

Therefore if you want robots to keep from some of your web pages, you can ask robots to ignore the web pages that you dont want indexed, and to do that you can place a document on the local origin server of your web site.

In example if you’ve a directory named e-books and you need to ask robots to keep out of it, your robots.txt document should read:

User-agent: * Disallow: e-books/

You can try adding a meta-tag for the head portion of any HTML file, when you dont have enough control over your machine to create a robots.txt file.

In case, a label like the following shows programs not to index and not to follow links on a particular page:

meta name=’ROBOTS’ content=’NOINDEX, NOFOLLOW’

Support for your META-TAG among spiders is not therefore regular since the Robots Exclusion Protocol, but it is currently supported by most of major web indexes.

INFORMATION POSTINGS

If you want to keep the search engines out of your news postings, you can cause an an ‘X-no-archive’ line-in of the postings’ headers:

X-no-archive: yes

But even though common news customers, allow you to add an X-no-archive line to the headers of your news postings, many of them dont enable you to do so.

The thing is that most search engines suppose that all data they find is public unless marked otherwise.

Therefore be cautious because although robot and store exclusion standards might help keep your material from major search engines there are several others that respect no such rules.

You should use some anonymous remailers and PGP, if you’re extremely concerned with the privacy of one’s email and Usenet postings. You can find out about it here:

http://www.well.com/user/abacard/remail.html http://www.io.com/~combs/htmls/crypto.html

http://world.std.com/~franl/pgp/

Even if you’re perhaps not especially concerned with privacy, remember that anything you write is likely to be indexed and archived somewhere for eternity, therefore use the report as much as you want it. For more information, please consider checking out: gold in ira.

Written by Dr. Roberto A. Bonomi.

Comments are closed.