[iBook] Re: OFF TOPIC: web site question

Joel Esler eslerj at gmail.com
Mon Aug 7 04:48:02 PDT 2006


Okay I've allowed the spider indexing discussion to go on long enuff.

When a website it indexed by a search engine, two things are done.

First: A "GET / HTTP/1.0" command is run to make sure the website is up.
Second: A check for the file "/robots.txt" is done to check permissions.

The Robots.txt file tells each search engine in the world what the
search engine is allowed to index off of your site.  Following the
rules of the robots.txt, the spider then reads your webpage.  Any link
on the webpage (Except for ads, banners, and stuff like that) is
followed.  For instance if someone ELSE was linking their site to your
site.

Once that link is followed the same process happens over and over
again.  So say a webpage has 10 links, each of those 10 links has 10
or 50 links off of it.  (Think about how many links your site has,
both internally to itself to other pages on your website, and also
even the smallest little link to someone elses webpage), then those
webpages have 10, and so on, and so on.   Now you see why Search
engines are such a baffling technology.  It's hard to index all those
BILLIONS of BILLIONS of webpages.  Now...

If someone wants their website to be indexed by Google, you can submit
it to Google and Google will throw it into the spider crawl queue and
index it when it gets around to it (usually in a day or so), then
after you are indexed, you are searchable.  (Assuming that your
robots.txt is set correctly)

Google Cache is a whole ANOTHER story that I won't get into right now,
but you can imagine.

Searching is hard work.

J

On 8/7/06, Malcolm Cornelius <malcolm at fireflyuk.net> wrote:
> > With my particular hosting provider, I do have to pay extra and I can
> > guarantee Google will not find it anyway.
>
> Why ?
>
> > I think it depends on your hosting provider and how they have chosen to
> > install the Google component on their servers.
>
> What "google component" is this ?
>
> --
> Best wishes
>
> Malcolm Cornelius - The Powerbook Fanatic
> http://www.pbfanatic.co.uk
>
>
>
> _______________________________________________
> iBook mailing list
> iBook at listserver.themacintoshguy.com
> http://listserver.themacintoshguy.com/mailman/listinfo/ibook
>
> Listmom is trying to clean out his closets! Vintage Mac and random stuff:
>          http://search.ebay.com/_W0QQsassZmacguy1984
>


-- 
--Joel


More information about the iBook mailing list