View search for: http://
  (eg. iwebtool.com)
Keywords
  (eg. online tools)

Powered by iWEBTOOL

Spider View Tool
Free Spider View Tool
Spider View Tool
Free Spider View Tool



When Will Google Spider My Site?

By Garnet R. Chaney
Many of my clients and friends who have websites will ask me "So how long will it take google to index my site?" Many of my friends, after hearing me talk about how I get hundreds of thousands of search engine hits, and how later I get lots of traffic from the search engines, wonder how they can benefit from this free traffic. It's a common question that I hear a lot.
I saw a question on a forum where someone wondered why google had only indexed less than 1/3 of 1% of his site. In another thread, he commented about Google's merry and unpredictable path around the net as "one of the many ways Google can be gay" (which was quickly flamed by the diversity police.) I gave Blayne some clues from my experience:

Hi!
It might be too early to worry. Google is often the last one to completely spider my sites. Teoma, Altavista, Inktomi, and others all get to my sites much faster, and gobble much more of them in their first visits.

Google seems to first spider the home page, then a couple of weeks later it will get the pages mentioned on the home page. Then a couple of weeks later it will get the pages that were mentioned in the second set. And so on. Eventually, when google has travelled a lot of your pages, it'll start visiting to recheck the old pages, and you'll have a regular steady stream of google visits.

- Garnet

http://www.webfind.us


Recently, I helped one of my clients reorganize his site into a 150 page information portal. Within a week he saw an improvement of his ranking and findability in the Yahoo search engine. It's taken a little longer for Google to visit his site, but during the last two months, his web traffic has been doubling each month. I taught him a number of techniques for improving the quality of the content that he adds to his site, and I am sure he'll continue to enjoy more and more web traffic to his site.




==============================================================================


Frequently Asked Questions (FAQ)

What are spiders and how do they work?
How do you actually set up and maintain your spider list?
How do you reference search engine spiders — by UserAgent or by IP?
Why are some spiders uncommented, e. g. Googlebots?
Instead of uncommenting spiders, why not simply leave them out altogether?
Why are some spiders labelled SE, while some have no label?
How can I prevent Google from caching my pages?
If I leave the comments in your spider list, will the robots still read only what is not commented?
Are you using unresolved or resolved IPs to determine search engine spiders? And if both, which version should I use?
Does your fantomas spiderSpy™ software program work on NT as well?
What are “environmental variables” or “footprints”, and how do they relate to search engine spiders?
What is IP Delivery and where does your service come in?
I was interested in buying a one-year access to your spider IP list. I see there are several engines you cover, many I've never heard of.
Can you direct me on how I can update the botBase for the fantomas shadowSniper™ software in my fantomas Webmaster Suite™?
How do I update my IP-file without having a lot of double ip-numbers when I add my own?
What are spiders and how do they work?
Warning: this is a long one! :-)
A: A “spider” or “crawler” is a program (usually automated) employed by — commonly — a search engine to grab data on websites for later indexing. There are different types of spiders and, hence, various forms of spider behavior.

E.g. there are submission checkup spiders activated when you submit a site to a search engine: they will do a simple check if the submitted URL is valid, if the server is available/accessible, if a redirect command is effected, etc.

If the URL passes this test, it will typically be stored in a task queue for later crawling.

Comes the time (and this may literally take weeks) when your site is scheduled for full spidering or crawling (terms used synonymously here): this will be done by another spider program which may either suck only a single page's content for later indexing (“flat crawl”) or may follow internal and/or even external links (i.e. hyper links leading to other sites) up to a predefined level (“deep crawl”), typically 3-5 levels deep.

The data thus collected will then be fed into the search engine's database, where it will be indexed (again, this process may take several weeks to actually happen). After indexing, your site will be available for searches.


How do you actually set up and maintain your spider list?
A: Establishing such a list is no mean task: while many, many spider lists are freely available on the net, our (commercial) fantomas spiderSpy™ service goes a step further in systematizing any spider whose data we can get hold of.

To give you an impression of what we are up against: we have verified and stored no less than 900+ unique Inktomi spiders alone; AltaVista is featured with 1,100+ spiders, etc. We are also covering international search engine spiders e.g. from Germany, Japan, etc.

Nor does this ever stop: major search engines tend to implement new spiders almost on a weekly basis; spider names may be changed, IPs reassigned, etc.

Then there's new search engines pressing on the market all the time. This is what our fantomas spiderScouts Department is all about: it does just what it says — scouting for search engine spiders across the whole net. To effect this, we have set up a string of “spider traps”: these may range from proprietary software to dedicated domains whose only function is to invite spidering by regular (daily) page submissions. Then there's our own log files evaluation, third party sources, etc., etc.

You can read more about it at: http://fantomaster.com/fasvsspy01.html

There, you will also find a comprehensive list of search engines currently covered.


How do you reference search engine spiders — by UserAgent or by IP?
A: Strictly by IP as this is the only really safe approach. The UserAgent variable can easily be forged, some browsers (such as Opera) actually offer you a choice of UserAgent variables to submit to any web site you visit. Thus, a snooping competitor might spoof a search engine spider's UserAgent variable to detect whether you are using cloaked pages — a scenario you should avoid at all costs!


Why are some spiders uncommented, e.g. Googlebots?
A: These are typically classified as "Decloaking Hazards". E.g. the Googlebot spiders will cache your pages unless excluded from doing so by implementing a proprietary meta tag.

This is hazardous to cloaked setups because the spider, if not uncommented and, hence, treated as a human visitor, will store the cloaked content which will then be displayed during search routines. Meaning that any competitor could catch you out cloaking, which is not what you would typically want.


Instead of uncommenting spiders, why not simply leave them out altogether?
A: Some clients prefer to edit the spider list manually. Indeed, our fantomas shadowSniper™ program specifically offers this option. However, the spider list keeps expanding all the time and it's well nigh impossible for a single user to keep up with the plethora of search engine spiders haunting the Web.

We are including “Decloaking Hazards” in the list in uncommented form in order to point these users to which spiders they should not include by mistake or for sheer lack of information.

In a later version, we may split the list in two, placing these uncommented spiders in a separate exclusion list file.


Why are some spiders labelled SE, while some have no label?
A: This is a trimming measure to reduce server load when in action as a cloaking engine list. In the ASCII text version of the botBase, the search engine is generally only listed once (akin to a header or category title), followed by its spiders.

Use the CSV version if you want full reference for each spider.


How can I prevent Google from caching my pages?
A:

Ask them to stop caching your pages: they will comply if only because they would run the risk of a copyright violation suit. It will probably take a few weeks till the cached pages disappear, but disappear they will. (Been there, done that.)
They are also offering a do-it-yourself solution: simply include the following in your meta tags section: <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> [ Source: http://www.google.com/faq.html ]
In any case, should you have requested Google not to cache your pages, you could safely uncomment their spiders in our botBase's engine list.


If I leave the comments in your spider list,will the robots still read only what is not commented?
A: The engine list file in its ASCII (text) version is primarily targeted toward the fantomas shadowSniper™ keyword switch script only — it determines the manner in which each individual spider will be treated when accessing your site.

Other than that, the commented text serves no further technical purpose. So you can leave the comments in the file or simply delete them, it makes no difference, EXCEPT (we really can't stress this point too much!): if you uncomment a spider IP, this spider will, of course, be fed the cloaked page when accessing your site. This isn't always desirable, as for instance in the case of AltaVista's Babelfish (automatic translation) spider, but of course it is entirely up to you to modify the list in accord with your own specific requirements.


Are you using unresolved or resolved IPs to determine search engine spiders? And if both, which version should I use?
A: Depending on your server's specific configuration, spiders (as all site visitors, for that matter) will either be determined

by unresolved IP (e.g. "111.222.333.255")
by resolved IP (e.g. "spiderdomain.com")
either of both.
Obviously, the first and the last option are preferable because some spiders' IP aren't resolvable.

Depending on this configuration, you might even do without the resolved entries altogether, though we advise against it because the overall drain on CPU resources is really minimal.

Plus, the more you modify your spider list file, the more administrative overhead you will incur when updating it from our site, which, as you know, is being updated no less than every six hours.


Does your fantomas spiderSpy™ software program work on NT as well?
A: It doesn't have to: you won't be buying the the software but, rather, access to the database, which is updated 4 times a day.


What are “environmental variables” or “footprints”, and how do they relate to search engine spiders?
A: Every program — be it spider or web browser — accessing a web site will leave a “footprint”, i.e. it comes with a set of various environmental variables (i.e. data) which can be read and recorded by the visited system.

These variables include (but are not limited to):

The originating unique IP (“Internet Protocol”) address, e.g. “216.35.116.41”. Many servers (but by no means all of them) will attempt to “resolve” this IP, i.e. translate it into a common domain name like, in this case, “slurp@inktomi.com”. Sometimes however, IPs will not resolve gracefully for reasons beyond the scope of this short summary.
The UserAgent, which is basically a more or less freely assignable name tag such as “Slurp/si” or, in our example, “Slurp/si (slurp@inktomi.com) http://www.inktomi.com/slurp.html)”. Your web browser will usually have its own UserAgent, e.g. “Mozilla/4.72 [en] (Win98; I)” for a Netscape brower or “Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; AtHome0107)” for the MS Internet Explorer.
The referrer (variable “http_referer” — yes, no typo: it really lacks an “r”) which shows the last site visited (more typical for web browsers than search engine spider), e.g. “ http://www.google.com/search?q=ip+blocker ”. (In this example, the visitor obviously searched for the keyword phrase “ip blocker” at Google.com.) The latter is important for several reasons, one of them being evaluation of your site's search engine positioning. (By visiting the URL http://www.google.com/search?q=ip+blocker you can establish if/where your site is ranked under this search phrase.)

What is IP Delivery and where does your service come in?
A: For various reasons many webmasters opt for a technique variously termed “IP Delivery”, “Cloaking”, “Stealthing”, “Food”, “Ghosting” or “Phantomizing”.

This will typically work by feeding a search engine spider with a different page or page content than a human visitor will receive.

This may be done to protect a page's code from thieving competitors, for offering spiders — which are normally text oriented — with indexable content for pages which consist of mere graphics (“splash pages”) or to generally improve a site's search engine ranking by feeding the spiders with content optimized for search engine indices (but not very well readable for the human eye), etc.

Obviously, if a webmaster is cloaking pages, his or her system must be able to recognize a spider for what it is and distinguish it reliably from human visitors to serve pertinent content.

Enter our spider list as quoted: this gives relevant data required for cloaking setups: the UserAgent (#UA), the IP, the resolved domain name, and the search engine these belong to.


I am interested in buying a one-year access to your spider IP list. However, I see there are several engines you cover, many I've never heard of.
A: You normally wouldn't: we cater to an international clientele, hence we're also covering German, French, Spanish search engines, and more.


Can you direct me on how I can update the botBase for the fantomas shadowSniper™ software in my fantomas Webmaster Suite™?
We have recently implemented our proprietary fantomas spyFetcher™ script to automate the process. This one is available as a Perl/CGI program for the Unix platform and as an ASP script for Windows systems. You can download it from our subscribers section.

Tip: While the fantomas shadowSniper™ script references the engine list as “register.txt” by default, you may change this variable to “spiderspy.txt” — this will save you having to rename it.


How do I update my IP-file without having a lot of duplicate IP numbers when I add my own?
A: Sorry, we don't have software to support that. You could write your own script to delete duplicates, or export the file to a database first, sort it and trash the dupes, whatever.


When Will Google Spider My Site? Article Source :

http://fantomaster.com/fafaqsspy01.html
Webmasters Tools

Domain Checkups

Alexa Traffic Rank
All-in-One Lookup
Domain Availability
Domain Whois
Instant Domain Checker
Ping Test
Reverse IP/Look-up
Server Status
Website Speed Test


Search Engines


Backlink Checker
Google Banned Checker
PageRank Prediction
Keyword Density Checker
Keyword Suggestion
Link Popularity
Multi-Rank Checker
PageRank Checker
Rank Checker
Search Engine Position
Search Listings Preview
Spider View
Visual PageRank



HTML

HTML Encrypt
HTML Optimizer
HTTP Headers
Link Extractor
Meta-tags Extractor
Meta-tags Generator
Source Code Viewer


Miscellaneous


Anonymous Emailer
Link Shortener
md5 Encrypt
Online Calculator
Your Browser Details


WEB HOSTING || DOMAINS || SEO || GOOGLE ADSENSE || WEBMASTERS || WEB DESIGN || RESELLERS || SEARCH ENGINES
Domain  |  Mesothelioma Lawyers  | Alninga  | Hosting | Mesothelioma |  Mesothelioma | Hatem Brokensoft


WEB HOSTING || DOMAINS || SEO || GOOGLE ADSENSE || WEBMASTERS || WEB DESIGN || RESELLERS || SEARCH ENGINES
Contact US
Americana Host

Webmasters Tools - Free Web Tools Including Webmaster SEO Tools