Bot-trap

Bot-trap

[Login to edit this page]

Common techniques used are:

There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.

A spider trap causes a web crawler to enter something like an infinite loop, which wastes the spider's resources, lowers its productivity, and, in the case of a poorly written crawler, can crash the program. Polite spiders alternate requests between different hosts, and don't request documents from the same server more than once every several seconds, meaning that a "polite" web crawler is affected to a much lesser degree than an "impolite" crawler.

In addition, sites with spider traps usually have a robots.txt telling bots not to go to the trap, so a legitimate "polite" bot would not fall into the trap, whereas an "impolite" bot which disregards the robots.txt settings would be affected by the trap.


0 Comments

Write a comment

Rating:    

Share On Facebook
Search And Find
Epik Search:

Related Clips for Bot-trap

Join The Epik Network
Join Now:

Browse The Epik Network

  • diobrando

    niencheng

    gracepaley

    heberjgrant

    a-u-t-o-s

    officebang

    helados

    fleuradcock

    golddigging

    oreminerals

    ringlardner

    meiji

    atkinsiowa

    burstangel

    howtobedead

    idesi-sb01-s009

    4wdtyres

    nabij

    trickpony

    johngunther

    damondash