# Turn off SlySearch... used to find plagiarism by students... nothing here ! # Turn off LinkWalker... used to check links... # PsBot... Image indexer... who cares # Yahoo-MMCrawler... Image indexer... # SBIder... categorizes sites... # Intelliseek: Enterprise / Market Intelligence... go away # Grub: a real nuisance # Twiceler... experimental, it says... # Surveybot... by Domain Tools... nuisance # panscient.com # Pingdom GIGRIB... looks suspicious # NaverBot... Korean robot # nicebot... no info... suspicious # Exabot... controversial... leave for now (shows thumbnail of page) # YodaoBot... Japanese or Chinese # boitho.com-dc... another idea of the century, Norwegian (shows thumbnail of page) # ia_archiver... Alexa and Alexa toolbar (Web archiver... wayback machine) # lwp-trivial... looks strange # whiteiexpres... search page full of invalid links # User-agent: SlySearch User-agent: LinkWalker User-agent: JetBot User-agent: psbot User-agent: Intelliseek User-agent: grub-client User-agent: grub User-agent: T-H-U-N-D-E-R-S-T-O-N-E User-agent: Fasterfox User-agent: SBIder User-agent: Yahoo-MMCrawler User-agent: Twiceler User-agent: SurveyBot User-agent: panscient.com User-agent: Pingdom GIGRIB User-agent: NaverBot User-agent: nicebot User-agent: Sogou Web Spider User-agent: Sogou Spider User-agent: Exabot User-agent: boitho.com-dc User-agebt: YodaoBot User-agent: lwp-trivial User-agent: whiteiexpres Disallow: / # no useful info in cgi-bin and images directories # try to detect SpamBots or other badly behaved robots # Crawl-Delay is in seconds User-agent: * Crawl-Delay: 30 Disallow: /cgi-bin/ Disallow: /images/ Disallow: /php/