Oops! Something went sideways.

Looks like the styling got goofed up. Sorry about that, unless it's what you wanted. If this isn't what you were looking for, try force refreshing your page. You can do that by pressing Shift + F5, or holding Shift and clicking on the "reload" icon. (It's the weird circle arrow thing "⟳" just above this page, usually next to where it says https://blog.unitedheroes.net...)

isn't quite ashamed enough to present

jr conlin's ink stained banana

:: Bots on Parade

i’m a curious soul.

i was curious what bots were scraping my sites, so i figured i’d do a quick survey.

i’m also super lazy, so i used a simple bash script to get the list of whoever has been pulling my robots.txt file, because no one else would.

so a quick

zgrep robots.txt access.log* | cut -d\" -f 6 | sort |uniq > agents.lst

on soc.jrconlin.com got me:

 
-
AwarioSmartBot/1.0 (+https://awario.com/bots.html; bots@awario.com)                                                       
FediCrawl/1.0                                                                                                             
Googlebot/2.1 (+http://www.google.com/bot.html)                                                                           
IonCrawl (https://www.ionos.de/terms-gtc/faq-crawler-en/)                                                                 
Mastodon server indexer                                                                                                   
Minoru's Fediverse Crawler (+https://nodes.fediverse.party)                                                               
Mozilla/4.0 (compatible; fluid/0.0; +http://www.leak.info/bot.html)                                                       
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)                                                                                               
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)                                                                                                
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)                                                                                                         
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:98.0) Gecko/20100101 Firefox/98.0                                        
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)                                                                           
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36     
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36     
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11    
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36      
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36           
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0                                          
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0                                            
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE
Mozilla/5.0 (Windows NT 9_1; Win64; x64) AppleWebKit/547.47 (KHTML, like Gecko) Chrome/61.0.1793 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4 240.111 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/543.47 (KHTML, like Gecko) Chrome/54.0.2644 Safari/537.36
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; AwarioBot/1.0; +https://awario.com/bots.html)
Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
Mozilla/5.0 (compatible; Barkrowler/0.9; +https://babbar.tech/crawler)
Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Linespider/1.1; +https://lin.ee/4dwXkTH)
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
Mozilla/5.0 (compatible; Nmap Scripting Engine; http://nmap.org/book/nse.html)
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)
Mozilla/5.0 (compatible; SeznamBot/4.0; +http://napoveda.seznam.cz/seznambot-intro/)
Mozilla/5.0 (compatible; WellKnownBot/0.1; +https://well-known.dev/about/#bot)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 (compatible; Yeti/1.1; +https://naver.me/spd)
Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36
Poduptime/Development/Testing
Poduptime/Production from https://fediverse.observer
Scrapy/2.7.1 (+https://scrapy.org)
SerendeputyBot/0.8.6 (http://serendeputy.com/about/serendeputy-bot)
caveman-hunter/0.0.0 (+https://fedi.buzz/)
curl/7.54.0
ws-bot-v1

Which is a lot.

i’m also kinda curious about how many bots pretend really hard not to be a bot. (Looking at you Applebot). i know Google has (at least) two different flavors of crawlers (one fast, the other slow, so no huge surprise there.)

Now, compare this with my close to 20 year old blog:

Buck/2.3.2; (+https://app.hypefactors.com/media-monitoring/about.html)
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; Yeti/1.1; +https://naver.me/spd) Chrome/113.0.0.0 Safari/537.36
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
Mozilla/5.0 (compatible; MojeekBot/0.11; +https://www.mojeek.com/bot.html)
Mozilla/5.0 (compatible; ScooperBot/3.0; +http://www.carma.com)
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)
Mozilla/5.0 (compatible; SeznamBot/4.0; +http://napoveda.seznam.cz/seznambot-intro/)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36
Scrapy/2.6.3 (+https://scrapy.org)
Twitterbot/1.0

That’s a whole lot less. As in 58% less.

Also kinda interesting to see the different bots that look for things. Clearly, the Federation attracts more bots.

It’s also kinda hilarious to me that some of my domains (like EvilOnAStick.com get FAR less crawlers. Apparently, these sites are part of the Dark Web.

Blogs of note
personal Christopher Conlin USMC Henriette's Herbal Blog My Mastodon musings Where have all the good blogs gone?
geek ultramookie

Powered by WordPress
Hosted on Dreamhost.