Как защититься от бота?

12 3
S
На сайте с 24.10.2014
Offline
63
10055

Опять с утра сервак лег. В логах эти сцуки

5.9.63.149 - - [16/Aug/2016:08:17:24 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.051
69.30.221.242 - - [16/Aug/2016:08:11:15 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.163
59.37.199.147 - - [16/Aug/2016:09:54:28 +0300] "GET /====== HTTP/1.1" 200 88720 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us; rv:1.9.2.3) Gecko/20100401 YFF35 Firefox/3.6.3" 0.895

Посмотрел хтассес, там нашел

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} MJ12bot [OR]

Как победить MJ12bot?

пс Кто-нить поделится свежим списком г-ноботов?

DC
На сайте с 14.02.2009
Offline
66
#1

через robots.txt как и от остальных выставив для него crawl delay или вообще запретить

one
На сайте с 15.04.2007
Offline
336
one
#2

.htaccess не отсекает что ли?

Решения для автоматизации действий ( https://www.facebook.com/automationstudio20/ ) в интернете.
U1
На сайте с 13.05.2016
Offline
26
#3

В robots.txt пропиши

User-agent: MJ12bot

Disallow: /

---------- Добавлено 16.08.2016 в 11:56 ----------

Shmalex:


пс Кто-нить поделится свежим списком г-ноботов?

Только такой, кто то давно выкладывал, я сохранил:

User-agent: gigabot

Disallow: /

User-agent: Gigabot/2.0

Disallow: /

User-agent: msnbot

Disallow: /

User-agent: msnbot/1.0

Disallow: /

User-agent: ia_archiver

Disallow: /

User-agent: libwww-perl

Disallow: /

User-agent: NetStat.Ru Agent

Disallow: /

User-agent: WebAlta Crawler/1.3.25

Disallow: /

User-agent: Yahoo!-MMCrawler/3.x

Disallow: /

User-agent: MMCrawler/3.x

Disallow: /

User-agent: NG/2.0

Disallow: /

User-agent: slurp

Disallow: /

User-agent: aipbot

Disallow: /

User-agent: Alexibot

Disallow: /

User-agent: GameSpyHTTP/1.0

Disallow: /

User-agent: Aqua_Products

Disallow: /

User-agent: asterias

Disallow: /

User-agent: b2w/0.1

Disallow: /

User-agent: BackDoorBot/1.0

Disallow: /

User-agent: becomebot

Disallow: /

User-agent: BlowFish/1.0

Disallow: /

User-agent: Bookmark search tool

Disallow: /

User-agent: BotALot

Disallow: /

User-agent: BotRightHere

Disallow: /

User-agent: BuiltBotTough

Disallow: /

User-agent: Bullseye/1.0

Disallow: /

User-agent: BunnySlippers

Disallow: /

User-agent: CheeseBot

Disallow: /

User-agent: CherryPicker

Disallow: /

User-agent: CherryPickerElite/1.0

Disallow: /

User-agent: CherryPickerSE/1.0

Disallow: /

User-agent: Copernic

Disallow: /

User-agent: CopyRightCheck

Disallow: /

User-agent: cosmos

Disallow: /

User-agent: Crescent

Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0

Disallow: /

User-agent: DittoSpyder

Disallow: /

User-agent: EmailCollector

Disallow: /

User-agent: EmailSiphon

Disallow: /

User-agent: EmailWolf

Disallow: /

User-agent: EroCrawler

Disallow: /

User-agent: ExtractorPro

Disallow: /

User-agent: FairAd Client

Disallow: /

User-agent: Fasterfox

Disallow: /

User-agent: Flaming AttackBot

Disallow: /

User-agent: Foobot

Disallow: /

User-agent: Gaisbot

Disallow: /

User-agent: GetRight/4.2

Disallow: /

User-agent: Harvest/1.5

Disallow: /

User-agent: hloader

Disallow: /

User-agent: httplib

Disallow: /

User-agent: HTTrack 3.0

Disallow: /

User-agent: humanlinks

Disallow: /

User-agent: IconSurf

Disallow: /

User-agent: InfoNaviRobot

Disallow: /

User-agent: Iron33/1.0.2

Disallow: /

User-agent: JennyBot

Disallow: /

User-agent: Kenjin Spider

Disallow: /

User-agent: Keyword Density/0.9

Disallow: /

User-agent: larbin

Disallow: /

User-agent: LexiBot

Disallow: /

User-agent: libWeb/clsHTTP

Disallow: /

User-agent: LinkextractorPro

Disallow: /

User-agent: LinkScan/8.1a Unix

Disallow: /

User-agent: LinkWalker

Disallow: /

User-agent: LNSpiderguy

Disallow: /

User-agent: lwp-trivial

Disallow: /

User-agent: lwp-trivial/1.34

Disallow: /

User-agent: Mata Hari

Disallow: /

User-agent: Microsoft URL Control

Disallow: /

User-agent: Microsoft URL Control - 5.01.4511

Disallow: /

User-agent: Microsoft URL Control - 6.00.8169

Disallow: /

User-agent: MIIxpc

Disallow: /

User-agent: MIIxpc/4.2

Disallow: /

User-agent: Mister PiX

Disallow: /

User-agent: moget

Disallow: /

User-agent: moget/2.1

Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)

Disallow: /

User-agent: MSIECrawler

Disallow: /

User-agent: NetAnts

Disallow: /

User-agent: NICErsPRO

Disallow: /

User-agent: Offline Explorer

Disallow: /

User-agent: Openbot

Disallow: /

User-agent: Openfind

Disallow: /

User-agent: Openfind data gatherer

Disallow: /

User-agent: Oracle Ultra Search

Disallow: /

User-agent: PerMan

Disallow: /

User-agent: ProPowerBot/2.14

Disallow: /

User-agent: ProWebWalker

Disallow: /

User-agent: psbot

Disallow: /

User-agent: Python-urllib

Disallow: /

User-agent: QueryN Metasearch

Disallow: /

User-agent: Radiation Retriever 1.1

Disallow: /

User-agent: RepoMonkey

Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01

Disallow: /

User-agent: RMA

Disallow: /

User-agent: searchpreview

Disallow: /

User-agent: SiteSnagger

Disallow: /

User-agent: SpankBot

Disallow: /

User-agent: spanner

Disallow: /

User-agent: SurveyBot

Disallow: /

User-agent: suzuran

Disallow: /

User-agent: Szukacz/1.4

Disallow: /

User-agent: Teleport

Disallow: /

User-agent: TeleportPro

Disallow: /

User-agent: Telesoft

Disallow: /

User-agent: The Intraformant

Disallow: /

User-agent: TheNomad

Disallow: /

User-agent: TightTwatBot

Disallow: /

User-agent: toCrawl/UrlDispatcher

Disallow: /

User-agent: True_Robot

Disallow: /

User-agent: True_Robot/1.0

Disallow: /

User-agent: turingos

Disallow: /

User-agent: TurnitinBot

Disallow: /

User-agent: TurnitinBot/1.5

Disallow: /

User-agent: URL Control

Disallow: /

User-agent: URL_Spider_Pro

Disallow: /

User-agent: URLy Warning

Disallow: /

User-agent: VCI

Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32

Disallow: /

User-agent: Web Image Collector

Disallow: /

User-agent: WebAuto

Disallow: /

User-agent: WebBandit

Disallow: /

User-agent: WebBandit/3.50

Disallow: /

User-agent: WebCapture 2.0

Disallow: /

User-agent: WebCopier

Disallow: /

User-agent: WebCopier v.2.2

Disallow: /

User-agent: WebCopier v3.2a

Disallow: /

User-agent: WebEnhancer

Disallow: /

User-agent: WebSauger

Disallow: /

User-agent: Website Quester

Disallow: /

User-agent: Webster Pro

Disallow: /

User-agent: WebStripper

Disallow: /

User-agent: WebZip

Disallow: /

User-agent: WebZip

Disallow: /

User-agent: WebZip/4.0

Disallow: /

User-agent: WebZIP/4.21

Disallow: /

User-agent: WebZIP/5.0

Disallow: /

User-agent: Wget

Disallow: /

User-agent: wget

Disallow: /

User-agent: Wget/1.5.3

Disallow: /

User-agent: Wget/1.6

Disallow: /

User-agent: WWW-Collector-E

Disallow: /

User-agent: Xenu's

Disallow: /

User-agent: Xenu's Link Sleuth 1.1c

Disallow: /

User-agent: Zeus

Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32

Disallow: /

User-agent: Zeus Link Scout

Disallow: /

T
На сайте с 15.11.2011
Offline
120
#4

robots.txt? вы че там курите?

да обоссывали боты ваш robots.

Дешевые VDS - Дешевле некуда! (http://0ll0.ru/4Ta9y)
CJ
На сайте с 20.11.2006
Offline
129
#5

Через nginx.conf или iptables

В nginx.conf в секцию server {

if ($http_user_agent ~* "curl|ahrefs|crawler|majestic|R6_CommentReader|python|urllib|MJ12bot|Baiduspider|DomainTools|360Spider|linkdex|genieo.com|ltx71.com|WordPress|similartech.com" ) { 
return 403;
}

Через iptables консольные команды


iptables -I INPUT -d 00.111.222.33 -p tcp --dport 80 -m string --string 'MJ12bot' --algo bm -j DROP

Затем

service iptables save
D
На сайте с 07.11.2000
Offline
219
#6

Может проблема не в ботах? А сколько запросов в секунду они делают? Покажите непрерывный кусок лога.

S
На сайте с 24.10.2014
Offline
63
#7
DrCrash:
через robots.txt как и от остальных выставив для него crawl delay или вообще запретить

Мне кажется, что этим гадам пофиг роботс.тхт

one:
.htaccess не отсекает что ли?

Может неправильно записано, проверьте, кто понимает

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} MJ12bot [OR]

RewriteCond %{HTTP_USER_AGENT} Java [OR]

RewriteCond %{HTTP_USER_AGENT} NjuiceBot [OR]

RewriteCond %{HTTP_USER_AGENT} Gigabot [OR]

RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR]

RewriteCond %{HTTP_USER_AGENT} JS-Kit [OR]

RewriteCond %{HTTP_USER_AGENT} Voyager [OR]

RewriteCond %{HTTP_USER_AGENT} PostRank [OR]

RewriteCond %{HTTP_USER_AGENT} PycURL [OR]

RewriteCond %{HTTP_USER_AGENT} Aport [OR]

RewriteCond %{HTTP_USER_AGENT} ia_archiver [OR]

RewriteCond %{HTTP_USER_AGENT} DotBot [OR]

RewriteCond %{HTTP_USER_AGENT} SurveyBot [OR]

RewriteCond %{HTTP_USER_AGENT} larbin [OR]

RewriteCond %{HTTP_USER_AGENT} Butterfly [OR]

RewriteCond %{HTTP_USER_AGENT} libwww [OR]

RewriteCond %{HTTP_USER_AGENT} Wget [OR]

RewriteCond %{HTTP_USER_AGENT} SWeb [OR]

RewriteCond %{HTTP_USER_AGENT} LinkExchanger [OR]

RewriteCond %{HTTP_USER_AGENT} Soup [OR]

RewriteCond %{HTTP_USER_AGENT} WordPress [OR]

RewriteCond %{HTTP_USER_AGENT} PHP/ [OR]

RewriteCond %{HTTP_USER_AGENT} spbot [OR]

RewriteCond %{HTTP_USER_AGENT} MLBot [OR]

RewriteCond %{HTTP_USER_AGENT} InternetSeer [OR]

RewriteCond %{HTTP_USER_AGENT} FairShare [OR]

RewriteCond %{HTTP_USER_AGENT} Yeti [OR]

RewriteCond %{HTTP_USER_AGENT} Birubot [OR]

RewriteCond %{HTTP_USER_AGENT} YottosBot [OR]

RewriteCond %{HTTP_USER_AGENT} gold\ crawler [OR]

RewriteCond %{HTTP_USER_AGENT} Linguee [OR]

RewriteCond %{HTTP_USER_AGENT} Ezooms [OR]

RewriteCond %{HTTP_USER_AGENT} lwp-trivial [OR]

RewriteCond %{HTTP_USER_AGENT} Purebot [OR]

RewriteCond %{HTTP_USER_AGENT} User-Agent [OR]

RewriteCond %{HTTP_USER_AGENT} kmSearchBot [OR]

RewriteCond %{HTTP_USER_AGENT} SiteBot [OR]

RewriteCond %{HTTP_USER_AGENT} CamontSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ptd-crawler [OR]

RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]

RewriteCond %{HTTP_USER_AGENT} suggybot [OR]

RewriteCond %{HTTP_USER_AGENT} ttCrawler [OR]

RewriteCond %{HTTP_USER_AGENT} Nutch [OR]

RewriteCond %{HTTP_USER_AGENT} Zeus

plattoo
На сайте с 12.05.2010
Offline
195
#8
Shmalex:
Может неправильно записано, проверьте, кто понимает

Shmalex, дальше-то что? что в RewriteRule прописано?

S
На сайте с 24.10.2014
Offline
63
#9
Dimka:
Может проблема не в ботах? А сколько запросов в секунду они делают? Покажите непрерывный кусок лога.

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:34:58 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.053

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:00 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.051

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:01 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.052

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:02 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:03 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.049

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:05 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.052

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:06 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:08 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.051

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:09 +0300] "GET /stil/stilnye-muzhskie-kurtki HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:11 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:12 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:13 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:15 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.051

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:16 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.052

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:17 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:19 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.053

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:20 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.050

сайт.ru 5.9.151.22 - - [16/Aug/2016:08:35:21 +0300] "GET / HTTP/1.0" 301 177 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" 0.052

---------- Добавлено 16.08.2016 в 14:22 ----------

plattoo:
Shmalex, дальше-то что? что в RewriteRule прописано?

RewriteRule ^sitemap\.xml$ index.php?sitemap=main

RewriteRule ^sitemap_(\d+?)\.xml$ index.php?sitemap=$1

RewriteRule ^sitemap\.html$ index.php?htmlmap=main

RewriteRule ^sitemap_(\d+?)\.html$ index.php?htmlmap=$1

RewriteRule ^product/([^/]*)\/([^/]*)\/?$ index.php?iframe_url=$1&iframe_title=$2

RewriteRule ^([^/]+)\/page/(\d+)$ index.php?q=$1&page=$2

RewriteRule ^([^/]+)\/research$ index.php?q=$1&research=1

RewriteRule ^([^/]+)\/$ index.php?q=$1

plattoo
На сайте с 12.05.2010
Offline
195
#10
Shmalex:
RewriteRule ^sitemap\.xml$ index.php?sitemap=main
RewriteRule ^sitemap_(\d+?)\.xml$ index.php?sitemap=$1

RewriteRule ^sitemap\.html$ index.php?htmlmap=main
RewriteRule ^sitemap_(\d+?)\.html$ index.php?htmlmap=$1

RewriteRule ^product/([^/]*)\/([^/]*)\/?$ index.php?iframe_url=$1&iframe_title=$2

RewriteRule ^([^/]+)\/page/(\d+)$ index.php?q=$1&page=$2
RewriteRule ^([^/]+)\/research$ index.php?q=$1&research=1
RewriteRule ^([^/]+)\/$ index.php?q=$1

это к делу отношения не имеет

это чем заканчивается?

Shmalex:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} Java [OR]
RewriteCond %{HTTP_USER_AGENT} NjuiceBot [OR]
RewriteCond %{HTTP_USER_AGENT} Gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} JS-Kit [OR]
RewriteCond %{HTTP_USER_AGENT} Voyager [OR]
RewriteCond %{HTTP_USER_AGENT} PostRank [OR]
RewriteCond %{HTTP_USER_AGENT} PycURL [OR]
RewriteCond %{HTTP_USER_AGENT} Aport [OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} DotBot [OR]
RewriteCond %{HTTP_USER_AGENT} SurveyBot [OR]
RewriteCond %{HTTP_USER_AGENT} larbin [OR]
RewriteCond %{HTTP_USER_AGENT} Butterfly [OR]
RewriteCond %{HTTP_USER_AGENT} libwww [OR]
RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} SWeb [OR]
RewriteCond %{HTTP_USER_AGENT} LinkExchanger [OR]
RewriteCond %{HTTP_USER_AGENT} Soup [OR]
RewriteCond %{HTTP_USER_AGENT} WordPress [OR]
RewriteCond %{HTTP_USER_AGENT} PHP/ [OR]
RewriteCond %{HTTP_USER_AGENT} spbot [OR]
RewriteCond %{HTTP_USER_AGENT} MLBot [OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer [OR]
RewriteCond %{HTTP_USER_AGENT} FairShare [OR]
RewriteCond %{HTTP_USER_AGENT} Yeti [OR]
RewriteCond %{HTTP_USER_AGENT} Birubot [OR]
RewriteCond %{HTTP_USER_AGENT} YottosBot [OR]
RewriteCond %{HTTP_USER_AGENT} gold\ crawler [OR]
RewriteCond %{HTTP_USER_AGENT} Linguee [OR]
RewriteCond %{HTTP_USER_AGENT} Ezooms [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} Purebot [OR]
RewriteCond %{HTTP_USER_AGENT} User-Agent [OR]
RewriteCond %{HTTP_USER_AGENT} kmSearchBot [OR]
RewriteCond %{HTTP_USER_AGENT} SiteBot [OR]
RewriteCond %{HTTP_USER_AGENT} CamontSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ptd-crawler [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} suggybot [OR]
RewriteCond %{HTTP_USER_AGENT} ttCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} Nutch [OR]
RewriteCond %{HTTP_USER_AGENT} Zeus

т.е. перечислил ты ботов, дальше что с ними апач делает?

12 3

Авторизуйтесь или зарегистрируйтесь, чтобы оставить комментарий