Author Topic: Excessive Bandwidth Usage Issues  (Read 4794 times)

0 Members and 1 Guest are viewing this topic.

Excessive Bandwidth Usage Issues

« on: 18 March, 2021, 01:11:23 AM »
Open quickbuttons

and04

Posts: 80

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Personal Text
    Certified “blag-it” member
I suppose this is software related in a sense, but sorry anyway if this is in the wrong section of the forum.  Incidentally, I think there needs to be a 'general computing' section or similar...

Basically, a fair amount of bandwidth has been used up for the current month for my site, over 86%, and so I'm looking for ways to reduce further excessive bandwidth usage.  What things should I be doing?  I've discovered that spider and crawlers can use up bandwidth a lot and that a robots.txt file in the route directory can help with this, but I'm not that familiar with the syntax. 

To stop more bandwidth being used up, I put my site offline for maintenance about a day ago, but the usage has gone up to 89.45%. How can more bandwidth be used up when it's offline?

Re: Excessive Bandwidth Usage Issues

« Reply #1 on: 18 March, 2021, 07:25:59 AM »
Open quickbuttons

A.Jenery

Posts: 59

Digital artist and photographer
Offline
  • **
  • "blag-it" Forum Standard Member
  • Gender
    Male

    Male
  • Personal Text
    Certified “blag-it” member
Sounds like it's the spiders that are causing a lot of the problem.* Banning the bad ones via .htacess is the way to go. That will nail the ones that ignore robots.txt. They'll still be able to ping your domain, but they won't get any bytes back (just a 403 error message).

The ,htaccess for the root directory of my domain looks like this:

#BlockBotsByUserAgent
SetEnvIfNoCase User-Agent (Ahrefs|Baidu|BLEXBot|Brandwatch|DotBot|Garlik|Knowledge|libwww-perl|Linkdex|MJ12bot|omgili|PetalBot|Proximic|Semrush|Seznam|Sogou|Tweetmeme|Trendiction|Wordpress) bad_bot

<RequireAll>
Require all Granted
Require not env bad_bot
</RequireAll>

<Files .htaccess>
order allow,deny
deny from all
</Files>

<Files 403.shtml>
order allow,deny
allow from all
</Files>


Adding more bots is easy. I only threw in the ones that were giving me trouble.
*Spiders will often index everything on your site, multiple times in succession, when they are on the rampage. This can chew masses of bandwidth, particularly if you have a lot of images and/or downloadable zips.
« Last Edit: 18 March, 2021, 07:29:17 AM by A.Jenery »

Re: Excessive Bandwidth Usage Issues

« Reply #2 on: 18 March, 2021, 02:26:26 PM »
Open quickbuttons

and04

Posts: 80

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Personal Text
    Certified “blag-it” member
Sounds like it's the spiders that are causing a lot of the problem.* Banning the bad ones via .htacess is the way to go. That will nail the ones that ignore robots.txt. They'll still be able to ping your domain, but they won't get any bytes back (just a 403 error message).

The ,htaccess for the root directory of my domain looks like this:

#BlockBotsByUserAgent
SetEnvIfNoCase User-Agent (Ahrefs|Baidu|BLEXBot|Brandwatch|DotBot|Garlik|Knowledge|libwww-perl|Linkdex|MJ12bot|omgili|PetalBot|Proximic|Semrush|Seznam|Sogou|Tweetmeme|Trendiction|Wordpress) bad_bot

<RequireAll>
Require all Granted
Require not env bad_bot
</RequireAll>

<Files .htaccess>
order allow,deny
deny from all
</Files>

<Files 403.shtml>
order allow,deny
allow from all
</Files>


Adding more bots is easy. I only threw in the ones that were giving me trouble.
*Spiders will often index everything on your site, multiple times in succession, when they are on the rampage. This can chew masses of bandwidth, particularly if you have a lot of images and/or downloadable zips.
Thanks. :) Crawlers and spiders was the main thing my host mentioned, so this rings a bell.  Where exactly in my htaccess file do I paste this code and should I also use a robots.txt file in the route directory, would I need both or is it one or the other?

Re: Excessive Bandwidth Usage Issues

« Reply #3 on: 19 March, 2021, 03:41:46 AM »
Open quickbuttons

McGaskil

Posts: 89

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Gender
    Male

    Male
  • Personal Text
    Certified “blag-it” member
Opcache if you're not using it.  My processor usage plummeted the most, but the bandwidth followed.  Having a process staged keeps the server from round tripping constantly.

Also... set up your expires and caches.  If you dont change a lot of things with presentation of your forum, you can cache the images, css, and other static items for as much as a year- not having to make the users browser make return trips for more data... in this same vane, if you're using http2 you can push everything at once if a visitor is new, and if they aren't? You can check in one call whether they have the file and NOT send it if they do. 

If you aren't using those, and you implement them? You're looking at likely 80% less processor usage and 50% less bandwidth consumed.

Re: Excessive Bandwidth Usage Issues

« Reply #4 on: 20 March, 2021, 03:55:03 AM »
Open quickbuttons

and04

Posts: 80

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Personal Text
    Certified “blag-it” member
OK, thanks for helping, but this code that A.Jenery mentions:

#BlockBotsByUserAgent
SetEnvIfNoCase User-Agent (Ahrefs|Baidu|BLEXBot|Brandwatch|DotBot|Garlik|Knowledge|libwww-perl|Linkdex|MJ12bot|omgili|PetalBot|Proximic|Semrush|Seznam|Sogou|Tweetmeme|Trendiction|Wordpress) bad_bot
<RequireAll>
  Require all Granted
  Require not env bad_bot
</RequireAll>

<Files .htaccess>
order allow,deny
deny from all
</Files>

<Files 403.shtml>
order allow,deny
allow from all
</Files>

Where in the htaccess file do I put it, and if I use this, do I also need a robots.txt file as well and if so does it go in the route directory or under public_html?

My htaccess file currently has this in it:

RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301,NE]
# php -- BEGIN cPanel-generated handler, do not edit
# This domain inherits the “PHP” package.
# php -- END cPanel-generated handler, do not edit

Also, how do I know which crawlers and spiders are using up b/width?

Re: Excessive Bandwidth Usage Issues

« Reply #5 on: 20 March, 2021, 04:13:46 AM »
Open quickbuttons

ATH019

Posts: 204

Offline
  • Global Moderation
  • ***
  • Gender
    Male

    Male
  • Personal Text
    Certified “blag-it” member
I suppose this is software related ... .  Incidentally, I think there needs to be a 'general computing' section or similar...
Will mention this to the admin ppl that I think intends to make certain editions to the forum in near future.



Global Moderator

Re: Excessive Bandwidth Usage Issues

« Reply #6 on: 20 March, 2021, 08:23:43 AM »
Open quickbuttons

McGaskil

Posts: 89

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Gender
    Male

    Male
  • Personal Text
    Certified “blag-it” member
I suppose this is software related ... .  Incidentally, I think there needs to be a 'general computing' section or similar...
Will mention this to the admin ppl that I think intends to make certain editions to the forum in near future.



Global Moderator
That's good to know :)  @ and04 - drop it all in at the end.

Using robots.txt can be a good thing, but... the bad bots could'nt care less that it's there or not.  They're going to do what they do regardless- only legitimate bots adhere to your request, which is basically what your robots.txt is- a request to robots.

Take a look at the visitor logs on your server- if you're using cpanel there are generally several to choose from- that show you the statistics of your bandwidth usage... and by IP address...
« Last Edit: 20 March, 2021, 08:27:11 AM by McGaskil »

Re: Excessive Bandwidth Usage Issues

« Reply #7 on: 20 March, 2021, 11:15:56 AM »
Open quickbuttons

and04

Posts: 80

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Personal Text
    Certified “blag-it” member
Thanks! :) This suggests that the .htaccess method is better than the robots.txt one which doesn't stop bad bots, but what about 'good bots' that also take up a lot of b/width such as indexing bots?  I suppose I shouldn't exclude those, but then how do I minimize the amount of b/width they use up?  What would I put in the .htaccess file for this?  Lastly, do I need a robots.txt file at all?  As I understand it, ppl can find out what is in the code just by entering mysitename.com/robots.txt so not exactly secure...

Re: Excessive Bandwidth Usage Issues

« Reply #8 on: 20 March, 2021, 10:03:06 PM »
Open quickbuttons

and04

Posts: 80

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Personal Text
    Certified “blag-it” member
Right, just an update here...  I've put my forum back online today after being in maintenance mode and edited my htaccess file with the anti-bot code suggested by A.Jenery, so that's all good.
I've noticed though that the forum stats at the bottom is saying 'Most Online Today: 81' which is a bit odd seeing as the forum has only been online for about an hour at the time of posting.  Confused... ???

Re: Excessive Bandwidth Usage Issues

« Reply #9 on: 21 March, 2021, 01:45:55 AM »
Open quickbuttons

MsLeon

Posts: 82

Offline
  • ***
  • "blag-it" Forum Higher Member
  • Gender
    Female

    Female
  • Personal Text
    Certified “blag-it” member
So your forum is in the main public directory or is it in a sub directory?

Even if you are bouncing bad bots off the site with htaccess, it may still register that they were online for half a second or so.

If your forum is in a sub directory, try putting the code in the .htaccess of the main public directory.
« Last Edit: 21 March, 2021, 01:50:49 AM by MsLeon »