It might surprise you that almost 40% of all internet traffic comes from bots. While a majority of this traffic comes from Googlebot (Google’s web crawler) and Microsoft’s Bingbot, a lot of them are bots with malicious intent that can slow your web site’s performance or worse, launch malicious attacks to your database server.
So, when talking about bot traffic, there are good bots that are really beneficial to your site. Of course, you wouldn’t want to accidentally block Googlebot, or else you can’t get your website to rank on Google. Then, there are also bad bots that can cause significant, even permanent damages to your site’s reputation.
The question is, how can we differentiate between the two? How can we effectively manage bad bots while allowing good bots to operate optimally on our site?
Here, we will focus on answering these two questions, and we’ll discuss how to tell the difference between good and bad bots, and how to manage them properly.
Without further ado, let us begin.
What Is a Bot?
A bot, or to be more exact, an internet bot, is a software or program that is programmed to perform a specific automated task(s). A key characteristic of bots is that they are automated: they can perform what they are programmed for without the intervention of human users/programmers.
Typically bots are programmed to do relatively simple, repetitive tasks, but today’s bots can perform more complex tasks and are getting better at mimicking human behaviors. Bots usually operate over a network. As discussed, there are good, useful bots like the search engine crawler bots that index websites’ content for the search engine, but there are also bad, malicious bots that are programmed with ill intent.
Knowing The Good Bots
It’s no secret that the term ‘internet bots’ has got quite a negative reputation due to the presence of the malicious bots causing various cybersecurity threats. However, as discussed, there are a lot of good bots on the internet that are going to provide various benefits for your site and your business.
Here are some of the major types of good bots you should know:
- Search engine crawlers
The most common and most well-known type of good bots is the search engine crawler bots, especially Googlebot, but also crawler bots from other search engines like Microsoft’s Bingbot, Yahoo Bot, and Amazon’s Alexa Bot.
Their main, if not only task is to crawl as many web pages as they can to index the pages, so these pages can be listed (and ranked) on the search engines.
- Vendor bots/partner bots
Bots that come from vendors you use. For example, if you use PayPal gateway on your site, then PayPal’s IPN bot will operate on your site to carry out transactions and perform other essential operations.
- Social network bots
Like Facebook Bot, they give visibility to your website and drive engagement on the respective social media platform.
- Monitoring bots
Like Pingdom or Uptime.com, are bots that monitor your site’s uptime and the site’s overall health. These bots periodically check on your site’s load time, status, and downtime duration (if any).
- Aggregator bots
Bots that collect information from your website and recommend certain pages on news aggregator sites and RSS. Examples include Google Feedfetcher and WikioFeedBot.
- Backlink checker bots
For example bots by SEO solutions like Ahrefs and SEMRush. These bots check the links coming to the website so marketers can use this data to improve their SEO strategy.
- Copyright/DRM bots
These bots monitor the content’s Digital Rights Management (DRM), images, videos, and other content on web pages to monitor whether a page is using copyrighted content without permission.
As we can see, depending on the objective of your site, these bots can be very beneficial and you shouldn’t block their activities.
Nevertheless, even good bots when not managed well can use up your site’s resources and slow down your site (or even cause server failure). We should also carefully monitor their activities and manage them when necessary.
Knowing The Enemy: Bad Bots
Unlike the good bots discussed above, bad bots are intentionally designed to perform malicious activities. They are mostly owned and operated by cybercriminals, hackers, scammers/fraudsters, and other parties that engage in various cybercrimes.
One of the most common implementations of bad bots is content/web scraping, where your business’s competitors or third-party scrappers can use these bots to scrape and steal sensitive information from your site.
Also, even when these bots are unsuccessful in launching their attacks, they can strain your web server and slow down your site for legitimate users.
Nevertheless, here are some of the major types of bad bots you should know:
- Web scraper bots
As the name suggests, these bots are designed to scrape content from your site, rapidly saving your site’s elements and content. While web scraper bots aren’t always 100% bad (i.e., there are companies who scrape data only for analysis purposes), there are scraper bots that are used to illegally steal content and repurpose the content elsewhere, or even steal sensitive information contained on the site.
Designed to spread spams especially links that are used to drive traffic to a scam/fraud website. The most common application is bots that post automated messages on various comment sections of blogs, forums, and social media platforms.
Activities related to spambots have relatively declined in recent years, but not because we are more effective in dealing with them. Rather, spambots tactics are now highly unprofitable due to changes in PPC ads especially Google Ads that no longer reward spam, low-quality sites.
- Imposter bots
These bots are designed to mimic human behaviors so that they can pass various bot detection solutions. There are various applications of these imposter bots, but they are mainly used to launch another, more severe attack like DDoS attacks, data breaches, and others.
- Malware Bots
Designed to find and exploit security vulnerabilities so they can spread malware to the web server, often infecting the entire network. A common application is so that via the malware, attackers can take control of the infected device and then use it to perform a DDoS attack on another network. In this case, the affected device/computer is known as a DDoS botnet.
How Bad Bots Can Affect Your Website
Now that we’ve discussed various types of bad bots that are available on the internet, let’s take a look at how they can actually impact your site, both short-term and long-term.
- Affecting your SEO ranking
While there are many ranking factors that can affect your website’s ranking on the SERP, nowadays user-experience (UX) factors like bounce rate and dwell time are also major ranking factors. Bad bots can strain your web server’s performance and can affect your site’s reliability, which can hurt your site’s SEO ranking in the process.
There are also various other ways bad bots can do this. For example, by stealing your content and posting it elsewhere, creating a duplicate content issue that might get your site penalized.
- DDoS attack
Bad bots can be utilized to launch DDoS (Distributed Denial of Service) attacks, which can severely slow down your site or even render it completely inaccessible. As discussed above, malware bots can affect hundreds or even thousands of computers and use these computers (botnets) to spam requests on your website.
The financial and reputational damage done by a DDoS attack can be massive and long-term, and can even cause permanent damage to your reputation. Also, it’s very difficult to 100% eliminate an ongoing DDoS attack, so preventing it is much desirable.
- Slowing down your site
Even when bots aren’t launching a DDoS attack, it can put strains on your site’s server and can severely affect your site’s load time. Slow load time can cause a higher bounce rate, and according to Google, more than half of site visitors will exit a site that loads in more than 3 seconds. As discussed above, a high bounce rate might also translate into a drop in SEO performance.
- Infecting your audience with malware
Once your site has been infected with malware (via malware bots or other means), then it might also affect all of your site’s visitors. This can lead to a damage in reputation, as well as other negative impacts. Also, malware can be difficult to detect since it is usually designed to look like your site’s native code. In fact, it’s common that you didn’t notice your site infected by malware until Google detects it and penalize your site/warns visitors from visiting it, hurting your site’s traffic (which can be permanent)
How To Manage Bad Bots (and Good Bots)
First, although our focus here is going to be how we can manage the bad bots, it’s important to note that we still need to manage the good bots to make sure they are working optimally and won’t eat too much of your site’s resources. Also, we have to make sure we don’t block them accidentally (unless in specific cases where blocking them is intended).
Obviously managing good bots is going to be much easier since they are designed to follow rules and their identities/sources are pretty clear. There are only two main things we should do:
Setting up your robots.txt file
A key practice in managing bot traffic is to set up rules/policies in your site’s robots.txt file.
The robots.txt file is a text file that specifies the rules for any bots accessing the website where the robots.txt file resides. Essentially these rules define which pages the bots can or can’t crawl and other requirements. Good bots will always follow rules defined in the robots.txt file, but bad bots typically won’t. In fact, bad bots often will learn the rules in the robots.txt file and use the information to avoid your site’s security measures.
A whitelist or allowlist is a list of good bots that are allowed to enter and make a request from your site. On the other hand, a blacklist is a list of bots blocked from your site. Typically whitelisting or blacklisting bots from your site works by identifying and defining rules on the traffic’s user-agent (unique information that can identify the traffic’s identity) and/or the IP address.
Managing Bad Bots
Managing bad bots, however, is an entirely different story because these bots are designed to avoid identification and won’t follow your rules/policies. So, there is no one-size-fits-all approach to blocking bad bots.
Instead, here are some recommendations you can use to help stop bot attacks:
- Invest in a bot management solution
The best, most effective approach in defending against bad bot attacks is to invest in a proper bot management solution. Today’s bad bots are very sophisticated in mimicking human behaviors and can mask their user-agent while rotating between thousands of IP addresses. So, an advanced bot mitigation solution, preferably one utilizing AI machine learning technologies like DataDome is preferred.
With today’s gen-4 bots utilizing AI in their attack vectors, AI technology in the mitigation solution is now also necessary.
CAPTCHA is a simple test designed to differentiate between human users and bots. CAPTCHA won’t stop advanced attackers and there are now various CAPTCHA farms/solving services rented to cybercriminals and hackers. However, CAPTCHA might still catch and discourage basic bots and cybersecurity threats.
- Monitor and manage your traffic
Monitor your traffic sources carefully and look for high traffic spikes and sudden changes like higher bounce rate than usual, lower conversion rates from specific traffic sources, and so on. Especially monitor failed logins and validation (like failures of redeeming/validation of coupons/gift cards).
Above, we have discussed the key differences between good bots and bad bots, and their major examples. We have also discussed how we should manage these bad bots and good bots via various different approaches.
Due to the nature of these bad bots and how they rapidly evolve in recent years, there is no one single answer on how we can stop them. However, investing in a good, AI-based bot management solution is your best bet in identifying today’s sophisticated bots and mitigating their activities.