How to identify malicious source IPs when Hong Kong server traffic is being consumed in large quantities every day?-Jtti

How to identify malicious source IPs when Hong Kong server traffic is being consumed in large quantities every day?

Time : 2025-11-26 15:20:38

Edit : Jtti

　　Hong Kong servers are particularly prone to excessive bandwidth consumption in high-bandwidth, high-traffic scenarios. Many website owners purchase Hong Kong CN2, BGP, or international bandwidth, but traffic intended for normal user access is being wasted daily, sometimes even reaching full capacity in a short period, leading to slower access speeds, website lag, or even crashes. The root causes are often related to malicious IP scanning, CC attacks, mass web scraping, abnormal POST requests, or malicious downloading. If malicious IP sources are not identified and restricted promptly, traffic costs can increase exponentially, and website load will remain consistently high.

　　Before discovering excessive daily traffic consumption, website owners may only notice a sudden increase in server bandwidth, a significant increase in traffic billing, or an abnormal surge in CDN origin pull data. At this time, CPU load may not necessarily be high, as much malicious traffic consists of static downloads, malicious scraping, and repeated page refreshes, consuming bandwidth but not significant computing resources. The most direct way to determine if a source is malicious is to check access logs and system network connection counts. For Nginx, you can quickly view real-time access data to determine which IPs are making the most frequent requests. For example, you can use the following command to count the top ten sources of access:

awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head

　　If certain IPs or IPs from certain countries/regions experience unusually high access volumes, especially tens of thousands of requests within a short period, it's highly suspicious that this traffic doesn't belong to normal users. For a more direct observation, further investigation can be conducted to identify which requests are consuming bandwidth, such as determining if large files are being repeatedly downloaded.

awk '{print $1, $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head

　　If you find the same IP address or URL being downloaded inexplicably and excessively, such as images, static files, videos, or compressed files, it is highly likely that a malicious web crawler or attack script is present. For POST requests, you can also look for abnormal submissions by analyzing the logs.

grep POST /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head

　　Sometimes attackers masquerade as ordinary browser User Agents (UAs), but UAs are not always trustworthy. Further analysis can be conducted to identify User-Agents that appear frequently and exhibit suspicious characteristics. Some malicious scanning tools use UAs containing keywords such as null values, curl, python, and Go-http-client. These can be quickly identified using the following methods:

awk -F\" '{print $6}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head

　　After confirming the suspicious source at the log level, you can also view the real-time connection status through system commands, for example:

netstat -anp | grep ESTABLISHED

　　If a large number of IPs establish ESTABLISHED connections, especially dozens or even hundreds of connections simultaneously, it's almost certain that this access behavior is abnormal. For CC-type malicious traffic, there may also be a backlog of TIME_WAIT or SYN_RECV errors, which can also be quickly identified using netstat.

　　After identifying the malicious source IP, the next crucial step is to take protective measures to reduce traffic consumption. The most common method is to use a firewall to block malicious IPs or IP ranges, such as using UFW or iptables.

iptables -A INPUT -s 1.2.3.4 -j DROP

　　If all IPs from the same country/region are abnormal, a regional block can also be implemented. For Nginx, access restrictions can be added in the configuration, for example, using `limit_req` to control access frequency.

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

server {
    location / {
        limit_req zone=one burst=20 nodelay;
    }
}

　　For interfaces that don't need to be crawled, strict access frequencies can be set. If you're using a CDN, you can also enable attack protection, WAF, and access frequency limiting rules in the console to reduce pressure on the origin server. If malicious traffic mainly originates from overseas regions, you can also enable region blocking or JS Challenge to filter machine traffic through browser verification.

　　In some cases, attackers use a large number of distributed IPs to simulate real access, making single IP blocking ineffective. In this case, you can combine log analysis with security services for more advanced filtering, such as using Fail2ban to automatically identify malicious requests and block their origin.

fail2ban-client status

　　By customizing rules, servers can proactively block abnormal access behavior, thereby reducing bandwidth waste. For high-risk businesses, consider using professional DDoS protected IPs or defense services to filter out attack traffic at the scrubbing nodes, preventing the origin server's bandwidth from being exhausted.

　　Frequently Asked Questions:

　　Q: If traffic is at full capacity but CPU load is normal, is it necessarily malicious access?

　　A: Not necessarily. It could be large files being downloaded normally, such as some users downloading videos or compressed files. However, if download activity is clearly concentrated on certain suspicious IPs, it is highly likely to be malicious crawlers or attack scripts.

　　Q: Is it necessary to purchase DDoS protected IPs to solve this problem?

　　A: Not necessarily. In many cases, log analysis + firewall blocking + CDN or WAF can effectively reduce malicious traffic. DDoS protected IPs are only needed when encountering continuous high-intensity attacks.

　　Q: How to determine if a crawler is malicious?

　　A: Checking whether the User Agent (UA) is normal, whether the access frequency is too high, whether it bypasses robots.txt, and whether it is fetching a large amount of static resources in a short period of time using multiple threads can help you identify malicious crawlers.

　　Q: Will malicious traffic still enter the origin server after using CDN?

　　A: Yes, but it will be greatly reduced. Configuring reasonable caching strategies and access restriction rules can further reduce the pressure on the origin server.

　　Q: Will blocking IPs mistakenly affect legitimate users?

　　A: If it's a frequently accessing malicious IP, the probability of a false positive is extremely low. If you are concerned, you can use frequency limiting instead of direct blocking.

Relevant contents

24/7/365 support.We work when you work