An application on a CentOS server suddenly crashes, with the logs displaying "Cannot allocate memory," or the system becomes abnormally slow, unresponsive to terminal commands. This usually means the server is telling you that memory resources have been exhausted and cannot meet new demands. Don't panic. While memory allocation failures are troublesome, you can find the root cause and solve the problem using a systematic approach.
First, we need to understand the core mechanism of Linux (including CentOS) memory management. When an application requests memory, the system first allocates from the free portion of physical memory. If physical memory is insufficient, the kernel attempts to use swap space, moving inactive pages from physical memory to a swap partition or swap file on the hard drive to free up space. When both physical memory and swap space are nearly exhausted, new memory requests will fail. Furthermore, even if memory appears available, certain requests may fail due to memory fragmentation (especially contiguous memory requests) or limitations imposed by kernel parameters (such as memory limits for individual processes or user memory limits). Another key role is the "Out of Memory Killer." When system memory is severely depleted and on the verge of crashing, the kernel will actively select and terminate one or more processes (usually the most memory-consuming and "sacrificial") to free up memory and maintain basic system operation. The sudden disappearance of processes you see is very likely due to this.
When a problem occurs, the first step is not to restart the server, but to calmly gather information. If your terminal is still responsive, a set of commands can quickly paint a picture of the memory usage. First, use the `free -h` command to view the overall memory and swap space usage. Pay special attention to the `available` column (CentOS 7 and later), as it reflects the amount of memory the system can immediately allocate to new applications better than the `free` column. If both `available` and `swap free` are extremely low (e.g., only tens of MB left), the system is exhausted. Next, use the `top` command or the more intuitive `htop` (requires installation) to sort by memory usage (press `Shift+M` in `top`), and you can immediately see which process(s) is the "memory hog." The `%MEM` column in the `top` output shows the percentage of physical memory used by a process, `VIRT` is the virtual memory size (including allocated but unused portions), and `RES` is the actual amount of physical memory residing. Typically, monitor the processes with the highest `%MEM` and `RES` values.
However, memory leaks are often more insidious. A process might start with low memory usage, but slowly increase until it eventually crashes the system. In this case, you need to track the changes. You can run `vmstat 2 5` consecutively, which means sampling every 2 seconds for a total of 5 times. Observe the `si` (Swap In, from disk to memory) and `so` (Swap Out, from memory to disk) columns. If they consistently have high non-zero values, it indicates insufficient memory and frequent swapping, which leads to a sharp performance drop (disk is several orders of magnitude slower than memory). Another powerful tool is `sar` (part of the `sysstat` package), which provides historical memory pressure data; view it using `sar -r`.
If the above routine checks fail to pinpoint the problem, or if the failure occurs under a specific operation or user, further investigation is needed. Check kernel parameter limits: Use `ulimit -a` to view the current user's limits, where `max memory size` (`kbytes`) likely sets the maximum memory available to a single process or user. This limit may be configured in `/etc/security/limits.conf` or `/etc/systemd/system.conf` (for systemd services). Analyze kernel logs: The OOM killer leaves a "last message" before taking action. Immediately use `dmesg | tail -50` or directly check the `/var/log/messages` log, searching for keywords such as "Out of memory" and "Killed process". You will see records like "Out of memory: Kill process 12345 (java) score XXX", which clearly identifies the "culprit" and the "victim".
Memory fragmentation is also a potential cause, especially for applications that require large blocks of contiguous physical memory (such as some databases or scientific computing software). This can be diagnosed by checking the `/proc/buddyinfo` file. This file shows the number of available memory blocks at different orders (i.e., the size of contiguous page blocks). A very small number of blocks at larger orders (e.g., order 3 and above) indicates severe external fragmentation.
Once the root cause is identified, the solution becomes more targeted. If a specific process is abnormally consuming memory, the most immediate short-term solution is to terminate or restart that process (`kill PID` or `systemctl restart service_name`). However, it's crucial to first determine, based on logs and business requirements, whether it's a core business process. If the application has a memory leak, contact the developers to optimize the code or configure a restart strategy for the application. For issues caused by configuration limitations, if the `ulimit` limit is too low, you can temporarily remove it using `ulimit -m unlimited` or `ulimit -v unlimited` (only effective for the current session). For permanent modifications, you need to edit the `/etc/security/limits.conf` file and add `soft memlock` and `hard memlock` limits for the corresponding user or group. For example:
@developers soft memlock unlimited
@developers hard memlock unlimited
After modifying `limits.conf`, the changes will take effect for newly logged-in sessions. For services already running, a service or server restart may be required.
A more common scenario is insufficient overall server memory. In this case, the long-term solution is to increase physical memory. However, before upgrading hardware, existing resources can be optimized: 1) Increase swap space appropriately. If the swap space is already full, a new swap file can be created:
sudo fallocate -l 2G /swapfile_ext
sudo chmod 600 /swapfile_ext
sudo mkswap /swapfile_ext
sudo swapon /swapfile_ext
For permanent changes, `/swapfile_ext swap swap defaults 0 0` needs to be added to `/etc/fstab`. 2) Adjust the kernel's swap bias. Modifying the value of `/proc/sys/vm/swappiness` (default 60, range 0-100) can affect how aggressively the kernel uses swap space. For applications like databases that want to minimize swap usage, a temporary lower value (e.g., 10) can be set: `sysctl vm.swappiness=10`. For permanent changes, add `vm.swappiness=10` to `/etc/sysctl.conf`. 3) Clean up the page cache. In testing or emergency situations, if cached data is determined to be unimportant, page cache and directory entry cache can be released: `sync && echo 3 > /proc/sys/vm/drop_caches`. Note: This is a temporary emergency measure; the kernel will immediately restart caching. It's primarily used to test whether insufficient memory is caused by a full cache.
Finally, establishing a monitoring and prevention system is crucial. Use `cron` tasks to periodically run monitoring scripts, recording `free` and `top` output, or configure professional monitoring tools such as Zabbix or Prometheus to set alerts when memory usage exceeds 90%. For critical applications, directives such as `MemoryMax` and `MemoryHigh` can be used in their systemd service unit files (`.service`) to limit memory usage and prevent a single service from crashing the entire system.
Facing CentOS memory allocation failures requires a comprehensive approach, from emergency process termination to mid-term parameter tuning and swap space expansion, and finally to long-term capacity planning and application optimization. Understanding the different levels of causes behind memory exhaustion and skillfully using system tools for diagnosis will allow you to handle the next "memory crisis" with ease and ensure the stable operation of your services.