While using a cloud server, you might suddenly find yourself logging in and encountering system errors: the website can't write logs, the database is malfunctioning, and even SSH login becomes sluggish. A disk check reveals the root directory `/` is 100% full. This situation is very common in actual operations and maintenance, and it often happens suddenly. Many people's first reaction is to "quickly delete some things," but without understanding how the space became full, it's easy to delete critical files, or even cause the service to crash completely.
The core idea for handling this type of problem is actually quite simple: first, identify what is occupying the space, then decide whether to clean up or expand it, rather than blindly operating. The most basic step is to first confirm the disk usage:
df -h
This command can quickly show the usage rate of each partition. If you find that the / partition is close to or has reached 100% usage, then the problem is indeed in the root directory. The next step is not to delete anything, but to locate where the "biggest hog" is.
A very useful command is:
du -h --max-depth=1 /
It analyzes space usage by directory level, helping you quickly identify which directories are consuming the most space. Generally, the most common culprits are /var, /home, /usr, and /root. /var is almost always the hardest hit, as logs, caches, and database data are primarily located here.
If /var is consuming a large amount of space, you can investigate further:
du -h --max-depth=1 /var
Often you'll find that `/var/log` or `/var/lib` are unusually large. Log files are the easiest to overlook; for example, Nginx, Apache, MySQL, and system logs, without logrotation, can grow indefinitely, with a single log file reaching tens of gigabytes in size.
You can use the following command to find large files:
find / -type f -size +100M -exec ls -lh {} \;
This command will list files larger than 100MB, which can usually quickly pinpoint the problematic file. If you see a particularly large .log file, such as access.log or error.log, you can first check if it is still in use:
lsof | grep deleted
There's a more subtle situation: a file may have been deleted, but a process is still using it; in this case, disk space won't be released. Only restarting the service or process will truly reclaim the space.
For cleaning up log files, it's not recommended to directly use `rm`. A safer approach is to first clear the contents:
> /var/log/nginx/access.log
This will not affect running services. If it is confirmed that the logs can be deleted, then use logrotate for automatic management to prevent recurrence.
Besides logs, another common problem is cached files. For example, system package cache:
apt-get clean
Or in CentOS systems:
yum clean all
Docker is also an often overlooked "space black hole." If you run containers on a server, images, container logs, and middleware will accumulate continuously. You can clean up these unused resources using the following command:
docker system df
docker system prune -a
Before execution, ensure that no critical containers depend on these images; otherwise, it may impact business operations.
Another issue is the accumulation of backup files. Many people habitually place database backups and website compressed files directly in /root or /home, which can consume a large amount of space over time. You can use:
ls -lh /root
Check for old backup files and delete or migrate them to object storage as needed.
If the investigation reveals that the space is indeed filled with normal data, such as database growth or increased user file uploads, then it's not a matter of "cleaning up" but rather requires expansion.
Cloud server expansion typically involves two steps: first, expanding the cloud disk capacity in the console, and then expanding the partition and file system within the system. After expanding the cloud disk, the system will not automatically recognize the new space; manual operation is required.
You can first check the disk:
lsblk
If you are using the common ext4 file system, you can execute:
resize2fs /dev/vda1
If it is an xfs file system, then use:
xfs_growfs /
In some cases, if LVM (Logical Volume Management) is used, the expansion process is slightly more complex, requiring expansion of the physical volume first, then the logical volume, and finally the file system. However, the advantage is greater flexibility, making it suitable for long-term use.
Another point to note is inode exhaustion. Sometimes the disk appears to have space, but the system still displays "Disk full," which may be due to inode exhaustion. You can use:
df -i
Check inode usage. If inodes are full, it's usually due to a large number of small files (such as caches and session files), which need to be deleted for recovery.
Several pitfalls should be avoided during this process. First, do not arbitrarily delete files in system directories such as /bin, /usr, and /lib, as this may prevent the system from booting. Second, do not delete database files or program data directories without confirming their purpose, otherwise the data may be unrecoverable. Third, it's best to make backups before cleaning, especially in a production environment.
From an operations and maintenance perspective, resolving "disk full" is only the first step; preventing it from happening again is more important. It is recommended to do the following: enable log rotation, regularly clear caches, monitor disk usage, and set alarm thresholds. Once disk usage exceeds 80%, it should be taken seriously, not waited until 100%.
Additionally, mount log, uploaded file, and database directories to separate data disks instead of piling them all in the root directory. This way, even if one directory is full, it won't affect the entire system.
Many people realize the importance of capacity planning after experiencing "disk full" once. Cloud server resources are elastically scalable, but this requires proactive problem detection, not just waiting for service crashes before taking action.
In summary, a full root directory isn't necessarily a cause for alarm; the key is the approach: first, use `df` and `du` to identify the problem, then use `find` to locate large files, and properly clean up logs, caches, and unnecessary data; if the growth is normal, then decisively expand the disk. With a clear strategy, these types of problems can be resolved quickly.
Frequently Asked Questions:
Q: What's the safest thing to delete first when the disk is full?
A: Prioritize cleaning up log files (e.g., `/var/log`), caches (apt/yum/docker), and old backup files. These generally won't affect system operation.
Q: Why doesn't directly `rm` a large file release space?
A: The file may still be occupied by a process; you need to restart the relevant service or process to release it.
Q: Why isn't `df` showing any change after expanding the disk?
A: Because the file system hasn't expanded; you need to manually execute `resize2fs` or `xfs_growfs`.
Q: What to do if the inode list is full?
A: Delete a large number of small files, such as cache or session files, to free up inodes.
Q: How to prevent the disk from becoming full again?
A: Enabling journaling, setting up monitoring alerts, proper partitioning, and regular cleanup are the most effective methods.