In Linux systems, the efficiency of file operations directly affects system performance, especially when processing massive amounts of data. By deeply understanding the file system mechanism and kernel features, the speed of file creation and deletion can be increased by dozens of times. Especially in a production environment, performance optimization can effectively improve work efficiency. File operations in Linux servers involve multiple levels, such as system call efficiency, file system features, hardware interaction, etc. It is necessary to understand the kernel mechanism and then go deep into the parameter optimization of specific commands to optimize massive small files or large files respectively. Deletion operations also need to pay extra attention to the impact of ext4 mechanisms on performance, etc. For more specific content, please continue reading!
Optimize file creation performance
Fragmentation can be reduced through pre-allocation strategy:
fallocate -l 2G large_file.img # Completed in 0.1 seconds
dd if=/dev/zero of=file bs=1G count=2 # Takes 3 seconds (mechanical hard disk)
- fallocate directly allocates disk blocks (metadata operation only)
- dd requires physical write, 200 times slower
- Ext4/XFS supports instant pre-allocation, NTFS requires full write
Batch processing reduces system calls. The specific operations are as follows:
python
# Python efficient creation example
with open('batch.txt', 'w') as f:
for i in range(100000):
f.write(f"Line {i}\n") # Single open completes 100,000 lines of writing
- Single open call is 300 times faster than loop open
- Buffer setting: setvbuf() is adjusted to 1MB to improve the writing speed of small files
Common file system feature utilizations include:
- XFS's delayed allocation mechanism: merge multiple write requests
- Btrfs's copy-on-write: avoid duplicate data writing
- Tmpfs memory file system: creation speed is 100 times faster than SSD
mount -t tmpfs -o size=1G tmpfs /mnt/ramdisk
The following is a collection of commonly used commands for deep optimization of file deletion!
1. Asynchronous deletion mechanism
rsync -a --delete empty_dir/ target_dir/ # 5 times faster than rm
Principle: Replace the original directory after creating an empty directory structure
Applicable scenario: Deletion of millions of small files
2. Kernel parameter tuning
sysctl -w vm.vfs_cache_pressure=200 # Increase inode recycling priority
echo 3 > /proc/sys/vm/drop_caches # Release pagecache immediately
Adjust dir_index to enable B-tree index (Ext4):
tune2fs -O dir_index /dev/sda1
3. Physical storage feature adaptation
Mechanical hard disk: Enable TRIM to prevent deletion performance degradation
fstrim -v /mnt/data # Execute weekly
- SSD: Disable log to reduce write amplification
mkfs.ext4 -O ^has_journal /dev/nvme0n1p1
High concurrency scenario practice
1. Parallel processing framework
# Use GNU parallel to delete millions of files
find /data/2023- -type f | parallel -j 32 rm {}
- -j parameter is set according to the number of CPU cores (recommended number of cores × 2)
2. Comparison of efficient tool chains
Operation rm rsync find -delete unlink
Time taken for 100,000 files 82s 18s 76s 79s
Memory usage (MB) 15 220 18 12
Number of system calls 300,000+ 50,000 280,000+ 300,000+
3. Solutions for extreme scenarios
Prevention of inode exhaustion:
df -i # Monitor inode usage
tune2fs -i 0 /dev/sdb1 # Disable time check
Zombie file processing:
lsof +L1 # Find deleted files occupied by processes
kill -9 $(pid) # Free up space
Safety and reliability, such as complete deletion of sensitive data
Mechanical hard disk: 7 overwrites (DOD 5220.22-M standard)
shred -n 7 -z confidential.doc
SSD: Use the manufacturer's secure erase tool
nvme format -s1 /dev/nvme0n1
2. Anti-accidental deletion technology
alias rm='rm -I' # Deleting more than 3 files requires confirmation
chattr +i critical_file # Add a non-deletable mark
3. Automatic Recycle Bin
# Custom deletion function
del() {
mv "$@" ~/.Trash/$(date +%Y%m%d)
}
Performance benchmark test data
Creation speed comparison (10,000 1KB files)
- Ext4 default: 12.8 seconds
- XFS+fallocate: 0.3 seconds
- Tmpfs: 0.1 seconds
- Deletion speed comparison (1 million empty files)
- rm -rf: 182 seconds
- rsync: 31 seconds
- parallel rm: 28 seconds
Production environment recommendations: Avoid using tmpfs for database servers, and give priority to XFS file systems. The rsync deletion solution is recommended for log servers, and logrotate is used to achieve automated management. Before deleting critical data, the backup must be verified and the 3-2-1 principle (3 copies, 2 media, 1 copy off-site) must be followed. When processing more than 100 million files, consider using a distributed file system (such as Ceph) instead of a single-machine solution. Monitor IO load through iostat -xmt 2 to ensure that the deletion operation does not affect the response time of core services.