Linux Shell text processing core technology and practical application-Jtti

Linux Shell text processing core technology and practical application

Time : 2025-09-11 14:55:25

Edit : Jtti

The Linux Shell is a fundamental tool for system administrators and developers, offering powerful text processing capabilities. The following discusses core text processing techniques within the Shell environment, including common commands, advanced techniques, and practical applications.

The foundation of text processing lies in mastering the combined use of core commands. The grep command is used for pattern matching and text searching. Its basic syntax is `grep [option] pattern [file]`. Common options include -i (ignore case), -v (invert match), -n (display line numbers), and -r (recursive search). For example, to search the current directory and its subdirectories for text files containing "error" and display line numbers:

grep -rn "error" .

awk is a powerful text analysis tool that excels at processing structured text data. Its basic working principle is to scan a file line by line, processing it field by field. Awk programs typically consist of pattern matching and processing actions:

# Print the first and third columns
awk '{print $1, $3}' filename
# Process /etc/passwd using colons as delimiters
awk -F: '{print $1, $6}' /etc/passwd
# Count the number of lines in a file
awk 'END {print NR}' filename
# Sum the third column
awk '{sum += $3} END {print sum}' data.txt

sed is a stream editor used for filtering and transforming text. It supports regular expressions and can perform search and replace, line selection, conditional processing, and other operations:

# Replace all occurrences of "old" with "new" in a file
sed 's/old/new/g' filename
# Replace only the first occurrence of "old" on each line
sed 's/old/new/' filename
# Delete blank lines
sed '/^$/d' filename
# Replace only matching lines
sed '/pattern/s/old/new/g' filename

The sort command is used to sort text lines, supporting various sorting methods and duplicate removal:

# Sort by number
sort -n file.txt
# Sort by the second column
sort -k2 file.txt
# Sort with duplicate removal
sort -u file.txt
# Sort in reverse order
sort -r file.txt

The uniq command is often used in conjunction with sort to remove or count duplicate lines:

# Count duplicate line occurrences
sort file.txt | uniq -c
# Display unique lines
sort file.txt | uniq -u
# Display all duplicate lines
sort file.txt | uniq -d

The cut command is used to extract specific fields from a text line:

# Extract the first 10 characters of each line
cut -c1-10 filename
# Extract the first field with a colon as the delimiter
cut -d: -f1 /etc/passwd
# Extract multiple fields
cut -d, -f1,3,5 data.csv

The paste command is used to merge file lines:

# Merge two files in parallel
paste file1.txt file2.txt
# Use a specified delimiter
paste -d, file1.txt file2.txt

The tr command is used to convert and delete characters:

# Convert lowercase to uppercase
tr 'a-z' 'A-Z' < filename
# Delete numeric characters
tr -d '0-9' < filename
# Compress duplicate characters
tr -s ' ' < filename

The wc command is used to count text information:

# Count lines, words, and characters
wc filename
# Count only the number of lines
wc -l filename
# Count the number of files in the current directory
ls | wc -l

The find command, combined with text processing, allows for powerful file content searches:

# Find files with specific content in the current directory
find . -type f -exec grep -l "pattern" {} \;
# Find and replace text in multiple files
find . -name "*.txt" -exec sed -i 's/old/new/g' {} \;

The xargs command converts standard input into command-line arguments:

# Find and delete files
find . -name "*.tmp" | xargs rm
# Process multiple files in parallel
find . -name "*.log" | xargs -P 4 -I {} gzip {}

Regular expressions are a core technology for text processing. Shells support both basic regular expressions (BREs) and extended regular expressions (EREs):

# Use extended regular expressions to match IP addresses
grep -E '([0-9]{1,3}\.){3}[0-9]{1,3}' file
# Use regular expression replacement
sed -E 's/([0-9]+)/\1digit/g' file

In practical applications, these commands often need to be used in combination. For example, to analyze a web server log and extract the most frequently accessed IP addresses:

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

Process a CSV file and calculate the sum of a specific column:

awk -F, '{sum += $3} END {print "Sum:", sum}' data.csv

Batch rename files:

find . -name "*.jpg" | awk '{printf "mv %s %s\n", $0, $0}' | sed 's/\.jpg/\.jpeg/' | sh

Monitor a log file and display error messages in real time:

tail -f application.log | grep --line-buffered "ERROR"

Remove duplicate text and preserve original order:

awk '!seen[$0]++' filename

Extract the intersection of two files:

sort file1.txt file2.txt | uniq -d

Extract the difference between two files:

sort file1.txt file2.txt | uniq -u

Awk demonstrates its power when processing multi-line text patterns:

# Process multi-line records
awk 'BEGIN {RS=""; FS="\n"} {print "Record:", NR, "Yes", NF, "Field"}' data.txt

Performance optimization is an important consideration when processing large files. Using LC_ALL=C can significantly increase command processing speed:

LC_ALL=C grep "pattern" large_file.txt
LC_ALL=C sort large_file.txt > sorted_file.txt

For extremely large files, use the split command:

# Split the large file into smaller files of 1000 lines each
split -l 1000 large_file.txt chunk_
# Process the split files in parallel
find . -name "chunk_*" | xargs -P 4 -I {} process_file.sh {}

Mastering these text processing techniques requires considerable practice, but once mastered, they will greatly improve your productivity in a Linux environment. We recommend practicing with real-world projects to gradually master the in-depth application of these powerful tools.

Relevant contents

24/7/365 support.We work when you work