System Administration

Linux Log Parsing, Text Manipulation and Data Analysis Commands

Comprehensive guide to Linux log parsing and text manipulation commands including sed, awk, grep, cut, sort, jq, and more. Master data analysis, log monitoring, and text processing for system administration.

#linux #log-parsing #text-processing #sed #awk #grep #data-analysis #sysadmin

Free account required

About Log Parsing Commands

Log parsing and text manipulation are essential skills for system administrators and DevOps professionals. These commands help you analyze log files, extract patterns, manipulate text streams, and process structured data efficiently. Monitor your log analysis in real-time and test these techniques on optimized VPS infrastructure.

From simple pattern matching with grep to complex text transformations with sed and awk, mastering these tools will significantly boost your productivity in troubleshooting and data analysis tasks. Compare performance metrics to ensure your server handles log processing efficiently.

1. Stream Editors & Pattern Processing 2. Advanced Search Tools 3. Text Manipulation 4. Data Format & Specialized Tools 5. File Viewing & Monitoring 6. Comparison & Diff Tools

Stream Editors & Pattern Processing

sed - Stream Editor

Sed (Stream Editor) is a powerful text stream editor. It's used for text manipulation, including search and replace, insertion, deletion, and more, based on regular expressions.

# Replace first occurrence
$ sed's/old/new/' file.txt # Replace all occurrences (global)
$ sed's/old/new/g' file.txt # Edit file in-place
$ sed -i's/old/new/g' file.txt # Delete lines matching pattern
$ sed'/pattern/d' file.txt # Print only lines matching pattern
$ sed -n'/pattern/p' file.txt

awk - Pattern Scanning and Text Processing Tool

AWK is a versatile text processing tool. It's primarily used for data manipulation, text reporting, and actions based on field-separated data. Operates on a line-by-line basis and is particularly useful for working with structured data.

# Print specific columns
$ awk'{print $1, $3}' file.txt # Print with custom delimiter
$ awk -F:'{print $1}' /etc/passwd # Sum a column
$ awk'{sum+=$1} END {print sum}' file.txt # Filter rows with conditions
$ awk'$3 > 100 {print $0}' file.txt # Pattern matching
$ awk'/ERROR/ {print $0}' logfile.txt

echo - Display Text or Output

The echo command is used to print text or variables to the standard output (usually the terminal). It's commonly used for displaying messages or piping output from shell scripts.

# Basic output
$ echo"Hello, World!" # Without trailing newline
$ echo -n"No newline" # Enable escape sequences
$ echo -e"Line1\nLine2\tTabbed" # Output variables
$ echo $PATH # Redirect to file
$ echo"Log entry" >> logfile.txt

grep - Global Regular Expression Print

grep is used for searching text using regular expressions. It scans text and outputs lines that match the specified pattern.

# Basic search
$ grep"pattern" file.txt # Case insensitive
$ grep -i"error" logfile.txt # Recursive search
$ grep -r"TODO" /project/ # Show line numbers
$ grep -n"error" file.txt # Invert match (exclude pattern)
$ grep -v"DEBUG" logfile.txt # Count matches
$ grep -c"error" logfile.txt # Show context (lines before/after)
$ grep -A 3 -B 3"error" logfile.txt

🌐 Network logs: Use grep with our Linux Networking Commands for troubleshooting | Advanced patterns in Linux Admin Tips
🔍 Analyze DNS logs: DNS Lookup tool for DNS troubleshooting

Advanced Search Tools

ngrep - Network Packet Analyzer

ngrep is a network packet analyzer tool that allows you to search for patterns in network traffic. It's useful for monitoring network activity and filtering packets based on regular expressions.

# Monitor HTTP traffic
$ sudo ngrep -q -W byline"^(GET|POST)" tcp port 80 # Search for specific pattern
$ sudo ngrep -d any"password" port 80 # Monitor DNS queries
$ sudo ngrep -q -d any port 53

ripgrep (rg) - Line-Oriented Search Tool

ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern. It's designed to be fast and efficient, making it a popular choice for code searching and text processing.

# Basic search
$ rg"pattern" # Search specific file types
$ rg"function" -t py # Case insensitive
$ rg -i"error" # Show context
$ rg -C 3"pattern" # Search hidden files
$ rg --hidden"pattern"

agrep - Approximate Grep

agrep stands for"approximate grep". It allows you to perform approximate string matching, useful for finding similar or misspelled words in text.

# Search with 1 error allowed
$ agrep -1"patern" file.txt # Case insensitive approximate search
$ agrep -i -2"linux" file.txt

ugrep - Ultra-Fast Search Tool

ugrep is a command-line search tool that supports recursive search, regex patterns, and Unicode. It aims to be a feature-rich and efficient alternative to traditional grep.

# Basic recursive search
$ ugrep"pattern" -r # Search with fuzzy matching
$ ugrep -Z"patern" # Interactive search
$ ugrep -Q

ack - Developer-Friendly Code Search

ack is a tool for searching text and code. It's designed to be developer-friendly, automatically skipping version control directories and binary files by default.

# Search in code
$ ack"function" # Search specific file types
$ ack --python"class" # Ignore case
$ ack -i"error" # List matching files only
$ ack -l"TODO"

ag (The Silver Searcher) - Code Search

The Silver Searcher, commonly known as"ag", is another code searching tool that's optimized for speed, particularly among developers. It's similar to ack and ripgrep in terms of usage.

# Basic search
$ ag"pattern" # Search specific file types
$ ag --python"class" # Case sensitive search
$ ag -s"Pattern" # Show context
$ ag -C 2"pattern"

pt (The Platinum Searcher) - Code Search

The Platinum Searcher is yet another code searching tool that focuses on speed and efficiency. It's similar to ag and ripgrep.

# Basic search
$ pt"pattern" # Ignore case
$ pt -i"error" # Search with file type filter
$ pt --go"func"

🛠️ More tools: Explore our DevOps diagnostic tools | Basic commands: Linux Commands reference

Text Manipulation

cut - Extract Sections from Lines

The cut command is used for extracting sections from lines of files or data streams. It's often used to isolate specific fields or columns from text.

# Cut by delimiter
$ cut -d: -f1 /etc/passwd # Cut by character position
$ cut -c1-10 file.txt # Multiple fields
$ cut -d, -f1,3 data.csv

sort - Sort Lines of Text

sort is used for sorting lines of text files or data streams in ascending or descending order. It's helpful for organizing data.

# Basic sort
$ sort file.txt # Reverse sort
$ sort -r file.txt # Numeric sort
$ sort -n numbers.txt # Sort by column
$ sort -k2 file.txt # Remove duplicates
$ sort -u file.txt

uniq - Remove Duplicate Lines

uniq is used to remove duplicate lines from a sorted text file or data stream. It's often used in conjunction with sort.

# Remove duplicates
$ sort file.txt | uniq # Count occurrences
$ sort file.txt | uniq -c # Show only duplicates
$ sort file.txt | uniq -d # Show only unique lines
$ sort file.txt | uniq -u

diff - Compare Files Line by Line

diff is a tool for comparing the contents of two files and finding the differences between them. It's often used for code and document comparisons.

# Basic comparison
$ diff file1.txt file2.txt # Unified format
$ diff -u file1.txt file2.txt # Side by side
$ diff -y file1.txt file2.txt # Ignore whitespace
$ diff -w file1.txt file2.txt

tac - Reverse Cat

tac is the reverse of cat. It outputs the lines of a file in reverse order, displaying the last line first and so on.

# Display file in reverse
$ tac file.txt # Reverse multiple files
$ tac file1.txt file2.txt

cat - Concatenate and Display Files

cat is short for"concatenate". It's used to display the contents of one or more files, or to combine multiple files into a single output.

# Display file
$ cat file.txt # Concatenate files
$ cat file1.txt file2.txt > combined.txt # Number lines
$ cat -n file.txt

printf - Format and Print Text

printf is used to format and print text in a specific way. It allows you to control the output format, including the width, precision, and alignment of data.

# Basic formatting
$ printf"Hello, %s\n""World" # Format numbers
$ printf"%.2f\n" 3.14159 # Align text
$ printf"%-10s %5d\n""Item" 100

comm - Compare Two Sorted Files

comm is used to compare two sorted files line by line and display lines that are unique to each file or common to both.

# Show all columns
$ comm file1.txt file2.txt # Show only lines in file1
$ comm -23 file1.txt file2.txt # Show only common lines
$ comm -12 file1.txt file2.txt

tr - Translate or Delete Characters

tr is used for translating, deleting, or squeezing characters in a text stream. It's often used for character-level transformations.

# Convert to uppercase
$ echo"hello" | tr'a-z''A-Z' # Delete characters
$ echo"hello123" | tr -d'0-9' # Squeeze repeating characters
$ echo"hello" | tr -s'l'

rev - Reverse Lines Character-wise

rev reverses the characters in each line of a text file or data stream.

# Reverse characters in each line
$ rev file.txt # Reverse from stdin
$ echo"hello" | rev

wc - Word Count

wc (word count) is used to count the number of lines, words, and characters in a text file or data stream.

# Count lines, words, characters
$ wc file.txt # Count lines only
$ wc -l file.txt # Count words only
$ wc -w file.txt # Count characters only
$ wc -c file.txt

nl - Number Lines

nl is used to add line numbers to the lines of a text file or data stream.

# Add line numbers
$ nl file.txt # Number non-empty lines only
$ nl -b t file.txt # Custom format
$ nl -n rz -w 3 file.txt

paste - Merge Lines of Files Side by Side

paste is used to merge lines from multiple files side by side. It's commonly used for combining data from different sources.

# Merge files side by side
$ paste file1.txt file2.txt # Use custom delimiter
$ paste -d, file1.txt file2.txt # Merge serially
$ paste -s file1.txt

Data Format & Specialized Tools

jq - Command-line JSON Processor

jq is a command-line JSON processor. It's used for querying, manipulating, and formatting JSON data. It's especially handy for parsing JSON in shell scripts.

# Pretty print JSON
$ cat data.json | jq'.' # Extract specific field
$ cat data.json | jq'.name' # Filter arrays
$ cat data.json | jq'.items[] | select(.price > 10)' # Get array length
$ cat data.json | jq'.items | length' # Transform structure
$ cat data.json | jq'{name: .username, id: .user_id}'

🐳 Container logs: Parse JSON logs from Docker containers | ☸️ Orchestration: Kubernetes logs and events

csvcut - CSV Column Extraction Utility

csvcut is a utility for working with CSV (Comma-Separated Values) data. It allows you to select specific columns from CSV data.

# Extract specific columns
$ csvcut -c 1,3 data.csv # Extract by column names
$ csvcut -c"name,price" data.csv # Show column names
$ csvcut -n data.csv

ccze - Colorize Log Files

ccze is a tool that colorizes log files or text, making it easier to read and understand logs by highlighting different log levels and patterns.

# Colorize log file
$ cat /var/log/syslog | ccze # Colorize with specific mode
$ tail -f /var/log/apache2/access.log | ccze -A # Use specific plugin
$ cat logfile.txt | ccze -p syslog

File Viewing & Monitoring

less & more - Pager Programs

These are both pager programs that allow you to view text files one screen at a time. They are useful for browsing large files without overwhelming your terminal.

# View file with less (recommended)
$ less file.txt # View file with more (simple)
$ more file.txt # less navigation:
# Space: next page
# b: previous page
# /pattern: search forward
# ?pattern: search backward
# q: quit

tail - Display Last Lines of File

tail displays the last few lines of a file. Like head, it also defaults to showing the last 10 lines but can be configured differently.

# Last 10 lines
$ tail file.txt # Last 20 lines
$ tail -n 20 file.txt # Follow file updates (real-time)
$ tail -f /var/log/syslog # Follow multiple files
$ tail -f file1.log file2.log

📈 Real-time monitoring: Dashboard for infrastructure metrics | Performance benchmarks for server analysis

head - Display First Lines of File

head displays the first few lines of a file. By default, it shows the first 10 lines but can be configured to display a different number of lines.

# First 10 lines
$ head file.txt # First 20 lines
$ head -n 20 file.txt # First 100 bytes
$ head -c 100 file.txt

watch - Execute Command Repeatedly

The watch command repeatedly runs the specified command at regular intervals (by default, every 2 seconds) and displays the output. It's often used for monitoring system resource usage such as checking system logs with top or observing log files with tail.

# Monitor command every 2 seconds
$ watch df -h # Custom interval (1 second)
$ watch -n 1'ps aux | grep nginx' # Highlight differences
$ watch -d free -h # Monitor log file
$ watch'tail -20 /var/log/syslog'

Comparison & Diff Tools

vimdiff - Visual Diff in Vim

vimdiff is a command in diff mode for Vim text editor. It's used for visually comparing and editing files within the Vim environment.

# Compare two files
$ vimdiff file1.txt file2.txt # Compare three files
$ vimdiff file1.txt file2.txt file3.txt # Navigation:
# ]c: next difference
# [c: previous difference
# do: diff obtain (get changes from other file)
# dp: diff put (put changes to other file)
# :diffupdate: refresh diff
# :qa: quit all windows

See Also: For basic Linux command reference, check out our Linux Commands Cheatsheet.

Practical Log Parsing Examples

Find all ERROR lines in log file

$ grep"ERROR" /var/log/application.log | tail -20

Count 404 errors in Apache access log

$ awk'$9 == 404 {print $0}' /var/log/apache2/access.log | wc -l

Extract unique IP addresses from log

$ awk'{print $1}' /var/log/nginx/access.log | sort -u

Find top 10 most frequent log entries

$ sort /var/log/syslog | uniq -c | sort -rn | head -10

Monitor real-time logs with color highlighting

$ tail -f /var/log/syslog | ccze -A

Extract timestamp and message from structured log

$ jq -r'.timestamp +"" + .message' application.json

Replace IP addresses in log file

$ sed's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/[REDACTED]/g' access.log

Find logs between specific timestamps

$ awk'/2025-01-16 10:00:00/,/2025-01-16 11:00:00/' application.log

🔍 Diagnostic tools for log investigation: DNS Lookup for DNS log analysis | WHOIS to investigate IP addresses | What Is My IP to verify server IP

Back to Cheatsheets

Master Log Analysis on High-Performance VPS

Process logs faster with optimized VPS hosting. Compare VPS providers with excellent I/O performance, explore side-by-side comparisons, use our DevOps diagnostic tools, and learn more from our Linux administration guide for efficient log processing workloads.

View VPS Benchmarks More Cheatsheets