Viewing File Contents
Bioinformatics files are often enormous - a single FASTQ file can contain hundreds of millions of lines. You cannot open these in a text editor. Instead, you use command-line tools to peek at contents, search through them, and extract specific sections.
These tools read files without loading the entire file into memory. This lets you examine 100 GB files on a laptop with 8 GB of RAM.
head - View the Beginning
The head command shows the first few lines of a file:
head sequences.fasta>AT1G01010.1 | NAC001 | NAC domain protein | chr1:3631-5899 REVERSE
ATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTAT
CTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTA
GCTCACGGCTTTGTCGGGCAGATCATTGAGCTAGTAGGAGGTTTCACGGGCATCAACCAA
>AT1G01020.1 | ARV1 | ARV1 family protein | chr1:6788-9130 FORWARD
ATGAACACGAAGGACCACCAGATCACCCAAGTACCACCGCCCCACCTCTCTTCCCACCAA
AATCCAATCAAAACGATAGTTTCTCCAACCAACCCATCTCCAACAACTTTAACTTCTTCT
>AT1G01030.1 | NGA3 | AP2 domain protein | chr1:11649-13714 FORWARD
ATGGCGGATGCTTCACCTTCTTCTCCCCTCGCCGCCTTCCTCTCCAACTCCTTCAGAGAG
GTCGGCGGTGGCGGTGGTGGCTTTCTCTCCGAAGGTGCCGGTGCCGGAGCTCCTCCTCCABy default, head shows the first 10 lines. Perfect for checking file format before processing.
Specify Number of Lines
head -n 4 sequences.fasta>AT1G01010.1 | NAC001 | NAC domain protein | chr1:3631-5899 REVERSE
ATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTAT
CTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTA
GCTCACGGCTTTGTCGGGCAGATCATTGAGCTAGTAGGAGGTTTCACGGGCATCAACCAAThe -n flag specifies exactly how many lines. Shows just the first sequence from the FASTA file.
Use head -n 4 on FASTQ files to see exactly one complete read (4 lines per read).
Practical Example: Check FASTQ Quality
head -n 4 sample.fastq@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICOne complete FASTQ record. The quality line shows mostly 'I' (Phred 40), indicating high-quality sequencing.
Check Multiple Files
head -n 1 *.fasta==> genome_chr1.fasta <==
>Chr1 CHROMOSOME dumped from ADB: Jun/20/09 14:53:46
==> genome_chr2.fasta <==
>Chr2 CHROMOSOME dumped from ADB: Jun/20/09 14:53:52
==> genome_chr3.fasta <==
>Chr3 CHROMOSOME dumped from ADB: Jun/20/09 14:53:57Check the headers of multiple FASTA files at once. head automatically shows which file each output comes from.
tail - View the End
The tail command shows the last few lines:
tail analysis.log[2025-11-20 14:23:15] Alignment phase complete
[2025-11-20 14:23:18] Starting post-alignment QC
[2025-11-20 14:28:42] QC checks passed
[2025-11-20 14:28:45] Writing output files
[2025-11-20 14:32:10] Analysis complete
[2025-11-20 14:32:10] Total runtime: 2 hours 15 minutes
[2025-11-20 14:32:10] Output: aligned_sorted.bam
[2025-11-20 14:32:10] Exit status: SUCCESSCheck the end of a log file to see if analysis completed successfully. Default shows last 10 lines.
Follow Growing Files
tail -f alignment.log[2025-11-20 15:30:12] Aligning sample_01_R1.fastq.gz
[2025-11-20 15:30:45] 10% complete (4.2M reads aligned)
[2025-11-20 15:31:18] 20% complete (8.5M reads aligned)
[2025-11-20 15:31:52] 30% complete (12.7M reads aligned)
...The -f flag follows the file, showing new lines as they are written. Perfect for monitoring long-running jobs in real time. Press Ctrl+C to stop.
Use tail -f to watch log files from running analyses. You can see progress, catch errors early, and know when jobs complete without repeatedly checking.
Show Specific Number of Lines
tail -n 3 gene_counts.txtAT5G67590 2847
AT5G67600 4521
AT5G67610 1203Show just the last 3 lines. Useful to see the end of data files without scrolling through entire output.
Practical Example: Check Pipeline Progress
Monitor Multi-Sample Pipeline
2 stepscat - Concatenate and Display
The cat command displays entire file contents:
cat small_file.txtSample Condition Replicate
Sample_01 Control 1
Sample_02 Control 2
Sample_03 Treatment 1
Sample_04 Treatment 2cat prints the entire file to your terminal. Good for small files like metadata tables.
Never use cat on large files. A 100 GB BAM file will flood your terminal with binary garbage. Use head, tail, or less for large files.
Concatenate Multiple Files
cat file1.txt file2.txt file3.txt > combined.txtcat's original purpose: concatenate files. This combines three files into one. The > redirects output to a new file.
Display with Line Numbers
cat -n gene_list.txt 1 AT1G01010
2 AT1G01020
3 AT1G01030
4 AT1G01040
5 AT1G01050
6 AT1G01060The -n flag adds line numbers. Useful for referencing specific lines in data files.
Practical Example: Combine Sample Files
cat sample_01_counts.txt sample_02_counts.txt sample_03_counts.txt > all_samples_counts.txtCombine count files from multiple samples into one master file for downstream analysis in R or Python.
less - Interactive File Viewer
The less command is the best way to explore large files. It loads only what you're viewing, not the entire file.
less large_alignment.sam@HD VN:1.6 SO:coordinate
@SQ SN:Chr1 LN:30427671
@SQ SN:Chr2 LN:19698289
@SQ SN:Chr3 LN:23459830
@PG ID:STAR PN:STAR VN:2.7.10a
SRR001666.1 0 Chr1 3631 255 36M * 0 0 GGGTGATGGCCG...
SRR001666.2 16 Chr1 3845 255 36M * 0 0 ATCGATCGATCG...
: <-- less shows : prompt at bottomless opens the file in an interactive pager. Use arrow keys to scroll, / to search, q to quit. The file is not loaded entirely into memory.
Essential less Navigation
less Keyboard Commands
Practical Example: Explore BAM File
Investigate Alignment File
3 stepsUse less -S for files with very long lines (SAM files, VCF files, wide tables). This prevents line wrapping and makes columnar data much easier to read.
wc - Word Count
The wc command counts lines, words, and characters:
wc genome.fasta 123456 123456 3456789012 genome.fastaOutput format: lines, words, bytes, filename. This genome has 123,456 lines and is 3.4 GB.
Count Lines Only
wc -l gene_list.txt 27655 gene_list.txtThe -l flag counts only lines. The Arabidopsis genome has 27,655 genes in this annotation.
Count Sequences in FASTA
grep -c '^>' sequences.fasta5432Count FASTA sequences by counting header lines (starting with >). This file contains 5,432 sequences.
Count Reads in FASTQ
echo $(( $(wc -l < reads.fastq) / 4 ))52345678FASTQ files have 4 lines per read. Divide line count by 4 to get read count. This file has 52 million reads.
Count Multiple Files
wc -l *.txt 1245 sample_01_counts.txt
1245 sample_02_counts.txt
1245 sample_03_counts.txt
3735 totalCount lines in multiple files. wc automatically shows individual counts and a total.
Practical Workflows
Workflow 1: Validate Download
Verify Downloaded Sequencing Data
4 stepsWorkflow 2: Quick QC Check
Fast Quality Assessment
4 stepsWorkflow 3: Compare Sample Depths
Check Sequencing Depth Across Samples
2 stepsCombining Commands with Pipes
The real power comes from chaining these tools together:
head -n 1000 large_file.txt | tail -n 10Line 991
Line 992
Line 993
Line 994
Line 995
Line 996
Line 997
Line 998
Line 999
Line 1000Get lines 991-1000. First take the first 1000 lines, then take the last 10 of those.
cat *.txt | wc -l 125678Combine all text files and count total lines across all of them.
grep '^>' sequences.fasta | head -n 5>AT1G01010.1 | NAC001 | NAC domain protein
>AT1G01020.1 | ARV1 | ARV1 family protein
>AT1G01030.1 | NGA3 | AP2 domain protein
>AT1G01040.1 | DCL1 | Dicer-like protein
>AT1G01050.1 | PPA1 | Protein phosphatase 2AExtract FASTA headers (lines starting with >) and show the first 5.
Pipes (|) send output from one command as input to the next. This is fundamental to UNIX philosophy: combine simple tools to solve complex problems.
Binary Files
Some bioinformatics formats are binary (BAM, BCF, compressed files). You cannot view them directly.
head alignments.bamBAM☻↑☺☺☺À@HD VN:1.6 SO:coordinate@SQ SN:Chr1 LN:30427671@SQ...Binary files show garbage characters. Never use regular viewing commands on binary formats.
Solution: Use format-specific tools:
Viewing Binary Bioinformatics Files
Always pipe binary-to-text converters through less or head. Never pipe directly to your terminal or you'll get screens of garbage characters.
Quick Reference
File Viewing Commands Cheat Sheet
Best Practices
- Use less for large files: Never cat a multi-GB file
- Check compressed files without extracting: Use gunzip -c or zcat with pipes
- Monitor long jobs: Use tail -f on log files
- Verify downloads: head and tail to check file format and completeness
- Count before processing: wc -l to know dataset size
- Use format-specific tools: samtools, bcftools for binary formats
- Preview before full run: head -n 1000 to test pipelines on small data
Practice Exercises
Practice file viewing commands interactively
Try these exercises on evomics-learn:
- Explore FASTA files with head and tail
- Count sequences in genomics files
- Monitor a simulated analysis log with tail -f
- Use less to search through annotation files
- Combine commands with pipes
Next Steps
You now have the fundamental skills for terminal navigation and file manipulation. The next major section covers text processing - the real power of the command line for bioinformatics.
You'll learn:
- grep - Search for patterns in files
- sed - Transform and edit text streams
- awk - Process structured data
- cut, sort, uniq - Extract and organize data
These tools let you process genomics files without writing programs, directly from the command line.
Further Reading
- Text Processing with grep - Next topic
- Less Manual
- GNU Coreutils - wc