Viewing File Contents

Bioinformatics files are often enormous - a single FASTQ file can contain hundreds of millions of lines. You cannot open these in a text editor. Instead, you use command-line tools to peek at contents, search through them, and extract specific sections.

These tools read files without loading the entire file into memory. This lets you examine 100 GB files on a laptop with 8 GB of RAM.

head - View the Beginning

The head command shows the first few lines of a file:

Input0.03sSuccess

head sequences.fasta

Output

>AT1G01010.1 | NAC001 | NAC domain protein | chr1:3631-5899 REVERSE
ATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTAT
CTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTA
GCTCACGGCTTTGTCGGGCAGATCATTGAGCTAGTAGGAGGTTTCACGGGCATCAACCAA
>AT1G01020.1 | ARV1 | ARV1 family protein | chr1:6788-9130 FORWARD
ATGAACACGAAGGACCACCAGATCACCCAAGTACCACCGCCCCACCTCTCTTCCCACCAA
AATCCAATCAAAACGATAGTTTCTCCAACCAACCCATCTCCAACAACTTTAACTTCTTCT
>AT1G01030.1 | NGA3 | AP2 domain protein | chr1:11649-13714 FORWARD
ATGGCGGATGCTTCACCTTCTTCTCCCCTCGCCGCCTTCCTCTCCAACTCCTTCAGAGAG
GTCGGCGGTGGCGGTGGTGGCTTTCTCTCCGAAGGTGCCGGTGCCGGAGCTCCTCCTCCA

By default, head shows the first 10 lines. Perfect for checking file format before processing.

Specify Number of Lines

Input0.02sSuccess

head -n 4 sequences.fasta

Output

>AT1G01010.1 | NAC001 | NAC domain protein | chr1:3631-5899 REVERSE
ATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTAT
CTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTA
GCTCACGGCTTTGTCGGGCAGATCATTGAGCTAGTAGGAGGTTTCACGGGCATCAACCAA

The -n flag specifies exactly how many lines. Shows just the first sequence from the FASTA file.

Use head -n 4 on FASTQ files to see exactly one complete read (4 lines per read).

Practical Example: Check FASTQ Quality

Input0.02sSuccess

head -n 4 sample.fastq

Output

36 bp read lengthPhred 40 (high) quality

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

One complete FASTQ record. The quality line shows mostly 'I' (Phred 40), indicating high-quality sequencing.

Check Multiple Files

Input0.05sSuccess

head -n 1 *.fasta

Output

==> genome_chr1.fasta <==
>Chr1 CHROMOSOME dumped from ADB: Jun/20/09 14:53:46

==> genome_chr2.fasta <==
>Chr2 CHROMOSOME dumped from ADB: Jun/20/09 14:53:52

==> genome_chr3.fasta <==
>Chr3 CHROMOSOME dumped from ADB: Jun/20/09 14:53:57

Check the headers of multiple FASTA files at once. head automatically shows which file each output comes from.

tail - View the End

The tail command shows the last few lines:

Input0.02sSuccess

tail analysis.log

Output

[2025-11-20 14:23:15] Alignment phase complete
[2025-11-20 14:23:18] Starting post-alignment QC
[2025-11-20 14:28:42] QC checks passed
[2025-11-20 14:28:45] Writing output files
[2025-11-20 14:32:10] Analysis complete
[2025-11-20 14:32:10] Total runtime: 2 hours 15 minutes
[2025-11-20 14:32:10] Output: aligned_sorted.bam
[2025-11-20 14:32:10] Exit status: SUCCESS

Check the end of a log file to see if analysis completed successfully. Default shows last 10 lines.

Follow Growing Files

InputSuccess

tail -f alignment.log

Output

[2025-11-20 15:30:12] Aligning sample_01_R1.fastq.gz
[2025-11-20 15:30:45] 10% complete (4.2M reads aligned)
[2025-11-20 15:31:18] 20% complete (8.5M reads aligned)
[2025-11-20 15:31:52] 30% complete (12.7M reads aligned)
...

The -f flag follows the file, showing new lines as they are written. Perfect for monitoring long-running jobs in real time. Press Ctrl+C to stop.

Monitor Running Jobs

Use tail -f to watch log files from running analyses. You can see progress, catch errors early, and know when jobs complete without repeatedly checking.

Show Specific Number of Lines

Input0.02sSuccess

tail -n 3 gene_counts.txt

Output

AT5G67590	2847
AT5G67600	4521
AT5G67610	1203

Show just the last 3 lines. Useful to see the end of data files without scrolling through entire output.

Practical Example: Check Pipeline Progress

Monitor Multi-Sample Pipeline

2 steps

tail -n 1 logs/sample_*.log

Output

==> logs/sample_01.log <==
[2025-11-20 15:45:23] Alignment complete (45.2M reads, 92.3% mapped)

==> logs/sample_02.log <==
[2025-11-20 15:50:12] Alignment complete (48.7M reads, 91.8% mapped)

==> logs/sample_03.log <==
[2025-11-20 15:32:18] ERROR: Out of memory during alignment

cat - Concatenate and Display

The cat command displays entire file contents:

Input0.01sSuccess

cat small_file.txt

Output

Sample	Condition	Replicate
Sample_01	Control	1
Sample_02	Control	2
Sample_03	Treatment	1
Sample_04	Treatment	2

cat prints the entire file to your terminal. Good for small files like metadata tables.

Never use cat on large files. A 100 GB BAM file will flood your terminal with binary garbage. Use head, tail, or less for large files.

Concatenate Multiple Files

InputSuccess

cat file1.txt file2.txt file3.txt > combined.txt

cat's original purpose: concatenate files. This combines three files into one. The > redirects output to a new file.

Display with Line Numbers

Input0.01sSuccess

cat -n gene_list.txt

Output

     1	AT1G01010
2	AT1G01020
3	AT1G01030
4	AT1G01040
5	AT1G01050
6	AT1G01060

The -n flag adds line numbers. Useful for referencing specific lines in data files.

Practical Example: Combine Sample Files

InputSuccess

cat sample_01_counts.txt sample_02_counts.txt sample_03_counts.txt > all_samples_counts.txt

Combine count files from multiple samples into one master file for downstream analysis in R or Python.

less - Interactive File Viewer

The less command is the best way to explore large files. It loads only what you're viewing, not the entire file.

InputSuccess

less large_alignment.sam

Output

@HD	VN:1.6	SO:coordinate
@SQ	SN:Chr1	LN:30427671
@SQ	SN:Chr2	LN:19698289
@SQ	SN:Chr3	LN:23459830
@PG	ID:STAR	PN:STAR	VN:2.7.10a
SRR001666.1	0	Chr1	3631	255	36M	*	0	0	GGGTGATGGCCG...
SRR001666.2	16	Chr1	3845	255	36M	*	0	0	ATCGATCGATCG...
:                                    <-- less shows : prompt at bottom

less opens the file in an interactive pager. Use arrow keys to scroll, / to search, q to quit. The file is not loaded entirely into memory.

less Keyboard Commands

1# Navigation

2Space # Next page

3b # Previous page

4↓ or j # Down one line

5↑ or k # Up one line

6G # Jump to end of file

7g # Jump to beginning of file

850G # Jump to line 50

10# Search

11/pattern # Search forward for pattern

12?pattern # Search backward for pattern

13n # Next match

14N # Previous match

16# Display

17-N # Show line numbers

18-S # Disable line wrapping (useful for wide data)

20# Quit

21q # Exit less

Format Details

Navigation: Move through the file

Search: Find specific content within the file

Display Options: Change how content is displayed

Exit: Quit the viewer

Practical Example: Explore BAM File

Investigate Alignment File

3 steps

samtools view -h alignments.bam | less -S

Output

Opens interactive viewer showing SAM format with each alignment on one line

Use less -S for files with very long lines (SAM files, VCF files, wide tables). This prevents line wrapping and makes columnar data much easier to read.

wc - Word Count

The wc command counts lines, words, and characters:

Input1.2sSuccess

wc genome.fasta

Output

  123456   123456  3456789012 genome.fasta

Output format: lines, words, bytes, filename. This genome has 123,456 lines and is 3.4 GB.

Count Lines Only

Input0.02sSuccess

wc -l gene_list.txt

Output

27,655 genes

  27655 gene_list.txt

The -l flag counts only lines. The Arabidopsis genome has 27,655 genes in this annotation.

Count Sequences in FASTA

Input0.15sSuccess

grep -c '^>' sequences.fasta

Output

5,432 sequences

Count FASTA sequences by counting header lines (starting with >). This file contains 5,432 sequences.

Count Reads in FASTQ

Input2.3sSuccess

echo $(( $(wc -l < reads.fastq) / 4 ))

Output

52,345,678 reads

52345678

FASTQ files have 4 lines per read. Divide line count by 4 to get read count. This file has 52 million reads.

Count Multiple Files

Input0.05sSuccess

wc -l *.txt

Output

   1245 sample_01_counts.txt
1245 sample_02_counts.txt
1245 sample_03_counts.txt
3735 total

Count lines in multiple files. wc automatically shows individual counts and a total.

Practical Workflows

Workflow 1: Validate Download

Verify Downloaded Sequencing Data

4 steps

ls -lh sample.fastq.gz

Output

-rw-r--r-- 1 user group 2.3G Nov 20 14:23 sample.fastq.gz

Workflow 2: Quick QC Check

Fast Quality Assessment

4 steps

head -n 4 sample.fastq

Output

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTATCCAGCCTGGAAGATGGCGACGCAGACCGACGCG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

Workflow 3: Compare Sample Depths

Check Sequencing Depth Across Samples

2 steps

for file in *.fastq.gz; do echo -n "$file: "; gunzip -c $file | wc -l | awk '{print $1/4}'; done

Output

sample_01.fastq.gz: 52345678
sample_02.fastq.gz: 48234567
sample_03.fastq.gz: 51123456
sample_04.fastq.gz: 49876543

Combining Commands with Pipes

The real power comes from chaining these tools together:

Input0.03sSuccess

head -n 1000 large_file.txt | tail -n 10

Output

Line 991
Line 992
Line 993
Line 994
Line 995
Line 996
Line 997
Line 998
Line 999
Line 1000

Get lines 991-1000. First take the first 1000 lines, then take the last 10 of those.

Input0.15sSuccess

cat *.txt | wc -l

Output

Combine all text files and count total lines across all of them.

Input0.05sSuccess

grep '^>' sequences.fasta | head -n 5

Output

>AT1G01010.1 | NAC001 | NAC domain protein
>AT1G01020.1 | ARV1 | ARV1 family protein
>AT1G01030.1 | NGA3 | AP2 domain protein
>AT1G01040.1 | DCL1 | Dicer-like protein
>AT1G01050.1 | PPA1 | Protein phosphatase 2A

Extract FASTA headers (lines starting with >) and show the first 5.

Pipes (|) send output from one command as input to the next. This is fundamental to UNIX philosophy: combine simple tools to solve complex problems.

Binary Files

Some bioinformatics formats are binary (BAM, BCF, compressed files). You cannot view them directly.

InputSuccess

head alignments.bam

Output

BAM☻↑☺☺☺À@HD VN:1.6 SO:coordinate@SQ SN:Chr1 LN:30427671@SQ...

Binary files show garbage characters. Never use regular viewing commands on binary formats.

Solution: Use format-specific tools:

Viewing Binary Bioinformatics Files

1# BAM files (binary alignment)

2samtools view -h file.bam | less

4# BCF files (binary VCF)

5bcftools view file.bcf | less

7# Compressed files (.gz)

8gunzip -c file.fastq.gz | head

9zcat file.fastq.gz | head # alternative

10zless file.fastq.gz # interactive viewing

12# HDF5 files

13h5dump file.h5 | less

Format Details

BAM: Use samtools to convert to readable SAM format

BCF: Use bcftools to convert to readable VCF format

Compressed: Use gunzip -c or zcat to decompress on-the-fly

HDF5: Use h5dump to convert to text representation

Always pipe binary-to-text converters through less or head. Never pipe directly to your terminal or you'll get screens of garbage characters.

Quick Reference

File Viewing Commands Cheat Sheet

1# View beginning of file

2head file.txt # First 10 lines

3head -n 20 file.txt # First 20 lines

4head -n 4 file.fastq # First FASTQ record

6# View end of file

7tail file.txt # Last 10 lines

8tail -n 20 file.txt # Last 20 lines

9tail -f running.log # Follow growing file (Ctrl+C to stop)

11# Display entire file

12cat file.txt # Print entire file (small files only!)

13cat -n file.txt # Print with line numbers

14cat file1 file2 > merged # Concatenate files

16# Interactive viewing

17less file.txt # Interactive pager (recommended for large files)

18less -S file.txt # No line wrapping

19less -N file.txt # Show line numbers

21# Count lines/words/characters

22wc file.txt # Lines, words, bytes

23wc -l file.txt # Lines only

24wc -w file.txt # Words only

25wc -c file.txt # Bytes only

27# Compressed files

28gunzip -c file.gz | head # View compressed file without decompressing

29zcat file.gz | less # Interactive viewing of compressed file

30zless file.gz # Alternative compressed viewer

32# Binary bioinformatics files

33samtools view file.bam | less # View BAM file

34bcftools view file.bcf | less # View BCF file

Format Details

head: Quick peek at start of file

tail: Check end of file or monitor logs

cat: Display or concatenate small files

less: Interactive viewing of large files

wc: Count lines, words, or characters

Compressed: View compressed files without extracting

Binary: Format-specific tools for binary files

Best Practices

File Viewing Best Practices

Use less for large files: Never cat a multi-GB file
Check compressed files without extracting: Use gunzip -c or zcat with pipes
Monitor long jobs: Use tail -f on log files
Verify downloads: head and tail to check file format and completeness
Count before processing: wc -l to know dataset size
Use format-specific tools: samtools, bcftools for binary formats
Preview before full run: head -n 1000 to test pipelines on small data

Practice Exercises

Practice in evomics-learn

Practice file viewing commands interactively

Try these exercises on evomics-learn:

Explore FASTA files with head and tail
Count sequences in genomics files
Monitor a simulated analysis log with tail -f
Use less to search through annotation files
Combine commands with pipes

Next Steps

You now have the fundamental skills for terminal navigation and file manipulation. The next major section covers text processing - the real power of the command line for bioinformatics.

You'll learn:

grep - Search for patterns in files
sed - Transform and edit text streams
awk - Process structured data
cut, sort, uniq - Extract and organize data

These tools let you process genomics files without writing programs, directly from the command line.

Viewing File Contents

head - View the Beginning

Specify Number of Lines

Practical Example: Check FASTQ Quality

Check Multiple Files

tail - View the End

Follow Growing Files

Show Specific Number of Lines

Practical Example: Check Pipeline Progress

Monitor Multi-Sample Pipeline

cat - Concatenate and Display

Concatenate Multiple Files

Display with Line Numbers

Practical Example: Combine Sample Files

less - Interactive File Viewer

Essential less Navigation

less Keyboard Commands

Practical Example: Explore BAM File

Investigate Alignment File

wc - Word Count

Count Lines Only

Count Sequences in FASTA

Count Reads in FASTQ

Count Multiple Files

Practical Workflows

Workflow 1: Validate Download

Verify Downloaded Sequencing Data

Workflow 2: Quick QC Check

Fast Quality Assessment

Workflow 3: Compare Sample Depths

Check Sequencing Depth Across Samples

Combining Commands with Pipes

Binary Files

Viewing Binary Bioinformatics Files

Quick Reference

File Viewing Commands Cheat Sheet

Best Practices

Practice Exercises

Next Steps

Further Reading