Evomics Docs
UNIX for Biologists/Writing Shell Scripts

Writing Shell Scripts

A shell script is a file containing commands that run sequentially. Instead of typing the same commands repeatedly, write them once in a script and execute it whenever needed.

Shell scripts are the bridge between individual commands and complex bioinformatics pipelines. A well-written script is self-documenting, reusable, and handles errors gracefully.

Your First Script

Create a file called hello.sh:

hello.sh

1#!/bin/bash
2# My first shell script
3
4echo "Hello from shell script!"
5echo "Current directory: $(pwd)"
6echo "Today is: $(date)"
Format Details
1
Shebang: Tells system to use bash
2
Comment: Documentation for humans
4
Commands: Regular commands, one per line

Make it Executable

InputSuccess
chmod +x hello.sh

Make the script executable. +x adds execute permission.

Run the Script

InputSuccess
./hello.sh
Output
Hello from shell script!
Current directory: /home/user/analysis
Today is: Wed Nov 20 10:30:00 EST 2024

Execute the script. ./ means 'current directory'.

Why ./hello.sh and not just hello.sh?

For security, the current directory (.) is not in your PATH. You must explicitly specify ./ to run scripts in the current directory. This prevents accidentally running malicious scripts.

The Shebang Line

The first line tells the system which interpreter to use.

InputSuccess
head -n 1 analyze.sh
Output
#!/bin/bash

Shebang line. Always first line, no spaces before #!

Common shebangs:

  • #!/bin/bash - Use bash (most common for bioinformatics)
  • #!/bin/sh - Use basic sh (more portable, fewer features)
  • #!/usr/bin/env bash - Find bash in PATH (works across systems)

Use #!/usr/bin/env bash for maximum portability. This finds bash wherever it's installed, rather than assuming it's in /bin/bash.

Script Arguments

Access command-line arguments with special variables:

greet.sh

1#!/bin/bash
2# Demonstrate script arguments
3
4echo "Script name: $0"
5echo "First argument: $1"
6echo "Second argument: $2"
7echo "All arguments: $@"
8echo "Number of arguments: $#"
Format Details
4
$0: Script name
5
$1, $2: Individual arguments
7
$@: All arguments as separate words
8
$#: Count of arguments
InputSuccess
./greet.sh Alice Bob
Output
Script name: ./greet.sh
First argument: Alice
Second argument: Bob
All arguments: Alice Bob
Number of arguments: 2

Arguments are separated by spaces. $1 is Alice, $2 is Bob.

Checking Arguments

Always validate that required arguments are provided:

process_sample.sh

1#!/bin/bash
2# Process a sample with error checking
3
4if [ $# -ne 1 ]; then
5echo "Usage: $0 <sample_id>"
6echo "Example: $0 Sample_01"
7exit 1
8fi
9
10sample_id=$1
11echo "Processing sample: $sample_id"
12# ... rest of analysis
Format Details
4
Check count: Verify exactly 1 argument
5
Usage message: Help user understand syntax
7
Exit error: Exit with non-zero code for error
10
Use argument: Store in descriptive variable
InputSuccess
./process_sample.sh
Output
Usage: ./process_sample.sh <sample_id>
Example: ./process_sample.sh Sample_01

Script rejects missing arguments with helpful message.

Exit Codes

Scripts return exit codes: 0 for success, non-zero for errors.

InputSuccess
./successful_script.sh
echo $?
Output
0

The special variable $? contains the exit code of the last command. 0 means success.

check_file.sh

1#!/bin/bash
2# Check if required file exists
3
4if [ ! -f "$1" ]; then
5echo "ERROR: File not found: $1"
6exit 1
7fi
8
9echo "File exists: $1"
10exit 0
Format Details
4
Test existence: ! negates the test
6
Exit with error: 1 indicates failure
10
Exit success: 0 indicates success
InputSuccess
./check_file.sh missing.txt
echo "Exit code: $?"
Output
ERROR: File not found: missing.txt
Exit code: 1

Non-zero exit code signals error to calling program.

Using Exit Codes in Workflows

InputSuccess
if ./check_file.sh sequences.fasta; then
  echo "File OK, proceeding with analysis"
else
  echo "Cannot proceed - fix errors first"
fi
Output
File exists: sequences.fasta
File OK, proceeding with analysis

Use script exit codes in conditionals. Enables chaining scripts.

Practical Bioinformatics Scripts

Count Reads in FASTQ

count_reads.sh

1#!/bin/bash
2# Count reads in a FASTQ file
3# Usage: ./count_reads.sh <fastq_file>
4
5if [ $# -ne 1 ]; then
6echo "Usage: $0 <fastq_file>"
7exit 1
8fi
9
10fastq_file=$1
11
12# Check file exists
13if [ ! -f "$fastq_file" ]; then
14echo "ERROR: File not found: $fastq_file"
15exit 1
16fi
17
18# Count lines and divide by 4
19lines=$(wc -l < "$fastq_file")
20reads=$((lines / 4))
21
22echo "$fastq_file: $reads reads"
Format Details
2
Description: Explain what script does
3
Usage doc: Show how to run it
5
Validate args: Check argument count
13
Validate input: Check file exists
19
Calculate: FASTQ has 4 lines per read
Input0.8sSuccess
chmod +x count_reads.sh\n./count_reads.sh sample.fastq
Output
sample.fastq: 1234567 reads

Reusable script for any FASTQ file.

Extract Gene Sequences

extract_genes.sh

1#!/bin/bash
2# Extract specific genes from FASTA by ID list
3# Usage: ./extract_genes.sh <fasta> <gene_list>
4
5if [ $# -ne 2 ]; then
6echo "Usage: $0 <fasta_file> <gene_list>"
7exit 1
8fi
9
10fasta=$1
11gene_list=$2
12
13# Validate inputs
14if [ ! -f "$fasta" ]; then
15echo "ERROR: FASTA file not found: $fasta"
16exit 1
17fi
18
19if [ ! -f "$gene_list" ]; then
20echo "ERROR: Gene list not found: $gene_list"
21exit 1
22fi
23
24# Extract sequences
25output="extracted_genes.fasta"
26> "$output" # Create empty file
27
28while read -r gene_id; do
29# Extract header and sequence
30awk -v id="$gene_id" '
31 $0 ~ "^>"id {print; getline; print}
32' "$fasta" >> "$output"
33done < "$gene_list"
34
35echo "Extracted $(grep -c "^>" "$output") sequences to $output"
Format Details
14
Input validation: Check both files exist
26
Create output: > creates empty file
28
Read list: Process each gene ID
30
Extract seq: Use awk to find and extract
35
Report results: Count extracted sequences
InputSuccess
cat genes_of_interest.txt
Output
AT1G01010
AT1G01020
AT1G01030

Gene list file - one ID per line.

Input1.2sSuccess
./extract_genes.sh genome.fasta genes_of_interest.txt
Output
Extracted 3 sequences to extracted_genes.fasta

Extract specific genes from genome FASTA.

Script Structure Best Practices

Well-Structured Script Template

1#!/bin/bash
2# Script: analyze_sample.sh
3# Description: Quality check and basic stats for FASTQ
4# Author: Your Name
5# Date: 2024-11-20
6# Usage: ./analyze_sample.sh <sample.fastq>
7
8# Exit on any error
9set -e
10set -u # Exit on undefined variable
11set -o pipefail # Catch errors in pipes
12
13# Validate arguments
14if [ $# -ne 1 ]; then
15echo "Usage: $0 <fastq_file>"
16exit 1
17fi
18
19# Store arguments in named variables
20fastq_file=$1
21
22# Check inputs exist
23if [ ! -f "$fastq_file" ]; then
24echo "ERROR: File not found: $fastq_file"
25exit 1
26fi
27
28# Main analysis
29echo "Analyzing $fastq_file..."
30
31total_reads=$(wc -l < "$fastq_file")
32reads=$((total_reads / 4))
33
34echo "Total reads: $reads"
35echo "Analysis complete"
36
37exit 0
Format Details
2
Header: Document the script
9
Safety flags: Exit on errors
14
Validation: Check all inputs
19
Named vars: Use descriptive names
28
Main logic: Core functionality
Safety Flags

set -e - Exit immediately if any command fails set -u - Exit if using undefined variable set -o pipefail - Exit if any command in a pipeline fails

These prevent silent errors and make scripts more robust. Add them after the shebang line.

Looping Over Multiple Files

batch_process.sh

1#!/bin/bash
2# Process all FASTQ files in directory
3
4set -e
5
6for fastq in *.fastq; do
7# Skip if no files match
8[ -f "$fastq" ] || continue
9
10echo "Processing $fastq..."
11
12# Count reads
13lines=$(wc -l < "$fastq")
14reads=$((lines / 4))
15
16# Write to summary
17echo "$fastq $reads" >> read_counts.txt
18done
19
20echo "Summary written to read_counts.txt"
Format Details
6
Loop files: Process all FASTQ in directory
8
Skip non-match: Handle case of no *.fastq files
13
Calculate: Count reads per file
17
Append results: >> appends to file
Input3.5sSuccess
./batch_process.sh
Output
Processing Sample_01.fastq...
Processing Sample_02.fastq...
Processing Sample_03.fastq...
Summary written to read_counts.txt

Process all FASTQ files automatically.

InputSuccess
cat read_counts.txt
Output
Sample_01.fastq	1234567
Sample_02.fastq	2345678
Sample_03.fastq	987654

Results compiled in tab-separated file.

Processing Paired-End Reads

process_pairs.sh

1#!/bin/bash
2# Process paired-end FASTQ files
3
4set -e
5
6for r1 in *_R1.fastq; do
7# Skip if no R1 files
8[ -f "$r1" ] || continue
9
10# Derive R2 and sample names
11r2=${r1/_R1.fastq/_R2.fastq}
12sample=${r1/_R1.fastq/}
13
14# Check R2 exists
15if [ ! -f "$r2" ]; then
16 echo "WARNING: Missing R2 for $sample, skipping"
17 continue
18fi
19
20echo "Processing $sample..."
21echo " R1: $r1"
22echo " R2: $r2"
23
24# Your analysis commands here
25# e.g., alignment, quality control, etc.
26
27done
28
29echo "Paired-end processing complete"
Format Details
11
Find R2: Replace _R1 with _R2 in filename
12
Sample ID: Remove _R1.fastq suffix
15
Verify pair: Check both files exist
24
Analysis: Add your pipeline here
InputSuccess
./process_pairs.sh
Output
Processing Sample_01...
R1: Sample_01_R1.fastq
R2: Sample_01_R2.fastq
Processing Sample_02...
R1: Sample_02_R1.fastq
R2: Sample_02_R2.fastq
Paired-end processing complete

Automatically match R1/R2 pairs.

Script Arguments vs Hard-Coded Values

Flexible vs Rigid Scripts

3 steps
# Bad: Hard-coded
#!/bin/bash
fastqc Sample_01.fastq

Debugging Scripts

InputSuccess
bash -x script.sh
Output
+ echo 'Starting analysis'
Starting analysis
+ sample=Sample_01
+ echo 'Processing Sample_01'

-x shows each command before executing. Invaluable for debugging.

InputSuccess
bash -n script.sh

-n checks syntax without running. Catches typos before execution.

Add debug output to your scripts:

Script with Debug Mode

1#!/bin/bash
2# Enable debug output with: DEBUG=1 ./script.sh
3
4if [ "$DEBUG" = "1" ]; then
5set -x # Enable command tracing
6fi
7
8sample=$1
9echo "Processing $sample"
Format Details
4
Debug check: Only enable if DEBUG set
5
Trace mode: Show commands when DEBUG=1
InputSuccess
DEBUG=1 ./script.sh Sample_01
Output
+ sample=Sample_01
+ echo 'Processing Sample_01'
Processing Sample_01

Set DEBUG=1 to see detailed execution.

Quick Reference

Script Basics

#!/bin/bash # Shebang line chmod +x script.sh # Make executable ./script.sh # Run script bash script.sh # Run with bash explicitly

Arguments

$0 # Script name $1, $2, $3 # Individual arguments $@ # All arguments $# # Number of arguments "$@" # All arguments (proper quoting)

Exit and Error Handling

exit 0 # Exit success exit 1 # Exit with error $? # Last command's exit code set -e # Exit on error set -u # Exit on undefined variable set -o pipefail # Catch pipe errors

Script Structure

#!/bin/bash # Description and usage set -e -u -o pipefail # Validate arguments if [ $# -ne 1 ]; then echo "Usage: $0 <arg>" exit 1 fi # Check inputs if [ ! -f "$1" ]; then echo "ERROR: File not found" exit 1 fi # Main logic here exit 0

Best Practices Summary

Script Writing Checklist
  1. Always include shebang - #!/bin/bash on line 1
  2. Add usage documentation - Comments explaining what script does
  3. Validate all arguments - Check count and types
  4. Check file existence - Before reading/writing
  5. Use descriptive variable names - sample_id not s
  6. Quote variables - "$var" prevents word splitting
  7. Exit with proper codes - 0 for success, 1+ for errors
  8. Add safety flags - set -e, set -u, set -o pipefail
  9. Make scripts executable - chmod +x script.sh
  10. Test with edge cases - Missing files, empty inputs, etc.

Next Steps

You can now write scripts that:

  • Accept arguments for flexibility
  • Validate inputs before processing
  • Handle errors gracefully
  • Process multiple files automatically

The next page covers functions (reusable code blocks within scripts) and advanced debugging techniques to make your scripts even more robust and maintainable.

Further Reading