Writing Shell Scripts
A shell script is a file containing commands that run sequentially. Instead of typing the same commands repeatedly, write them once in a script and execute it whenever needed.
Shell scripts are the bridge between individual commands and complex bioinformatics pipelines. A well-written script is self-documenting, reusable, and handles errors gracefully.
Your First Script
Create a file called hello.sh:
hello.sh
Make it Executable
chmod +x hello.shMake the script executable. +x adds execute permission.
Run the Script
./hello.shHello from shell script!
Current directory: /home/user/analysis
Today is: Wed Nov 20 10:30:00 EST 2024Execute the script. ./ means 'current directory'.
Why ./hello.sh and not just hello.sh?
For security, the current directory (.) is not in your PATH. You must explicitly specify ./ to run scripts in the current directory. This prevents accidentally running malicious scripts.
The Shebang Line
The first line tells the system which interpreter to use.
head -n 1 analyze.sh#!/bin/bashShebang line. Always first line, no spaces before #!
Common shebangs:
#!/bin/bash- Use bash (most common for bioinformatics)#!/bin/sh- Use basic sh (more portable, fewer features)#!/usr/bin/env bash- Find bash in PATH (works across systems)
Use #!/usr/bin/env bash for maximum portability. This finds bash wherever it's installed, rather than assuming it's in /bin/bash.
Script Arguments
Access command-line arguments with special variables:
greet.sh
./greet.sh Alice BobScript name: ./greet.sh
First argument: Alice
Second argument: Bob
All arguments: Alice Bob
Number of arguments: 2Arguments are separated by spaces. $1 is Alice, $2 is Bob.
Checking Arguments
Always validate that required arguments are provided:
process_sample.sh
./process_sample.shUsage: ./process_sample.sh <sample_id>
Example: ./process_sample.sh Sample_01Script rejects missing arguments with helpful message.
Exit Codes
Scripts return exit codes: 0 for success, non-zero for errors.
./successful_script.sh
echo $?0The special variable $? contains the exit code of the last command. 0 means success.
check_file.sh
./check_file.sh missing.txt
echo "Exit code: $?"ERROR: File not found: missing.txt
Exit code: 1Non-zero exit code signals error to calling program.
Using Exit Codes in Workflows
if ./check_file.sh sequences.fasta; then
echo "File OK, proceeding with analysis"
else
echo "Cannot proceed - fix errors first"
fiFile exists: sequences.fasta
File OK, proceeding with analysisUse script exit codes in conditionals. Enables chaining scripts.
Practical Bioinformatics Scripts
Count Reads in FASTQ
count_reads.sh
chmod +x count_reads.sh\n./count_reads.sh sample.fastqsample.fastq: 1234567 readsReusable script for any FASTQ file.
Extract Gene Sequences
extract_genes.sh
cat genes_of_interest.txtAT1G01010
AT1G01020
AT1G01030Gene list file - one ID per line.
./extract_genes.sh genome.fasta genes_of_interest.txtExtracted 3 sequences to extracted_genes.fastaExtract specific genes from genome FASTA.
Script Structure Best Practices
Well-Structured Script Template
set -e - Exit immediately if any command fails
set -u - Exit if using undefined variable
set -o pipefail - Exit if any command in a pipeline fails
These prevent silent errors and make scripts more robust. Add them after the shebang line.
Looping Over Multiple Files
batch_process.sh
./batch_process.shProcessing Sample_01.fastq...
Processing Sample_02.fastq...
Processing Sample_03.fastq...
Summary written to read_counts.txtProcess all FASTQ files automatically.
cat read_counts.txtSample_01.fastq 1234567
Sample_02.fastq 2345678
Sample_03.fastq 987654Results compiled in tab-separated file.
Processing Paired-End Reads
process_pairs.sh
./process_pairs.shProcessing Sample_01...
R1: Sample_01_R1.fastq
R2: Sample_01_R2.fastq
Processing Sample_02...
R1: Sample_02_R1.fastq
R2: Sample_02_R2.fastq
Paired-end processing completeAutomatically match R1/R2 pairs.
Script Arguments vs Hard-Coded Values
Flexible vs Rigid Scripts
3 stepsDebugging Scripts
bash -x script.sh+ echo 'Starting analysis'
Starting analysis
+ sample=Sample_01
+ echo 'Processing Sample_01'-x shows each command before executing. Invaluable for debugging.
bash -n script.sh-n checks syntax without running. Catches typos before execution.
Add debug output to your scripts:
Script with Debug Mode
DEBUG=1 ./script.sh Sample_01+ sample=Sample_01
+ echo 'Processing Sample_01'
Processing Sample_01Set DEBUG=1 to see detailed execution.
Quick Reference
Script Basics
#!/bin/bash # Shebang line
chmod +x script.sh # Make executable
./script.sh # Run script
bash script.sh # Run with bash explicitlyArguments
$0 # Script name
$1, $2, $3 # Individual arguments
$@ # All arguments
$# # Number of arguments
"$@" # All arguments (proper quoting)Exit and Error Handling
exit 0 # Exit success
exit 1 # Exit with error
$? # Last command's exit code
set -e # Exit on error
set -u # Exit on undefined variable
set -o pipefail # Catch pipe errorsScript Structure
#!/bin/bash
# Description and usage
set -e -u -o pipefail
# Validate arguments
if [ $# -ne 1 ]; then
echo "Usage: $0 <arg>"
exit 1
fi
# Check inputs
if [ ! -f "$1" ]; then
echo "ERROR: File not found"
exit 1
fi
# Main logic here
exit 0Best Practices Summary
- Always include shebang -
#!/bin/bashon line 1 - Add usage documentation - Comments explaining what script does
- Validate all arguments - Check count and types
- Check file existence - Before reading/writing
- Use descriptive variable names -
sample_idnots - Quote variables -
"$var"prevents word splitting - Exit with proper codes - 0 for success, 1+ for errors
- Add safety flags -
set -e,set -u,set -o pipefail - Make scripts executable -
chmod +x script.sh - Test with edge cases - Missing files, empty inputs, etc.
Next Steps
You can now write scripts that:
- Accept arguments for flexibility
- Validate inputs before processing
- Handle errors gracefully
- Process multiple files automatically
The next page covers functions (reusable code blocks within scripts) and advanced debugging techniques to make your scripts even more robust and maintainable.