Variables and Control Flow
Shell scripting transforms individual commands into automated workflows. Variables store data, conditionals make decisions, and loops process multiple items - essential skills for analyzing genomics datasets.
Think of shell scripting as writing recipes for your data analysis. Instead of manually processing 100 FASTQ files, write a script once and let it run automatically.
Variables
Variables store values you want to reuse. In genomics, this might be file paths, sample names, or parameters.
Creating Variables
sample_id=Sample_01Create a variable. No spaces around the = sign!
Common mistake: sample_id = Sample_01 will fail. Shell interprets this as running a command called sample_id with arguments = and Sample_01. Never use spaces around = when assigning variables.
Using Variables
sample_id=Sample_01
echo $sample_idSample_01Access variable value with $. The $ expands the variable to its value.
sample_id=Sample_01
echo "Processing ${sample_id}_R1.fastq"Processing Sample_01_R1.fastqUse ${variable} syntax when adjacent to other text. Prevents ambiguity.
Quotes Matter
message="Hello World"
echo $messageHello WorldDouble quotes preserve the value. Spaces are kept together.
files="*.fastq"
echo $files*.fastqDouble quotes prevent glob expansion. The * remains literal.
name=Sample_01
echo "Processing $name"Processing Sample_01Variables expand inside double quotes. Allows building strings.
name=Sample_01
echo 'Processing $name'Processing $nameSingle quotes prevent all expansion. The $name stays literal.
Command Substitution
Capture command output into variables:
count=$(wc -l < genes.txt)
echo "Found $count genes"Found 15234 genes$() runs the command and captures its output. Store results for later use.
date=$(date +%Y-%m-%d)
echo "Analysis started on $date"Analysis started on 2024-11-20Capture the current date. Useful for timestamping results.
total=$(awk '{sum += $2} END {print sum}' counts.txt)
echo "Total reads: $total"Total reads: 52345678Store calculation results. Process data and save the answer.
Practical Variable Examples
Process Sample with Variables
4 stepsIf Statements
Make decisions based on conditions. Essential for error checking and conditional processing.
Basic If Syntax
if [ -f sequences.fasta ]; then
echo "File exists"
fiFile existsCheck if file exists before processing. Prevents errors from missing files.
if [ condition ]; then - Start conditional block. Spaces around brackets required!
then - Marks the beginning of commands to run when true.
fi - End of if statement (if backwards).
Important: Spaces are mandatory: [ -f file ] works, [-f file] fails.
File Tests
if [ -f data.txt ]; then
echo "Regular file exists"
fiRegular file exists-f tests for regular file existence. Most common file check.
if [ -d results ]; then
echo "Directory exists"
fiDirectory exists-d tests for directory existence. Check before creating output directories.
if [ -s sequences.fastq ]; then
echo "File exists and is not empty"
fiFile exists and is not empty-s tests file has content. Catch empty output files early.
If-Else
if [ -f results.txt ]; then
echo "Results found"
else
echo "No results - running analysis"
fiNo results - running analysiselse handles the false case. Take different actions based on condition.
If-Elif-Else
quality=35
if [ $quality -ge 40 ]; then
echo "Excellent quality"
elif [ $quality -ge 30 ]; then
echo "Good quality"
else
echo "Poor quality"
fiGood qualityMultiple conditions with elif. Categorize data into bins.
Numeric Comparisons
count=100
if [ $count -gt 50 ]; then
echo "More than 50"
fiMore than 50-gt means greater than. Other operators: -lt (less than), -eq (equal), -ne (not equal), -ge (greater or equal), -le (less or equal).
String Comparisons
filetype=fastq
if [ "$filetype" = "fastq" ]; then
echo "Processing FASTQ file"
fiProcessing FASTQ file= tests string equality. Use quotes around variables to handle empty values safely.
if [ -z "$empty_var" ]; then
echo "Variable is empty"
fiVariable is empty-z tests if string is empty. Check required variables are set.
For Loops
Process multiple items automatically. The foundation of batch processing in bioinformatics.
Loop Over List
for sample in Sample_01 Sample_02 Sample_03; do
echo "Processing $sample"
doneProcessing Sample_01
Processing Sample_02
Processing Sample_03Loop through explicit list. The variable sample takes each value in turn.
Loop Over Files
for file in *.fastq; do
echo "Found: $file"
doneFound: Sample_01_R1.fastq
Found: Sample_01_R2.fastq
Found: Sample_02_R1.fastqLoop through files matching pattern. Process all FASTQs automatically.
If no files match the pattern, the loop will run once with the literal string *.fastq. Always check if files exist first, or use shopt -s nullglob to skip loops when no matches.
Process Multiple Samples
Quality Check All Samples
1 stepArithmetic in Loops
for i in {1..5}; do
echo "Processing batch $i"
doneProcessing batch 1
Processing batch 2
Processing batch 3
Processing batch 4
Processing batch 5{1..5} generates sequence 1 through 5. Useful for numbered samples.
for i in {01..03}; do
echo "Sample_$i"
doneSample_01
Sample_02
Sample_03Use leading zeros for consistent naming. {01..03} preserves padding.
While Loops
Repeat while a condition is true. Useful for reading files line by line.
Basic While Loop
counter=1
while [ $counter -le 3 ]; do
echo "Iteration $counter"
counter=$((counter + 1))
doneIteration 1
Iteration 2
Iteration 3Loop while condition is true. Increment counter each time.
Read File Line by Line
while read -r sample; do
echo "Processing: $sample"
done < samples.txtProcessing: Sample_01
Processing: Sample_02
Processing: Sample_03Read each line into variable sample. Process sample list from file.
Always use read -r - The -r flag prevents backslash interpretation, preserving the exact line content. Without it, paths like C:\data\files would be corrupted.
Process Samples from List
Analyze Samples from File
2 stepsCombining Conditionals and Loops
Real workflows combine these tools:
Quality Filter Multiple Samples
1 stepPractical Bioinformatics Example
Batch Sequence Length Analysis
1 stepCommon Patterns
Check Required Files Exist
required_files="sample.fastq reference.fasta annotation.gff"
for file in $required_files; do
if [ ! -f "$file" ]; then
echo "ERROR: Missing required file: $file"
exit 1
fi
done
echo "All required files present"All required files presentValidate inputs before analysis. Exit early if anything is missing.
Create Output Directories
for dir in results logs temp; do
if [ ! -d "$dir" ]; then
mkdir "$dir"
echo "Created $dir/"
fi
doneCreated results/
Created logs/
Created temp/Ensure output directories exist. Create only if needed.
Process Paired-End Reads
for r1 in *_R1.fastq; do
r2=${r1/_R1.fastq/_R2.fastq}
sample=${r1/_R1.fastq/}
if [ -f "$r2" ]; then
echo "Processing pair: $sample"
echo " R1: $r1"
echo " R2: $r2"
else
echo "WARNING: Missing R2 for $sample"
fi
doneProcessing pair: Sample_01
R1: Sample_01_R1.fastq
R2: Sample_01_R2.fastq
Processing pair: Sample_02
R1: Sample_02_R1.fastq
R2: Sample_02_R2.fastqMatch R1 and R2 files. Use string substitution to find pairs.
Variable Naming Best Practices
- Use descriptive names -
sample_idnots - Lowercase for custom variables -
output_dirnotOUTPUT_DIR - UPPERCASE for environment variables -
PATH,HOME - Underscores for readability -
gene_countnotgenecount - Avoid reserved words - Don't use
test,if,for,whileas variable names
Quick Reference
Variable Operations
var=value # Assign variable (no spaces!)
$var or ${var} # Use variable
"$var" # Expand in double quotes
'$var' # Literal in single quotes
$(command) # Command substitution
${var:-default} # Use default if var is unsetConditionals
if [ condition ]; then # If statement
commands
elif [ condition ]; then # Else if
commands
else # Else
commands
fi # End ifFile Tests
[ -f file ] # File exists
[ -d dir ] # Directory exists
[ -s file ] # File not empty
[ -r file ] # File readable
[ -w file ] # File writable
[ -x file ] # File executableNumeric Comparisons
[ $a -eq $b ] # Equal
[ $a -ne $b ] # Not equal
[ $a -lt $b ] # Less than
[ $a -le $b ] # Less or equal
[ $a -gt $b ] # Greater than
[ $a -ge $b ] # Greater or equalLoops
for var in list; do # For loop
commands
done
while [ condition ]; do # While loop
commands
done
while read -r line; do # Read file
commands
done < fileNext Steps
You now have the building blocks for automation:
- Variables to store data
- Conditionals to make decisions
- Loops to process multiple items
The next page covers turning these commands into reusable shell scripts with proper structure, arguments, and error handling.