Evomics Docs
UNIX for Biologists/Functions and Debugging

Functions and Debugging

Functions organize code into reusable blocks. Debugging techniques help find and fix errors quickly. Together, they transform scripts from fragile one-offs into robust, maintainable tools.

Functions are like mini-scripts within your script. Write once, use many times. If you're copying and pasting code, you probably need a function.

Creating Functions

Functions encapsulate related commands under a single name:

Simple Function

1#!/bin/bash
2
3# Define function
4greet() {
5echo "Hello from function!"
6}
7
8# Call function
9greet
Format Details
4
Function definition: name() { commands }
9
Function call: Just use the name
InputSuccess
./function_demo.sh
Output
Hello from function!

Functions run when called. Define before calling.

Functions with Arguments

Functions access arguments like scripts do ($1, $2, etc.):

Function with Arguments

1#!/bin/bash
2
3count_reads() {
4local fastq=$1
5
6if [ ! -f "$fastq" ]; then
7 echo "ERROR: File not found: $fastq"
8 return 1
9fi
10
11lines=$(wc -l < "$fastq")
12reads=$((lines / 4))
13echo "$fastq: $reads reads"
14}
15
16# Call function with argument
17count_reads "Sample_01.fastq"
18count_reads "Sample_02.fastq"
Format Details
4
Get argument: $1 is first argument to function
4
local: Variable only exists in function
8
return: Exit function with status code
17
Call: Pass arguments like any command
InputSuccess
./count_reads_func.sh
Output
Sample_01.fastq: 1234567 reads
Sample_02.fastq: 2345678 reads

Reuse function for multiple files.

Local vs Global Variables

Variable Scope

1#!/bin/bash
2
3# Global variable
4output_dir="results"
5
6process_sample() {
7# Local variable (only in function)
8local sample=$1
9local temp_file="temp.txt"
10
11echo "Processing $sample"
12echo "Output to: $output_dir" # Can access global
13}
14
15process_sample "Sample_01"
16
17# sample and temp_file don't exist here
18echo "Global output_dir: $output_dir"
Format Details
4
Global: Available everywhere
8
Local: Only in this function
12
Access global: Functions can read globals
17
local gone: Local vars vanish after function
Always Use 'local'

Always declare function variables as local. This prevents accidentally overwriting global variables with the same name. Makes functions safer and more predictable.

Return Values

Functions return exit codes (0-255), not arbitrary values. To return data, use echo:

Returning Values

1#!/bin/bash
2
3# Return data via echo
4get_read_count() {
5local fastq=$1
6local lines=$(wc -l < "$fastq")
7local reads=$((lines / 4))
8echo $reads # Output the value
9}
10
11# Capture returned value
12count=$(get_read_count "sample.fastq")
13echo "Total reads: $count"
14
15# Return status code
16validate_file() {
17local file=$1
18if [ -f "$file" ]; then
19 return 0 # Success
20else
21 return 1 # Failure
22fi
23}
24
25# Use return code in conditional
26if validate_file "data.txt"; then
27echo "File valid"
28else
29echo "File missing"
30fi
Format Details
8
echo value: Output data to stdout
12
Capture output: Save echoed value
19
return code: 0 = success, non-zero = failure
26
Test code: Use function in if statement

Practical Bioinformatics Functions

Check File Format

Validate FASTQ Format

1#!/bin/bash
2
3is_fastq() {
4local file=$1
5
6# Check file exists and not empty
7if [ ! -s "$file" ]; then
8 echo "ERROR: File missing or empty: $file"
9 return 1
10fi
11
12# Check first character is @
13first_char=$(head -c 1 "$file")
14if [ "$first_char" != "@" ]; then
15 echo "ERROR: Not FASTQ format (must start with @)"
16 return 1
17fi
18
19# Check line count is multiple of 4
20lines=$(wc -l < "$file")
21if [ $((lines % 4)) -ne 0 ]; then
22 echo "ERROR: Line count not multiple of 4"
23 return 1
24fi
25
26return 0
27}
28
29# Use validation
30for file in *.fastq; do
31if is_fastq "$file"; then
32 echo "✓ $file is valid FASTQ"
33else
34 echo "✗ $file failed validation"
35fi
36done
Format Details
7
Check exists: -s tests file has content
13
Format check: FASTQ starts with @
20
Line check: Must be multiple of 4
31
Use validation: Test each file

Calculate Sequence Statistics

FASTA Statistics Function

1#!/bin/bash
2
3fasta_stats() {
4local fasta=$1
5
6if [ ! -f "$fasta" ]; then
7 echo "ERROR: File not found: $fasta" >&2
8 return 1
9fi
10
11# Count sequences
12local seq_count=$(grep -c "^>" "$fasta")
13
14# Calculate total length
15local total_length=$(grep -v "^>" "$fasta" | \
16 tr -d '\n' | wc -c)
17
18# Calculate average
19local avg_length=0
20if [ $seq_count -gt 0 ]; then
21 avg_length=$((total_length / seq_count))
22fi
23
24# Output results
25echo "File: $fasta"
26echo "Sequences: $seq_count"
27echo "Total length: $total_length bp"
28echo "Average length: $avg_length bp"
29echo ""
30}
31
32# Analyze all FASTA files
33for fasta in *.fasta; do
34[ -f "$fasta" ] || continue
35fasta_stats "$fasta"
36done
Format Details
7
Error to stderr: >&2 sends to stderr not stdout
12
Count seqs: Count > headers
15
Total length: Remove headers and newlines
20
Check divide: Avoid division by zero
33
Skip non-files: Handle no matches

Error Handling Patterns

Die Function

A standard pattern for fatal errors:

Die Function for Errors

1#!/bin/bash
2
3# Print error and exit
4die() {
5echo "ERROR: $*" >&2
6exit 1
7}
8
9# Usage
10[ -f "required.txt" ] || die "Missing required file"
11[ $# -eq 2 ] || die "Usage: $0 <input> <output>"
12
13echo "All checks passed, continuing..."
Format Details
5
Error message: $* is all arguments combined
5
To stderr: >&2 for error messages
6
Exit fatal: Stop script immediately
10
Use die: || means run if first command fails

Warn Function

For non-fatal warnings:

Warn Function

1#!/bin/bash
2
3warn() {
4echo "WARNING: $*" >&2
5}
6
7process_file() {
8local file=$1
9
10if [ ! -f "$file" ]; then
11 warn "File not found: $file, skipping"
12 return 1
13fi
14
15# Process file...
16}
17
18# Continue even if some files missing
19for file in *.txt; do
20process_file "$file" || continue
21done
Format Details
4
Warning: Like die but doesn't exit
11
Warn and skip: Report but continue
20
Continue on error: || continue skips to next iteration

Debugging Techniques

Add Debug Prints

Debug Mode

1#!/bin/bash
2
3DEBUG=0 # Set to 1 to enable debug output
4
5debug() {
6if [ "$DEBUG" = "1" ]; then
7 echo "DEBUG: $*" >&2
8fi
9}
10
11process_sample() {
12local sample=$1
13debug "Entering process_sample with: $sample"
14
15# Processing...
16debug "Finished processing $sample"
17}
18
19# Run with: DEBUG=1 ./script.sh
20process_sample "Sample_01"
Format Details
3
Debug flag: Control debug output
6
Conditional print: Only if DEBUG=1
13
Debug points: Track execution flow
19
Enable debug: Set DEBUG=1 when running

Trace Execution

InputSuccess
bash -x script.sh
Output
+ sample=Sample_01
+ echo 'Processing Sample_01'
Processing Sample_01
+ count_reads Sample_01.fastq

-x shows each command before running. See exactly what executes.

Add tracing to specific sections:

Selective Tracing

1#!/bin/bash
2
3# Normal execution
4echo "Starting analysis"
5
6# Enable tracing for this section
7set -x
8complicated_command
9another_command
10set +x
11
12# Back to normal
13echo "Analysis complete"
Format Details
7
Enable trace: set -x turns on tracing
10
Disable trace: set +x turns off tracing

Check Syntax Without Running

InputSuccess
bash -n script.sh

-n checks syntax without executing. Catches typos before running.

InputSuccess
bash -n broken_script.sh
Output
broken_script.sh: line 15: syntax error near unexpected token `fi'
broken_script.sh: line 15: `fi'

Syntax errors reported immediately.

Error Handling with set Flags

Robust Script Template

1#!/bin/bash
2# Robust script with comprehensive error handling
3
4# Exit on error
5set -e
6
7# Exit on undefined variable
8set -u
9
10# Catch errors in pipes
11set -o pipefail
12
13# Optional: trace execution
14# set -x
15
16# Cleanup on exit
17cleanup() {
18echo "Cleaning up temporary files..."
19rm -f temp_*
20}
21trap cleanup EXIT
22
23# Your script here
24echo "Starting robust pipeline..."
Format Details
5
set -e: Exit if any command fails
8
set -u: Exit if using undefined variable
11
pipefail: Catch errors in pipe chains
17
cleanup: Function to run on exit
21
trap: Run cleanup when script exits

set -e has limitations: It doesn't catch errors in some contexts (like functions in conditions). Combine with explicit error checking for critical operations.

Trap for Cleanup

Always clean up temporary files, even if script fails:

Cleanup with Trap

1#!/bin/bash
2
3# Create temp directory
4temp_dir=$(mktemp -d)
5
6# Ensure cleanup happens
7cleanup() {
8echo "Cleaning up $temp_dir"
9rm -rf "$temp_dir"
10}
11trap cleanup EXIT
12
13# Use temp directory safely
14echo "Working in $temp_dir"
15# ... do work ...
16
17# cleanup runs automatically when script exits
Format Details
4
Make temp: mktemp creates secure temp dir
7
Cleanup func: Remove temp directory
11
trap EXIT: Run cleanup no matter how script ends
17
Automatic: cleanup called on normal or error exit

Complete Example: Robust Pipeline

Production-Ready Analysis Script

1#!/bin/bash
2# Complete analysis pipeline with error handling
3
4set -e -u -o pipefail
5
6# Functions
7die() {
8echo "ERROR: $*" >&2
9exit 1
10}
11
12log() {
13echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
14}
15
16validate_fastq() {
17local file=$1
18[ -f "$file" ] || die "FASTQ not found: $file"
19[ -s "$file" ] || die "FASTQ is empty: $file"
20
21local lines=$(wc -l < "$file")
22[ $((lines % 4)) -eq 0 ] || die "Invalid FASTQ: $file"
23}
24
25cleanup() {
26log "Cleaning up temporary files"
27rm -rf "$temp_dir"
28}
29
30# Setup
31[ $# -eq 1 ] || die "Usage: $0 <sample.fastq>"
32
33sample_fastq=$1
34temp_dir=$(mktemp -d)
35trap cleanup EXIT
36
37log "Starting analysis of $sample_fastq"
38
39# Validate input
40validate_fastq "$sample_fastq"
41log "Input validation passed"
42
43# Analysis
44log "Counting reads"
45lines=$(wc -l < "$sample_fastq")
46reads=$((lines / 4))
47log "Found $reads reads"
48
49# Generate report
50report="analysis_report.txt"
51{
52echo "Analysis Report"
53echo "==============="
54echo "Sample: $sample_fastq"
55echo "Reads: $reads"
56echo "Date: $(date)"
57} > "$report"
58
59log "Report written to $report"
60log "Analysis complete"
Format Details
4
Safety: Exit on errors
7
die(): Fatal error handling
12
log(): Timestamped logging
16
validate(): Input validation
25
cleanup(): Cleanup function
35
trap: Ensure cleanup runs
49
Report: Generate output

Debugging Common Errors

Undefined Variable

InputSuccess
#!/bin/bash\nset -u\necho $undefined_var
Output
script.sh: line 3: undefined_var: unbound variable

set -u catches undefined variables immediately.

Fix: Always initialize variables or use ${var:-default}:

InputSuccess
value=${optional_var:-default_value}
echo $value
Output
default_value

Use default if variable is unset or empty.

Command Not Found in Function

InputSuccess
my_function() {\n  ech 'typo'\n}\nmy_function
Output
script.sh: line 2: ech: command not found

Typos in function only caught when function runs.

Fix: Use bash -n to check syntax before running.

Silent Pipe Failures

InputSuccess
false | echo 'Still runs'
Output
Still runs

Without pipefail, errors in pipes are ignored!

InputSuccess
set -o pipefail\nfalse | echo 'Still runs'\necho 'Exit code:' $?
Output
Still runs
Exit code: 1

pipefail preserves error code from failed command in pipe.

Quick Reference

Function Syntax

function_name() { # Define function local var=$1 # Get argument echo "result" # Return data via echo return 0 # Return status code } result=$(function_name arg) # Capture output function_name arg # Just run function

Error Handling

set -e # Exit on error set -u # Exit on undefined variable set -o pipefail # Catch pipe errors set -x # Trace execution die() { # Fatal error function echo "ERROR: $*" >&2 exit 1 } cleanup() { # Cleanup function rm -rf temp_* } trap cleanup EXIT # Run on exit

Debugging

bash -x script.sh # Trace execution bash -n script.sh # Check syntax set -x # Enable tracing set +x # Disable tracing DEBUG=1 ./script.sh # Run with debug flag

Best Practices

Function Best Practices
  1. One purpose per function - Each function does one thing well
  2. Use local variables - Prevent variable name conflicts
  3. Validate inputs - Check arguments at function start
  4. Return meaningful codes - 0 for success, 1+ for errors
  5. Document functions - Comments explaining purpose and usage
  6. Keep functions short - If > 50 lines, consider splitting
  7. Test functions independently - Easier to debug
  8. Use consistent naming - verb_noun pattern (e.g., validate_file)

Next Steps

You now have the tools for writing robust, maintainable scripts:

  • Functions for code reuse
  • Error handling for reliability
  • Debugging techniques for troubleshooting

The final page brings everything together with real-world bioinformatics pipelines, showing how to combine all these techniques into production-ready workflows.

Further Reading