Basic Navigation

Before you can process genomics data, you need to find it. The UNIX file system is organized as a tree of directories (folders). Learning to navigate this tree is your first essential skill.

The File System Tree

Think of the file system as an inverted tree. The root is at the top, branches are directories, and leaves are files.

File System Structure

/ (root) - The top of the file system tree. Everything starts here.

~ (home) - Your personal directory. Usually /home/username on Linux or /Users/username on macOS.

Absolute path - Full path from root. Example: /home/user/projects/rnaseq/data

Relative path - Path from your current location. Example: ../data or ./samples

Typical bioinformatics project structure:

/home/user/projects/ └── rnaseq_analysis/ ├── raw_data/ │ ├── sample_01.fastq.gz │ ├── sample_02.fastq.gz │ └── sample_03.fastq.gz ├── reference_genome/ │ ├── genome.fasta │ └── annotations.gtf ├── scripts/ │ ├── quality_control.sh │ └── alignment.sh └── results/ ├── qc_reports/ ├── alignments/ └── counts/

pwd - Print Working Directory

The most basic navigation command: where am I?

Input0.01sSuccess
pwd
Output
/home/user/projects/rnaseq_analysis

Shows your current location in the file system. Think of this as 'you are here' on a map.

Lost in the file system? Type pwd to see where you are. This is especially helpful when working in deeply nested directories or on remote servers.

Every command you run happens in your current directory. Understanding where you are determines:

  • Which files you can access without specifying full paths
  • Where new files will be created
  • Which directory is used when you specify relative paths

ls - List Directory Contents

See what files and directories are in your current location:

Input0.02sSuccess
ls
Output
raw_data  reference_genome  results  scripts

Lists files and directories in the current location. By default, shows just names in a simple format.

Useful ls Options

The ls command becomes more powerful with options (also called flags):

Input0.02sSuccess
ls -l
Output
drwxr-xr-x  5 user group  4096 Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4096 Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4096 Nov 20 09:15 results
drwxr-xr-x  2 user group  4096 Nov 18 14:23 scripts

Long format (-l) shows permissions, owner, size, date modified, and name. The 'd' at the start means directory.

Input0.02sSuccess
ls -lh
Output
-h flag
drwxr-xr-x  5 user group  4.0K Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4.0K Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
drwxr-xr-x  2 user group  4.0K Nov 18 14:23 scripts

Human-readable sizes (-h). Shows 4.0K instead of 4096 bytes. Much easier to read for large genomics files.

Input0.03sSuccess
ls -lh raw_data/
Output
3 files6.8 GB total size
-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02.fastq.gz
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

List contents of a specific directory without changing location. Each file is ~2.3 GB of compressed sequencing data.

Input0.02sSuccess
ls -lha
Output
drwxr-xr-x  7 user group  4.0K Nov 20 14:30 .
drwxr-xr-x 15 user group  4.0K Nov 18 14:20 ..
-rw-r--r--  1 user group   245 Nov 18 14:22 .gitignore
drwxr-xr-x  5 user group  4.0K Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4.0K Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
drwxr-xr-x  2 user group  4.0K Nov 18 14:23 scripts

Show all files (-a), including hidden files starting with a dot. The . entry is the current directory, .. is the parent directory.

Hidden files (starting with .) are often configuration files. Examples include .bashrc (shell configuration), .gitignore (git settings), and tool-specific config files.

Practical ls Examples for Bioinformatics

Input0.03sSuccess
ls -lh *.fastq.gz
Output
4 matched files
-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01_R1.fastq.gz
-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01_R2.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02_R1.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02_R2.fastq.gz

List only FASTQ files using a wildcard pattern (*). The asterisk matches any characters, so *.fastq.gz finds all files ending in .fastq.gz.

Input0.04sSuccess
ls -lhS
Output
-S flag
-rw-r--r--  1 user group   45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r--  1 user group   12G Nov 19 14:23 genome_index.idx
-rw-r--r--  1 user group  3.2G Nov 18 10:45 genome.fasta
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

Sort by size (-S), largest first. Useful to find which files are consuming the most disk space. The BAM alignment file is 45 GB.

Input0.03sSuccess
ls -lht | head -5
Output
-rw-r--r--  1 user group  1.2M Nov 20 14:25 gene_counts.txt
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
-rw-r--r--  1 user group   45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r--  1 user group   12G Nov 19 14:23 genome_index.idx
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

Sort by modification time (-t), newest first, and show only the first 5 entries. Find your most recently modified files quickly.

cd - Change Directory

Move between directories:

InputSuccess
cd raw_data

Move into the raw_data directory. This uses a relative path - raw_data must exist in your current directory.

After running cd raw_data, your location changes:

Input0.01sSuccess
pwd
Output
/home/user/projects/rnaseq_analysis/raw_data

Confirm you moved. You're now inside the raw_data directory.

Essential cd Shortcuts

4 steps
cd ~
Output
$ pwd
/home/user

Absolute vs Relative Paths

Path Types

Absolute path - Starts with /. Goes from the root of the file system. Example: /home/user/projects/rnaseq_analysis/raw_data

Relative path - Starts from your current location. Example: raw_data or ../reference_genome or ./scripts/alignment.sh

InputSuccess
cd /home/user/projects/rnaseq_analysis/reference_genome

Absolute path - works from anywhere. You could be in any directory, and this command takes you to reference_genome.

InputSuccess
cd ../reference_genome

Relative path - go up one level (..), then into reference_genome. Only works if reference_genome is a sibling directory.

Use absolute paths in scripts that might run from different locations. Use relative paths for quick navigation when working interactively.

Practical Navigation Workflow

Here's a realistic example of navigating a bioinformatics project:

Navigate RNA-seq Project Structure

8 steps
pwd
Output
/home/user

Tab Completion

The terminal's most powerful feature for navigation: Tab completion. Press the Tab key to autocomplete file and directory names.

Tab Completion

Start typing a file or directory name and press Tab. The terminal will autocomplete it if it's unique, or show you options if there are multiple matches.

Example workflow:

$ cd proj[TAB] # Becomes: cd projects/ $ cd projects/rna[TAB] # Becomes: cd projects/rnaseq_analysis/ $ ls sam[TAB] # Shows: sample_01.fastq.gz sample_02.fastq.gz sample_03.fastq.gz $ ls sample_01[TAB] # Becomes: ls sample_01.fastq.gz

Tab completion:

  • Saves typing
  • Prevents typos
  • Shows you what files exist
  • Works for commands, files, and directories

UNIX is case-sensitive. sample.fastq and Sample.fastq are different files. Tab completion helps catch case mismatches.

Common Navigation Mistakes

Mistake 1: Spaces in File Names

Input
cd my project
Output
bash: cd: my: No such file or directory

Spaces confuse the shell. It thinks 'my' and 'project' are two separate arguments.

Solution: Use quotes or escape the space:

InputSuccess
cd "my project"

Quotes treat the entire string as one argument. The space is preserved inside the quotes.

InputSuccess
cd my\ project

Backslash escapes the space. The space is treated as part of the directory name, not a separator.

Best practice: Avoid spaces in file and directory names. Use underscores or hyphens instead: my_project or my-project.

Mistake 2: Forgetting Where You Are

Input
cd results
Output
bash: cd: results: No such file or directory

The 'results' directory doesn't exist in your current location. Check with pwd and ls first.

Solution: Check your location and available directories:

Diagnose Navigation Errors

5 steps
pwd
Output
/home/user/documents

Organizing Bioinformatics Projects

A well-organized project structure makes navigation easier:

Recommended Project Structure

1project_name/
2├── 00_raw_data/ # Original sequencing files (read-only)
3│ ├── sample_01_R1.fastq.gz
4│ └── sample_01_R2.fastq.gz
5├── 01_quality_control/ # QC reports
6│ └── fastqc_results/
7├── 02_trimmed/ # Cleaned reads
8│ └── trimmed_fastq/
9├── 03_alignment/ # Aligned reads
10│ └── bam_files/
11├── 04_counts/ # Gene expression counts
12│ └── count_matrices/
13├── 05_analysis/ # R scripts and results
14│ └── differential_expression/
15├── reference/ # Reference genomes and annotations
16│ ├── genome.fasta
17│ └── annotations.gtf
18├── scripts/ # All analysis scripts
19│ ├── 01_quality_control.sh
20│ ├── 02_alignment.sh
21│ └── 03_counting.sh
22└── README.md # Project documentation
Format Details
1
Root Directory: Top-level project folder with a descriptive name
2
Numbered Folders: Prefix with numbers to show workflow order
15
Scripts Folder: Keep all scripts together for easy version control
19
Documentation: README explains project structure and how to reproduce analysis

Benefits of this structure:

  • Number prefixes show workflow order
  • Descriptive names make purpose clear
  • Separate scripts from data for version control
  • Read-only raw data prevents accidental deletion
  • Easy navigation - you always know where to find files

Many researchers prefix folders with numbers (00_, 01_, 02_) so ls shows them in workflow order. This makes navigation intuitive.

When working on high-performance computing clusters, navigation has additional considerations:

InputSuccess
cd $SCRATCH/rnaseq_analysis

Use environment variables to navigate to scratch space. $SCRATCH is set by the cluster and points to fast temporary storage.

InputSuccess
cd /project/lab_name/shared_data/reference_genomes

Shared lab data often lives in project directories with absolute paths. Use absolute paths in submission scripts.

Common HPC environment variables:

  • $HOME - Your home directory (small quota, for scripts and configs)
  • $SCRATCH - Fast temporary storage (large quota, purged periodically)
  • $WORK - Long-term storage (medium quota, not purged)

On HPC systems, $SCRATCH is often purged after 30-90 days. Never keep your only copy of data there. Use it for intermediate files during analysis.

Quick Reference

Navigation Commands Cheat Sheet

1# Where am I?
2pwd # Print working directory
3
4# What's here?
5ls # List files
6ls -l # Long format with details
7ls -lh # Human-readable sizes
8ls -lha # Include hidden files
9ls -lht # Sort by time, newest first
10ls -lhS # Sort by size, largest first
11
12# Move around
13cd directory_name # Go to directory
14cd .. # Go up one level
15cd ~ # Go to home directory
16cd - # Go to previous directory
17cd /absolute/path # Go to absolute path
18
19# Shortcuts
20~ # Home directory
21. # Current directory
22.. # Parent directory
23- # Previous directory
Format Details
1
Location: Commands to determine where you are
4
Listing: Commands to see what files exist
12
Navigation: Commands to move between directories
19
Special Symbols: Shortcuts that work in all commands

Practice Exercises

Practice in evomics-learn

Practice navigation commands with interactive exercises

Try these exercises on evomics-learn:

  1. Navigate through a project directory structure
  2. Find files using ls with different options
  3. Use relative paths to move between related directories
  4. Practice tab completion
  5. Organize a bioinformatics project

Next Steps

Now that you can navigate the file system, the next section covers working with files: creating, copying, moving, and deleting them.

You'll learn:

  • Create directories for your projects
  • Copy files between locations
  • Rename and move files
  • Delete files safely (and how to avoid disasters)

Further Reading