Basic Navigation

Before you can process genomics data, you need to find it. The UNIX file system is organized as a tree of directories (folders). Learning to navigate this tree is your first essential skill.

The File System Tree

Think of the file system as an inverted tree. The root is at the top, branches are directories, and leaves are files.

File System Structure

/ (root) - The top of the file system tree. Everything starts here.

~ (home) - Your personal directory. Usually /home/username on Linux or /Users/username on macOS.

Absolute path - Full path from root. Example: /home/user/projects/rnaseq/data

Relative path - Path from your current location. Example: ../data or ./samples

Typical bioinformatics project structure:

/home/user/projects/
└── rnaseq_analysis/
    ├── raw_data/
    │   ├── sample_01.fastq.gz
    │   ├── sample_02.fastq.gz
    │   └── sample_03.fastq.gz
    ├── reference_genome/
    │   ├── genome.fasta
    │   └── annotations.gtf
    ├── scripts/
    │   ├── quality_control.sh
    │   └── alignment.sh
    └── results/
        ├── qc_reports/
        ├── alignments/
        └── counts/

pwd - Print Working Directory

The most basic navigation command: where am I?

Input0.01sSuccess

pwd

Output

/home/user/projects/rnaseq_analysis

Shows your current location in the file system. Think of this as 'you are here' on a map.

Lost in the file system? Type pwd to see where you are. This is especially helpful when working in deeply nested directories or on remote servers.

Every command you run happens in your current directory. Understanding where you are determines:

Which files you can access without specifying full paths
Where new files will be created
Which directory is used when you specify relative paths

ls - List Directory Contents

See what files and directories are in your current location:

Input0.02sSuccess

ls

Output

raw_data  reference_genome  results  scripts

Lists files and directories in the current location. By default, shows just names in a simple format.

Useful ls Options

The ls command becomes more powerful with options (also called flags):

Input0.02sSuccess

ls -l

Output

drwxr-xr-x  5 user group  4096 Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4096 Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4096 Nov 20 09:15 results
drwxr-xr-x  2 user group  4096 Nov 18 14:23 scripts

Long format (-l) shows permissions, owner, size, date modified, and name. The 'd' at the start means directory.

Input0.02sSuccess

ls -lh

Output

-h flag

drwxr-xr-x  5 user group  4.0K Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4.0K Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
drwxr-xr-x  2 user group  4.0K Nov 18 14:23 scripts

Human-readable sizes (-h). Shows 4.0K instead of 4096 bytes. Much easier to read for large genomics files.

Input0.03sSuccess

ls -lh raw_data/

Output

3 files6.8 GB total size

-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02.fastq.gz
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

List contents of a specific directory without changing location. Each file is ~2.3 GB of compressed sequencing data.

Input0.02sSuccess

ls -lha

Output

drwxr-xr-x  7 user group  4.0K Nov 20 14:30 .
drwxr-xr-x 15 user group  4.0K Nov 18 14:20 ..
-rw-r--r--  1 user group   245 Nov 18 14:22 .gitignore
drwxr-xr-x  5 user group  4.0K Nov 18 14:23 raw_data
drwxr-xr-x  3 user group  4.0K Nov 18 14:23 reference_genome
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
drwxr-xr-x  2 user group  4.0K Nov 18 14:23 scripts

Show all files (-a), including hidden files starting with a dot. The . entry is the current directory, .. is the parent directory.

Hidden files (starting with .) are often configuration files. Examples include .bashrc (shell configuration), .gitignore (git settings), and tool-specific config files.

Practical ls Examples for Bioinformatics

Input0.03sSuccess

ls -lh *.fastq.gz

Output

4 matched files

-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01_R1.fastq.gz
-rw-r--r--  1 user group  2.3G Nov 18 10:45 sample_01_R2.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02_R1.fastq.gz
-rw-r--r--  1 user group  2.1G Nov 18 11:23 sample_02_R2.fastq.gz

List only FASTQ files using a wildcard pattern (*). The asterisk matches any characters, so *.fastq.gz finds all files ending in .fastq.gz.

Input0.04sSuccess

ls -lhS

Output

-S flag

-rw-r--r--  1 user group   45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r--  1 user group   12G Nov 19 14:23 genome_index.idx
-rw-r--r--  1 user group  3.2G Nov 18 10:45 genome.fasta
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

Sort by size (-S), largest first. Useful to find which files are consuming the most disk space. The BAM alignment file is 45 GB.

Input0.03sSuccess

ls -lht | head -5

Output

-rw-r--r--  1 user group  1.2M Nov 20 14:25 gene_counts.txt
drwxr-xr-x  4 user group  4.0K Nov 20 09:15 results
-rw-r--r--  1 user group   45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r--  1 user group   12G Nov 19 14:23 genome_index.idx
-rw-r--r--  1 user group  2.4G Nov 18 12:08 sample_03.fastq.gz

Sort by modification time (-t), newest first, and show only the first 5 entries. Find your most recently modified files quickly.

cd - Change Directory

Move between directories:

InputSuccess

cd raw_data

Move into the raw_data directory. This uses a relative path - raw_data must exist in your current directory.

After running cd raw_data, your location changes:

Input0.01sSuccess

pwd

Output

/home/user/projects/rnaseq_analysis/raw_data

Confirm you moved. You're now inside the raw_data directory.

Essential cd Shortcuts

4 steps

cd ~

Output

$ pwd
/home/user

Absolute vs Relative Paths

Path Types

Absolute path - Starts with /. Goes from the root of the file system. Example: /home/user/projects/rnaseq_analysis/raw_data

Relative path - Starts from your current location. Example: raw_data or ../reference_genome or ./scripts/alignment.sh

InputSuccess

cd /home/user/projects/rnaseq_analysis/reference_genome

Absolute path - works from anywhere. You could be in any directory, and this command takes you to reference_genome.

InputSuccess

cd ../reference_genome

Relative path - go up one level (..), then into reference_genome. Only works if reference_genome is a sibling directory.

Use absolute paths in scripts that might run from different locations. Use relative paths for quick navigation when working interactively.

Here's a realistic example of navigating a bioinformatics project:

Navigate RNA-seq Project Structure

8 steps

pwd

Output

/home/user

Tab Completion

The terminal's most powerful feature for navigation: Tab completion. Press the Tab key to autocomplete file and directory names.

Tab Completion

Start typing a file or directory name and press Tab. The terminal will autocomplete it if it's unique, or show you options if there are multiple matches.

Example workflow:

$ cd proj[TAB]
# Becomes: cd projects/
 
$ cd projects/rna[TAB]
# Becomes: cd projects/rnaseq_analysis/
 
$ ls sam[TAB]
# Shows: sample_01.fastq.gz  sample_02.fastq.gz  sample_03.fastq.gz
 
$ ls sample_01[TAB]
# Becomes: ls sample_01.fastq.gz

Tab completion:

Saves typing
Prevents typos
Shows you what files exist
Works for commands, files, and directories

UNIX is case-sensitive. sample.fastq and Sample.fastq are different files. Tab completion helps catch case mismatches.

Mistake 1: Spaces in File Names

Input

cd my project

Output

bash: cd: my: No such file or directory

Spaces confuse the shell. It thinks 'my' and 'project' are two separate arguments.

Solution: Use quotes or escape the space:

InputSuccess

cd "my project"

Quotes treat the entire string as one argument. The space is preserved inside the quotes.

InputSuccess

cd my\ project

Backslash escapes the space. The space is treated as part of the directory name, not a separator.

Best practice: Avoid spaces in file and directory names. Use underscores or hyphens instead: my_project or my-project.

Mistake 2: Forgetting Where You Are

Input

cd results

Output

bash: cd: results: No such file or directory

The 'results' directory doesn't exist in your current location. Check with pwd and ls first.

Solution: Check your location and available directories:

Diagnose Navigation Errors

5 steps

pwd

Output

/home/user/documents

Organizing Bioinformatics Projects

A well-organized project structure makes navigation easier:

Recommended Project Structure

1project_name/

2├── 00_raw_data/ # Original sequencing files (read-only)

3│ ├── sample_01_R1.fastq.gz

4│ └── sample_01_R2.fastq.gz

5├── 01_quality_control/ # QC reports

6│ └── fastqc_results/

7├── 02_trimmed/ # Cleaned reads

8│ └── trimmed_fastq/

9├── 03_alignment/ # Aligned reads

10│ └── bam_files/

11├── 04_counts/ # Gene expression counts

12│ └── count_matrices/

13├── 05_analysis/ # R scripts and results

14│ └── differential_expression/

15├── reference/ # Reference genomes and annotations

16│ ├── genome.fasta

17│ └── annotations.gtf

18├── scripts/ # All analysis scripts

19│ ├── 01_quality_control.sh

20│ ├── 02_alignment.sh

21│ └── 03_counting.sh

22└── README.md # Project documentation

Format Details

Root Directory: Top-level project folder with a descriptive name

Numbered Folders: Prefix with numbers to show workflow order

Scripts Folder: Keep all scripts together for easy version control

Documentation: README explains project structure and how to reproduce analysis

Benefits of this structure:

Number prefixes show workflow order
Descriptive names make purpose clear
Separate scripts from data for version control
Read-only raw data prevents accidental deletion
Easy navigation - you always know where to find files

Many researchers prefix folders with numbers (00_, 01_, 02_) so ls shows them in workflow order. This makes navigation intuitive.

When working on high-performance computing clusters, navigation has additional considerations:

InputSuccess

cd $SCRATCH/rnaseq_analysis

Use environment variables to navigate to scratch space. $SCRATCH is set by the cluster and points to fast temporary storage.

InputSuccess

cd /project/lab_name/shared_data/reference_genomes

Shared lab data often lives in project directories with absolute paths. Use absolute paths in submission scripts.

Common HPC environment variables:

$HOME - Your home directory (small quota, for scripts and configs)
$SCRATCH - Fast temporary storage (large quota, purged periodically)
$WORK - Long-term storage (medium quota, not purged)

On HPC systems, $SCRATCH is often purged after 30-90 days. Never keep your only copy of data there. Use it for intermediate files during analysis.

Quick Reference

Navigation Commands Cheat Sheet

1# Where am I?

2pwd # Print working directory

4# What's here?

5ls # List files

6ls -l # Long format with details

7ls -lh # Human-readable sizes

8ls -lha # Include hidden files

9ls -lht # Sort by time, newest first

10ls -lhS # Sort by size, largest first

12# Move around

13cd directory_name # Go to directory

14cd .. # Go up one level

15cd ~ # Go to home directory

16cd - # Go to previous directory

17cd /absolute/path # Go to absolute path

19# Shortcuts

20~ # Home directory

21. # Current directory

22.. # Parent directory

23- # Previous directory

Format Details

Location: Commands to determine where you are

Listing: Commands to see what files exist

Navigation: Commands to move between directories

Special Symbols: Shortcuts that work in all commands

Practice Exercises

Practice in evomics-learn

Practice navigation commands with interactive exercises

Try these exercises on evomics-learn:

Navigate through a project directory structure
Find files using ls with different options
Use relative paths to move between related directories
Practice tab completion
Organize a bioinformatics project

Next Steps

Now that you can navigate the file system, the next section covers working with files: creating, copying, moving, and deleting them.

You'll learn:

Create directories for your projects
Copy files between locations
Rename and move files
Delete files safely (and how to avoid disasters)

The File System Tree

pwd - Print Working Directory

ls - List Directory Contents

Useful ls Options

Practical ls Examples for Bioinformatics

cd - Change Directory

Navigation Shortcuts

Essential cd Shortcuts

Absolute vs Relative Paths

Practical Navigation Workflow

Navigate RNA-seq Project Structure

Tab Completion

Common Navigation Mistakes

Mistake 1: Spaces in File Names

Mistake 2: Forgetting Where You Are

Diagnose Navigation Errors

Organizing Bioinformatics Projects

Recommended Project Structure

Navigation on HPC Clusters

Quick Reference

Navigation Commands Cheat Sheet

Practice Exercises

Next Steps

Further Reading