Basic Navigation
Before you can process genomics data, you need to find it. The UNIX file system is organized as a tree of directories (folders). Learning to navigate this tree is your first essential skill.
The File System Tree
Think of the file system as an inverted tree. The root is at the top, branches are directories, and leaves are files.
/ (root) - The top of the file system tree. Everything starts here.
~ (home) - Your personal directory. Usually /home/username on Linux or /Users/username on macOS.
Absolute path - Full path from root. Example: /home/user/projects/rnaseq/data
Relative path - Path from your current location. Example: ../data or ./samples
Typical bioinformatics project structure:
/home/user/projects/
└── rnaseq_analysis/
├── raw_data/
│ ├── sample_01.fastq.gz
│ ├── sample_02.fastq.gz
│ └── sample_03.fastq.gz
├── reference_genome/
│ ├── genome.fasta
│ └── annotations.gtf
├── scripts/
│ ├── quality_control.sh
│ └── alignment.sh
└── results/
├── qc_reports/
├── alignments/
└── counts/
pwd - Print Working Directory
The most basic navigation command: where am I?
pwd/home/user/projects/rnaseq_analysisShows your current location in the file system. Think of this as 'you are here' on a map.
Lost in the file system? Type pwd to see where you are. This is especially helpful when working in deeply nested directories or on remote servers.
Every command you run happens in your current directory. Understanding where you are determines:
- Which files you can access without specifying full paths
- Where new files will be created
- Which directory is used when you specify relative paths
ls - List Directory Contents
See what files and directories are in your current location:
lsraw_data reference_genome results scriptsLists files and directories in the current location. By default, shows just names in a simple format.
Useful ls Options
The ls command becomes more powerful with options (also called flags):
ls -ldrwxr-xr-x 5 user group 4096 Nov 18 14:23 raw_data
drwxr-xr-x 3 user group 4096 Nov 18 14:23 reference_genome
drwxr-xr-x 4 user group 4096 Nov 20 09:15 results
drwxr-xr-x 2 user group 4096 Nov 18 14:23 scriptsLong format (-l) shows permissions, owner, size, date modified, and name. The 'd' at the start means directory.
ls -lhdrwxr-xr-x 5 user group 4.0K Nov 18 14:23 raw_data
drwxr-xr-x 3 user group 4.0K Nov 18 14:23 reference_genome
drwxr-xr-x 4 user group 4.0K Nov 20 09:15 results
drwxr-xr-x 2 user group 4.0K Nov 18 14:23 scriptsHuman-readable sizes (-h). Shows 4.0K instead of 4096 bytes. Much easier to read for large genomics files.
ls -lh raw_data/-rw-r--r-- 1 user group 2.3G Nov 18 10:45 sample_01.fastq.gz
-rw-r--r-- 1 user group 2.1G Nov 18 11:23 sample_02.fastq.gz
-rw-r--r-- 1 user group 2.4G Nov 18 12:08 sample_03.fastq.gzList contents of a specific directory without changing location. Each file is ~2.3 GB of compressed sequencing data.
ls -lhadrwxr-xr-x 7 user group 4.0K Nov 20 14:30 .
drwxr-xr-x 15 user group 4.0K Nov 18 14:20 ..
-rw-r--r-- 1 user group 245 Nov 18 14:22 .gitignore
drwxr-xr-x 5 user group 4.0K Nov 18 14:23 raw_data
drwxr-xr-x 3 user group 4.0K Nov 18 14:23 reference_genome
drwxr-xr-x 4 user group 4.0K Nov 20 09:15 results
drwxr-xr-x 2 user group 4.0K Nov 18 14:23 scriptsShow all files (-a), including hidden files starting with a dot. The . entry is the current directory, .. is the parent directory.
Hidden files (starting with .) are often configuration files. Examples include .bashrc (shell configuration), .gitignore (git settings), and tool-specific config files.
Practical ls Examples for Bioinformatics
ls -lh *.fastq.gz-rw-r--r-- 1 user group 2.3G Nov 18 10:45 sample_01_R1.fastq.gz
-rw-r--r-- 1 user group 2.3G Nov 18 10:45 sample_01_R2.fastq.gz
-rw-r--r-- 1 user group 2.1G Nov 18 11:23 sample_02_R1.fastq.gz
-rw-r--r-- 1 user group 2.1G Nov 18 11:23 sample_02_R2.fastq.gzList only FASTQ files using a wildcard pattern (*). The asterisk matches any characters, so *.fastq.gz finds all files ending in .fastq.gz.
ls -lhS-rw-r--r-- 1 user group 45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r-- 1 user group 12G Nov 19 14:23 genome_index.idx
-rw-r--r-- 1 user group 3.2G Nov 18 10:45 genome.fasta
-rw-r--r-- 1 user group 2.4G Nov 18 12:08 sample_03.fastq.gzSort by size (-S), largest first. Useful to find which files are consuming the most disk space. The BAM alignment file is 45 GB.
ls -lht | head -5-rw-r--r-- 1 user group 1.2M Nov 20 14:25 gene_counts.txt
drwxr-xr-x 4 user group 4.0K Nov 20 09:15 results
-rw-r--r-- 1 user group 45G Nov 19 16:34 aligned_sorted.bam
-rw-r--r-- 1 user group 12G Nov 19 14:23 genome_index.idx
-rw-r--r-- 1 user group 2.4G Nov 18 12:08 sample_03.fastq.gzSort by modification time (-t), newest first, and show only the first 5 entries. Find your most recently modified files quickly.
cd - Change Directory
Move between directories:
cd raw_dataMove into the raw_data directory. This uses a relative path - raw_data must exist in your current directory.
After running cd raw_data, your location changes:
pwd/home/user/projects/rnaseq_analysis/raw_dataConfirm you moved. You're now inside the raw_data directory.
Navigation Shortcuts
Essential cd Shortcuts
4 stepsAbsolute vs Relative Paths
Absolute path - Starts with /. Goes from the root of the file system. Example: /home/user/projects/rnaseq_analysis/raw_data
Relative path - Starts from your current location. Example: raw_data or ../reference_genome or ./scripts/alignment.sh
cd /home/user/projects/rnaseq_analysis/reference_genomeAbsolute path - works from anywhere. You could be in any directory, and this command takes you to reference_genome.
cd ../reference_genomeRelative path - go up one level (..), then into reference_genome. Only works if reference_genome is a sibling directory.
Use absolute paths in scripts that might run from different locations. Use relative paths for quick navigation when working interactively.
Practical Navigation Workflow
Here's a realistic example of navigating a bioinformatics project:
Navigate RNA-seq Project Structure
8 stepsTab Completion
The terminal's most powerful feature for navigation: Tab completion. Press the Tab key to autocomplete file and directory names.
Start typing a file or directory name and press Tab. The terminal will autocomplete it if it's unique, or show you options if there are multiple matches.
Example workflow:
$ cd proj[TAB]
# Becomes: cd projects/
$ cd projects/rna[TAB]
# Becomes: cd projects/rnaseq_analysis/
$ ls sam[TAB]
# Shows: sample_01.fastq.gz sample_02.fastq.gz sample_03.fastq.gz
$ ls sample_01[TAB]
# Becomes: ls sample_01.fastq.gzTab completion:
- Saves typing
- Prevents typos
- Shows you what files exist
- Works for commands, files, and directories
UNIX is case-sensitive. sample.fastq and Sample.fastq are different files. Tab completion helps catch case mismatches.
Common Navigation Mistakes
Mistake 1: Spaces in File Names
cd my projectbash: cd: my: No such file or directorySpaces confuse the shell. It thinks 'my' and 'project' are two separate arguments.
Solution: Use quotes or escape the space:
cd "my project"Quotes treat the entire string as one argument. The space is preserved inside the quotes.
cd my\ projectBackslash escapes the space. The space is treated as part of the directory name, not a separator.
Best practice: Avoid spaces in file and directory names. Use underscores or hyphens instead: my_project or my-project.
Mistake 2: Forgetting Where You Are
cd resultsbash: cd: results: No such file or directoryThe 'results' directory doesn't exist in your current location. Check with pwd and ls first.
Solution: Check your location and available directories:
Diagnose Navigation Errors
5 stepsOrganizing Bioinformatics Projects
A well-organized project structure makes navigation easier:
Recommended Project Structure
Benefits of this structure:
- Number prefixes show workflow order
- Descriptive names make purpose clear
- Separate scripts from data for version control
- Read-only raw data prevents accidental deletion
- Easy navigation - you always know where to find files
Many researchers prefix folders with numbers (00_, 01_, 02_) so ls shows them in workflow order. This makes navigation intuitive.
Navigation on HPC Clusters
When working on high-performance computing clusters, navigation has additional considerations:
cd $SCRATCH/rnaseq_analysisUse environment variables to navigate to scratch space. $SCRATCH is set by the cluster and points to fast temporary storage.
cd /project/lab_name/shared_data/reference_genomesShared lab data often lives in project directories with absolute paths. Use absolute paths in submission scripts.
Common HPC environment variables:
$HOME- Your home directory (small quota, for scripts and configs)$SCRATCH- Fast temporary storage (large quota, purged periodically)$WORK- Long-term storage (medium quota, not purged)
On HPC systems, $SCRATCH is often purged after 30-90 days. Never keep your only copy of data there. Use it for intermediate files during analysis.
Quick Reference
Navigation Commands Cheat Sheet
Practice Exercises
Practice navigation commands with interactive exercises
Try these exercises on evomics-learn:
- Navigate through a project directory structure
- Find files using ls with different options
- Use relative paths to move between related directories
- Practice tab completion
- Organize a bioinformatics project
Next Steps
Now that you can navigate the file system, the next section covers working with files: creating, copying, moving, and deleting them.
You'll learn:
- Create directories for your projects
- Copy files between locations
- Rename and move files
- Delete files safely (and how to avoid disasters)