Working with Files

Now that you can navigate the file system, it's time to manipulate it. This section covers the essential commands for organizing bioinformatics projects: creating directories, copying files, renaming them, and (carefully) deleting them.

Unlike graphical interfaces, the terminal has no undo button and no recycle bin. Deleted files are gone forever. We'll learn safe practices to prevent disasters.

mkdir - Make Directory

Create new directories to organize your project:

InputSuccess
mkdir rnaseq_analysis

Creates a new directory called rnaseq_analysis in your current location. Nothing is printed if successful.

Verify it was created:

Input0.02sSuccess
ls -lh
Output
drwxr-xr-x  2 user group  4.0K Nov 20 15:30 rnaseq_analysis

The new directory appears in the listing. The 'd' at the start indicates it's a directory.

Create Multiple Directories at Once

InputSuccess
mkdir raw_data processed_data results

Creates three directories with a single command. Separate multiple names with spaces.

Input0.01sSuccess
ls
Output
processed_data  raw_data  results

All three directories were created.

Create Nested Directories

Input
mkdir results/qc_reports/fastqc
Output
mkdir: cannot create directory 'results/qc_reports/fastqc': No such file or directory

This fails because the parent directory 'qc_reports' doesn't exist yet. mkdir doesn't create intermediate directories by default.

Solution: Use the -p flag to create parent directories:

InputSuccess
mkdir -p results/qc_reports/fastqc

The -p flag creates all necessary parent directories. Now results, qc_reports, and fastqc are all created in one command.

Always use mkdir -p when creating nested directories. It's safe even if some directories already exist - it won't give an error.

Practical Example: Set Up Project Structure

InputSuccess
mkdir -p rnaseq_project/{raw_data,quality_control,trimmed,alignments,counts,scripts}

Creates a project with six subdirectories in one command. The curly braces {} are expanded by the shell to create multiple directories.

Verify the structure:

Input0.02sSuccess
ls -l rnaseq_project/
Output
drwxr-xr-x  2 user group  4096 Nov 20 15:32 alignments
drwxr-xr-x  2 user group  4096 Nov 20 15:32 counts
drwxr-xr-x  2 user group  4096 Nov 20 15:32 quality_control
drwxr-xr-x  2 user group  4096 Nov 20 15:32 raw_data
drwxr-xr-x  2 user group  4096 Nov 20 15:32 scripts
drwxr-xr-x  2 user group  4096 Nov 20 15:32 trimmed

All six directories were created from a single command.

cp - Copy Files

Copy files from one location to another:

InputSuccess
cp sample.fastq.gz backup_sample.fastq.gz

Creates a copy of sample.fastq.gz named backup_sample.fastq.gz in the same directory. Original file remains unchanged.

Copy to a Different Directory

InputSuccess
cp sample.fastq.gz raw_data/

Copies sample.fastq.gz into the raw_data directory. The file keeps the same name in the destination.

InputSuccess
cp sample.fastq.gz raw_data/renamed_sample.fastq.gz

Copies and renames in one command. The file is copied to raw_data with a new name.

Copy Multiple Files

InputSuccess
cp sample_01.fastq.gz sample_02.fastq.gz sample_03.fastq.gz raw_data/

Copies multiple files to the same destination. The last argument must be a directory.

Using wildcards:

InputSuccess
cp *.fastq.gz raw_data/

Copy all FASTQ files at once using a wildcard. The * matches any characters, so this copies every file ending in .fastq.gz.

Copy Directories

Input
cp scripts backup_scripts
Output
cp: scripts is a directory (not copied).

Copying directories requires the -r (recursive) flag. Without it, cp refuses to copy directories.

Solution: Use -r to copy recursively:

InputSuccess
cp -r scripts backup_scripts

The -r flag copies the directory and all its contents recursively. Now backup_scripts contains everything from scripts.

Preserve File Attributes

InputSuccess
cp -p reference_genome.fasta backup_reference.fasta

The -p flag preserves timestamps, permissions, and ownership. Useful when you need to maintain file metadata.

For bioinformatics, consider using cp -rp when backing up entire analysis directories. This preserves timestamps that show when analyses were run.

Practical Example: Backup Before Analysis

Create Analysis Backup

3 steps
mkdir -p backups/$(date +%Y-%m-%d)
Output
Created directory: backups/2025-11-20

mv - Move (Rename) Files

The mv command does two things: rename files and move files between directories.

Rename a File

InputSuccess
mv old_name.fastq.gz new_name.fastq.gz

Renames old_name.fastq.gz to new_name.fastq.gz in the same directory. The original name no longer exists.

Unlike cp, the mv command does not create a copy. The original file is gone. If new_name.fastq.gz already exists, it will be overwritten without warning (unless you use -i).

Move to a Different Directory

InputSuccess
mv sample.fastq.gz raw_data/

Moves sample.fastq.gz into the raw_data directory. The file no longer exists in the original location.

Move and Rename

InputSuccess
mv results/old_analysis.txt archive/analysis_2024-11-20.txt

Moves the file from results to archive while also renaming it. Combines moving and renaming in one command.

Move Multiple Files

InputSuccess
mv *.bam alignments/

Move all BAM files to the alignments directory. The * wildcard matches all files ending in .bam.

Safe Move with Confirmation

InputSuccess
mv -i important_file.txt results/
Output
mv: overwrite 'results/important_file.txt'? 

The -i flag prompts before overwriting. If important_file.txt already exists in results/, you'll be asked to confirm.

Rename Multiple Files

Renaming multiple files requires a loop (covered in depth later):

InputSuccess
for file in *.txt; do mv "$file" "${file%.txt}.processed.txt"; done

Renames all .txt files to .processed.txt. For example, data.txt becomes data.processed.txt. The loop processes each file individually.

Practical Example: Organize Downloaded Data

Organize Sequencing Data

3 steps
mkdir -p data/{2024,2025}/{raw,processed}
Output
Created directories: data/2024/raw, data/2024/processed, data/2025/raw, data/2025/processed

rm - Remove (Delete) Files

DANGER: No Undo

The rm command permanently deletes files. There is no recycle bin. There is no undo. Deleted files are gone forever. Be absolutely certain before using rm.

Delete a Single File

InputSuccess
rm temporary_file.txt

Deletes temporary_file.txt permanently. No confirmation, no recovery. The file is gone.

Safe Deletion with Confirmation

InputSuccess
rm -i important_data.txt
Output
rm: remove regular file 'important_data.txt'? 

The -i flag prompts for confirmation before deleting. Type 'y' for yes or 'n' for no.

Consider aliasing rm to 'rm -i' in your .bashrc file. This adds a safety check to every deletion.

Delete Multiple Files

InputSuccess
rm file1.txt file2.txt file3.txt

Deletes multiple files. All three are removed permanently.

Using wildcards (dangerous):

InputSuccess
rm *.tmp

Deletes all files ending in .tmp. Be very careful with wildcards - verify files first with ls *.tmp before deleting.

Delete Directories

Input
rm results
Output
rm: cannot remove 'results': Is a directory

Regular rm cannot delete directories. This is a safety feature to prevent accidentally removing entire directory trees.

Use -r to remove recursively:

InputSuccess
rm -r old_analysis

The -r flag recursively deletes the directory and everything inside it. This cannot be undone.

Use -rf to force removal without confirmation:

InputSuccess
rm -rf temporary_results

The -f flag forces deletion without prompting. Extremely dangerous. The -rf combination is the most destructive command in UNIX.

NEVER DO THIS

rm -rf / - Deletes your entire system rm -rf ~ - Deletes your entire home directory rm -rf * - Deletes everything in current directory

Always triple-check your command before pressing Enter, especially with -rf.

Safe Deletion Practices

Safe File Deletion Workflow

4 steps
ls *.tmp
Output
analysis_1.tmp
analysis_2.tmp
temp_alignments.tmp

Alternative to Deletion: Move to Trash

Instead of deleting, consider moving files to a trash directory:

InputSuccess
mkdir -p ~/.trash

Create a trash directory in your home folder (one-time setup).

InputSuccess
mv old_file.txt ~/.trash/

Move files to trash instead of deleting. You can recover them if needed.

InputSuccess
rm -rf ~/.trash/*

Empty the trash when you're certain you don't need those files. Review contents first with ls ~/.trash/

Wildcards and Patterns

Wildcards let you operate on multiple files at once. Powerful but dangerous.

Common Wildcard Patterns

1# Asterisk (*) - matches any characters (including none)
2*.fastq.gz # All files ending in .fastq.gz
3sample_* # All files starting with sample_
4*_R1_* # All files containing _R1_
5
6# Question mark (?) - matches exactly one character
7sample_?.fastq # sample_1.fastq, sample_2.fastq (not sample_10.fastq)
8file?.txt # file1.txt, fileA.txt (not file10.txt)
9
10# Square brackets [] - matches one character from a set
11sample_[123].fastq # sample_1.fastq, sample_2.fastq, sample_3.fastq
12file_[A-Z].txt # file_A.txt through file_Z.txt
13data_[0-9].csv # data_0.csv through data_9.csv
14
15# Curly braces {} - expands to multiple patterns
16*.{fastq,fasta} # All files ending in .fastq OR .fasta
17sample_{01,02,03} # sample_01, sample_02, sample_03
Format Details
1
Asterisk: Most common wildcard - matches zero or more characters
6
Question Mark: Matches exactly one character, useful for single-digit numbers
10
Brackets: Match one character from a defined set or range
14
Braces: Comma-separated alternatives - not a wildcard but very useful

Testing Wildcards Safely

Always test wildcards with ls before using them with rm:

Test Before Delete

3 steps
ls sample_*_R1.fastq.gz
Output
sample_01_R1.fastq.gz
sample_02_R1.fastq.gz
sample_03_R1.fastq.gz

Practical Scenarios

Scenario 1: Organize Downloaded Sequences

Organize SRA Downloads

4 steps
mkdir -p project/raw_data

Scenario 2: Clean Up Failed Analysis

Remove Failed Analysis Outputs

4 steps
ls -lh alignments/
Output
-rw-r--r-- 1 user group   45G Nov 19 16:34 sample_01.bam
-rw-r--r-- 1 user group  234K Nov 19 16:15 sample_02.bam (incomplete)
-rw-r--r-- 1 user group  456K Nov 19 16:16 sample_03.bam (incomplete)

Scenario 3: Archive Completed Project

Archive and Compress Project

4 steps
mkdir -p archives/2024

Common Mistakes and How to Avoid Them

Mistake 1: Overwriting Files

InputSuccess
cp new_analysis.txt results.txt

If results.txt already exists, it will be overwritten without warning. The original results.txt is lost forever.

Solution: Use -i for interactive mode or check first:

InputSuccess
ls results.txt && echo 'File exists! Choose different name.'
Output
results.txt
File exists! Choose different name.

Check if the file exists before copying. The && means 'run second command only if first succeeds.'

Mistake 2: Deleting Wrong Files

InputSuccess
rm *

DANGER: Deletes everything in the current directory. Always be explicit about what you're deleting.

What you probably meant:

InputSuccess
rm *.tmp

Delete only temporary files with .tmp extension. Be specific with your patterns.

Mistake 3: Space in Wildcard

Input
rm * .tmp
Output
rm: cannot remove '*': No such file or directory

The space between * and .tmp makes these two separate arguments. First it tries to delete a file literally named '*', then one named '.tmp'. Neither exists, but if * existed as a filename, it would be deleted.

Correct:

InputSuccess
rm *.tmp

No space - this is one pattern that matches files ending in .tmp.

Quick Reference

File Operations Cheat Sheet

1# Create directories
2mkdir directory_name # Create one directory
3mkdir -p path/to/nested/dir # Create nested directories
4mkdir dir1 dir2 dir3 # Create multiple directories
5
6# Copy files
7cp file1 file2 # Copy file1 to file2
8cp file destination_dir/ # Copy to directory
9cp *.txt backup/ # Copy multiple files
10cp -r dir1 dir2 # Copy directory recursively
11cp -rp dir1 dir2 # Copy preserving attributes
12
13# Move/rename files
14mv old_name new_name # Rename file
15mv file destination_dir/ # Move file
16mv *.fastq raw_data/ # Move multiple files
17mv -i file destination/ # Interactive mode (confirm overwrites)
18
19# Delete files (DANGEROUS)
20rm file # Delete file (permanent!)
21rm -i file # Delete with confirmation
22rm *.tmp # Delete multiple files
23rm -r directory # Delete directory and contents
24rm -rf directory # Force delete (very dangerous)
25
26# Safe practices
27ls pattern* | wc -l # Count files before deleting
28ls pattern* # List files before deleting
29mv file ~/.trash/ # Move to trash instead of deleting
Format Details
1
mkdir: Create directories - use -p for nested directories
6
cp: Copy files - use -r for directories, -p to preserve attributes
13
mv: Move or rename - original file is removed
19
rm: Delete permanently - no undo, use with extreme caution
26
Safety: Always verify before deleting

Best Practices

File Management Best Practices
  1. Test patterns first: Use ls to verify what files match before using rm
  2. Use descriptive names: sample_01_R1.fastq.gz is better than s1.fq
  3. Organize by date: Prefix directories with YYYY-MM-DD for chronological sorting
  4. Version control for scripts: Keep scripts in git, not just copied to backups
  5. Document your structure: Include a README explaining directory organization
  6. Backup before major changes: cp -rp important directories before risky operations
  7. Use -i flag: Add confirmation prompts to destructive commands
  8. Verify after operations: Always check that copy/move/delete did what you expected

Practice Exercises

Practice in evomics-learn

Practice file operations in a safe environment

Try these exercises on evomics-learn:

  1. Create a project directory structure
  2. Organize files using cp and mv
  3. Practice safe deletion with -i flag
  4. Use wildcards to manipulate multiple files
  5. Recover from mistakes using backups

Next Steps

Now that you can create and manipulate files, the next section covers viewing file contents without opening them in editors. This is essential for bioinformatics where files are often too large for text editors.

You'll learn:

  • Preview files with head and tail
  • Search through files with less
  • Display entire files with cat
  • Count lines, words, and characters with wc

Further Reading