Working with Files
Now that you can navigate the file system, it's time to manipulate it. This section covers the essential commands for organizing bioinformatics projects: creating directories, copying files, renaming them, and (carefully) deleting them.
Unlike graphical interfaces, the terminal has no undo button and no recycle bin. Deleted files are gone forever. We'll learn safe practices to prevent disasters.
mkdir - Make Directory
Create new directories to organize your project:
mkdir rnaseq_analysisCreates a new directory called rnaseq_analysis in your current location. Nothing is printed if successful.
Verify it was created:
ls -lhdrwxr-xr-x 2 user group 4.0K Nov 20 15:30 rnaseq_analysisThe new directory appears in the listing. The 'd' at the start indicates it's a directory.
Create Multiple Directories at Once
mkdir raw_data processed_data resultsCreates three directories with a single command. Separate multiple names with spaces.
lsprocessed_data raw_data resultsAll three directories were created.
Create Nested Directories
mkdir results/qc_reports/fastqcmkdir: cannot create directory 'results/qc_reports/fastqc': No such file or directoryThis fails because the parent directory 'qc_reports' doesn't exist yet. mkdir doesn't create intermediate directories by default.
Solution: Use the -p flag to create parent directories:
mkdir -p results/qc_reports/fastqcThe -p flag creates all necessary parent directories. Now results, qc_reports, and fastqc are all created in one command.
Always use mkdir -p when creating nested directories. It's safe even if some directories already exist - it won't give an error.
Practical Example: Set Up Project Structure
mkdir -p rnaseq_project/{raw_data,quality_control,trimmed,alignments,counts,scripts}Creates a project with six subdirectories in one command. The curly braces {} are expanded by the shell to create multiple directories.
Verify the structure:
ls -l rnaseq_project/drwxr-xr-x 2 user group 4096 Nov 20 15:32 alignments
drwxr-xr-x 2 user group 4096 Nov 20 15:32 counts
drwxr-xr-x 2 user group 4096 Nov 20 15:32 quality_control
drwxr-xr-x 2 user group 4096 Nov 20 15:32 raw_data
drwxr-xr-x 2 user group 4096 Nov 20 15:32 scripts
drwxr-xr-x 2 user group 4096 Nov 20 15:32 trimmedAll six directories were created from a single command.
cp - Copy Files
Copy files from one location to another:
cp sample.fastq.gz backup_sample.fastq.gzCreates a copy of sample.fastq.gz named backup_sample.fastq.gz in the same directory. Original file remains unchanged.
Copy to a Different Directory
cp sample.fastq.gz raw_data/Copies sample.fastq.gz into the raw_data directory. The file keeps the same name in the destination.
cp sample.fastq.gz raw_data/renamed_sample.fastq.gzCopies and renames in one command. The file is copied to raw_data with a new name.
Copy Multiple Files
cp sample_01.fastq.gz sample_02.fastq.gz sample_03.fastq.gz raw_data/Copies multiple files to the same destination. The last argument must be a directory.
Using wildcards:
cp *.fastq.gz raw_data/Copy all FASTQ files at once using a wildcard. The * matches any characters, so this copies every file ending in .fastq.gz.
Copy Directories
cp scripts backup_scriptscp: scripts is a directory (not copied).Copying directories requires the -r (recursive) flag. Without it, cp refuses to copy directories.
Solution: Use -r to copy recursively:
cp -r scripts backup_scriptsThe -r flag copies the directory and all its contents recursively. Now backup_scripts contains everything from scripts.
Preserve File Attributes
cp -p reference_genome.fasta backup_reference.fastaThe -p flag preserves timestamps, permissions, and ownership. Useful when you need to maintain file metadata.
For bioinformatics, consider using cp -rp when backing up entire analysis directories. This preserves timestamps that show when analyses were run.
Practical Example: Backup Before Analysis
Create Analysis Backup
3 stepsmv - Move (Rename) Files
The mv command does two things: rename files and move files between directories.
Rename a File
mv old_name.fastq.gz new_name.fastq.gzRenames old_name.fastq.gz to new_name.fastq.gz in the same directory. The original name no longer exists.
Unlike cp, the mv command does not create a copy. The original file is gone. If new_name.fastq.gz already exists, it will be overwritten without warning (unless you use -i).
Move to a Different Directory
mv sample.fastq.gz raw_data/Moves sample.fastq.gz into the raw_data directory. The file no longer exists in the original location.
Move and Rename
mv results/old_analysis.txt archive/analysis_2024-11-20.txtMoves the file from results to archive while also renaming it. Combines moving and renaming in one command.
Move Multiple Files
mv *.bam alignments/Move all BAM files to the alignments directory. The * wildcard matches all files ending in .bam.
Safe Move with Confirmation
mv -i important_file.txt results/mv: overwrite 'results/important_file.txt'? The -i flag prompts before overwriting. If important_file.txt already exists in results/, you'll be asked to confirm.
Rename Multiple Files
Renaming multiple files requires a loop (covered in depth later):
for file in *.txt; do mv "$file" "${file%.txt}.processed.txt"; doneRenames all .txt files to .processed.txt. For example, data.txt becomes data.processed.txt. The loop processes each file individually.
Practical Example: Organize Downloaded Data
Organize Sequencing Data
3 stepsrm - Remove (Delete) Files
The rm command permanently deletes files. There is no recycle bin. There is no undo. Deleted files are gone forever. Be absolutely certain before using rm.
Delete a Single File
rm temporary_file.txtDeletes temporary_file.txt permanently. No confirmation, no recovery. The file is gone.
Safe Deletion with Confirmation
rm -i important_data.txtrm: remove regular file 'important_data.txt'? The -i flag prompts for confirmation before deleting. Type 'y' for yes or 'n' for no.
Consider aliasing rm to 'rm -i' in your .bashrc file. This adds a safety check to every deletion.
Delete Multiple Files
rm file1.txt file2.txt file3.txtDeletes multiple files. All three are removed permanently.
Using wildcards (dangerous):
rm *.tmpDeletes all files ending in .tmp. Be very careful with wildcards - verify files first with ls *.tmp before deleting.
Delete Directories
rm resultsrm: cannot remove 'results': Is a directoryRegular rm cannot delete directories. This is a safety feature to prevent accidentally removing entire directory trees.
Use -r to remove recursively:
rm -r old_analysisThe -r flag recursively deletes the directory and everything inside it. This cannot be undone.
Use -rf to force removal without confirmation:
rm -rf temporary_resultsThe -f flag forces deletion without prompting. Extremely dangerous. The -rf combination is the most destructive command in UNIX.
rm -rf / - Deletes your entire system
rm -rf ~ - Deletes your entire home directory
rm -rf * - Deletes everything in current directory
Always triple-check your command before pressing Enter, especially with -rf.
Safe Deletion Practices
Safe File Deletion Workflow
4 stepsAlternative to Deletion: Move to Trash
Instead of deleting, consider moving files to a trash directory:
mkdir -p ~/.trashCreate a trash directory in your home folder (one-time setup).
mv old_file.txt ~/.trash/Move files to trash instead of deleting. You can recover them if needed.
rm -rf ~/.trash/*Empty the trash when you're certain you don't need those files. Review contents first with ls ~/.trash/
Wildcards and Patterns
Wildcards let you operate on multiple files at once. Powerful but dangerous.
Common Wildcard Patterns
Testing Wildcards Safely
Always test wildcards with ls before using them with rm:
Test Before Delete
3 stepsPractical Scenarios
Scenario 1: Organize Downloaded Sequences
Organize SRA Downloads
4 stepsScenario 2: Clean Up Failed Analysis
Remove Failed Analysis Outputs
4 stepsScenario 3: Archive Completed Project
Archive and Compress Project
4 stepsCommon Mistakes and How to Avoid Them
Mistake 1: Overwriting Files
cp new_analysis.txt results.txtIf results.txt already exists, it will be overwritten without warning. The original results.txt is lost forever.
Solution: Use -i for interactive mode or check first:
ls results.txt && echo 'File exists! Choose different name.'results.txt
File exists! Choose different name.Check if the file exists before copying. The && means 'run second command only if first succeeds.'
Mistake 2: Deleting Wrong Files
rm *DANGER: Deletes everything in the current directory. Always be explicit about what you're deleting.
What you probably meant:
rm *.tmpDelete only temporary files with .tmp extension. Be specific with your patterns.
Mistake 3: Space in Wildcard
rm * .tmprm: cannot remove '*': No such file or directoryThe space between * and .tmp makes these two separate arguments. First it tries to delete a file literally named '*', then one named '.tmp'. Neither exists, but if * existed as a filename, it would be deleted.
Correct:
rm *.tmpNo space - this is one pattern that matches files ending in .tmp.
Quick Reference
File Operations Cheat Sheet
Best Practices
- Test patterns first: Use
lsto verify what files match before usingrm - Use descriptive names:
sample_01_R1.fastq.gzis better thans1.fq - Organize by date: Prefix directories with YYYY-MM-DD for chronological sorting
- Version control for scripts: Keep scripts in git, not just copied to backups
- Document your structure: Include a README explaining directory organization
- Backup before major changes:
cp -rpimportant directories before risky operations - Use -i flag: Add confirmation prompts to destructive commands
- Verify after operations: Always check that copy/move/delete did what you expected
Practice Exercises
Practice file operations in a safe environment
Try these exercises on evomics-learn:
- Create a project directory structure
- Organize files using cp and mv
- Practice safe deletion with -i flag
- Use wildcards to manipulate multiple files
- Recover from mistakes using backups
Next Steps
Now that you can create and manipulate files, the next section covers viewing file contents without opening them in editors. This is essential for bioinformatics where files are often too large for text editors.
You'll learn:
- Preview files with head and tail
- Search through files with less
- Display entire files with cat
- Count lines, words, and characters with wc
Further Reading
- Viewing File Contents - Next topic
- GNU Coreutils Manual
- Safe rm practices