In this workshop, we will learn how to analyze 16S rRNA sequencing data using QIIME2, a bioinformatics pipeline designed for microbiome studies. We will work with amplicon data generated from an Illumina MiSeq sequencing platform. QIIME2 allows us to process raw sequencing reads, detect amplicon sequence variants (ASVs), and generate feature tables for further analysis.
QIIME2 can be installed using a Conda environment, which allows us to manage dependencies efficiently. We will use a pre-configured Virtual Machine (VM) for this workshop, which includes all required software and tools.
qiime2-activate
wget 'https://disc-genomics.uibk.ac.at/data/CAME_SSU.tar.gz' -O - | tar -zx
Before running any analysis, ensure all necessary files are available.
cd VM_CAME
ls
ls -lh *.fastq.gz # Lists all FASTQ files with detailed information
QIIME2 uses artifacts (.qza) to store intermediate results and metadata. To import our sequencing data, we must reference the original FASTQ files using a manifest file (CSV format), which lists the paths of our raw sequence files.
cat manifest.csv
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.csv \
--input-format PairedEndFastqManifestPhred33 \
--output-path demux.qza
qiime demux summarize --i-data demux.qza --o-visualization demux.qzv
To view the visualization in your browser:
qiime tools view demux.qzv
The DADA2 algorithm is used for:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza \
--p-n-threads 2 \
--p-trunc-len-f 280 --p-trunc-len-r 220 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza \
--p-trim-left-f 19 --p-trim-left-r 22
--p-trunc-len-f 280 & --p-trunc-len-r 220: Truncate forward (R1) and reverse (R2) reads to remove low-quality bases.--p-trim-left-f 19 & --p-trim-left-r 22: Remove primers (19 bp from forward and 22 bp from reverse reads).denoising-stats.qzarep-seqs.qzatable.qzaqiime metadata tabulate --m-input-file denoising-stats.qza --o-visualization denoising-stats.qzv
qiime tools view denoising-stats.qzv
mkdir denoising-stats
qiime tools export --output-path denoising-stats --input-path denoising-stats.qza
cd denoising-stats
less stats.tsv
mkdir rep_seqs
qiime tools export --output-path rep_seqs --input-path rep-seqs.qza
cd rep_seqs
less dna-sequences.fasta
Convert the table.qza file into a CSV format for further analysis.
mkdir table_otus
qiime tools export --output-path table_otus --input-path table.qza
biom convert -i table_otus/feature-table.biom -o table_otus.csv --to-tsv
Now, the ASV abundance table is available in a CSV file, which can be opened in Excel or other programs for further analysis.
To simplify sequence headers in the FASTA file, rename ASVs:
awk '/^>/ {print ">ASV_" sprintf("%05d", ++i); next} {print}' dna-sequences.fasta > dna-sequences-rename.fasta
To rename ASVs in the feature table CSV file, use:
awk 'NR<=2 {print; next} {print "ASV_" sprintf("%05d", ++i) "\t" $2 "\t" $3 "\t" $4 "\t" $5}' table_otus.csv > table_otus_rename.csv
In this workshop, we have:
These steps provide the foundation for downstream microbial community analysis, such as taxonomic classification, diversity analysis, and functional annotation.
For further QIIME2 documentation, visit: QIIME2 Official Documentation
You have obtained the representative sequences for each ASV, along with a table showing the number of sequences clustered under each ASV across your samples.
You are ready to proceed with analyze representative sequence using the OPU (Operational Phylogenetic Units) approach!