In this hands-on workshop, we will learn how to use the ARB program to cluster Amplicon Sequence Variants (ASVs) obtained from QIIME2 analysis into Operational Taxonomic Units (OPUs). Additionally, we will classify these OPUs taxonomically using the SILVA reference database.
By the end of this session, you will be able to:
The [SILVA database] (https://www.arb-silva.de/download/arb-files/) is commonly used for phylogenetic classification of ribosomal RNA sequences. We will use the SILVA Release 138.2 Ref NR 99 as our reference.
https://www.arb-silva.de/download/arb-files/
Open the working folder:
cd /home/vmuser/Desktop/QIIME2
The file SILVA_138.1_SSURef_NR99_CAME_3.arb is the reduced SILVA file that we will use during the workshop.
The ASV representative sequences generated from QIIME2 are within the working directory:
ls -lh dna-sequences-rename.fasta
This file dna-sequences-rename.fasta contains the representative ASVs sequences that we will classify into OPUs.
Read the file:
less dna-sequences-rename.fasta
To start the ARB software, run:
arb
This will open the ARB graphical interface where we will perform sequence alignment, clustering, and taxonomic classification.
After reconstructing OPUs, ARB will generate an output table named OPUs_OTUs_relation.csv. This file contains the ASVs assigned to OPUs along with their taxonomy based on the SILVA reference database.
To integrate the taxonomic information with the ASV table, we will merge table_otus_rename.csv with OPUs_OTUs_relation.csv.
First, move to the Desktop:
cd /home/vmuser/Desktop/QIIME2
ls table_otus_rename.csv
ls OPUs_OTUs_relation.csv .
echo -e "OPUs\nASVs\tOPUs\tSILVA_taxonomy" | cat - OPUs_OTUs_relation.csv > OPUs_OTUs_relation_2.csv
paste table_otus_rename.csv OPUs_OTUs_relation_2.csv > Table_OPUs_Tax.csv
The final file Table_OPUs_Tax.csv now contains ASVs, their corresponding OPUs, and their taxonomic assignments. Open the file to explore the results!
A custom script is available to calculate the total number of sequences per sample that belong to each OPU.
Execute the script with:
./script_OPUs_count.sh Table_OPUs_Tax.csv
Once the script finishes running, open the output table:
column -t output_OPUs.csv | less
This file provides a summary of the distribution of OPUs across different samples.
By following this workflow, we have successfully:
This approach allows for a refined taxonomic classification beyond standard OTUs, improving our understanding of microbial diversity in environmental samples. 🎯
Let’s discuss any questions and explore the results together!