Now we have all ingredients to continue binning: the scaffolds and bam files containing reads mapped on those scaffolds.
In metagenomics, binning is the process of grouping reads or contigs and assigning them to operational taxonomic units.
Binning methods can be based on either compositional features, alignment (similarity), or both.
metabat2
uses both the contig depth and tetra-nucleotide frequencies to bin the contigs.
Every bin will ideally represent one microbial genome from one particular microbe that was in the original DNA extraction.
The first step in the binning process, is to calculate the contig depths from all bam files that were created before.
All these depths are stored in one big table, which is then passed to metabat2
.
We achieve this with a script that comes with metabat2
: jgi_summarize_bam_contig_depths
[DO:] See how the jgi_summarize_bam_contig_depths
script works:
jgi_summarize_bam_contig_depths -h
Remember to find the usage line first.
Then make sure you find the --outputDepth
option.
Notice that the help page tells you to supply an arg(ument) where to store the depth output file.
Specify a path to a file that this script will create. Something like this:
./script --outputDepth /path/to/depth_matrix.tab
Remember than you can use bash to point to multiple files with a "glob" or "asterisk". A glob looks like this directory name/* and includes all files included in the directory.
[DO:] Try using the *
with ls
first. List all sorted bam files you created.
ls ./data/sorted/*.sorted.bam
[DO:] Run the script jgi_summarize_bam_contig_depths
to calculate the average depth per contig over all six samples.
jgi_summarize_bam_contig_depths --outputDepth ./data/depth_matrix.tab ./data/sorted/*.sorted.bam
The output of this process is the input for MetaBAT in the next step. After the jgi script finishes, make sure you check that the file contains a table. If so, please remove all BAM files. We don't need these anymore.
[DO:] Check if your depth matrix contains data:
head data/depth_matrix.tab
[DO:] When you're sure, remove the sorted bams:
rm -rf ./data/sorted # double- and triple-check that your depth matrix is OK before removing the sorted bam files.
Now you have your depth_matrix, let's take a moment and reflect upon what this matrix does and how it helps in binning the microbial contigs. For this part, we will visualise the depth_matrix file in excel (or a similar spreadsheet editor) on your computer.
[DO:] Follow these steps:
[DO:]
Now move on to binning part2!