Binning part 1: calculate depth

Now we have all ingredients to continue binning: the scaffolds and bam files containing reads mapped on those scaffolds. In metagenomics, binning is the process of grouping reads or contigs and assigning them to operational taxonomic units. Binning methods can be based on either compositional features, alignment (similarity), or both. metabat2 uses both the contig depth and tetra-nucleotide frequencies to bin the contigs. Every bin will ideally represent one microbial genome from one particular microbe that was in the original DNA extraction.

The first step in the binning process, is to calculate the contig depths from all bam files that were created before. All these depths are stored in one big table, which is then passed to metabat2. We achieve this with a script that comes with metabat2: jgi_summarize_bam_contig_depths

[DO:] See how the jgi_summarize_bam_contig_depths script works:

In [ ]:
jgi_summarize_bam_contig_depths -h

Remember to find the usage line first. Then make sure you find the --outputDepth option. Notice that the help page tells you to supply an arg(ument) where to store the depth output file. Specify a path to a file that this script will create. Something like this:

./script --outputDepth /path/to/depth_matrix.tab

Remember than you can use bash to point to multiple files with a "glob" or "asterisk". A glob looks like this directory name/* and includes all files included in the directory.

[DO:] Try using the * with ls first. List all sorted bam files you created.

In [ ]:
ls ./

[DO:] Run the script jgi_summarize_bam_contig_depths to calculate the average depth per contig over all six samples.

In [ ]:
jgi_summarize_bam_contig_depths

The output of this process is the input for MetaBAT in the next step. After the jgi script finishes, make sure you check that the file contains a table. If so, please remove all BAM files. We don't need these anymore.

[DO:] Check if your depth matrix contains data:

In [ ]:
head ..path/to/your/deth_matrix... # <-- substitute this path for the file you created with the jgi script.

[DO:] When you're sure, remove the sorted bams:

In [ ]:
rm ./data/sorted # double- & triple-check that your depth_matrix is OK before you remove your sorted bam files

Depth matrix visualisation

Now you have your depth_matrix, let's take a moment and reflect upon what this matrix does and how it helps in binning the microbial contigs. For this part, we will visualise the depth_matrix file in excel (or a similar spreadsheet editor) on your computer.

[DO:] Follow these steps:

  1. Download the depth_matrix to your personal computer.
  2. The depth matrix is a big table in which columns are delimited by TABs. Open your data in excel and make sure all data is displayed as columns
  1. [Q:] interpret the table
    1. What do the columns represent?
    2. What do the rows represent?
  1. [A:]

[DO:]

  1. For clarity, remove all columns except those that display the depth data.
  2. Check if you have one column per sample.
  3. Find the option for conditional formatting, filling the cells with colour depending on their content.
  4. Color all cells in the excel sheet according to a colour gradient with three colors.
  1. [Q:] interpret the table
    1. Can you identify two rows with a similar colour pattern,
    2. what does that mean if these two have a similar colour pattern?
  1. [A:]

Now move on to binning part2!