Now that we have bins made with metabat2
we can check them for contamination and completeness (quality); for this, we will use CheckM.
CheckM is a suite of tools for assessing the quality of bacterial genomes assemblies/bins.
It estimates genome completeness and contamination by using Single Copy Marker Genes (SCMGs) of a specific phylogenetic lineage.
As you will be able to see in the checkm help pages, checkm
has a workflow (lineage_wf
) that will run all necessary steps to assess bin quality.
Lineage_wf (lineage-specific workflow) steps:
Unfortunately, the 'tree' part of this workflow is too memory intensive (about 32Gbytes of RAM (!) ).
So we will cheat a bit.
Instead of the lineage_wf
, we will use the taxonomy_wf
.
The taxonomy_wf
does not determine the lineage of a bin, but checks SCMGs for a lineage that you provide in the commandline.
Hence, we don't load the full tree to find the most appropriate marker set, but assume all bins are bacteria (reasonable assumption in this case) and don't look any deeper than that.
checkm taxonomy_wf -h
The checkm manual may seem somewhat intimidating.
However, remember that the options in square brackets are optional [optional argument]
.
Those without brackets are mandatory.
[DO:] Run the checkm taxonomy_wf
on the bins you created:
checkm taxonomy_wf
If CheckM
doesn't work propperly, you can see an example output here
[Q:] What can you say about the binning quality
[A:]
A CheckM output of the full lineage_wf
is available online here
[Q:] Is the taxonomy of the bins a surprise given the nature of the sample?
[A:]
You can create an extended checkm table with more information.
checkm qa --help
Did you try to vary binning parameters in the previous notebook? If so, run these through Checkm as well. Remember to create clear and separate output directories.
Are the bins of similar quality?