Bin taxonomy

For those of you that are very fast, we will determine Bin taxonomy. This is a very computationally intensive task; if you can run this overnight, then do. Otherwise, you can skip it. Also check if the server has sufficient memmory to do this task! See the code below.

Taxonomy determination can be done by certain marker genes present in a bin or by the gene content of a bin. The former is what checkm normally does, except we don't have to resources to do that today. The latter is what a tool called CAT does, not the same as cat. CAT, the Contig Annotation Tool, finds all open reading frames in a contig or bin, blasts this to the blast protein database, and combines all results to predict taxonomy of a contig or bin.

Install CAT via conda as you did with checkm. Create a new environment Named 'cat', using the Channel 'bioconda', installing 'CAT' and 'diamond' version 0.9.21.

conda create -n cat -c bioconda CAT diamond=0.9.21

conda activate cat

And run CAT like this. Note that you should fill two things:

  1. The path to your bins with the -b option
  2. An output prefix with the -o option.

CAT bins -b <</path/to/your/bins>> -d /projects/03b52fdc-4de1-4e3f-924b-2de9148c4f74/metagenomics_reference_databases/CAT_prepare_20190719/2019-07-19_CAT_database/ -t /projects/03b52fdc-4de1-4e3f-924b-2de9148c4f74/metagenomics_reference_databases/CAT_prepare_20190719/2019-07-19_taxonomy/ -s fa -n 1 -o <</path/to/your/output/prefix>>

Enough memory?

Only run the command above if there is more than 20GB of RAM available, use the command below to check:

In [0]:
free -g