metabat2
¶Now, we continue the binning procedure with metabat2
.
In this notebook, we will create the actual bins!
We will need:
First, remember where the first two items listed above are.
Use ls
to confirm in the cell below.
[DO:] Locate the scaffolds file:
ls
[DO:] Locate the depth matrix:
ls
[DO:] Make a new directory to store your bins.
Make sure this directory is data/bins
mkdir
[DO:] Read the help page of metabat2
Find out which options you have to use minimally, then make sure you tell MetaBAT to use one thread only!
Supply your depth matrix as the --abdFile. Short for AbundanceFile.
metabat2 -h
[DO:] Bin your scaffolds with metabat2
:
metabat2
[DO:] How many bins were created? Check the directory where you stored your bins.
[Q:] Did you get more or less bins than expected from the length/depth plot you made earlier?
[A:]
Now we have our metagenome binned! Congratulations. Let's try to visualise this similarly as done in Binning part 1. We will use some command line tricks to get all data in a similar sheet. In this case, these are given to you already. If you feel up to the challenge, try to reverse engineer the code and understand what is happening.
[DO:] Run the code below to make data/binlist.txt
# First, we make an empty file with a header in which we will make our table.
echo -e 'bin\tcontigName' > binlist
# Then we move to the folder in which we made the bins.
cd ./data/bins/
# Now, we start a loop for each file that ends with `.fa`
for f in *.fa
do # For each `.fa` file, we extract the bin number and make a variable that we call `name`
name=$(echo $f | cut -d '.' -f 2)
# Continuing in this iteration of the loop, we filter all fasta headers
grep '>' $f | sed "s/^>/$name\t/g"
done | sort -k2 -V >> ../binlist.txt
# directly after filtering, replace the fasta header sign '>' with the `name` variable defined earlier.
# we end the loop and sort all resulting tables at once on the second column
# after sorting, we append our newly made table to the 'binlist' file we defined earlier.
cd ../../
[DO:] Check what the file looks like:
head data/binlist.txt
[DO:] compare these with the names we have in our depth_matrix.txt. They must be exactly the same to join these two different tables into one.
cut -f 1 <<your original depth matrix>> | head
Now, we use the join
command to join the two tables.
There must be a shared field in both tables.
In the first table, this is the second column -1 2
, and in the second file, this is the first column -2 1
.
We then take both files and give them to the join command.
However, since join
is very picky in how files are sorted, we re-sort them on-the-fly like so <(sort -k2d ./somefile.txt)
(second column, sort as dictionary).
Lastly, since both files have headers, we supply the --header
option and save the result as a new file.
Notice that I use the \
character to spread out this very long join
commandline over several lines.
[DO:] fill in the path to your depth matrix below and run the code
join -1 2 -2 1 \
<(sort -k2d ./data/binlist) \
<(sort -k1d <<...your original depth matrix...>>) \
--header \
| tr ' ' "\t" \
> ./data/binned_depth_matrix.tab
[DO:] Visualise the binned_depth_matrix.tab in excel
binned_depth_matrix.tab
and open it in excel.[Q:] Does this colour pattern make more sense than it did before?
[A:]
[Q:] Are there any outliers or mistakes you can spot?
[A:]
[Q:]Can you determine the depth of the six bins from this table? Why is this not taking the mean of all depths? Think about the difference between depth and coverage. You will need to make a pivot table in excel/LibreOffice/googledrive.
[A:]
[Q:] Can you determine the depth of each bin per sample(type)? In other words, which bin is abundant in the L samples, and which bin is abundant in the P samples.
[A:]
[Q:] The research this practical is based on focusses on microbes inside the leaves (L samples). Which bins would you advise me to study further?
[A:]
If you'd like, you can try to vary input signals. For each variation, make sure you save the bins in a separate directory with a clear name.
To modify your depth matrix, have a look at the collumns present:
head -n 3 <<your depth matrix>>
You can select certain columns with the cut command. This example shows you how to select only one replicate of one sample type and save this as a separate depth matrix.
cut -f 1-3,10,11 data/depth_matrix > data/depth_matrix_P1