BAM files are Binary sAM files. SAM files are nothing more than big tables that tell us what read from the FastQ file, mapped where exactly on the scaffolds of the metagenome assembly. The rows of this table that we call a SAM file are not ordered in any logical way. When just mapped, the order resembles the (random) order of the reads in the original fastq file.
For many computational purposes, we want to sort/order these rows according to
We will achieve this with the samtools
program.
As the name suggests, samtools
comprises many tools to deal with SAM (or BAM) files: one of which is the samtools sort
tool.
This tool we will use to sort our BAM files.
samtools
also contains the samtools view
tool we used earlier.
Samtools view is used to convert SAM to BAM and also BAM to SAM.
Before we proceed, let's have a quick look if the BAM files are created as we expected, and peek inside with samtools view
.
ls ./ # <-- insert the path where you stored your bam files.
samtools view ./path/to/your/bamfile.bam | head
You should see a lot of tab-delimited names, numbers and sequences. Perhaps google how a sam/bam file should look and see if it corresponds to what you get.
We have seen how to make a loop in the backmapping part of this practical. Now let's do the same and sort the BAM files we created earlier.
Before you start, create a new folder where you store your sorted BAM files.
I suggest something like data/sorted
The cell below contains a copy of the loop from the previous notebook. Edit in a way so that the loop sorts your bam files.
samtools sort
Make sure you only use one CPU/thread; we have to share this computer with all of us.
[DO:] make a new directory to store your sorted bam files
mkdir
[DO:] use samtools sort
to sort the bamfiles.
Make sure to store them in your new directory.
samples=( E1 E2 E3 P1 P2 P3 )
for i in ${samples[@]}
do echo $i
done
[DO:] After sorting, check whether your bam files are sorted correctly.
ls --size
to see if the files exist and have a proportional sizels
samtools
Did your bam files sort correctly? Then remove the unsorted bam files. We don't need these anymore and we save some disk space.
[DO:] If you are sure, then remove the mapped directory by adding the appropriate option to the command below
rm ./data/mapped