Sorting bam files

BAM files are Binary sAM files. SAM files are nothing more than big tables that tell us what read from the FastQ file, mapped where exactly on the scaffolds of the metagenome assembly. The rows of this table that we call a SAM file are not ordered in any logical way. When just mapped, the order resembles the (random) order of the reads in the original fastq file.

For many computational purposes, we want to sort/order these rows according to

  1. the scaffold they mapped on
  2. the position on that scaffold. (position in bp)

We will achieve this with the samtools program. As the name suggests, samtools comprises many tools to deal with SAM (or BAM) files: one of which is the samtools sort tool. This tool we will use to sort our BAM files.

samtools also contains the samtools view tool we used earlier. Samtools view is used to convert SAM to BAM and also BAM to SAM.

Before we proceed, let's have a quick look if the BAM files are created as we expected, and peek inside with samtools view.

In [ ]:
ls ./  # <-- insert the path where you stored your bam files.
In [ ]:
samtools view ./path/to/your/bamfile.bam | head

You should see a lot of tab-delimited names, numbers and sequences. Perhaps google how a sam/bam file should look and see if it corresponds to what you get.

Another loop

We have seen how to make a loop in the backmapping part of this practical. Now let's do the same and sort the BAM files we created earlier.

Before you start, create a new folder where you store your sorted BAM files. I suggest something like data/sorted

The cell below contains a copy of the loop from the previous notebook. Edit in a way so that the loop sorts your bam files.

  1. Make sure to do this step by step. Test every little thing you change in the loop
  2. Don't forget to very carefully read the help page of samtools sort
    • You want to sort the reads by coordinate (which is the default), not by name.

Make sure you only use one CPU/thread; we have to share this computer with all of us.

[DO:] make a new directory to store your sorted bam files

In [ ]:
mkdir

[DO:] use samtools sort to sort the bamfiles. Make sure to store them in your new directory.

In [ ]:
samples=( E1 E2 E3 P1 P2 P3 )
for i in ${samples[@]}
    do echo $i
done

check

[DO:] After sorting, check whether your bam files are sorted correctly.

  1. Use ls --size to see if the files exist and have a proportional size
  2. run samtools view to view your BAM files.
In [ ]:
ls
In [ ]:
samtools

clean

Did your bam files sort correctly? Then remove the unsorted bam files. We don't need these anymore and we save some disk space.

[DO:] If you are sure, then remove the mapped directory by adding the appropriate option to the command below

In [ ]:
rm ./data/mapped