BAM files are Binary sAM files. SAM files are nothing more than big tables that tell us what read from the FastQ file, mapped where exactly on the scaffolds of the metagenome assembly. The rows of this table that we call a SAM file are not ordered in any logical way. When just mapped, the order resembles the (random) order of the reads in the original fastq file.
For many computational purposes, we want to sort/order these rows according to
We will achieve this with the samtools
program.
As the name suggests, samtools
comprises many tools to deal with SAM (or BAM) files: one of which is the samtools sort
tool.
This tool we will use to sort our BAM files.
samtools
also contains the samtools view
tool we used earlier.
Samtools view is used to convert SAM to BAM and also BAM to SAM.
Before we proceed, let's have a quick look if the BAM files are created as we expected.
[DO:] check the mapped directory with ls
and peek inside the bam files with samtools view
.
ls ./data/mapped
samtools view ./data/mapped/L1.bam | head
You should see a lot of tab-delimited names, numbers and sequences.
[DO:] Google how a sam/bam file should look and see if it corresponds to what you get.
the official specifications:"https://samtools.github.io/hts-specs/SAMv1.pdf
wikipedia: https://en.wikipedia.org/wiki/SAM_(file_format)
[Q:] What are the headers of a sam/bam file?
[A:]
Col | Field | Type | Brief description |
---|---|---|---|
1 | QNAME | String | Query template NAME |
2 | FLAG | Int | bitwise FLAG |
3 | RNAME | String | References sequence NAME |
4 | POS | Int | 1- based leftmost mapping POSition |
5 | MAPQ | Int | MAPping Quality |
6 | CIGAR | String | CIGAR string |
7 | RNEXT | String | Ref. name of the mate/next read |
8 | PNEXT | Int | Position of the mate/next read |
9 | TLEN | Int | observed Template LENgth |
10 | SEQ | String | segment SEQuence |
11 | QUAL | String | ASCII of Phred-scaled base QUALity+33 |
We have seen how to make a loop in the backmapping part of this practical. Now let's do the same and sort the BAM files we created earlier.
Before you start, create a new folder where you store your sorted BAM files.
I suggest something like data/sorted
The cell below contains a copy of the loop from the previous notebook. Edit in a way so that the loop sorts your bam files.
samtools sort
Make sure you only use one CPU/thread; we have to share this computer with all of us.
[DO:] make a new directory to store your sorted bam files
mkdir data/sorted
samtools sort
[DO:] use samtools sort
to sort the bamfiles.
Make sure to store them in your new directory.
samples=( L1 L2 L3 P1 P2 P3 )
for i in ${samples[@]}
do samtools sort -o data/sorted/$i.sorted.bam data/mapped/$i.bam
done
[DO:] After sorting, check whether your bam files are sorted correctly.
ls --size
to see if the files exist and have a proportional sizels -s1 ./data/mapped
ls -s1 ./data/sorted
samtools view data/sorted/L1.sorted.bam | head
Did your bam files sort correctly? Then remove the unsorted bam files. We don't need these anymore and we save some disk space.
[DO:] If you are sure, then remove the mapped directory by adding the appropriate option to the command below
rm ./data/mapped -rf