The webpage you are looking at is called a Jupyter notebook. It is a webpage on which you can write text (like this text) and also code. You can execute the code in the webpage, and the output returns to you within the same notebook! This may sound trivial, but it's "really cool" to put it in non-scientific terms. Code and text are entered in individual cells, a code cell or a text cell. This cell you are reading now is a text cell. Next, let's look at a code cell and execute it. There are two ways of executing a cell. First you select a cell, either with your mouse or with the up and down keys on your keyboard. Then, you execute it by hitting the 'run' button in the toolbar above, or you hit CTRL+RETURN.
First we'll learn how to work with Jupyter notebooks, then we'll move on to learn the basics of Bash.
Working with Cells in Jupyter is quite straightforward. You learn best by doing, so do all of the things listed below:
Try creating a new cell below this cell. Make at least a text cell and a code cell.
In the metagenomics practical, we will be working mostly in Linux' mother tongue: BASH. However, juPYter notebooks work natively in PYthon. Still, we can work in BASH. Often, this works by itself. Sometimes, you will need to type either %%bash at the beginning of a code cell. Or by preceding every bash command with an exclamation mark. Now, let's get to work and learn some Bash!
execute the code cell below:
echo "hello world"
In the bash language, the first word you type is always the command. So, in this case, that was:
echo
This command 'echoes' whatever you give it in the terminal. After you 'call' the command, you give it an argument in this case that is:
"hello world"
This basic structure of 'command' 'arguments' comes back through the metagenomics practical.
do: now try to change "hello world" to something else in the cell below:
echo "hello world"
Often, an argument is a path to a file.
We have the ls
command to see what files we have.
ls
We are learning fast.
So we now know what a command
is, and we know what an argument
is.
Finally, we also need to know what options are.
Options are optional extra information that we pass to the command
.
Options are often provided in between the command
and the argument
.
They look either like this
ls --size --human-readable
or in shortened versions like this
ls -sh
Note that the above two commands are synonymous. Try out in the cell below:
ls --size --human-readable data/
ls -sh ./data/
Commands
, options
and arguments
are separated by spaces.
Also note that options
can have their own arguments
.
If this is the case, the manual or help page will specify this.
We will get the manual and help pages later.
Auto-complete is one of the best features of the bash
language and your greatest friend during this practical. Let's say we want to list (ls
) the contents of the data/
folder but are too lazy to type the whole word 'data/'. Then we can type
ls da
and then hit the TAB button on your keyboard. Bash should either automatically complete the path to
ls data/
or if there are multiple options to auto-complete, bash
will give you a little menu with these options.
Using autocomplete does not only make your life a lot easier, but it also prevents you from making typos! If bash autocomplete doesn't work, odds are something in your command or argument is wrong. Best to check before you proceed!
Try out auto-complete below
ls data
Bash can hand the output of one program to another. This is called piping. If you pipe the output of multiple programs to each other, you made a 'pipeline'. Pipelines look somewhat like this
command1 | command2 | command 3
One trick with pipes that we will often use is the | head
pipe.
This pipe shows you only the first ten lines of the output of some command.
| head -n 1
changes this number to 1.
See for yourself below.
ls -1 data/reads/
ls -1 data/reads/ | head -n 1
Loops are a useful feature you'll find in most (if not all) programming languages. They are also quite intuitive to use. A loop simply is a series of commands that does the same thing multiple times, but with some small adaptation. Let's make a loop together. First, we need to have two concepts clear
A variable is a specific word that means something else; this something may vary. Hence the name. We can specify a variable like this:
variable1=coffee
To refer to the content of a variable, we use a $
sign. So this looks like so
echo $variable1
Now enter the cell below and try for yourself. You can name a variable anything you want.
variable1=coffee
echo $variable1
If you are working in a Bash kernel, your variables will be remembered in the entire notebook. If you work with a Python kernel, then your variables are only remembered in one cell. Check what kernel this notebook is running in the top right op this page.
An array is a list of variables; it's that simple. To make an array, we type something like this
samples=(L1 L2 L3)
To refer back to an array, we type this
echo ${samples[@]}
This looks a bit more complicated.
The [@]
part means: 'all contents in the array'.
Hence, if you type echo ${samples[0]}
, you will only get the first variable in the array.
Again, try for yourself below in a new cell.
samples=(L1 L2 L3)
echo ${samples[@]}
echo ${samples[0]}
echo ${samples[2]}
Now we get to loops. Let's keep it simple, I will define a loop for you, and you see how it works.
break=(coffee tea cookies)
for i in ${break[@]}
do echo $i
done
Do you get the loop? Make sure you do. You will make your loops in the following parts of the practical.
ls ./data/*
or list every file in ./data/reads
that ends on .gz
ls ./data/reads/*.gz
Now, list every file in ./data/reads
that starts with L
and ends in .gz
ls data/reads/L*.gz
The base of a file is the part before the extension or extensions; you will need this later in the practical.
As we have seen now, you can specify folders with a /
. You can move from folder to folder. If you ever wonder what folder you are in now, you can 'print work directory' or pwd
.
pwd
The current folder you are in is denoted as a dot: .
Hence, if you type a path like ./data/reads
you tell the computer explicitly to start in the current folder, then move to the data folder, and then move to the reads folder.
This dot is not required, data/reads
means the exact same.
Try in the two cells below:
ls ./data/reads
ls data/reads
If you type ls /
, then you ask the computer to list the root of the filesystem, the highest level on the hard drive. Somewhat like C:/
on windows computers.
Whenever you see a manual page or a prewritten command with code like this
somecommand /path/to/file
Then it is implied that you substitute the /path/to/file
with a path to a file you want to use or create.
If you get errors like file not found
, perhaps check where you are with pwd
and ls
to see if you accidentally moved somewhere you did not mean to.
Although you won't need to in this practical, you can change working directories.
I designed this practical not to bother you with this, but in the long run it is important to be aware of where you are in a directory structure.
You can Change Directory with the command cd
.
For example, if you want to move into a data folder, you may type cd ./data
.
If you want to move back up again, you type cd ..
.
Where ..
means one folder up, you can also use this with ls
like ls ..
.
If you are completely lost, just type cd
without any argument; this will take you back to your home directory.
If you inadvertently move directories and you need to get back, you now know how to do so.
Whenever you are asked to use some command or programme, and you don't know exactly how it works, we can ask the computer for help.
--help
-h
man some-command
Not all of these always work for every command, but one or two always do; trial and error.
On these webpages, the man
command doesn't work too well. Better to stick to the --help
pages.
head --help
Quite often, you'll find a 'usage' line at the top of the help page.
This tells you how to use the command.
In the example of head
, it tells you first to type head
, then any options, and then any file.
Those entries in [square brackets] are optional.
Entries without any brackets, or with <arrows> are required.
You are now ready to work with Bash in Jupyter notebooks! Congratulations. Whenever you get stuck in the subsequent notebooks that use Bash code, maybe come back here for advice.