How to work in a Jupyter notebook with the Bash language

The webpage you are looking at is called a Jupyter notebook. It is a webpage on which you can write text (like this text) and also code. You can execute the code in the webpage, and the output returns to you within the same notebook! This may sound trivial, but it's "really cool" to put it in non-scientific terms. Code and text are entered in individual cells, a code cell or a text cell. This cell you are reading now is a text cell. Next, let's look at a code cell and execute it. There are two ways of executing a cell. First you select a cell, either with your mouse or with the up and down keys on your keyboard. Then, you execute it by hitting the 'run' button in the toolbar above, or you hit CTRL+RETURN.

Jupyter

First we'll learn how to work with Jupyter notebooks, then we'll move on to learn the basics of Bash.

Working with Cells in Jupyter is quite straightforward. You learn best by doing, so do all of the things listed below:

  • You can select a cell with your mouse or the arrow keys on your keyboard.
  • You can edit a cell by hitting RETURN or by double-clicking it.
  • A new cell is a code cell by default turn in into a markdown (text) cell by hitting 'm'
  • A code cell can be executed by hitting CTRL+RETURN.
  • A markdown cell can be rendered by hitting CTRL+RETURN.
  • Add an additional cell by hitting the '+' button in the toolbar.
  • Add an additional cell by clicking between two cells
  • Add an additional cell by using the keyboard
    • add a cell below by hitting the 'b' key
    • and above by hitting the 'a' key
  • Whenever your notebook turns out to be unresponsive, you may interrupt the underlying programme running the code: the kernel, by
    • hitting the square stop button in the toolbar
    • clicking 'restart' or 'interupt' in the 'kernel menu' in your menu bar.
    • clicking 'close and halt' in the File menu.

Try creating a new cell below this cell. Make at least a text cell and a code cell.

Jupyter kernels

In the metagenomics practical, we will be working mostly in Linux' mother tongue: BASH. However, juPYter notebooks work natively in PYthon. Still, we can work in BASH. Often, this works by itself. Sometimes, you will need to type either %%bash at the beginning of a code cell. Or by preceding every bash command with an exclamation mark. Now, let's get to work and learn some Bash!

bash basics

execute the code cell below:

In [ ]:
echo "hello world"

In the bash language, the first word you type is always the command. So, in this case, that was:

echo

This command 'echoes' whatever you give it in the terminal. After you 'call' the command, you give it an argument in this case that is:

"hello world"

This basic structure of 'command' 'arguments' comes back through the metagenomics practical.

do: now try to change "hello world" to something else in the cell below:

In [ ]:
echo "hello world"

Often, an argument is a path to a file. We have the ls command to see what files we have.

In [ ]:
ls

We are learning fast. So we now know what a command is, and we know what an argument is. Finally, we also need to know what options are. Options are optional extra information that we pass to the command. Options are often provided in between the command and the argument. They look either like this

ls --size --human-readable

or in shortened versions like this

ls -sh

Note that the above two commands are synonymous. Try out in the cell below:

In [ ]:
ls --size --human-readable data/
In [ ]:
ls -sh ./data/

Commands, options and arguments are separated by spaces. Also note that options can have their own arguments. If this is the case, the manual or help page will specify this. We will get the manual and help pages later.

auto-complete

Auto-complete is one of the best features of the bash language and your greatest friend during this practical. Let's say we want to list (ls) the contents of the data/ folder but are too lazy to type the whole word 'data/'. Then we can type

ls da

and then hit the TAB button on your keyboard. Bash should either automatically complete the path to

ls data/

or if there are multiple options to auto-complete, bash will give you a little menu with these options.

Using autocomplete does not only make your life a lot easier, but it also prevents you from making typos! If bash autocomplete doesn't work, odds are something in your command or argument is wrong. Best to check before you proceed!

Try out auto-complete below

In [ ]:
ls da

pipes

Bash can hand the output of one program to another. This is called piping. If you pipe the output of multiple programs to each other, you made a 'pipeline'. Pipelines look somewhat like this

command1 | command2 | command 3

One trick with pipes that we will often use is the | head pipe. This pipe shows you only the first ten lines of the output of some command. | head -n 1 changes this number to 1. See for yourself below.

In [ ]:
ls -1 data/reads/
In [ ]:
ls -1 data/reads/ | head -n 1

loops

Loops are a useful feature you'll find in most (if not all) programming languages. They are also quite intuitive to use. A loop simply is a series of commands that does the same thing multiple times, but with some small adaptation. Let's make a loop together. First, we need to have two concepts clear

  • variable
  • array

A variable is a specific word that means something else; this something may vary. Hence the name. We can specify a variable like this:

variable1=coffee

To refer to the content of a variable, we use a $ sign. So this looks like so

echo $variable1

Now enter the cell below and try for yourself. You can name a variable anything you want.

In [ ]:
variable1=coffee
echo $variable1

If you are working in a Bash kernel, your variables will be remembered in the entire notebook. If you work with a Python kernel, then your variables are only remembered in one cell. Check what kernel this notebook is running in the top right op this page.

An array is a list of variables; it's that simple. To make an array, we type something like this

samples=(L1 L2 L3)

To refer back to an array, we type this

echo ${samples[@]}

This looks a bit more complicated. The [@] part means: 'all contents in the array'. Hence, if you type echo ${samples[0]}, you will only get the first variable in the array. Again, try for yourself below in a new cell.

In [ ]:

Now we get to loops. Let's keep it simple, I will define a loop for you, and you see how it works.

In [ ]:
break=(coffee tea cookies)
In [ ]:
for   i in ${break[@]}
do    echo $i
done

Do you get the loop? Make sure you do. You will make your loops in the following parts of the practical.

Have you completed all exercises above? Then move on to this:

Bash basics extra

  • Wildcards*
  • Base filenames.
  • paths
  • manual /help pages

wildcards

wildcards can be used in the command-line. For example: list every folder/file inside the ./data/ folder

In [ ]:
ls ./data/*

or list every file in ./data/reads that ends on .gz

In [ ]:
ls ./data/reads/*.gz

Now, list every file in ./data/reads that starts with L and ends in .gz

In [ ]:

base filenames

The base of a file is the part before the extension or extensions; you will need this later in the practical.

paths

As we have seen now, you can specify folders with a /. You can move from folder to folder. If you ever wonder what folder you are in now, you can 'print work directory' or pwd.

In [ ]:
pwd

The current folder you are in is denoted as a dot: .
Hence, if you type a path like ./data/reads you tell the computer explicitly to start in the current folder, then move to the data folder, and then move to the reads folder. This dot is not required, data/reads means the exact same. Try in the two cells below:

In [ ]:

In [ ]:

If you type ls /, then you ask the computer to list the root of the filesystem, the highest level on the hard drive. Somewhat like C:/ on windows computers.

Whenever you see a manual page or a prewritten command with code like this

somecommand /path/to/file

Then it is implied that you substitute the /path/to/file with a path to a file you want to use or create.

If you get errors like file not found, perhaps check where you are with pwd and ls to see if you accidentally moved somewhere you did not mean to.

change your working directory.

Although you won't need to in this practical, you can change working directories. I designed this practical not to bother you with this, but in the long run it is important to be aware of where you are in a directory structure. You can Change Directory with the command cd. For example, if you want to move into a data folder, you may type cd ./data. If you want to move back up again, you type cd ... Where .. means one folder up, you can also use this with ls like ls ... If you are completely lost, just type cd without any argument; this will take you back to your home directory. If you inadvertently move directories and you need to get back, you now know how to do so.

help and manual pages

Whenever you are asked to use some command or programme, and you don't know exactly how it works, we can ask the computer for help.

  • type the command without any argument or options
  • type the command with option --help
  • type the command with option -h
  • get the manual page man some-command

Not all of these always work for every command, but one or two always do; trial and error.

On these webpages, the man command doesn't work too well. Better to stick to the --help pages.

In [ ]:
head --help

Quite often, you'll find a 'usage' line at the top of the help page. This tells you how to use the command. In the example of head, it tells you first to type head, then any options, and then any file. Those entries in [square brackets] are optional. Entries without any brackets, or with <arrows> are required.

That's it!

You are now ready to work with Bash in Jupyter notebooks! Congratulations. Whenever you get stuck in the subsequent notebooks that use Bash code, maybe come back here for advice.