We use cookies to ensure you the best possible experience when using BioLib. To read our privacy policy and configure what cookies are set, click on cookie settings below.
March 10, 2020
Bioinformatics as a discipline has a wide scope: through the whole data value chain, collecting, connecting, managing, and analyzing different types of biological and health data. Getting value from this data is complex and collaborative work, requiring the use and development of advanced software tools. However, even for experts, many of these software tools are both difficult and very time consuming to set up, use, and interpret. A simple way for researchers to increase the impact of their work is to make the tools they develop easier to install, share, and use.
This tutorial shows how to create good-looking visualization tools that are intuitive to use. Specifically, the tutorial guides you through the steps in building a Python 3 based tool that creates heat-maps of gene expression data. The finished tool runs in any browser, requires zero installation, setup, or coding from the end-user.
This tutorial has two parts: Part I explains the code, and Part II describes how to configure the app on BioLib. If you want to jump straight to Part II, the complete code used in Part II can be found at the very end of Part I.
The first step is to load the dependencies required for the script to work. In BioLib, we already support a bunch of
them. In this tutorial, we make use of seaborn
and matplotlib
to make plots, pandas
to handle the input files and argparse
to parse parameter inputs specified by users.
import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
import io
import matplotlib.pyplot as plt
import argsparse
import sys
import base64
In general, tools on BioLib do at least two things: a) load data and b) analyze/process this data. We will split these two jobs into two separate utility functions that can be exported and used in other tools as well.
Handling a Binary Input
On BioLib, Inputs are parsed to tools as binary arrays (buffered I/O in Python language). So, to get something meaningful
from a .csv
file or .txt
input, we’ll make a function that takes a binary representation of a file and reads it into a
data frame. We do this with pd.read_table()
. Since a heat map requires both data and a set of labels, this function also
renames the column with labels (determined by the input index) and stores it in a new variable. It then returns data and
labels so we can pass them on:
def format_data_from_csv_binary(binary, index=0, separator=','):
data = pd.read_table(binary, sep=separator)
data.rename(columns={data.columns[index]: "labels"}, inplace=True) # Change the labels column name
labels = data.pop("labels")
return data, labels
We parse index
and separator
as inputs such that the end-user can specify these according to the files they like to analyze.
Generating a Heat Map
Now, it’s time to create the heat map. This function gathers the data and the labels and returns a heat map object using the
Seaborn library. For the plot to look great on BioLib, we call the function tight_layout()
to give the plot a desired shape:
def generate_heatmap(data, labels):
heatmap_object = sns.heatmap(data, yticklabels=labels,
cbar_kws={'label': 'Expression level'})
heatmap_object.figure.tight_layout()
return heatmap_object
Now that the utilities are created, it is time to put together the code. First, we load the parameter inputs that the user pass to our code:
# Load input parameter
parser = argparse.ArgumentParser()
parser.add_argument('--index', dest='index', default=0)
parser.add_argument('--separator', dest='separator', default=',')
args = parser.parse_args()
In this case, we have two parameters: --separator
, which allows users to change value separator according to their input
files and --index
, which refers to the index of the labels column, being 0
the first column and -1
the last one (like a Python list).
To assign parameters in BioLib we use the library argparse
. Parameters can later accessed as args.index
and args.separator
.
Once the input parameters are set, the next step is to load the input data and process
it with the first utility function to obtain the data frame and the labels variable. To input a file in BioLib,
we assign the following function to a variable csv_binary = io.BytesIO(sys.stdin.read().encode())
. Once we have our
binary data loaded in a variable, we then transform the content of the binary input with the previously defined function.
# Load input data
csv_binary = io.BytesIO(sys.stdin.read().encode())
data,labels = format_data_from_csv_buffer(csv_binary, index=int(index), separator=separator)
Finally, we generate a heat map with our function generate_heatmap()
. To generate a graphical output, we use Matplotlib (plt
) to generate a png file and render this as Markdown.
# Generate output
generate_heatmap(data, labels)
plt.savefig('example.png')
file_data = open('example.png', 'rb').read()
data = base64.b64encode(file_data).decode('ascii')
print(''.format('image/png', data))
This is what the full code looks like:
import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
import io
import matplotlib.pyplot as plt
import argparse
import sys
import base64
# utility function to parse csv from binary on BioLib
def format_data_from_csv_buffer(buffer, index=0, separator=','):
data = pd.read_table(buffer, sep=separator)
data.rename(columns={data.columns[index]: "labels"}, inplace=True) # Change the labels column name
labels = data.pop("labels")
return data, labels
# utility function to generate a Seaborn object containin a heatmap
def generate_heatmap(data, labels):
heatmap_object = sns.heatmap(data, yticklabels=labels,
cbar_kws={'label': 'Expression level'})
heatmap_object.figure.tight_layout()
return heatmap_object
# Load input parameter
parser = argparse.ArgumentParser()
parser.add_argument('--index', dest='index', default=0)
parser.add_argument('--separator', dest='separator', default=',')
args = parser.parse_args()
# Load input data
csv_binary = io.BytesIO(sys.stdin.read().encode())
data,labels = format_data_from_csv_buffer(csv_binary, index=int(args.index), separator=args.separator)
# Generate output
generate_heatmap(data, labels)
plt.savefig('example.png')
file_data = open('example.png', 'rb').read()
data = base64.b64encode(file_data).decode('ascii')
print(''.format('image/png', data))
This is all the code you need to create a heat map application that will look something like this https://biolib.com/example-apps/expression-heatmap/ . Next up: configuring your application on BioLib.
With the code ready to go, we only need to click our way through a few configuration steps before we have created a tool that any scientist anywhere in the world can use to create heat maps with just a few clicks and their web browser, yay!
In this example, we set up a tool that reads one input file (gene expression data) and has two different parameters: one for selecting a separator (e.g., comma or tab-separated) for the input file and one for defining which column to use as labels.
Step 1 - Create App To start, open https://biolib.com, login and click “Create” in the upper right-hand corner (NB! to create an application, you need to have an account and to be signed in). First, you are asked to select a template application. Select the Python template.
Notice that by default, applications are created in “draft” mode - this means that you are the only one who can view and run your application. The tool will not be visible or accessible to others until you change it from “draft” to “public” mode (which you can do at a later point).
Step 2 - Adding Your Code
To add your code, edit __main__.py
and copy-paste the code from Part I into the code editor.
Step 3 - Configuring Standard Inputs
Under Advanced Settings the stdin
toogle determines whether the user will be required to provide a standard input.
In this example, the user should be asked to provide such input - so leave the toggle on yes.
Step 4 - Creating Input Parameters You create and set up the parameters for your application by clicking the “Add" button under the Input Parameter section - this opens a new input section.
I. Let’s start by adding the parameter called --index
(remember, this was the parameter that tells the app what column in the input file holds the labels).
First, give the parameter a description (this is the text that the user will see): “Index of the labels (0 for the first column, -1 for the last column)”.
Where it says “How should the input be rendered”, choose “Number Input”.
The field called “Input Key” is where you tell the BioLib app creation engine what the parameter variable is called in your code; in our case, it is simply called --index
.
II. To set up the --separator
parameter we go through the same process. First, click “add new parameter” again. Describe the input
(e.g. “what separator is used in your data?”) and set the parameter to be visible to the user. This time choose text
input (instead of number) and set the Input Key to --separator
. Finally, set the default value to ,
.
5 - Adding a Description
It is important to add a good description of the tool, such that users can easily use it. It is highly recommended to upload a brief and illustrative description for the tool with pictures. The description field can be written in markdown, so adding titles, and pictures should be easy. Add your description to the README.md
file.
What is a good description? A good description:
For an example of this, please see this description: https://biolib.com/example-apps/expression-heatmap/
Step 6 - Save the Tool
Now that the configuration is complete, press “Save” in the top right corner.
That’s it! You have now successfully created and configured a tool on BioLib that can read gene expression files and create a heat map visualization from it.
Remember, by default; your app is created in “draft” mode, which means you are the only one who can view it. To share your app with your colleagues or the world at large, go to your profile page (click the profile icon in the upper right-hand corner), find the app you want to share, press the edit app icon, and change the setting from “draft” to “private” (link-sharing) or “public” (anyone can view) under "App Settings".