March 10, 2020

Tutorial: Creating Plots on BioLib


Introduction

Bioinformatics as a discipline has a wide scope: through the whole data value chain, collecting, connecting, managing, and analyzing different types of biological and health data. Getting value from this data is complex and collaborative work, requiring the use and development of advanced software tools. However, even for experts, many of these software tools are both difficult and very time consuming to set up, use, and interpret. A simple way for researchers to increase the impact of their work is to make the tools they develop easier to install, share, and use.

This tutorial shows how to create good-looking visualization tools that are intuitive to use. Specifically, the tutorial guides you through the steps in building a Python 3 based tool that creates heat-maps of gene expression data. The finished tool runs in any browser, requires zero installation, setup, or coding from the end-user.

This tutorial has two parts: Part I explains the code, and Part II describes how to configure the app on BioLib. If you want to jump straight to Part II, the complete code used in Part II can be found at the very end of Part I.

image info

Part I - The Code

Loading Dependencies

The first step is to load the dependencies required for the script to work. In BioLib, we already support a bunch of them. In this tutorial, we make use of seaborn and matplotlib to make plots, pandas to handle the input files and argparse to parse parameter inputs specified by users.

import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
import io
import matplotlib.pyplot as plt
import argsparse
import sys
import base64

Utility Functions

In general, tools on BioLib do at least two things: a) load data and b) analyze/process this data. We will split these two jobs into two separate utility functions that can be exported and used in other tools as well.

Handling a Binary Input

On BioLib, Inputs are parsed to tools as binary arrays (buffered I/O in Python language). So, to get something meaningful from a .csv file or .txt input, we’ll make a function that takes a binary representation of a file and reads it into a data frame. We do this with pd.read_table(). Since a heat map requires both data and a set of labels, this function also renames the column with labels (determined by the input index) and stores it in a new variable. It then returns data and labels so we can pass them on:

def format_data_from_csv_binary(binary, index=0, separator=','):
    data = pd.read_table(binary, sep=separator)
    data.rename(columns={data.columns[index]: "labels"}, inplace=True)  # Change the labels column name    
    labels = data.pop("labels")
    return data, labels

We parse index and separator as inputs such that the end-user can specify these according to the files they like to analyze.

Generating a Heat Map

Now, it’s time to create the heat map. This function gathers the data and the labels and returns a heat map object using the Seaborn library. For the plot to look great on BioLib, we call the function tight_layout() to give the plot a desired shape:

def generate_heatmap(data, labels):
    heatmap_object = sns.heatmap(data, yticklabels=labels,
                                    cbar_kws={'label': 'Expression level'})
    heatmap_object.figure.tight_layout()

    return heatmap_object

Main Function

Now that the utilities are created, it is time to put together the code. First, we load the parameter inputs that the user pass to our code:

# Load input parameter
parser = argparse.ArgumentParser()
parser.add_argument('--index', dest='index', default=0)
parser.add_argument('--separator', dest='separator', default=',')
args = parser.parse_args()

In this case, we have two parameters: --separator, which allows users to change value separator according to their input files and --index, which refers to the index of the labels column, being 0 the first column and -1 the last one (like a Python list). To assign parameters in BioLib we use the library argparse. Parameters can later accessed as args.index and args.separator.

Once the input parameters are set, the next step is to load the input data and process it with the first utility function to obtain the data frame and the labels variable. To input a file in BioLib, we assign the following function to a variable csv_binary = io.BytesIO(sys.stdin.read().encode()). Once we have our binary data loaded in a variable, we then transform the content of the binary input with the previously defined function.

# Load input data
csv_binary = io.BytesIO(sys.stdin.read().encode())
data,labels = format_data_from_csv_buffer(csv_binary, index=int(index), separator=separator)

Finally, we generate a heat map with our function generate_heatmap(). To generate a graphical output, we use Matplotlib (plt) to generate a png file and render this as Markdown.

# Generate output
generate_heatmap(data, labels)
plt.savefig('example.png')
file_data = open('example.png', 'rb').read()
data = base64.b64encode(file_data).decode('ascii')
print('![picture](data:{};base64,{})'.format('image/png', data))

The Finished Code - (to be Copied and Pasted in Part II)

This is what the full code looks like:

import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
import io
import matplotlib.pyplot as plt
import argparse
import sys
import base64

# utility function to parse csv from binary on BioLib
def format_data_from_csv_buffer(buffer, index=0, separator=','):
    data = pd.read_table(buffer, sep=separator)
    data.rename(columns={data.columns[index]: "labels"}, inplace=True)  # Change the labels column name    
    labels = data.pop("labels")
    return data, labels

# utility function to generate a Seaborn object containin a heatmap
def generate_heatmap(data, labels):
    heatmap_object = sns.heatmap(data, yticklabels=labels,
                                    cbar_kws={'label': 'Expression level'})
    heatmap_object.figure.tight_layout()

    return heatmap_object


# Load input parameter
parser = argparse.ArgumentParser()
parser.add_argument('--index', dest='index', default=0)
parser.add_argument('--separator', dest='separator', default=',')
args = parser.parse_args()


# Load input data
csv_binary = io.BytesIO(sys.stdin.read().encode())
data,labels = format_data_from_csv_buffer(csv_binary, index=int(args.index), separator=args.separator)

# Generate output
generate_heatmap(data, labels)
plt.savefig('example.png')
file_data = open('example.png', 'rb').read()
data = base64.b64encode(file_data).decode('ascii')
print('![picture](data:{};base64,{})'.format('image/png', data))

What Happens Next?

This is all the code you need to create a heat map application that will look something like this https://biolib.com/example-apps/expression-heatmap/ . Next up: configuring your application on BioLib.

Part II - Publish Tool on BioLib

With the code ready to go, we only need to click our way through a few configuration steps before we have created a tool that any scientist anywhere in the world can use to create heat maps with just a few clicks and their web browser, yay!

In this example, we set up a tool that reads one input file (gene expression data) and has two different parameters: one for selecting a separator (e.g., comma or tab-separated) for the input file and one for defining which column to use as labels.

Step 1 - Create App To start, open https://biolib.com, login and click “Create” in the upper right-hand corner (NB! to create an application, you need to have an account and to be signed in). First, you are asked to select a template application. Select the Python template.

Notice that by default, applications are created in “draft” mode - this means that you are the only one who can view and run your application. The tool will not be visible or accessible to others until you change it from “draft” to “public” mode (which you can do at a later point).

image info

Step 2 - Adding Your Code To add your code, edit __main__.py and copy-paste the code from Part I into the code editor.

image info

Step 3 - Configuring Standard Inputs Under Advanced Settings the stdin toogle determines whether the user will be required to provide a standard input. In this example, the user should be asked to provide such input - so leave the toggle on yes.

image info

Step 4 - Creating Input Parameters You create and set up the parameters for your application by clicking the “Add" button under the Input Parameter section - this opens a new input section.

I. Let’s start by adding the parameter called --index (remember, this was the parameter that tells the app what column in the input file holds the labels). First, give the parameter a description (this is the text that the user will see): “Index of the labels (0 for the first column, -1 for the last column)”. Where it says “How should the input be rendered”, choose “Number Input”. The field called “Input Key” is where you tell the BioLib app creation engine what the parameter variable is called in your code; in our case, it is simply called --index.

II. To set up the --separator parameter we go through the same process. First, click “add new parameter” again. Describe the input (e.g. “what separator is used in your data?”) and set the parameter to be visible to the user. This time choose text input (instead of number) and set the Input Key to --separator. Finally, set the default value to ,.

image info

5 - Adding a Description

It is important to add a good description of the tool, such that users can easily use it. It is highly recommended to upload a brief and illustrative description for the tool with pictures. The description field can be written in markdown, so adding titles, and pictures should be easy. Add your description to the README.md file.

What is a good description? A good description:

  • Explains clearly what the tool does, the version used, who the developer is, whether it is a part of a toolkit or an independent tool
  • Declares unambiguously what input type/formats the tool takes: e.g., FASTA, FASTQ, alignments, PDB, etc.
  • Declare the expected output and meaning of this output. It is a good idea to show an example of what an output looks like and which information the user can extract from it.
  • Add an illustrative example that shows the input and output.

For an example of this, please see this description: https://biolib.com/example-apps/expression-heatmap/

Step 6 - Save the Tool Now that the configuration is complete, press “Save” in the top right corner.
image info

That’s it! You have now successfully created and configured a tool on BioLib that can read gene expression files and create a heat map visualization from it.

Remember, by default; your app is created in “draft” mode, which means you are the only one who can view it. To share your app with your colleagues or the world at large, go to your profile page (click the profile icon in the upper right-hand corner), find the app you want to share, press the edit app icon, and change the setting from “draft” to “private” (link-sharing) or “public” (anyone can view) under "App Settings".