Run from Python
To run applications from Python, install the latest pybiolib Python package:
pip3 install --upgrade pybiolib
Login
To log in with your BioLib account in a Python notebook run the code below and follow the instructions shown:
import biolib
biolib.login()
Alternatively, you can use an API token and set it as the BIOLIB_TOKEN
environment variable. To create an API token
click here.
Run using .cli()
To load an application into your Python script, add the following:
import biolib
app = biolib.load('author/application')
To run an application call the function .cli()
on the application you loaded above.
For instance, to run samtools
with the --help
command:
import biolib
samtools = biolib.load('samtools/samtools')
job = samtools.cli(args='--help')
print(job.get_stdout().decode())
Running an application returns a job object, which allows you to monitor progress and save results. Read more about jobs here.
Non blocking
By default, calling the function .cli()
blocks until the application is finished.
You can pass the keyword argument blocking=False
to return immediately.
For example the code below will print "in_progress".
import biolib
samtools = biolib.load('samtools/samtools')
job = samtools.cli(args='--help', blocking=False)
print(job.get_status())
Result prefix
You can annotate the result with a custom name when calling .cli()
using the keyword argument result_prefix
as:
import biolib
samtools = biolib.load('samtools/samtools')
job = samtools.cli(args='--help', result_prefix='my_help_testr')
Setting the result prefix makes it easy to distinguish results from one another on the result page here.
Run using .run()
The .run()
function is a more Pythonic way to run applications where all keyword arguments are pass to the
application as command line arguments. This function blocks and waits until the application is finished.
samtools = biolib.load('samtools/samtools')
job = samtools.run()
Run using .start()
The .start()
function is a more Pythonic way to run applications where all keyword arguments are pass to the
application as command line arguments. This function returns immediately when the job is created.
samtools = biolib.load('samtools/samtools')
job = samtools.start()
Search
To search for applications on BioLib use the function biolib.search()
which takes a search query as the first
argument:
app_list = biolib.search('samtools')
print(app_list)
Should print something like below:
['samtools/samtools',
'samtools/samtools-fixmate',
'samtools/samtools-stats',
'samtools/samtools-collate',
'samtools/samtools-fastq',
...
To run a specific application you can pass a value from the list above to biolib.load()
and the call app.cli()
:
app = biolib.load(app_list[0])
job = app.cli('--help')
Results
When a job has completed, its outputs can be accessed by the following functions:
job.wait() # Wait until done
job.get_stdout() # Returns stdout as bytes
job.get_stderr() # Returns stderr as bytes
job.get_exit_code() # Returns exit code of the application as an integer
Save files to disk
To save the output files to a local directory like "result_files" run:
job.save_files(output_dir='result_files')
The .save_files()
function also takes an optional path_filter
argument as a glob pattern.
For example to save all .pdb
files from a result you can run:
job.save_files(output_dir='result_files', path_filter='*.pdb')
In memory files
Work with result files without saving them to disk. To list the output files from a job:
job.list_output_files()
To load a single file into memory, without saving it to disk, run:
my_csv_file = job.get_output_file('/my_file.csv')
To pass an output file to a library like Pandas
or BioPython
, run .get_file_handle()
on the object:
import pandas as pd
my_dataframe = pd.read_csv(my_csv_file.get_file_handle())
Jobs
A job object refers to a specific run of an application. It holds progress information of the application run and the result when the job has completed.
List jobs
When signed in, you can print a table of your jobs by running:
biolib.show_jobs(count=25)
where count
refers to the number of jobs you want to show.
Retrieve a job
To retrieve a Job
in python call biolib.get_job()
with the Job's ID. You can find its ID on the job overview
page here.
job = biolib.get_job(job_id)
The biolib.get_job()
function returns a job object on which you can call get_status()
like below to print the
status:
print(job.get_status())
You can use this to determine if a job has completed or is still in progress.
Open in browser
You can open the job in your web browser to view the graphical and interactive output files.
job.open_browser()
Stream output
If your Job
is still running you can attach to its stdout
and stderr
by running:
job.stream_logs()
This will print current output and keep streaming stdout
and stderr
until the job has finished.
Download output files
You can download job output files using the job ID. The job ID can be found under "Details" on the Results page, or in the share link
job_id = '1a234567-b89...'
job = biolib.get_job(job_id)
job.save_files('job_output/')
Download input files
To download the input files of a job:
job_id = '1a234567-b89...'
job = biolib.get_job(job_id)
job.save_input_files(output_dir='input_files')
Start jobs in parallel
Use the blocking=False
argument to cli()
on an application to get the job immediately
without having to wait for the application to finish.
This feature allows for parallelized workflows as the one below:
samtools = biolib.load('samtools/samtools')
my_fasta_files = ['seq1.fasta', 'seq2.fasta']
my_jobs = []
for file in my_fasta_files:
job = samtools.cli(file, blocking=False)
my_jobs.append(job)
Experiments
An Experiment
is a collection of jobs that you can retrieve together.
To group the jobs in an Experiment
use the following syntax:
with biolib.Experiment('my-experiment-name'):
my_application.cli(input_1) # these two jobs will be
my_application.cli(input_2) # grouped in the same Experiment
All jobs started under the with
statement will be grouped under the Experiment's ID (in this
case my-experiment-name
).
List experiments
When logged in, you can print a table of your experiments by running:
biolib.show_experiments(count=10)
where count
refers to the number of experiments you want to show.
Retrieve an experiment
To load an Experiment
in Python, run the following:
my_experiment = biolib.get_experiment('my-experiment-name')
print(my_experiment)
Wait for all jobs
To block and wait until all jobs of an experiments has finished, use the .wait()
function:
my_experiment.wait()
Retrieve jobs
To get a list of the Job
objects contained in an Experiment
, run:
my_jobs = my_experiment.get_jobs()
You can interact with the list of Job
objects in the following way:
for job in my_jobs:
# Print output
if job.get_status() == 'completed':
print(job.get_stdout())
else:
job.stream_logs()
# Save output files
job.save_files('my_results')
List jobs
To show an overview of the jobs in your experiment run:
my_experiment.show_jobs()
This prints a table of the jobs contained in your experiment.
Mount files
Using .mount_files()
you can mount all jobs of the experiment and their output files to a local directory. This allows
you to explore all the files in the experiment using the file browser of your system.
my_experiment.mount_files(mount_path='my_local_directory')
Still have a question?
If you have any questions that you can't find an answer to above, please reach out to the BioLib community.