Running Apps from Python
To run applications from Python, install the latest pybiolib Python package:
pip3 install -U pybiolib
Log in to BioLib
To log in with your BioLib account in a Python notebook run the code below and follow the instructions shown:
import biolib
biolib.login()
Alternatively, you can use an API token and set it as the BIOLIB_TOKEN
environment variable. To create an API token
click here.
Running applications
To import an application into your Python script, add the following:
import biolib
application = biolib.load('author/application')
To run an application, pass a number of arguments the .cli()
function.
Running an application returns a Job
object, which allows you to monitor progress and save results.
For instance, if we want to run samtools
with the --help
command, we can do:
import biolib
samtools = biolib.load('samtools/samtools')
job = samtools.cli(args='--help')
print(job.get_stdout().decode())
This will yield the following:
>>> job = samtools.cli(args='--help')
2021-08-18 11:41:39,085 | INFO : Loading package...
2021-08-18 11:41:39,384 | INFO : Loaded package: samtools/samtools
2021-08-18 11:41:40,252 | INFO : Computing...
>>> print(job.get_stdout().decode())
stdout:
Program: samtools (Tools for alignments in the SAM format)
Version: 1.10-31-g6e6d5f9 (using htslib 1.10.2-31-g4f60833)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
fqidx index/extract FASTQ
index index alignment
Save result files
To save the output files to a directory (in this case result_files/
) run:
job.save_files('results_files/')
Working with Jobs
A Job
is an object referring to the execution of an application. It contains progress information of the application
execution as well as the result when the job completes.
Listing Jobs
When signed in, you can print a table of your jobs by running:
biolib.show_jobs(count=25)
where count
refers to the number of jobs you want to show.
Retrieving a Job
To retrieve a Job
in python call biolib.get_job()
with the Job's ID. You can find its ID on the job overview
page here.
job = biolib.get_job(job_id)
Job Status
To retrieve the status of a job in Python, call .get_status()
on the job:
status = biolib.get_job(job_id).get_status()
print(status)
You can use this to determine if a job has completed or is still in progress.
Streaming Output
If your Job
is still running you can attach to its stdout
and stderr
by running:
job.stream_logs()
This will print current output and keep streaming stdout
and stderr
until the job has finished.
Results
Saving Results
Assuming a Job
has completed, its outputs can be accessed by the following methods:
job.get_stdout() # Returns stdout as bytes
job.get_stderr() # Returns stderr as bytes
job.get_exit_code() # Returns exit code of the application as an integer
job.save_files(output_dir) # Saves result files to 'output_dir'
.save_files()
also supports a glob filter using the path_filter
argument. To save all .pdb
files from a result you can run:
job.save_files(output_dir, path_filter='*.pdb')
Using Results Without Saving to Disk
Some applications may output large files. To save disk space on your computer you can interact with result files without saving them to disk.
To list the output files from a job:
job.list_output_files()
To load a single file into memory, without saving it to disk, run:
my_csv_file = job.get_output_file('/my_file.csv')
To pass an output file to a library like Pandas
or BioPython
, run .get_file_handle()
on the object:
import pandas as pd
my_dataframe = pd.read_csv(my_csv_file.get_file_handle())
Starting Multiple Jobs in Parallel
Use the blocking=False
argument to cli()
on an application to get the job immediately
without having to wait for the application to finish.
This feature allows for parallelized workflows as the one below:
samtools = biolib.load('samtools/samtools')
my_fasta_files = ['seq1.fasta', 'seq2.fasta']
my_jobs = []
for file in my_fasta_files:
job = samtools.cli(file, blocking=False)
my_jobs.append(job)
Grouping Jobs in an Experiment
An Experiment
is a collection of jobs that you can retrieve together.
To group the jobs in an Experiment
use the following syntax:
with biolib.Experiment('my-experiment-name'):
my_application.cli(input_1) # these two jobs will be
my_application.cli(input_2) # grouped in the same Experiment
All jobs started under the with
statement will be grouped under the Experiment's ID (in this
case my-experiment-name
).
Retrieving Experiments
To load an Experiment
in Python, run the following:
my_experiment = biolib.get_experiment('my-experiment-name')
print(my_experiment)
Waiting Until all Jobs are Completed
To block and wait until all jobs of an experiments has finished, use the .wait()
function:
my_experiment.wait()
Retrieving Job
Objects from an Experiment
To get a list of the Job
objects contained in an Experiment
, run:
my_jobs = my_experiment.get_jobs()
You can interact with the list of Job
objects in the following way:
for job in my_jobs:
# Print output
if job.get_status() == 'completed':
print(job.get_stdout())
else:
job.stream_logs()
# Save output files
job.save_files('my_results')
Listing Jobs in an Experiment
To show an overview of the jobs in your experiment run:
my_experiment.show_jobs()
This prints a table of the jobs contained in your experiment.
Listing Experiments
When signed in, you can print a table of your experiments by running:
biolib.show_experiments(count=10)
where count
refers to the number of experiments you want to show.
Still have a question?
If you have any questions that you can't find an answer to above, please reach out to the BioLib community.