PacBio tutorials

Smrtportal AMI

Smrtportal/smrtanalysis is the software developed by Pacific Biosciences

  • start a new amazon instance:
  • select m3.2xlarge to get 8 cpus (instead of 4 with m1.xlarge) but NOT with the beacon AMI; instead, follow the instructions for setting up the smrtportal AMI from this pdf
  • NOTE skip the filezilla step

Downloading data

We’ll use data from a single smrtcell based on a size selected 20kb E. coli library

Do:

cd /opt/smrtanalysis/common/inputs_dropbox
wget http://files.pacb.com/datasets/secondary-analysis/ecoli-k12-P4C2-20KSS/ecoliK12.tar.gz
tar -vxzf ecoliK12.tar.gz

To import into smrtportal:

  • in smrtportal, go to Home –> Import and Manage –> Import SMRT Cells
  • click on common/inputs_dropbox
  • click ‘Scan’ and OK

HGAP for assembly

  • in smrtportal, go to Home –> Create New
  • add your imported smrtcell by selecting it and clicking the triangle pointing to the right
  • give your job a name
  • for protocol, select RS_HGAP_Assembly.2 (not ‘.1’)
  • click ‘Save’
  • click ‘Start’

Monitor the run, it will probably go overnight

Where is the data

Check /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal. Results appear in the data folder

Base modification detection

Use:

  • the same smrtcell data
  • the RS_Modification_and_Motif_Analysis.1 protocol
  • the ‘e coli K12 MG1655’ reference from the dropdown menu

Running smrtanalysis from the commandline

See this part of the Pacific Biosciences wiki: https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.0#-using-the-command-line

  • You’ll need a settings xml file, which can only be obtained by setting up a smrtportal job with the correct protocol, and grabbing the settings.xml from /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal
  • You’ll also need an input.fofn file (‘file-of-filenames’, which contains the full path to the bax.h5 (or bas.h5) file(s)
  • use screen!

Do:

. /opt/smrtanalysis/etc/setup.sh
fofnToSmrtpipeInput.py input.fofn > input.xml
smrtpipe.py --params=settings.xml xml:input.xml

Or customise smrtpipe to use more cpus (if available):

smrtpipe.py -D NPROC=24 --params=settings.xml xml:input.xml

Results appear in the data folder

Running blasr to map reads

  • upload the reference (e.g., the velvet assembly we used for QC/Validation)
  • we will run blasr on a subset of the reads (using only one of the three bax.h5 files)
  • samtools is included in the smrtportal distribution
  • use screen!

Do:

. /opt/smrtanalysis/etc/setup.sh

blasr /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \
/path/to/velvet_pe+mp.fa \
-minSubreadLength 1000 -bestn 1 -nproc 8 -sam -out pacbio_2_velvet_pe+mp_71.sam

samtools view -buS pacbio_2_velvet_pe+mp_71.sam | samtools sort - pacbio_2_velvet_pe+mp_71.sorted
samtools index pacbio_2_velvet_pe+mp_71.sorted

The bam and bai files can be added to the IGV browser

Running bridgemapper

Check out https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Bridgemapper

First, fix a ‘bug’ which makes one of the steps single-cpu instead of parallel.

Edit /opt/smrtanalysis/analysis/lib/python2.7/pbbridgemapper/bridgemapper.py: change lines 426 + 427 from:

stdout, stderr = runBlasr(affixesFastq, args.referenceFasta,
                            affixBlasrOutputPath)

To:

stdout, stderr = runBlasr(affixesFastq, args.referenceFasta,
                            affixBlasrOutputPath, nproc=args.nproc)

You can now run Bridgemapper through the smrtportal, which gives the optimal output for viewing in the PacBio genome browser SMRTview. However, this will take a long time. To do that, add the velvet assembly as reference through ‘Home –> Import and Manage –> Manage Reference Sequences –> New’

Alternatively, run bridgemapper on a subset of the reads through the command line. Use screen!

Do:

. /opt/smrtanalysis/etc/setup.sh

pbbridgemapper --debug --nproc 8 /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \
/path/to/velvet_pe+mp.fa bridgemapper_out
  • –debug allows for seeing what bridgemapper is doing

To use SMRTview with the results, you’ll need to make a smrtportal reference repo:

referenceUploader -c -p smrtpipe_references -n velvet_pe+mp -f /path/to/velvet_pe+mp.fa --saw='sawriter -welter'
  • -c: create
  • -p: folder for references
  • -n: name of the reference
  • -f: fasta file

Viewing with SMRTView

  • download the split_reads.bridgemapper.gz and complete smrtpipe_references folder to your harddrive
  • install SMRTView from https://github.com/PacificBiosciences/DevNet/wiki/SMRT-View
  • choose ‘File –> Open data from server’ and select ‘Files of type –> Reference Metadata’
  • find the relevant reference.info.xml in the smrtpipe_references folder
  • choose ‘File –> add Tracks from server” and select ‘Files of type –> As above and also gzipped files’
  • add the split_reads.bridgemapper.gz file

Start browsing!


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.

Development and posting of this material, and the associated workshop, were supported by Grant Number R25HG006243 from the National Human Genome Research Institute and an NSF OCI supplement to NSF DBI-0939454.


Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to PacBio tutorials on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.