================ PacBio tutorials ================ Smrtportal AMI ------------------- Smrtportal/smrtanalysis is the software developed by Pacific Biosciences * start a new amazon instance: * select m3.2xlarge to get 8 cpus (instead of 4 with m1.xlarge) but NOT with the beacon AMI; instead, follow the instructions for setting up the smrtportal AMI from `this pdf `__ * **NOTE** skip the filezilla step Downloading data ------------------- We'll use data from a single smrtcell based on a size selected 20kb *E. coli* library Do:: cd /opt/smrtanalysis/common/inputs_dropbox wget http://files.pacb.com/datasets/secondary-analysis/ecoli-k12-P4C2-20KSS/ecoliK12.tar.gz tar -vxzf ecoliK12.tar.gz To import into smrtportal: * in smrtportal, go to Home --> Import and Manage --> Import SMRT Cells * click on common/inputs_dropbox * click 'Scan' and OK HGAP for assembly ------------------- * in smrtportal, go to Home --> Create New * add your imported smrtcell by selecting it and clicking the triangle pointing to the right * give your job a name * for protocol, select RS_HGAP_Assembly.2 (*not* '.1') * click 'Save' * click 'Start' Monitor the run, it will probably go overnight **Where is the data** Check /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal. Results appear in the *data* folder Base modification detection --------------------------- Use: * the same smrtcell data * the RS_Modification_and_Motif_Analysis.1 protocol * the 'e coli K12 MG1655' reference from the dropdown menu Running smrtanalysis from the commandline ----------------------------------------- See this part of the Pacific Biosciences wiki: https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.0#-using-the-command-line * You'll need a settings xml file, which can only be obtained by setting up a smrtportal job with the correct protocol, and grabbing the settings.xml from /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal * You'll also need an input.fofn file ('file-of-filenames', which contains the full path to the bax.h5 (or bas.h5) file(s) * use screen! Do:: . /opt/smrtanalysis/etc/setup.sh fofnToSmrtpipeInput.py input.fofn > input.xml smrtpipe.py --params=settings.xml xml:input.xml Or customise smrtpipe to use more cpus (if available):: smrtpipe.py -D NPROC=24 --params=settings.xml xml:input.xml Results appear in the *data* folder Running blasr to map reads -------------------------- * upload the reference (e.g., the velvet assembly we used for QC/Validation) * we will run blasr on a subset of the reads (using only one of the three bax.h5 files) * samtools is included in the smrtportal distribution * use screen! Do:: . /opt/smrtanalysis/etc/setup.sh blasr /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \ /path/to/velvet_pe+mp.fa \ -minSubreadLength 1000 -bestn 1 -nproc 8 -sam -out pacbio_2_velvet_pe+mp_71.sam samtools view -buS pacbio_2_velvet_pe+mp_71.sam | samtools sort - pacbio_2_velvet_pe+mp_71.sorted samtools index pacbio_2_velvet_pe+mp_71.sorted The bam and bai files can be added to the IGV browser Running bridgemapper -------------------- Check out https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Bridgemapper First, fix a 'bug' which makes one of the steps single-cpu instead of parallel. Edit /opt/smrtanalysis/analysis/lib/python2.7/pbbridgemapper/bridgemapper.py: change lines **426 + 427** from:: stdout, stderr = runBlasr(affixesFastq, args.referenceFasta, affixBlasrOutputPath) To:: stdout, stderr = runBlasr(affixesFastq, args.referenceFasta, affixBlasrOutputPath, nproc=args.nproc) You can now run Bridgemapper through the smrtportal, which gives the optimal output for viewing in the PacBio genome browser SMRTview. However, this will take a long time. To do that, add the velvet assembly as reference through 'Home --> Import and Manage --> Manage Reference Sequences --> New' **Alternatively,** run bridgemapper on a subset of the reads through the command line. Use screen! Do:: . /opt/smrtanalysis/etc/setup.sh pbbridgemapper --debug --nproc 8 /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \ /path/to/velvet_pe+mp.fa bridgemapper_out * --debug allows for seeing what bridgemapper is doing To use SMRTview with the results, you'll need to make a smrtportal reference repo:: referenceUploader -c -p smrtpipe_references -n velvet_pe+mp -f /path/to/velvet_pe+mp.fa --saw='sawriter -welter' * -c: create * -p: folder for references * -n: name of the reference * -f: fasta file **Viewing with SMRTView** * download the split_reads.bridgemapper.gz and complete smrtpipe_references folder to your harddrive * install SMRTView from https://github.com/PacificBiosciences/DevNet/wiki/SMRT-View * choose 'File --> Open data from server' and select 'Files of type --> Reference Metadata' * find the relevant reference.info.xml in the smrtpipe_references folder * choose 'File --> add Tracks from server" and select 'Files of type --> As above and also gzipped files' * add the split_reads.bridgemapper.gz file Start browsing!