Metagenomics Practical

Metagenomics assemblies with Ray

Ray is a particularly interesting genome assembler due to several unusual features:

  • It can scale to arbitrary numbers of processors and machines by distributing its assembly graph
  • It has several functions specific to metagenome assembly ‘Ray Meta’
  • Ray’s author, @sebhtml is incredibly responsive on Twitter :)
  • Ray will happily mix input from several different sequencing techniques, e.g. Illumina and 454
  • If run with the write-kmers option enabled, the resulting assembly graph may be viewed using the separate Ray Cloud Browser software

Installing Ray

Dependencies

sudo apt-get install build-essential
sudo apt-get install git
sudo apt-get install openmpi1.6-bin openmpi1.6-common libopenmpi1.6-dev

Installing Ray from source code

git clone https://github.com/sebhtml/ray
git clone https://github.com/sebhtml/RayPlatform

cd ray
HAVE_LIBZ=y MAXKMERLENGTH=64 make

You can add this to your PATH:

export PATH=$PATH:`pwd`

A simple command-line for multi-processor execution:

For paired-end reads:

mpirun -np 8 Ray -k 31 -p pair1.fastq.gz pair2.fastq.gz -o output_directory

For interleaved paired-end reads:

mpirun -np 8 Ray -k 31 -i pairs.fastq.gz -o output_directory

For single-end reads:

mpirun -np 8 Ray -k 31 -s reads.fastq.gz -o output_directory

If you want to run Ray Cloud Browser, you will want the -write-kmers option:

mpirun -np 8 Ray -write-kmers -k 31 -p pair1.fastq.gz pair2.fastq.gz -o output directory

If you run via a cluster, i.e. StarCluster, mpirun can be set to execute on multiple machines, e.g.:

mpirun -np 8 -H host1,host2,host3,host4 -k 31 -p pair1.fastq.gz pair2.fastq.gz -o x

For more command-line options, see:

https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt

Ray Cloud Browser

Here is a useful script to set up Ray Cloud Browser from a kmers.txt and Contigs.fasta file:

#!/bin/bash
tag=$1
kmerfile=$2
contigfile=$3
mapid=$4
sectionid=$5

RayCloudBrowser-client create-map $kmerfile $tag.dat
RayCloudBrowser-client add-map config.json "$tag" $tag.dat
RayCloudBrowser-client create-section $contigfile $tag-contigs.dat
RayCloudBrowser-client create-map-annotations-with-section $tag.dat $tag-contigs.dat $sectionid
RayCloudBrowser-client add-section config.json $mapid "$tag Contigs" $tag-contigs.dat

Table Of Contents

Previous topic

Notes on using khmer to do stuff

Next topic

Assembly QC

This Page


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.

Development and posting of this material, and the associated workshop, were supported by Grant Number R25HG006243 from the National Human Genome Research Institute and an NSF OCI supplement to NSF DBI-0939454.


Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Metagenomics Practical on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.