PPSEQ :: Parallel Processing for NGS Analysis

subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link | subglobal1 link
subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link | subglobal2 link
subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link | subglobal3 link
subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link
subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link
subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link
subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link
subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link

PPSEQ: Parallel Processing for Next-Generation Sequencing (NGS) Analysis

PPSEQ is a software suite including a scalable hierarchical multitasking parallel infrastructure and the classical sequencing algorithms. The PPSEQ software suite includes...

+ ppbwt is a parallel indexer that builds an index from a set of DNA sequences. It reads in FASTA files with extensions .fa and generates six indexer files. The index is based on the FM-index which in turn is based on the Burrows Wheeler transform (BWT).

+ ppalgn is a parallel aligner that reads in both genome index files prebuilt by ppbwt and short-read files in the FASTQ format.The final output can be SAM/BAM files, or other user-defined formats.

+ more to come...

The project of PPSEQ is hosted by SourceForge.net in here and the releases are avaiable from the download page. Tutorials are linked on the right navigation bar.

News

The explosive increase in NGS (Next-Generation Sequencing) capacity along with the huge supply of processing power by MPP (Massive Parallel Processing) systems presents a substantial opportunity for developing highly efficient analysis strategies for NGS data. In spite of the significant research advances in both fields in the past decades, cutting-edge research in applying supercomputing to NGS data analysis is still in its early stage. In fact, very few algorithms or software packages have been designed to perform parallel processing of NGS datasets, and even fewer can run on the powerful MPP systems. Hence, developing the massively parallel computing algorithms and software, as we propose, for faster and more accurate NGS data analysis will bridge the algorithmic and computational gaps between analysis of NGS data and parallel computing. A highly multidisciplinary team with experience and expertise in Bioinformatics, Parallel Computing, Statistics and BIological Sciences has been assembled for this project.

Nowadays, the fastest computers – the Cray XK7 at ORNL, the BlueGene/Q located at LLNL, and K Computer at RIKEN in Japan – contain 1,600,000 to 600,000 cores and are capable of speeds of 17.6 to 10.5 Petaflops. We now deliver an integrated, high-performance software suite for fast NGS data analysis on these, as well as smaller, supercomputer architectures. Our software system has also been tested on a variety of other platforms, includeing personal computers, workstations, Linux clusters, Intel cluster. The bioinformatics and parallel computing technologies developed in our software packages could lead to broad synergistic efforts in other research areas, such as public health, clinical research, and environmental microbiology, because of their increasingly dependence on NGS data. Currently, our software system includes the following parallel algorithms for a). genome indexing, b) sequence alignment, c) kmer index builder, d) de novo assembly, and e) detection of differential sequences between samples.

Releases

>> PPSeq-1.3 (beta) release - 10/1/2013

The first public release of PPSeq is now available for download. The release includes the two core tools: the indexer - ppbwt and the aligner - ppalgn. A tutorial can be found in here. Features of this release includes:

  • FASTA/FASTQ inputs and SAM outputs supported.
  • Supports Bowtie2 alignment policy (the multiseed and X-mimatch policy)
  • Tested on the Linux x86_64 cluster systems
  • POSIX threads (Pthreads) required.

Publications

  1. Zhang, P., Zhao, L., Xu, J., Deng, Y., Zhu, W. and Wu, S., "PPSeq: A Scalable Parallel Processing Algorithm for Sequence Alignment". Journal of Bioinformatics, (submitted).
  2. Xu X., Y. Zhang, J. Williams J, E. Antoniou, W.R. McCombie, S. Wu, W. Zhu, N.O. Davidson, P. Denoya, E. Li. "Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets". BMC Bioinformatics, 2013, 14(Suppl 9): S1. doi:10.1186/1471-2105-14-S9-S1
  3. Wu, S., J.M. Wang, W. Zhao, S. Pounds and C. Cheng. "ChIP-PaM: An Algorithm to Identify Protein-DNA Interaction Using ChIP-Seq Data". Theoretical Biology & Medical Modelling 2010, 7: 18. doi:10.1186/1742-4682-7-18
About Us | Site Map | Privacy Policy | Contact Us | ©2012-2013 Stony Brook University