Parallel R

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You’ll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don’t.

With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.

  • Snow: works well in a traditional cluster environment
  • Multicore: popular for multiprocessor and multicore computers
  • Parallel: part of the upcoming R 2.14.0 release
  • R+Hadoop: provides low-level access to a popular form of cluster computing
  • RHIPE: uses Hadoop’s power with R’s language and interactive shell
  • Segue: lets you use Elastic MapReduce as a backend for lapply-style operations

Table of Contents
Chapter 1 Getting Started
Chapter 2 snow
Chapter 3 multicore
Chapter 4 parallel
Chapter 5 A Primer on MapReduce and Hadoop
Chapter 6 R+Hadoop
Chapter 7 RHIPE
Chapter 8 Segue
Chapter 9 New and Upcoming

Book Details

  • Paperback: 122 pages
  • Publisher: O’Reilly Media (October 2011)
  • Language: English
  • ISBN-10: 1449309925
  • ISBN-13: 978-1449309923
Download [5.3 MiB]

You may also like...

Leave a Reply