rss Home » Tag Archive For ‘Hadoop’

Clojure in Action

Clojure in Action

Book Description

is a hands-on tutorial for the working programmer who has written code in a language like or Ruby, but has no prior experience with Lisp. It teaches from the basics to advanced topics using practical, real-world application examples. Blow through the theory and dive into practical matters like unit-testing and environment set-up, all the way through building a scalable web-application using domain-specific languages, , , and .

Clojure is a modern Lisp for the , and it has the strengths you’d expect: first-class functions, macros, support for functional , and a Lisp-like, clean style.

Clojure is a practical guide focused on applying Clojure to practical challenges. You’ll start with a language tutorial written for readers who already know . Then, you’ll dive into the use cases where Clojure really shines: state management, safe concurrency and multicore programming, first-class code generation, and interop. In each chapter, you’ll first explore the unique characteristics of a problem area and then discover how to tackle them using Clojure. Along the way, you’ll explore practical matters like architecture, unit testing, and set-up as you build a scalable web application that includes custom , , , and .

What’s Inside

  • A fast-paced Clojure tutorial
  • Creating web services with Clojure Download Now »

Parallel R

Parallel R

Book Description

It’s tough to argue with as a high-quality, cross-platform, statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using to analyze large datasets. You’ll learn the basics of Snow, Multicore, , and some -related tools, including how to find them, how to use them, when they work well, and when they don’t.

With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.

  • Snow: works well in a traditional cluster environment
  • Multicore: popular for multiprocessor and multicore computers
  • : part of the upcoming R 2.14.0 release
  • R+: provides low-level access to a popular form of cluster computing
  • RHIPE: uses Hadoop’s power with R’s language and interactive shell
  • Segue: lets you use Elastic as a backend for lapply-style operations

Table of Contents
Chapter 1 Getting Started
Chapter 2 snow
Chapter 3 multicore
Chapter 4 parallel
Chapter 5 A Primer on and Hadoop Download Now »

Programming Pig

Programming Pig

Book Description

This guide is an ideal learning tool and reference for , the language that helps you describe and run large data projects on . With Pig, you can analyze data without having to create a full-fledged application—making it easy for you to experiment with new data sets.

Pig shows newcomers how to get started, and teaches intermediate users the benefits of using Pig Latin, the data flow language for building and maintaining pipelines for processing data. Advanced users learn how to build complex data processing pipelines with Pig’s macros and modularity features, and discover how to build systems for complex data processing needs by embedding Pig Latin into scripting languages.

  • Learn the advantages and disadvantages of using Pig instead of
  • Understand how Pig fits in with other components, such as HDFS, Hive, , and
  • Follow examples that explain built-in Pig Latin functions, and data operators such as join and group
  • Use grunt, the shell that Pig provides for exploring and working with HDFS
  • Get performance tuning tips for running Pig Latin scripts on Hadoop clusters in less time
  • Extend Pig with powerful user defined functions written in or Python

About the Author
Alan is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful project. Download Now »

HBase: The Definitive Guide

HBase: The Definitive Guide

Book Description

If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how can fulfill your needs. As the implementation of ’s BigTable architecture, scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. : provides the details you require, whether you simply want to evaluate this high-performance, non-relational database, or put it into practice right away.

HBase’s adoption rate is beginning to climb, and several IT executives are asking pointed questions about this high-capacity database. This is the only book available to give you meaningful answers.

  • Learn how to distribute large datasets across an inexpensive cluster of commodity servers
  • Develop HBase clients in many languages, including , Python, and Ruby
  • Get details on HBase’s primary storage system, HDFS—’s distributed and replicated filesystem
  • Learn how HBase’s native interface to ’s framework enables easy and execution of batch jobs that can scan entire tables
  • Discover the integration between HBase and other facets of the Hadoop project

About the Author
Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at various Hadoop User Group meetings, as well as large conferences such as FOSDEM in Brussels. He also started the Munich OpenHUG meetings. Download Now »

Hadoop in Action

Hadoop in Action

Book Description

teaches readers how to use and write programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.

The book begins by making the basic idea of Hadoop and easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with , as most code examples will be written in . Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples. Download Now »

Hadoop: The Definitive Guide, 2nd Edition

Hadoop: The Definitive Guide, 2nd Edition

Book Description

Discover how can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the framework — an implementation of , the algorithm on which built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with
  • Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Analyze datasets with Hive, Hadoop’s data warehousing system
  • Take advantage of , Hadoop’s database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems Download Now »
12»