We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.
The first Spark screencast is called First Steps With Spark and walks you through downloading and building Spark, as well as using the Spark shell, all in less than 10 minutes!
The second screencast is a 2 minute overview of the Spark documentation.
We hope you find these screencasts useful.
At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.
We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release — 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, or download the release today.
This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the Amazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.
We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.
Quantfind, one of the Bay Area companies that has been using Spark for predictive analytics, recently posted two useful entries on working with Spark in their tech blog:
Thanks for sharing this, and looking forward to see others!
Recently, we’ve seen quite a bit of coverage of both Spark and Shark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.
In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.
Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.
Spark version 0.6.0 was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the release notes.