Analytics With Apache Spark Is Coming

By Neil Chaudhuri | April 22, 2015

At Vidya we currently offer two courses, Software Engineering in Java and Agile Software Project Management with Scrum. In response to popular demand…OK, like eight or nine people…we are currently working on a third course to be ready by Summer 2015 tentatively called Analytics with Apache Spark.

As “Big Data” becomes more and more of a thing, there just aren’t enough software engineers who know the tools and techniques for doing meaningful, performant, cloud-scale analytics. Meanwhile, Apache Spark is surging in popularity for two reasons. Spark provides a much easier programming model than old-school MapReduce in Hadoop, which is great for developers. And Spark works a lot faster because it optimizes operations over cluster memory rather than lethargic disc I/O like MapReduce, which is great for everybody.

Like Software Engineering in Java, Analytics with Apache Spark will be heavy on code–both with examples and hands-on exercises. We will use Spark’s Scala API since the Java and Python APIs aren’t as complete and performant. If you don’t know Scala, don’t worry. We will spend time on learning just enough Scala to be dangerous with Spark.

Analytics with Apache Spark will feature two three-hour sessions.

The first session focuses on understanding the advantages of Spark for analytics, learning a little Scala, and mastering the Spark API–particularly the SparkContext and the RDD, Spark’s fundamental abstraction. We start simple with hardcoded datasets and the Spark shell and progress to real world datasets and full-fledged Spark programs. The second session focuses on using Spark in the real world–running and monitoring on a cluster, integrating with Hadoop (for example how to use Spark with HBase), performance considerations with RDDs, and tuning, testing, and debugging.

The entire course will be taught in a classroom, but eventually there will be an online version as well.

Some of the details may change, but this is what we currently envision for Analytics with Apache Spark. It is hard to teach such a complex topic in such a short time, but we hope to create a course that gives you the confidence to market yourself as a Spark expert and to produce great analytics to prove it–and maybe to make some cash money doing what you love.

Let us know what you want to see in Analytics with Apache Spark. We want to build something you will learn from and enjoy.