Review: Strata Conference New York + Hadoop World 2012: Complete Video Compilation

Disclosure: I received a review copy of this title from O’Reilly

This set is without doubt my favorite set of O’Reilly videos to date. If you’ve read my earlier reviews you’ll know that I’m a big fan of O’Reilly’s videos and conferences. What differentiates this set is that combines the usual diversity of topics from a good tech conference with numerous deep dives - groups of videos with 4 or 5 titles in the set that dig deeper into a specific subject.  As with previous O'Reilly content that I have reviewed, the production quality of this set is amazing. Video is high definition, sound is crystal clear and, while not cheap, at a whopping 107 hours the set is very reasonably priced at $400.

So far some of my favorite talks have been:

 An Introduction to Hadoop – Mark Fei

This talk demonstrates everything that is great about this conference and video set. In this 4 part introduction to Hadoop Mark Fei does a great job of introducing MapReduce, core Hadoop and hdfs and providing a high level overview of the many acronyms of the Hadoop ecosystem. This was my first real exposure to Hadoop and Mr. Fei provided enough information to get me excited about the technology as well as the requisite knowledge to get me started. The speaker did a great job, clearly and thoroughly covering the content and it was quite an enjoyable watch. The down side of this series of videos is that the speaker is constantly interrupted with questions. To his credit he did his best to defer these questions until the end, but at times the content became a little disjointed as the in-person audience fired their questions at the speaker. Questions are to be expected at a large tech event and in this case the quantity of questions only serves to highlight the current popularity of Hadoop.  This really isn’t a big complaint but it was the one negative in a really good video series.

Moneyball for New York City – Michael Flowers

I loved this talk. It’s completely different from most talks I’ve seen.  In summary it’s a very quick non technical talk about a skunk work in New York City’s mayor’s office. Mr. Flowers provides a fascinating look at a real world scenario where the Office of Policy and Strategic Planning’s Analytics Unit successfully leveraged disparate real world data to more efficiently deal with illegal building conversions (typically a building owner illegally modifying a building to house more tenants despite zoning restrictions). Michael Flowers is a former prosecutor and is very passionate about the problems his office is trying to solve. He breaks down the problem, how it was previously approached and how it is now being approached by his office.  This is a very enjoyable talk – perhaps more so because the problem takes a front seat to the technology used to solve it - and I’ll be looking for more talks by Mr. Flowers!

Designing Data Visualizations Workshop - Noah Iliinsky

Big data is as much about disseminating information as it is capturing it. In this 4 part talk Noah Iliinsky of IBM provides an extremely accessible overview of what is a complicated topic.  He provides many real world examples of common pitfalls of data visualization and goes at length to explain the issues and provide appropriate alternatives. I found 3 of the 4 videos to be very useful. The third session was lab based and required the attendee to walk through their own real world visualization problem. Since I’m not working on any specific data visualization tasks at the moment I didn’t get too much out of this session but I will no doubt walk through this content next time I undertake such a visualization task.  I really feel the content in these videos is useful to everyone working in the IT industry – we all disseminate information – through application UIs, reports, Powerpoint presentations, targeting end users, internal teams and project stakeholders, etc. The ability to share this information in a correct and meaningful way seems foundational. Even if it even only explicitly articulates and reinforces design concepts already in your subconscious, I highly recommend this video series.

Summary (tl;dr;)

The combination of diverse topics and in-depth deep dives sets this title apart from others I have viewed. At 107 hours the cost is reasonable and is much less expensive than the various costs of attending a conference in person. If you’re interested in Hadoop or Big Data