preloader

Accessing data in the lake via Excel-like tools

blog-image

Previous post

Key takeaways

  • It’s relatively simple to hook AWS Athena onto ODBC-compliant software (think Microsoft Excel)

  • Doing so gives you powerful processing on the back of well-understood software

  • Data are living in a data lake on Amazon Simple Storage Service (S3)

  • Unfamiliar tools are not going to hamper your teams as they dig in.

Introduction

We want our people to have access to the data, but this is often difficult if those individuals don’t have SQL skills or are not able to drive SQL query tools like DBeaver or similar SQL clients.

One thing most teams are familiar with however is Microsoft Excel. Now I’m not advocating that we take all our big data and try to manipulate it all with Excel or OpenOffice. For sure there are much more powerful tools to really crank through your data. However, if there’s one thing that democratizing data is about, it’s getting that data to people who can make use of it. If they can use tools they’re familiar with, then this makes them productive from the get-go.

In this article we examine how to connect a powerful tool like AWS Athena to Excel, PowerBI or OpenOffice in order to manipulate the data. You data engineers and data scientists might roll your eyes at this preposterous idea. When it comes to engaging all members of the team, then perhaps it’s not so preposterous.

The final aspect that makes this so attractive is that not only does it mean your team can be productive right off the bat, but behind Excel you have the power of AWS Athena that can drive massive queries of large datasets, and your teams don’t need to re-skill just yet.

A Quick Example

For the purpose of this blog post, I’ve got some “big data” from my home automation system (it’s really not that big, but it’ll illustrate a point). That data resides in Amazon S3 in Parquet format - not something that Excel can readily access. I would like to draw some graphs in Excel of the usage of energy in my house by accessing that data. As such, AWS Athena is providing that bridge to the parquet files, but the ODBC driver for Athena is allowing my Excel to connect and read that data.

It’s a win on multiple levels because:

  • I can keep the data in a place that’s cheap (AWS S3) and in a format that’s highly optimized (Parquet) and compressed using SNAPPY.

  • This means I’m not mucking with variable formats that my data engineers are going to have to keep sync’d.

  • I can give my team access to the data even though they know nothing about AWS Athena, Parquet, SNAPPY, S3 or any of the other tech behind the scenes.

  • My team know Excel and they can quickly draw the graphs of energy consumed to present at the monthly management meeting.

  • Finally, even tools like PowerBI can make use of the ODBC drivers to handle more complex reporting needs.

Setup

The meat and potatoes part of this article is really technical in nature, so we’ll continue looking at the technical aspects of making this work in our blog; but it’s good for the leaders to know that accessing your data lake through well-understood tools makes using that data a reality for many more in your workforce.

Harnessing data to provide insights.