[O’R] BigDataFR recommends: The O’Reilly Data Show Podcast – Michael Stack #hadoop #cloudera #datacientist

BigDataFR recommends: The O’Reilly Data Show Podcast – Michael Stack, engineer at Cloudera
Michael Stack on HBase past, present, and future.
Coming full circle with Bigtable and HBase

« At least once a year, I sit down with Michael Stack, engineer at Cloudera, to get an update on Apache HBase and the annual user conference, HBasecon. Stack has a great perspective, as he has been part of HBase since its inception. As former project leader, he remains a key contributor and evangelist, and one of the organizers of HBasecon.
In the beginning: Search and Bigtable

During the latest episode of the O’Reilly Data Show Podcast, I decided to broaden our conversation to include the beginnings of the very popular Apache HBase project. Stack reminded me that much of the big data community in the SF Bay Area is centered around search technologies, such as HBase. In particular, HBase was inspired by work out of Google (Bigtable), and the early engineers had ties to projects out of the Internet Archive:

At the time, I was working at the Internet Archive, and I was working on crawlers and search. The Bigtable paper looked really interesting to us because the archive, as you know, we used to host — or still do — the Wayback Machine. The Wayback Machine is a picture of the Web that goes back to 1998, and you could look at the Web at any particular time. What pages looked liked at a particular time. Bigtable was very interesting at the Internet Archive because it had this time dimension.

A group had started up to talk about the possibility of implementing a Bigtable clone. » […]

Read more
Ben Lorica, Chief Data Scientist & Director of Content Strategy for Data at O’Reilly Media, Inc
Source: radar.oreilly.com

Laisser un commentaire