UMBC CSEE Colloquium

Integration of HBase and Lucene for real-time big data analysis

Yin Huang
CSEE Department, UMBC

1:00 pm Friday, 22 February 22 2013, ITE 227, UMBC

The increasing size of data sets have posed several challenges on real-time big data analysis, Business Intelligence for example, in terms of system scalability and data availability. Business Intelligence focuses on mining the big data, providing multidimensional visualization and thus supporting business decision making, ideally in a real-time fashion. Traditional relational database management systems fail to provide a flexible and stable solution. Several NoSQL database systems have been proposed to tackle these challenges, such as Cassandra and HBase. HBase, however, does not support full-text searching; current implementation of HBase only offers the row-key based indexing. In this talk, we introduce building Lucene index on top of HBase to support multidimensional queries for data mart under MapReduce framework, serving as the corner stone for future data analysis and business report.

Yin Huang obtained his B.S. in Computer Science from Nanchang University in 2009, and studied in Chongqing University for two years for his M.S. He started his Ph.D. program in Computer Science at the University of Maryland, Baltimore County in 2011. In 2012, he interned in IBM Ottawa lab for four months with the focus of using Multicore-Enhanced Hadoop System for Buisness Intelligence. His current research area is database, data mining, and parallel computing.