WHY DID THEY CALL US ?
As part of its activity of sales data analysis (collected from retailers and panelist consumers), Nielsen decided to migrate its data storage, aggregation and reporting platform from Oracle to Hadoop.
The aggregation engine is developed in C++
The primary objective of this migration is to gather all existing platforms into a single global platform.
The performance objectives are as follows:
• improvement of the performance of the data extraction part
• implementation of an efficient data reporting tool
• acceleration of the aggregation engine
• Porting oracle SQL data extraction requests (via occi) to Hadoop/Impala (via ODBC). Requests which are not supported by Impala are interpreted directly into the C++ code.
• Design and development of a high-performance tool for loading aggregate data into IMPALA tables via HDFS, using the C++ library libhdfs3.
• Hybrid parallelization MPI+BOOST. Threads on Hadoop clusters from the aggregation engine
• The data loading tool allows you to insert a 70GB file into an IMPALA table in less than 10 minutes on a gigabit network. The use of libhdfs3 allows the application to have a small memory footprint compared to libhdfs use (gain of a factor of 8)
• The aggregation engine distributed with MPI+BOOST. Threads achieves 60% parallel efficiency on 8 nodes (20 core/node) of a hadoop cluster