February 11, 2016
With the incredible popularity of big data and Hadoop every Business Intelligence (BI) vendor wants to also be known as a "BI on Hadoop" vendor. But what they really can do is limited to a) querying HDFS data organized in HIVE tables using HiveQL or b) ingest any flat file into memory and analyze the data there. Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform". If we assume that all BI platforms have to have data extraction/integration, persistence, analytics and visualization layers, then "Native Hadoop/Spark BI Platforms" should be able to (ok, yes, I just had to add Spark)
- Use Hadoop/Spark as the primary processing platform for MOST of the aforementioned functionality. The only exception is visualization layer which is not what Hadoop/Spark do.
- Use distributed processing frameworks natively, such as
- Generation of MapReduce and/or Spark jobs
- Management of distributed processing framework jobs by YARN, etc
- Note, generating Hive or SparkSQL queries does not qualify
- Do declarative work in the product’s main user interface interpreted and executed on Hadoop/Spark directly. Not via a "pass through" mode.
- Natively support Apache Sentry and Apache Ranger security