Friday, June 8, 2012

Develop Jasper report with Hive


In last 2 blog posts we learned
  • Setup Hadoop and writing simple map-reduce jobs
  • Setup hive and firing sql queries over it

In this blog we will use Jasper Report to generate a report which will use Hive as the data store.
We will generate report form the list of customers who have mobile phone
It is assumed that you have Jaspersoft iReport Designer pre installed.

  • Start Hive in server mode so that we can connect it using jdbc client
      • hive --service hiveserver

  • Create table and load the data in the have table from the hive shell . This is done so that we can query it. Hadoop map reduce programs will be called internally to fetch data from this table. The data will be distributed over HDFS and will be collected and returned according to the query
      • hive -p 10000 -h localhost
      • CREATE TABLE person (PERSON_ID INT, NAME STRING, FIRST_NAME STRING, LAST_NAME STRING, MIDDLE_NAMES STRING, TITLE STRING, STREET_ADDRESS STRING, CITY STRING, COUNTRY STRING, POST_CODE STRING, HOME_PHONE STRING, WORK_PHONE STRING, MOBILE_PHONE STRING, NI_NUMBER STRING, CREDITLIMIT STRING, CREDIT_CARD STRING, CREDITCARD_START_DATE STRING, CREDITCARD_END_DATE STRING, CREDITCARD_CVC STRING, DOB STRING) row format delimited fields terminated by ',';
      • load data inpath 'export.csv' overwrite into table person;




  • Start the iReport Designer
    • Create a new datasource to connect to Hive Database. This is the first step which will add a hive database.


  • Create a new report. Refer to the screenshots for more details. An query is given to fetch appropriate data from the hive.




    This way we now have a distributed file system (HDFS). A map-reduce engine above it(Hadoop). Datawarehousing tool over these framework (Hive) and then used a reporting tool to extract out menaingful data out of it and displaying it. Jasper report has built-in capabilities to communicate with Hive (via JDBC).


    Peace.
    Sanket Raut

No comments: