Data Science With RapidMiner: Using RapidMiner to read data from HBase

Sunday, 30 August 2015

Using RapidMiner to read data from HBase

HBase is a database within the Hadoop ecosystem. Here's a very simple example RapidMiner process that connects to an HBase server and reads a value.

The process uses the RapidMiner Python operator and a package called 'happybase'.

As always when integrating systems together, there is a lot of leg-work to do to get things working. This starts with a running Hadoop cluster with HBase as well as some data. For this toy example, I created the world's simplest table called 'test' containing two rows. For example, from the HBase shell, the 'scan' command yields the following.

hbase(main):002:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1440837877452, value=value1
row2 column=cf:b, timestamp=1440837887539, value=value2
2 row(s) in 0.0290 seconds

To allow remote access, Thrift must be started to allow remote connections to get to HBase. This is typically done by running the following command within the HBase installation on the machine running HBase.

./bin/hbase thrift start

The final step is to ensure that remote requests to the default Thrift port (9090 by default) are not blocked by the firewall on the HBase machine.

The RapidMiner process can now be run. The Python code within the RapidMiner process is shown below. Change the script to match the values in your environment as you need.

import pandas as pd

import happybase

def rm_main():

def dict_to_dataframe(d):

df=pd.DataFrame(d.items())

df.set_index(0, inplace=True)

return df

# use the name or IP address where HBase is running

connection = happybase.Connection('192.168.1.76')

# use a table name in the database

table=connection.table('test')

# this scans the database and prints to the log

for key, data in table.scan():

print key, data

# this selects a row containing row1

row1 = table.row('row1')

return dict_to_dataframe(row1)

I'm by no means a Python expert so I don't expect this is the world's best example. Nonetheless, it shows the possibilities.

When run in my environment, the returned example set is as follows.

I've only scratched the surface of what could be done using the 'happybase' package but I hope this gives you some ideas about what you might be able to do.

Data Science With RapidMiner

Search this blog

Sunday, 30 August 2015

Using RapidMiner to read data from HBase

No comments:

Post a Comment

About Me

Labels

Blog Archive