BigData - Exercise

ZiBaT => Peter Levinsky => Big Data=> exercise

EXTRA Investigation of Hadoop

Updated : 2017-02-23

EXTRA Investigation of Hadoop/Hive

Idea: To have the hadoop (the ligth-version) up and running
Background: You have a Hadoop instance running (see installingHadoop)
the book : Fronter: Big Data Forensics – Learning Hadoop Investigations, Joe Sremack, Packt publishing

Create your own table

In the hive (from ambari)

Create a table to hold information of e.g. temperature and light together with a timestamp from a sensor.
Look of the description
insert a few rows in this table
select information from the table
export the information in a comma seperated file (*.csv) see the book p.127

Create a programming interface to this sensor table

You can use C#

You are going to use an ODBC driver and Visual Studio

Download a ODBC driver to hortonworks Hive
https://hortonworks.com/downloads/ -- choose 32bit / (64bit) and install the msi-file
Open your 'ODBC Data Source' to add this new driver
see (! for windows 7 - so do not download - this you did in step 1) from the middle how to setup ODBC data source
https://github.com/hortonworks/hadoop-tutorials/blob/master/Sandbox/T07_Installing_the_Hortonworks_ODBC_Driver_on_Windows_7.md
name as well as password is 'maria_dev'
Create a C# project in Visual Studio

code example ------------------------------:

var conn = new OdbcConnection("DSN=MyHive"); // My ODBC name is 'MyHive'
conn.Open();
var sql = new OdbcCommand("-- some select statement -- ",conn);

Console.WriteLine("Result");

var reader = sql.ExecuteReader();
while (reader.Read())
{

/* Example if table row is (temp int, ligth int, time string)
int temp = reader.GetInt32(0);
int light = reader.GetInt32(1);
string time = reader.GetString(2);

Console.WriteLine($"t={temp} l={light} timestamp={time}");
*/
}

Console.WriteLine("End");
Console.ReadLine();

------------------------------------------------------

You can use Java

Make a progarm in Java preferable (or in C# -- not sure it will work directly) - you can e.g. use Netbeans to develope the program.
You can take the *.jar file move to linux and run as follow: java -jar *.jar

To access look at this example: https://cwiki.apache.org/confluence/display/Hive/HiveClient (some example to Python and PHP as well)

Setup and Use these sample files (Instead of the NYSE from the tutorial)

http://stackoverflow.com/questions/10843892/download-large-data-for-hadoop