Big Data : Hadoop : Some important facts and terms

Big data is characterised by 3Vs I.e. Volume, Velocity and Verity.

Big data implementations are used to store read only / populate only data of high 3Vs. It is not replacement of Relational databases.

Best use cases of utilisations of big data / Hadoop implementations are-
1. As store of data generated from IoTs (Internate of Things)
2. As archived data store of parts of data from Relational Databases such as Audit Trail data, Field history, User analytics data. These data are usually generated by apps and are written in a RDBMS but are not usually edited/updated.

HIVE: it is a utility which is used to store table type data in HADOOP. Table metadata is stored and managed in MySQL internally. Actual data are kept in data nodes in HADOOP file system. HIVE scripts are almost identical to SQL. Example HIVE scripts are-
Select * from TableName
Select Count(Value) from TableName

Creating a table in Hadoop file system using HIVE script-
create table CountryTable(id int,name string)
HIVE script to load data in a HIVE created table stored in Hadoop file system-
load data local inpath '/home/hduser/country.txt' overwrite into table CountryTable;

SQOOP: Often it is referred as SQL Input Output. It is a tool used to import data from relational databases such as SQL server, MySQL, Oracle to Hadoop file system.

This entry was posted in BIG DATA, HADOOP, Hive. Bookmark the permalink.

Leave a Reply