Skip to content

Hive on Windows

Johnny Foulds edited this page Sep 19, 2019 · 7 revisions

This page contains the steps for deploying HIVE to the Windows Hadoop Cluster.

Install Cygwin

Install https://cygwin.com/setup-x86_64.exe with the default options. This will be required to run Hive as it must be executed from bash.

Create a symbolic link

This is required since Java doesn’t understand Cygwin path properly.

D:\>mkdir cygdrive
D:\>mklink /J  D:\cygdrive\d\ D:\
Junction created for D:\cygdrive\d\ <<===>> D:\

Install Hive

Setup environment variables

Variables Value
HIVE_HOME d:\data-analytics\hive

Add D:\data-analytics\hive\bin to Path environment variable.

Download the Binaries

PS D:\> cd d:\data-analytics\
PS D:\> wget http://apache.is.co.za/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz -OutFile apache-hive-3.1.2-bin.tar.gz

Install the Binaries

$ cd /d/data-analytics/
$ tar -xvzf apache-hive-3.1.2-bin.tar.gz

$ echo "hive-3.1.2" > apache-hive-3.1.2-bin/_version.txt
$ mv apache-hive-3.1.2-bin hive

Edit hive-config.sh

Append the following to hive/bin/hive-config.sh

export HADOOP_HOME='/cygdrive/d/data-analytics/hadoop'
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME='/cygdrive/d/data-analytics/hive'
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar

Create /hive/conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>hive.metastore.event.db.notification.api.auth</name>
		<value>false</value>
		<description>
			Should metastore do authorization against database notification related APIs such as get_next_notification.
			If set to true, then only the superusers in proxy settings have the permission
		</description>
	</property>
	<property>
		<name>hive.server2.enable.doAs</name>
		<value>false</value>
		<description>
			Setting this property to true will have HiveServer2 execute
			Hive operations as the user making the calls to it.
		</description>
	</property>	
</configuration>

Setup Hive HDFS folders

PS C:\> hdfs dfs -mkdir /tmp
PS C:\> hdfs dfs -mkdir -p /user/hive/warehouse

PS C:\> hdfs dfs -chmod g+w   /tmp
PS C:\> hdfs dfs -chmod g+w   /user/hive/warehouse

Initialize metastore

Derby is used since the purpose of the environment is to be a temporary completely self-contained solution:

$ cd d:\data-analytics
$ schematool -dbType derby -initSchema

Start HiveServer2 service

$ cd d:\data-analytics
$ $HIVE_HOME/bin/hive --service hiveserver2 start

Start the CLI

$ cd d:\data-analytics
$ $HIVE_HOME/bin/beeline -u jdbc:hive2://pshp111zatcwi:10000

This does work, but if you are too impatient after starting the service it will appear that the server is not running.

Test script

create database if not exists test_db;

use test_db;
create table test_table (id bigint not null, value varchar(100));
show tables;

You can browse HDFS to confirm the table was created: http://pshp111zatcwi:9870/explorer.html#/user/hive/warehouse/test_db.db/test_table

insert into test_table (id,value) values (1,'ABC'),(2,'DEF');

This will start a MapReduce job that can be viewed at: http://pshp111zatcwi:8088/cluster

Web References

Clone this wiki locally