-
Notifications
You must be signed in to change notification settings - Fork 1
Hive on Windows
This page contains the steps for deploying HIVE to the Windows Hadoop Cluster.
Install https://cygwin.com/setup-x86_64.exe
with the default options. This will be required to run Hive as it must be executed from bash.
This is required since Java doesn’t understand Cygwin path properly.
D:\>mkdir cygdrive
D:\>mklink /J D:\cygdrive\d\ D:\
Junction created for D:\cygdrive\d\ <<===>> D:\
Variables | Value |
---|---|
HIVE_HOME | d:\data-analytics\hive |
Add D:\data-analytics\hive\bin
to Path environment variable.
PS D:\> cd d:\data-analytics\
PS D:\> wget http://apache.is.co.za/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz -OutFile apache-hive-3.1.2-bin.tar.gz
$ cd /d/data-analytics/
$ tar -xvzf apache-hive-3.1.2-bin.tar.gz
$ echo "hive-3.1.2" > apache-hive-3.1.2-bin/_version.txt
$ mv apache-hive-3.1.2-bin hive
Append the following to hive/bin/hive-config.sh
export HADOOP_HOME='/cygdrive/d/data-analytics/hadoop'
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME='/cygdrive/d/data-analytics/hive'
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
</configuration>
PS C:\> hdfs dfs -mkdir /tmp
PS C:\> hdfs dfs -mkdir -p /user/hive/warehouse
PS C:\> hdfs dfs -chmod g+w /tmp
PS C:\> hdfs dfs -chmod g+w /user/hive/warehouse
Derby is used since the purpose of the environment is to be a temporary completely self-contained solution:
$ cd d:\data-analytics
$ schematool -dbType derby -initSchema
$ cd d:\data-analytics
$ $HIVE_HOME/bin/hive --service hiveserver2 start
$ cd d:\data-analytics
$ $HIVE_HOME/bin/beeline -u jdbc:hive2://pshp111:10000
This does work, but if you are too impatient after starting the service it will appear that the server is not running.
create database if not exists test_db;
use test_db;
create table test_table (id bigint not null, value varchar(100));
show tables;
You can browse HDFS to confirm the table was created: http://pshp111:9870/explorer.html#/user/hive/warehouse/test_db.db/test_table
insert into test_table (id,value) values (1,'ABC'),(2,'DEF');
This will start a MapReduce job that can be viewed at: http://pshp111:8088/cluster
- Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide - Installing and Running Hadoop and Spark on Windows - https://kontext.tech/docs/DataAndBusinessIntelligence/p/apache-hive-300-installation-on-windows-10-step-by-step-guide
- Setup Hive on Hadoop YARN Cluster - https://sysadmins.co.za/setup-hive-on-hadoop-yarn-cluster/
- Failed to initialize schema for HiveServer2 in Apache Hive 3.0.0 on Cygwin (Windows 10) - https://stackoverflow.com/questions/52719538/failed-to-initialize-schema-for-hiveserver2-in-apache-hive-3-0-0-on-cygwin-wind
- Cannot connect to hive using beeline, user root cannot impersonate anonymous - https://stackoverflow.com/questions/43180305/cannot-connect-to-hive-using-beeline-user-root-cannot-impersonate-anonymous
- Configuring the JDBC Interpreter for Apache Drill and Apache Hive - https://mapr.com/docs/61/Zeppelin/ConfigureJDBCInterpreter.html#concept_b5l_xdk_qbb__section_a5z_d2k_qbb
- Running Hive Queries in Zeppelin - https://mapr.com/docs/61/Zeppelin/ZeppelinHive.html