Some Hive gotchas for non-production setups…

Gluster

2013-11-15

Setting up your IDE to test HIVE: Export hadoop environmental variables… Maybe some day it will be as easy to run hive fully locally as it is to run pig.

FYI, this was just an initial post. For a deeper understanding of the real details of how hive works check here: http://jayunit100.blogspot.com/2013/12/the-anatomy-of-jdbc-connection-in-hive.html.

One of the main differences between pig and hive is that hive is written firmly with mapreduce and hadoop at its basis – wherease pig is designed more as a language, which could potentially run on many platforms. One consequence of this is that testing hive code in your IDE, or purely from a maven/java build is non-trivial… You really need the hadoop tarball and the correct hive configuration parameters in order to do local testing.

1) A side note for those testing against a real running hive server. When running the hive server locally or remotely, i always change “localhost” to 0.0.0.0. this is because if you bind to localhost (i think) outside machines wont necessarily be able to get in. Binding your hive server to 0.0.0.0 is thus important. You do that in your hive-site.xml file.

2) Maven dependencies for the source seemed a little unstable… I had to manually wget a couple of them, including:

 http://repo.springsource.org/plugins-release/sqlline/sqlline/1_0_2/sqlline-1_0_2.jar
 http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar

And then use the autogenerated mvn install directives to get them properly in my classpath.

3) FYI I had to remove the db.lck and dbex.lck files created by derby when toggling between hive local client and using the server. Not sure if thats the right way to do it… but otherwise, you cant start a functional hive server if there is an existing derby db in your existing working directory. The reason for this is that when you start a new hive instance, the lock files, if they exist, will prevent initialization of a new metastore in the same directory.

5) Finally found a cool way to get HIVE running and working in local mode in eclipse, without a full installation, initially seemed tricky, but luckily mike@hortonworks.com had a solution.

By referencing the basic hive tutorials, pulling in the poms from the hive source pom.xml, and using the connection url “jdbc:hive://”, hive server will launch in local mode, from inside of Java. 🙂

BUT WAIT: It will still try to launch hadoop from $HADOOP_HOME – so you will want to have the hadoop tarball downloaded to run hive jobs from your IDE. Just export HADOOP_HOME in your environment (See figure at the top of this page).

6) The Hive LOAD DATA commands can result in moving (and thus effective deletion) of your data files (if a LOAD DATA is followed by a DELETE TABLE). In local mode, this means that your local files will be move to another location on the same FS. In distributed mode, it means that the files will be moved from one place on the DFS to another. Either way the semantics are a little wacky. So just be careful !

LOAD DATA LOCAL INPATH results in a COPY of local data into the dfs.

... but ...

LOAD DATA INPATH results in a MOVE operation of local data into the dfs.

BLOG

06 Dec 2020
Looking back at 2020 – with g...

2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

Read more
27 Apr 2020
Update from the team

It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

Read more
03 Feb 2020
Building a longer term focus for Gl...

The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

Read more

Some Hive gotchas for non-production setups…

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...