The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

Yarn is an APP : Recognize !

Gluster
2014-01-24

This post is very raw… I’ll refine it later……. But for those that run into ClassNotFound/MethodNotFound issues, it might be useful food for thought. 

Poking around in AMBARI recently, I found that there is this parameter that defines the yarn default classpath.  

yarn.application.classpath

SO WHAT you say… OF COURSE YARN HAS A CLASSPATH.  Who cares???

Well … I care !

The reason why is that I always seem to run into MethodNotFound issues.  Especially when hadooping.  I wrote up my last MethodNotFound adventure in hadoop here.  Why ? Because cassandra seemed to be bundling its own obsolete version of avro in my classpath.  So I’m very dubious of the global hadoop lib/ classpath . 

Even though Im happy that yarn and mapreduce are decoupled:  But afraid that yarn has so much brought into its classpath by default………. See below.

…( Note my fears here might be unjustified … but lots of jars on the classpath usually spells nightmares for me)…

yarn.application.classpath $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/* CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries

Anyways … The old “/usr/lib/hadoop/lib” directory is now not so univeral as it used to be. 

What this means for people migrating to YARN ? 

You need to be smart about where you put your libs.  Hadoop is “sort of” OSGi (i.e. YARN now has its own libraries…) and thats a good thing – it means less likeliehood of hadoop apps tripping over differences in the hadoop library jars which they inherit classpath from.

But the downside? 

Now mapreduce apps need to make sure that, if you were preloading your hadoop/lib directory with some libraries for runtime deps, that you either (1) add that lib/ tothe yarn.application.classpath OR ELSE that you change where you are putting those jars (i.e. put them in /usr/lib/hadoop-yarn/lib  Im not sure what the exact right deployment idiom is, just yet… So dont qoute me.  

Just recognize that YARN classpath might not, by default, inherit from /usr/lib/hadoop/lib.

Anyways…

Will update this post once I learn more.  FYI For those of you using AMBARI, I think the ambari  folks have been gracious enough to include /usr/lib/hadoop/lib into the default yarn classpath.  This is good (if you are lazy), but might be bad (if you hate runtime NoSuchMethodERrors that tend to occur when you have the same version of a jar floating around in your class path + cutting edge API usage in your Apps).

Moral of the story: 

At the very least : You should know that yarn apps have a default classpath, and that hadoop/lib/  isn’t necessarily on it.



BLOG

  • 06 Dec 2020
    Looking back at 2020 – with g...

    2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

    Read more
  • 27 Apr 2020
    Update from the team

    It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

    Read more
  • 03 Feb 2020
    Building a longer term focus for Gl...

    The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

    Read more