This post is very raw… I’ll refine it later……. But for those that run into ClassNotFound/MethodNotFound issues, it might be useful food for thought.
Poking around in AMBARI recently, I found that there is this parameter that defines the yarn default classpath.
yarn.application.classpath
SO WHAT you say… OF COURSE YARN HAS A CLASSPATH. Who cares???
Well … I care !
The reason why is that I always seem to run into MethodNotFound issues. Especially when hadooping. I wrote up my last MethodNotFound adventure in hadoop here. Why ? Because cassandra seemed to be bundling its own obsolete version of avro in my classpath. So I’m very dubious of the global hadoop lib/ classpath .
Even though Im happy that yarn and mapreduce are decoupled: But afraid that yarn has so much brought into its classpath by default………. See below.
…( Note my fears here might be unjustified … but lots of jars on the classpath usually spells nightmares for me)…
yarn.application.classpath | $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/* | CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries |
Anyways … The old “/usr/lib/hadoop/lib” directory is now not so univeral as it used to be.
What this means for people migrating to YARN ?
You need to be smart about where you put your libs. Hadoop is “sort of” OSGi (i.e. YARN now has its own libraries…) and thats a good thing – it means less likeliehood of hadoop apps tripping over differences in the hadoop library jars which they inherit classpath from.
But the downside?
Now mapreduce apps need to make sure that, if you were preloading your hadoop/lib directory with some libraries for runtime deps, that you either (1) add that lib/ tothe yarn.application.classpath OR ELSE that you change where you are putting those jars (i.e. put them in /usr/lib/hadoop-yarn/lib . Im not sure what the exact right deployment idiom is, just yet… So dont qoute me.
Just recognize that YARN classpath might not, by default, inherit from /usr/lib/hadoop/lib.
Anyways…
Will update this post once I learn more. FYI For those of you using AMBARI, I think the ambari folks have been gracious enough to include /usr/lib/hadoop/lib into the default yarn classpath. This is good (if you are lazy), but might be bad (if you hate runtime NoSuchMethodERrors that tend to occur when you have the same version of a jar floating around in your class path + cutting edge API usage in your Apps).
Moral of the story:
At the very least : You should know that yarn apps have a default classpath, and that hadoop/lib/ isn’t necessarily on it.
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...