As aweseome as HDFS is, the beauty of a FUSE mounted FileSystem is the fact that you can monitor everything using standard *inx utils (well – as long as they are happening locally at least, thanks martin for pointing that out – see comments below)…
Anyways… We normally FUSE mount gluster, so when we run hadoop on top of gluster, its easy to watch whats going on behind the scenes by using posix utilities…. case inpoint: inotifywait .
Today while debugging some YARN operations, i got a chance to try out inotify. Inotify is a simple real time, recursive file monitoring tool.
FYI, this techniuqe is not specific to hadoop and gluster – im just using those as examples. You could just the same use inotify-tools to monitor any other file operations. For example:
Other possible awesome uses of the inotify utilities:
– monitoring static files served by a web server, for example, to confirm that the same file’s/directories were’nt being read too often or poorly cached from a shared storage pool.
– monitoring the amount of file ops occuring in the data/ directories of a RDBMS to confirm that too much disk i/o wasn’t occuring.
– etc etc etc…
How YARN, the FileSystem, and FUSE intersect in a gluster deployment.
(you can skip this if all you care about is installation and running of inotify-tools)…
For those wondering what I mean by “file system”… Hadoop was born as two projects : A file system and a mapreduce framework. The file system, HDFS, provided an API interface, which anyone could implement using any particular filesystem, so that different file systems can be used underneath mapreduce.
Fast forward a few years, and now YARN comes along, which further decoupled mapreduce into a resource allocator and the mapreduce application.
In any case, YARN needs a FileSystem implementation for some of its distributed data (i.e. the distributed staging/ directory).
In this particular scenario, I was attempting to trace some operations occuring on the file system. Rather than having to do java specific hacks or log inspection, since the gluster implementation of the hadoop file system is mounted over FUSE (see https://forge.gluster.org/hadoop/glusterfs-hadoop for details), we can simply run standard *nix file monitoring utilities to see what java FileSystem operations YARN is doing under the hood on startup.
Caveat: You want to run these operations on multiple nodes – because the file ops will only be seen for operations that are happening locally. TL;DR ~ run this on your YARN master node, so that you can see everything YARN is doing on startup. OR ELSE, run it on every node.
So, anyways, here’s how to install inotify-tools and run it recursively against a folder:
1) First install inotify from the EPELs.
#> rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
#> yum install inotify-tools
2) Watch YARN do its 'thang
#> inotifywait -r -m /mnt/glusterfs/
/mnt/glusterfs/ CREATE,ISDIR tmp
/mnt/glusterfs/ OPEN,ISDIR tmp
/mnt/glusterfs/ CLOSE_NOWRITE,CLOSE,ISDIR tmp
/mnt/glusterfs/tmp/ CREATE,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/ OPEN,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/ CLOSE_NOWRITE,CLOSE,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ ATTRIB,ISDIR
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CREATE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ ATTRIB,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ ATTRIB,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/lib/ OPEN glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...