<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<tt>Some more analysis wrt storage space,</tt><tt><br>
</tt><tt><br>
</tt><tt>"Since support was added to the Linux kernel, there is a
hard limit of</tt><tt><br>
</tt><tt>64KiB for the size of each extended attribute value,
however different</tt><tt><br>
</tt><tt>file systems impose additional constraints. For ext2/3/4
and btrfs,</tt><tt><br>
</tt><tt>each extended attribute is limited to a file system block
(e.g. 4 KiB),</tt><tt><br>
</tt><tt>and all (including names and values) must fit together in a
single</tt><tt><br>
</tt><tt>block. In XFS the names can be up to 256 bytes in length,
terminated</tt><tt><br>
</tt><tt>by the first 0-byte, and the values can be up to 64KB of
arbitrary</tt><tt><br>
</tt><tt>binary data. ReiserFS allows attributes of arbitrary size."</tt><tt><br>
</tt><tt><a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/Extended_file_attributes">https://en.wikipedia.org/wiki/Extended_file_attributes</a></tt><tt><br>
</tt><tt><br>
</tt><tt>Created a shell script to set 100 xattrs for a file with
basename</tt><tt><br>
</tt><tt>value as long as ~255.</tt><tt><br>
</tt><tt><br>
# -------------------<br>
</tt><tt>file=$1</tt><tt><br>
</tt><tt>for i in {1..100}</tt><tt><br>
</tt><tt>do</tt><tt><br>
</tt><tt> f="very very very very loooooooooooooooooong file
nameeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee$i";</tt><tt><br>
</tt><tt> h=$(echo $f | md5sum | awk '{print $1}');</tt><tt><br>
</tt><tt> setfattr -n
trusted.pgfid.3c3b44ab-f21f-4801-a0bc-5a337bd5047c.$h -v "$f"
$file;</tt><tt><br>
</tt><tt>done<br>
</tt><tt># -------------------</tt><tt><br>
</tt><tt><br>
Let me know if anybody thinks space could be issue for storing
these</tt><tt><br>
</tt><tt>information in xattrs.</tt><tt><br>
</tt><tt><br>
</tt><tt>Other experiments:</tt><tt><br>
------------------<br>
</tt><tt>For POC, I created two python scripts one to create index
and other</tt><tt><br>
</tt><tt>one to retrive value(gfid to path). I used MD5 for POC
purpose.</tt><tt><br>
</tt><tt><br>
</tt><tt><a class="moz-txt-link-freetext" href="https://gist.github.com/aravindavk/5307489f68cbcfb37d3d">https://gist.github.com/aravindavk/5307489f68cbcfb37d3d</a></tt><tt><br>
</tt><tt><a class="moz-txt-link-freetext" href="https://gist.github.com/aravindavk/d1d0ca9c874b7d3d8d86">https://gist.github.com/aravindavk/d1d0ca9c874b7d3d8d86</a></tt><tt><br>
</tt><tt><br>
</tt><tt>python pgfid_index.py <BRICK_PATH> # Updates required
xattrs </tt><tt>for all files<br>
</tt><tt><br>
</tt><tt>and</tt><tt><br>
</tt><tt><br>
</tt><tt>python gfid_to_path.py <BRICK_PATH> <GFID></tt><tt>
# Returns Path for given GFID<br>
</tt><tt><br>
</tt><tt>Note: This script uses `user.pgfid` prefix for xattr
instead of</tt><tt><br>
</tt><tt>`trusted.pgfid` for POC.</tt><tt><br>
</tt><tt><br>
</tt><tt>Once the design is finalized, I will update storage/posix
code.</tt><tt><br>
</tt><tt><br>
</tt><tt>Backward compatibility:</tt><tt><br>
</tt><tt>-----------------------</tt><tt><br>
</tt><tt>Same interface will be used to retrive information. That is</tt><tt><br>
</tt><tt><br>
</tt><tt>gluster volume set test build-pgfid on</tt><tt><br>
</tt><tt>getfattr -n glusterfs.ancestry.path -e text
/mnt/testvol/.gfid/<GFID></tt><tt><br>
</tt><tt><br>
</tt><tt>Ref:</tt><tt><br>
</tt><tt><a class="moz-txt-link-freetext" href="https://gluster.readthedocs.org/en/latest/Troubleshooting/gfid-to-path/">https://gluster.readthedocs.org/en/latest/Troubleshooting/gfid-to-path/</a></tt><tt><br>
</tt><tt><br>
</tt><tt>If any other component directly accessing xattrs instead of
using</tt><tt><br>
</tt><tt>getfattr interface, then that component need to be
changed.(For</tt><tt><br>
</tt><tt>example, glusterfind)</tt><tt><br>
</tt><tt><br>
</tt><tt>One more step will be introduced after `volume set` to
build the</tt><tt><br>
</tt><tt>index. Current implementation is healing pgfid xattrs on
named lookup,</tt><tt><br>
</tt><tt>if we disable this feature then we have to provide seperate
interface</tt><tt><br>
</tt><tt>to heal(For example, getfattr -n pgfid.heal <PATH>)</tt><br>
<pre class="moz-signature" cols="72">regards
Aravinda</pre>
<div class="moz-cite-prefix">On 12/09/2015 11:17 AM, Aravinda wrote:<br>
</div>
<blockquote cite="mid:5667C08E.2020600@redhat.com" type="cite">Hi,
<br>
<br>
Sharing draft design for GFID to Path Conversion.(Directory GFID
to Path is
<br>
very easy in DHT v.1, this design may not work in case of DHT 2.0)
<br>
<br>
Performance and Storage space impact yet to be analyzed.
<br>
<br>
Storing the required informaton
<br>
-------------------------------
<br>
Metadata information related to Parent GFID and Basename will
reside
<br>
with the file. PGFID and hash of Basename will become part of
Xattr
<br>
Key name and Basename will be saved as Value.
<br>
<br>
Xattr Key = meta.<PGFID>.<HASH(BASENAME)>
<br>
Xattr Value = <BASENAME>
<br>
<br>
Non-crypto hash is suitable for this purpose.
<br>
Number of Xattrs on a file = Number of Links
<br>
<br>
Converting GFID to Path
<br>
-----------------------
<br>
Example GFID: 78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038
<br>
<br>
1. List all xattrs of GFID file in the brick backend.
<br>
($BRICK_ROOT/.glusterfs/78/e8/78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038)
<br>
2. If Xattr Key starts with “meta”, Split to get parent GFID and
collect xattr value
<br>
3. Convert Parent GFID to path using recursive readlink till path.
<br>
4. Join Converted parent dir path and xattr value(basename)
<br>
<br>
Recording
<br>
---------
<br>
MKNOD/CREATE/LINK/SYMLINK: Add new Xattr(PGFID, BN)
<br>
RENAME: Remove old xattr(PGFID1, BN1), Add new xattr(PGFID2, BN2)
<br>
UNLINK: If Link count > 1 then Remove xattr(PGFID, BN)
<br>
<br>
Heal on Lookup
<br>
--------------
<br>
Healing on lookup can be enabled if required, by default we can
<br>
disable this option since this may have performance implications
<br>
during read.
<br>
<br>
Enabling the logging
<br>
---------------------
<br>
This can be enabled using Volume set option. Option name TBD.
<br>
<br>
Rebuild Index
<br>
-------------
<br>
Offline activity, crawls the backend filesystem and builds all the
required xattrs.
<br>
<br>
Comments and Suggestions Welcome.
<br>
<br>
regards
<br>
Aravinda
<br>
<br>
On 11/25/2015 10:08 AM, Aravinda wrote:
<br>
<blockquote type="cite">
<br>
regards
<br>
Aravinda
<br>
<br>
On 11/24/2015 11:25 PM, Shyam wrote:
<br>
<blockquote type="cite">There seem to be other interested
consumers in gluster for the same information, and I guess we
need a god base design to address this on disk change, so that
it can be leveraged in the various use cases appropriately.
<br>
<br>
Request a few folks to list out how they would use this
feature and also what performance characteristics they expect
around the same.
<br>
<br>
- gluster find class of utilties
<br>
- change log processors
<br>
- swift on file
<br>
- inotify support on gluster
<br>
- Others?
<br>
</blockquote>
Debugging utilities for users/admins(Show path for GFIDs
displayed in log files)
<br>
Retrigger Sync in Geo-replication(Geo-rep reports failed GFIDs
in logs, we can retrigger sync if path is known instead of GFID)
<br>
<blockquote type="cite">
<br>
[3] is an attempt in XFS to do the same, possibly there is a
more later thread around the same that discusses later
approaches.
<br>
<br>
[4] slide 13 onwards talks about how cephfs does this. (see
cephfs inode backtraces)
<br>
<br>
Aravinda, could you put up a design for the same, and how and
where this is information is added etc. Would help review it
from other xlators perspective (like existing DHT).
<br>
<br>
Shyam
<br>
[3] <a class="moz-txt-link-freetext" href="http://oss.sgi.com/archives/xfs/2014-01/msg00224.html">http://oss.sgi.com/archives/xfs/2014-01/msg00224.html</a>
<br>
[4]
<a class="moz-txt-link-freetext" href="http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf">http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf</a><br>
<br>
On 10/27/2015 10:02 AM, Shyam wrote:
<br>
<blockquote type="cite">Aravinda, List,
<br>
<br>
The topic is interesting and also relevant in the case of
DHT2 where we
<br>
lose the hierarchy on a single brick (unlike the older DHT)
and so some
<br>
of the thoughts here are along the same lines as what we are
debating
<br>
w.r.t DHT2 as well.
<br>
<br>
Here is another option that extends the current thought,
that I would
<br>
like to put forward, that is pretty much inspired from the
Linux kernel
<br>
NFS implementation (based on my current understanding of the
same) [1] [2].
<br>
<br>
If gluster server/brick processes handed out handles, (which
are
<br>
currently just GFID (or inode #) of the file), that encode
pGFID/GFID,
<br>
then on any handle based operation, we get the pGFID/GFID
for the object
<br>
being operated on. This solves the first part of the problem
where we
<br>
are encoding the pGFID in the xattr, and here we not only do
that but
<br>
further hand out the handle with that relationship.
<br>
<br>
It also helps when an object is renamed and we still allow
the older
<br>
handle to be used for operations. Not a bad thing in some
cases, and
<br>
possibly not the best thing to do in some other cases (say
access).
<br>
<br>
To further this knowledge back to a name, what you propose
can be stored
<br>
on the object itself. Thus giving us a short dentry tree
creation
<br>
ability of pGFID->name(GFID).
<br>
<br>
This of course changes the gluster RPC wire protocol, as we
need to
<br>
encode/send pGFID as well in some cases (or could be done
adding this to
<br>
the xdata payload.
<br>
<br>
Shyam
<br>
<br>
[1] <a class="moz-txt-link-freetext" href="http://nfs.sourceforge.net/#faq_c7">http://nfs.sourceforge.net/#faq_c7</a>
<br>
[2]
<a class="moz-txt-link-freetext" href="https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting">https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting</a>
<br>
<br>
On 10/27/2015 03:07 AM, Aravinda wrote:
<br>
<blockquote type="cite">Hi,
<br>
<br>
We have a volume option called "build-pgfid:on" to enable
recording
<br>
parent gfid in file xattr. This simplifies the GFID to
Path conversion.
<br>
Is it possible to save base name also in xattr along with
PGFID? It
<br>
helps in converting GFID to Path easily without doing
crawl.
<br>
<br>
Example structure,
<br>
<br>
dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
<br>
- f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
<br>
- f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
<br>
dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
<br>
- h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
<br>
<br>
Where file f1 and h1 are hardlinks. Note the same GFID.
<br>
<br>
Backend,
<br>
<br>
.glusterfs
<br>
- 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
<br>
- 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
<br>
- f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
<br>
- 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
<br>
<br>
Since f1 and h1 are hardlinks accross directories, file
xattr will have
<br>
two parent GFIDs. Xattr dump will be,
<br>
<br>
trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
<br>
trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
<br>
<br>
Number shows number of hardlinks per parent GFID.
<br>
<br>
If we know GFID of a file, to get path,
<br>
1. Identify which brick has that file using pathinfo
xattr.
<br>
2. Get all parent GFIDs(using listxattr on backend gfid
path
<br>
.glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
<br>
3. Crawl those directories to find files with same inode
as
<br>
.glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
<br>
<br>
Updating PGFID to be done when,
<br>
1. CREATE/MKNOD - Add xattr
<br>
2. RENAME - If moved to different directory, Update PGFID
<br>
3. UNLINK - If number of links is more than 1. Reduce
number of link,
<br>
Remove respective parent PGFID
<br>
4. LINK - Add PGFID if link to different directory,
Increment count
<br>
<br>
Advantageous:
<br>
1. Crawling is limited to a few directories instead of
full file system
<br>
crawl.
<br>
2. Break early during crawl when search reaches the
hardlinks number as
<br>
of Xattr value.
<br>
<br>
Disadvantageous:
<br>
1. Crawling is expensive if a directory has lot of files.
<br>
2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
<br>
3. This method of conversion will not work if file is
deleted.
<br>
<br>
We can improve performance of GFID to Path conversion if
we record
<br>
Basename also in file xattr.
<br>
<br>
trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
<br>
trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
<br>
<br>
Note: Multiple base names delimited by zerobyte.
<br>
<br>
What additional overhead compare to storing only PGFID,
<br>
1. Space
<br>
2. Number of xattrs will grow as number of hardlinks
<br>
3. Max size issue for xattr value?
<br>
4. Even when renamed within the same directory.
<br>
5. Updating value of xattr involves parsing in case of
multiple
<br>
hardlinks.
<br>
<br>
Are there any performance issues except during initial
indexing.(Assume
<br>
pgfid and basenames are populated by a separate script)
<br>
<br>
Comments and Suggestions Welcome.
<br>
<br>
</blockquote>
_______________________________________________
<br>
Gluster-devel mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>
<br>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a>
<br>
</blockquote>
</blockquote>
<br>
_______________________________________________
<br>
Gluster-devel mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>
<br>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a>
<br>
</blockquote>
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
</blockquote>
<br>
</body>
</html>