<html><body><div style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Sahina Bose" <sabose@redhat.com><br><b>To: </b>"Krutika Dhananjay" <kdhananj@redhat.com>, "Shyam" <srangana@redhat.com><br><b>Cc: </b>"Gluster Devel" <gluster-devel@gluster.org><br><b>Sent: </b>Thursday, September 3, 2015 3:56:10 PM<br><b>Subject: </b>Re: [Gluster-devel] Gluster Sharding and Geo-replication<br><div><br></div><br><br><div class="moz-cite-prefix">On 09/03/2015 12:13 PM, Krutika Dhananjay wrote:<br></div><blockquote cite="mid:550072460.12279311.1441262638234.JavaMail.zimbra@redhat.com"><div style="font-family: garamond,new york,times,serif; font-size:
12pt; color: #000000" data-mce-style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000;"><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid
#1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid
#1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Shyam" <a class="moz-txt-link-rfc2396E" href="mailto:srangana@redhat.com" target="_blank" data-mce-href="mailto:srangana@redhat.com"><srangana@redhat.com></a><br><b>To: </b>"Krutika Dhananjay" <a class="moz-txt-link-rfc2396E" href="mailto:kdhananj@redhat.com" target="_blank" data-mce-href="mailto:kdhananj@redhat.com"><kdhananj@redhat.com></a><br><b>Cc: </b>"Aravinda" <a class="moz-txt-link-rfc2396E" href="mailto:avishwan@redhat.com" target="_blank" data-mce-href="mailto:avishwan@redhat.com"><avishwan@redhat.com></a>, "Gluster Devel" <a class="moz-txt-link-rfc2396E" href="mailto:gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:gluster-devel@gluster.org"><gluster-devel@gluster.org></a><br><b>Sent: </b>Wednesday, September 2, 2015 11:13:55 PM<br><b>Subject: </b>Re: [Gluster-devel] Gluster Sharding and Geo-replication<br><div><br></div>On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:<br> ><br> ><br> > ------------------------------------------------------------------------<br> ><br> > *From: *"Shyam" <a class="moz-txt-link-rfc2396E" href="mailto:srangana@redhat.com" target="_blank" data-mce-href="mailto:srangana@redhat.com"><srangana@redhat.com></a><br> > *To: *"Aravinda" <a class="moz-txt-link-rfc2396E" href="mailto:avishwan@redhat.com" target="_blank" data-mce-href="mailto:avishwan@redhat.com"><avishwan@redhat.com></a>, "Gluster Devel"<br> > <a class="moz-txt-link-rfc2396E" href="mailto:gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:gluster-devel@gluster.org"><gluster-devel@gluster.org></a><br> > *Sent: *Wednesday, September 2, 2015 8:09:55 PM<br> > *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication<br> ><br> > On 09/02/2015 03:12 AM, Aravinda wrote:<br> > > Geo-replication and Sharding Team today discussed about the approach<br> > > to make Sharding aware Geo-replication. Details are as below<br> > ><br> > > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay Bellur<br> > ><br> > > - Both Master and Slave Volumes should be Sharded Volumes with same<br> > > configurations.<br> ><br> > If I am not mistaken, geo-rep supports replicating to a non-gluster<br> > local FS at the slave end. Is this correct? If so, would this<br> > limitation<br> > not make that problematic?<br> ><br> > When you state *same configuration*, I assume you mean the sharding<br> > configuration, not the volume graph, right?<br> ><br> > That is correct. The only requirement is for the slave to have shard<br> > translator (for, someone needs to present aggregated view of the file to<br> > the READers on the slave).<br> > Also the shard-block-size needs to be kept same between master and<br> > slave. Rest of the configuration (like the number of subvols of DHT/AFR)<br> > can vary across master and slave.<br><div><br></div>Do we need to have the sharded block size the same? As I assume the file <br> carries an xattr that contains the size it is sharded with <br> (trusted.glusterfs.shard.block-size), so if this is synced across, it <br> would do. If this is true, what it would mean is that "a sharded volume <br> needs a shard supported slave to ge-rep to".<br></blockquote><div>Yep. Even I feel it should probably not be necessary to enforce same-shard-size-everywhere as long as shard translator on the slave takes care not to further "shard" the individual shards gsyncD would write to, on the slave volume.<br></div><div>This is especially true if different files/images/vdisks on the master volume are associated with different block sizes.<br></div><div>This logic has to be built into the shard translator based on parameters (client-pid, parent directory of the file being written to).<br></div><div>What this means is that shard-block-size attribute on the slave would essentially be a don't-care parameter. I need to give all this some more thought though.<br></div></div></blockquote><br><br> I think this may help with coping with changes in shard block size configuration in master. Otherwise, once user changes the shard block size in master, the slave will be affected.<br> Are there any other shard volume options that if changed on master, would affect slave? How do we ensure master and slave are in sync w.r.t the shard configuration?</blockquote><div>The shard-block-size itself will be tied to a file since its creation in the form of an xattr. Subsequent changes to the block size of a volume should have no impact on already existing sharded files.<br></div><div>So it is probably not necessary for the shard size option to even be synced to the slave volume, as long as the block-size xattr is correctly synced to the slave's copy of the file.</div><div><br></div><div>Also, shard-block-size is the only tunable option as of today in shard translator. In future, there could be few more options introduced for<br></div><div>performance optimizations but that should probably not mean anything to the slave volume.<br></div><div><br></div><div>-Krutika<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><br><br><blockquote cite="mid:550072460.12279311.1441262638234.JavaMail.zimbra@redhat.com"><div style="font-family: garamond,new york,times,serif; font-size:
12pt; color: #000000" data-mce-style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000;"><div><br></div><div>-Krutika<br></div><div><br></div><blockquote style="border-left:2px solid
#1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid
#1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;">><br> > -Krutika<br> ><br> ><br> ><br> > > - In Changelog record changes related to Sharded files also. Just<br> > like<br> > > any regular files.<br> > > - Sharding should allow Geo-rep to list/read/write Sharding internal<br> > > Xattrs if Client PID is gsyncd(-1)<br> > > - Sharding should allow read/write of Sharded files(that is in<br> > .shards<br> > > directory) if Client PID is GSYNCD<br> > > - Sharding should return actual file instead of returning the<br> > > aggregated content when the Main file is requested(Client PID<br> > > GSYNCD)<br> > ><br> > > For example, a file f1 is created with GFID G1.<br> > ><br> > > When the file grows it gets sharded into chunks(say 5 chunks).<br> > ><br> > > f1 G1<br> > > .shards/G1.1 G2<br> > > .shards/G1.2 G3<br> > > .shards/G1.3 G4<br> > > .shards/G1.4 G5<br> > ><br> > > In Changelog, this is recorded as 5 different files as below<br> > ><br> > > CREATE G1 f1<br> > > DATA G1<br> > > META G1<br> > > CREATE G2 PGS/G1.1<br> > > DATA G2<br> > > META G1<br> > > CREATE G3 PGS/G1.2<br> > > DATA G3<br> > > META G1<br> > > CREATE G4 PGS/G1.3<br> > > DATA G4<br> > > META G1<br> > > CREATE G5 PGS/G1.4<br> > > DATA G5<br> > > META G1<br> > ><br> > > Where PGS is GFID of .shards directory.<br> > ><br> > > Geo-rep will create these files independently in Slave Volume and<br> > > syncs Xattrs of G1. Data can be read only when all the chunks are<br> > > synced to Slave Volume. Data can be read partially if main/first file<br> > > and some of the chunks synced to Slave.<br> > ><br> > > Please add if I missed anything. C & S Welcome.<br> > ><br> > > regards<br> > > Aravinda<br> > ><br> > > On 08/11/2015 04:36 PM, Aravinda wrote:<br> > >> Hi,<br> > >><br> > >> We are thinking different approaches to add support in<br> > Geo-replication<br> > >> for Sharded Gluster Volumes[1]<br> > >><br> > >> *Approach 1: Geo-rep: Sync Full file*<br> > >> - In Changelog only record main file details in the same brick<br> > >> where it is created<br> > >> - Record as DATA in Changelog whenever any addition/changes<br> > to the<br> > >> sharded file<br> > >> - Geo-rep rsync will do checksum as a full file from mount and<br> > >> syncs as new file<br> > >> - Slave side sharding is managed by Slave Volume<br> > >> *Approach 2: Geo-rep: Sync sharded file separately*<br> > >> - Geo-rep rsync will do checksum for sharded files only<br> > >> - Geo-rep syncs each sharded files independently as new files<br> > >> - [UNKNOWN] Sync internal xattrs(file size and block count)<br> > in the<br> > >> main sharded file to Slave Volume to maintain the same state as<br> > in Master.<br> > >> - Sharding translator to allow file creation under .shards<br> > dir for<br> > >> gsyncd. that is Parent GFID is .shards directory<br> > >> - If sharded files are modified during Geo-rep run may end up<br> > stale<br> > >> data in Slave.<br> > >> - Files on Slave Volume may not be readable unless all sharded<br> > >> files sync to Slave(Each bricks in Master independently sync<br> > files to<br> > >> slave)<br> > >><br> > >> First approach looks more clean, but we have to analize the Rsync<br> > >> checksum performance on big files(Sharded in backend, accessed<br> > as one<br> > >> big file from rsync)<br> > >><br> > >> Let us know your thoughts. Thanks<br> > >><br> > >> Ref:<br> > >> [1]<br> > >><br> > <a class="moz-txt-link-freetext" href="http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator" target="_blank" data-mce-href="http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator">http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator</a><br> > >> --<br> > >> regards<br> > >> Aravinda<br> > >><br> > >><br> > >> _______________________________________________<br> > >> Gluster-devel mailing list<br> > >> <a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br> > >> <a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank" data-mce-href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br> > ><br> > ><br> > ><br> > > _______________________________________________<br> > > Gluster-devel mailing list<br> > > <a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br> > > <a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank" data-mce-href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br> > ><br> > _______________________________________________<br> > Gluster-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br> > <a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank" data-mce-href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br> ><br> ><br></blockquote><div><br></div></div><br><br><pre>_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank" data-mce-href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre></blockquote><br></blockquote><div><br></div></div></body></html>