<html><body><div style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Venky Shankar" <vshankar@redhat.com><br><b>To: </b>"Aravinda" <avishwan@redhat.com><br><b>Cc: </b>"Shyam" <srangana@redhat.com>, "Krutika Dhananjay" <kdhananj@redhat.com>, "Gluster Devel" <gluster-devel@gluster.org><br><b>Sent: </b>Thursday, September 3, 2015 8:29:37 AM<br><b>Subject: </b>Re: [Gluster-devel] Gluster Sharding and Geo-replication<br><div><br></div>On Wed, Sep 2, 2015 at 11:39 PM, Aravinda <avishwan@redhat.com> wrote:<br>><br>> On 09/02/2015 11:13 PM, Shyam wrote:<br>>><br>>> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:<br>>>><br>>>><br>>>><br>>>> ------------------------------------------------------------------------<br>>>><br>>>> *From: *"Shyam" <srangana@redhat.com><br>>>> *To: *"Aravinda" <avishwan@redhat.com>, "Gluster Devel"<br>>>> <gluster-devel@gluster.org><br>>>> *Sent: *Wednesday, September 2, 2015 8:09:55 PM<br>>>> *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication<br>>>><br>>>> On 09/02/2015 03:12 AM, Aravinda wrote:<br>>>> > Geo-replication and Sharding Team today discussed about the<br>>>> approach<br>>>> > to make Sharding aware Geo-replication. Details are as below<br>>>> ><br>>>> > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay<br>>>> Bellur<br>>>> ><br>>>> > - Both Master and Slave Volumes should be Sharded Volumes with<br>>>> same<br>>>> > configurations.<br>>>><br>>>> If I am not mistaken, geo-rep supports replicating to a non-gluster<br>>>> local FS at the slave end. Is this correct? If so, would this<br>>>> limitation<br>>>> not make that problematic?<br>>>><br>>>> When you state *same configuration*, I assume you mean the sharding<br>>>> configuration, not the volume graph, right?<br>>>><br>>>> That is correct. The only requirement is for the slave to have shard<br>>>> translator (for, someone needs to present aggregated view of the file to<br>>>> the READers on the slave).<br>>>> Also the shard-block-size needs to be kept same between master and<br>>>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)<br>>>> can vary across master and slave.<br>>><br>>><br>>> Do we need to have the sharded block size the same? As I assume the file<br>>> carries an xattr that contains the size it is sharded with<br>>> (trusted.glusterfs.shard.block-size), so if this is synced across, it would<br>>> do. If this is true, what it would mean is that "a sharded volume needs a<br>>> shard supported slave to ge-rep to".<br>><br>> Yes. Number of bricks and replica count can be different. But sharded block<br>> size should be same. Only the first file will have<br>> xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this xattr<br>> also to Slave. Only Gsyncd can read/write the sharded chunks. Sharded Slave<br>> Volume is required to understand these chunks when read(non Gsyncd clients)<br><div><br></div>Even if this works I am very much is disagreement with this mechanism<br>of synchronization (not that I have a working solution in my head as<br>of now).<br></blockquote><div><br></div><div>Hi Venky,<br></div><div><br></div><div>It is not apparent to me what issues you see with approach 2. If you could lay them out here, it would be helpful in taking the discussions further.<br></div><div><br></div><div>-Krutika<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;">><br>>><br>>>><br>>>> -Krutika<br>>>><br>>>><br>>>><br>>>> > - In Changelog record changes related to Sharded files also. Just<br>>>> like<br>>>> > any regular files.<br>>>> > - Sharding should allow Geo-rep to list/read/write Sharding<br>>>> internal<br>>>> > Xattrs if Client PID is gsyncd(-1)<br>>>> > - Sharding should allow read/write of Sharded files(that is in<br>>>> .shards<br>>>> > directory) if Client PID is GSYNCD<br>>>> > - Sharding should return actual file instead of returning the<br>>>> > aggregated content when the Main file is requested(Client PID<br>>>> > GSYNCD)<br>>>> ><br>>>> > For example, a file f1 is created with GFID G1.<br>>>> ><br>>>> > When the file grows it gets sharded into chunks(say 5 chunks).<br>>>> ><br>>>> > f1 G1<br>>>> > .shards/G1.1 G2<br>>>> > .shards/G1.2 G3<br>>>> > .shards/G1.3 G4<br>>>> > .shards/G1.4 G5<br>>>> ><br>>>> > In Changelog, this is recorded as 5 different files as below<br>>>> ><br>>>> > CREATE G1 f1<br>>>> > DATA G1<br>>>> > META G1<br>>>> > CREATE G2 PGS/G1.1<br>>>> > DATA G2<br>>>> > META G1<br>>>> > CREATE G3 PGS/G1.2<br>>>> > DATA G3<br>>>> > META G1<br>>>> > CREATE G4 PGS/G1.3<br>>>> > DATA G4<br>>>> > META G1<br>>>> > CREATE G5 PGS/G1.4<br>>>> > DATA G5<br>>>> > META G1<br>>>> ><br>>>> > Where PGS is GFID of .shards directory.<br>>>> ><br>>>> > Geo-rep will create these files independently in Slave Volume and<br>>>> > syncs Xattrs of G1. Data can be read only when all the chunks are<br>>>> > synced to Slave Volume. Data can be read partially if main/first<br>>>> file<br>>>> > and some of the chunks synced to Slave.<br>>>> ><br>>>> > Please add if I missed anything. C & S Welcome.<br>>>> ><br>>>> > regards<br>>>> > Aravinda<br>>>> ><br>>>> > On 08/11/2015 04:36 PM, Aravinda wrote:<br>>>> >> Hi,<br>>>> >><br>>>> >> We are thinking different approaches to add support in<br>>>> Geo-replication<br>>>> >> for Sharded Gluster Volumes[1]<br>>>> >><br>>>> >> *Approach 1: Geo-rep: Sync Full file*<br>>>> >> - In Changelog only record main file details in the same brick<br>>>> >> where it is created<br>>>> >> - Record as DATA in Changelog whenever any addition/changes<br>>>> to the<br>>>> >> sharded file<br>>>> >> - Geo-rep rsync will do checksum as a full file from mount and<br>>>> >> syncs as new file<br>>>> >> - Slave side sharding is managed by Slave Volume<br>>>> >> *Approach 2: Geo-rep: Sync sharded file separately*<br>>>> >> - Geo-rep rsync will do checksum for sharded files only<br>>>> >> - Geo-rep syncs each sharded files independently as new files<br>>>> >> - [UNKNOWN] Sync internal xattrs(file size and block count)<br>>>> in the<br>>>> >> main sharded file to Slave Volume to maintain the same state as<br>>>> in Master.<br>>>> >> - Sharding translator to allow file creation under .shards<br>>>> dir for<br>>>> >> gsyncd. that is Parent GFID is .shards directory<br>>>> >> - If sharded files are modified during Geo-rep run may end up<br>>>> stale<br>>>> >> data in Slave.<br>>>> >> - Files on Slave Volume may not be readable unless all sharded<br>>>> >> files sync to Slave(Each bricks in Master independently sync<br>>>> files to<br>>>> >> slave)<br>>>> >><br>>>> >> First approach looks more clean, but we have to analize the Rsync<br>>>> >> checksum performance on big files(Sharded in backend, accessed<br>>>> as one<br>>>> >> big file from rsync)<br>>>> >><br>>>> >> Let us know your thoughts. Thanks<br>>>> >><br>>>> >> Ref:<br>>>> >> [1]<br>>>> >><br>>>><br>>>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator<br>>>> >> --<br>>>> >> regards<br>>>> >> Aravinda<br>>>> >><br>>>> >><br>>>> >> _______________________________________________<br>>>> >> Gluster-devel mailing list<br>>>> >> Gluster-devel@gluster.org<br>>>> >> http://www.gluster.org/mailman/listinfo/gluster-devel<br>>>> ><br>>>> ><br>>>> ><br>>>> > _______________________________________________<br>>>> > Gluster-devel mailing list<br>>>> > Gluster-devel@gluster.org<br>>>> > http://www.gluster.org/mailman/listinfo/gluster-devel<br>>>> ><br>>>> _______________________________________________<br>>>> Gluster-devel mailing list<br>>>> Gluster-devel@gluster.org<br>>>> http://www.gluster.org/mailman/listinfo/gluster-devel<br>>>><br>>>><br>><br>> regards<br>> Aravinda<br>><br>> _______________________________________________<br>> Gluster-devel mailing list<br>> Gluster-devel@gluster.org<br>> http://www.gluster.org/mailman/listinfo/gluster-devel<br></blockquote><div><br></div></div></body></html>