<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Le 03/03/2016 15:17, Saravanakumar
Arumugam a écrit :<br>
</div>
<blockquote cite="mid:56D8476E.4010900@redhat.com" type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<br>
<br>
<div class="moz-cite-prefix">On 03/03/2016 05:38 PM, Yannick
Perret wrote:<br>
</div>
<blockquote cite="mid:56D82941.2050707@liris.cnrs.fr" type="cite">Hello,
<br>
<br>
I can't find if it is possible to set a prefered server on a
per-client basis for replica volumes, so I ask the question
here. <br>
<br>
The context: we have 2 storage servers, each in one building. We
also have several virtual machines on each building, and they
can migrate from one building to an other (depending on load,
maintenance…). <br>
<br>
So (for testing at this time) I setup a x2 replica volume, one
replica on each storage server of course. As most of our volumes
are "many reads - few writes" it would be better for bandwidth
that each client uses the "nearest" storage server (local
building switch) - for reading, of course. The 2 buildings have
a good netlink but we prefer to minimize - when not needed -
data transferts beetween them (this link is shared). <br>
<br>
Can you see a solution for this kind of tuning? As far as I
understand geo-replica is not really what I need, no? <br>
</blockquote>
<br>
Yes, geo-replication "cannot" be used as you wish to carry out
"write" operation on Slave side.<br>
<br>
</blockquote>
Ok, thanks. I was pretty sure it was the case but I prefer to ask.<br>
<blockquote cite="mid:56D8476E.4010900@redhat.com" type="cite">
<blockquote cite="mid:56D82941.2050707@liris.cnrs.fr" type="cite">
<br>
It exists "cluster.read-subvolume" option of course but we can
have clients on both building so a per-volume option is not what
we need. An per-client equivalent of this option should be nice.
<br>
<br>
I tested by myself a small patch to perform this - and it seems
to work fine as far as I can see - but 1. before continuing in
this way I would first check if it exists an other way and 2.
I'm not familiar with the whole code so I'm not sure that my
tests are in the "state-of-the-art" for glusterfs. <br>
<br>
</blockquote>
maybe you should share that interesting patch :) and get better
feedback about your test case.<br>
</blockquote>
<br>
My "patch" is quite simple: I added in
afr_read_subvol_select_by_policy()
(xlators/cluster/afr/src/afr-common.c) a target selection similar to
the one managed by "read_child" configuration (see patches at the
end).<br>
<br>
Of course I also added the definition of this "forced_child" in
afr.h, in the same way favorite_child or read_child is defined.<br>
<br>
<br>
My real problem here is how to tell to client to change its
"forced-child" value.<br>
<br>
I did this by reading it from a local file
(/var/lib/glusterd/forced-child) in init() and reconfigure() (from
xlators/cluster/afr/src/afr.c). This is fine at startup, and when
volume configuration changed, but I find the sending a SIGUP is not
enough because client detects that no change occurs and do not call
"reconfigure()". So for my tests I modified
glusterfsd/src/glusterfsd-mgmt.c so that if
/var/lib/glusterf/forced-child exists it behave as if a
configuration change occured (and so calls reconfigure() which
reload the forced-child value).<br>
<br>
<br>
At this point it works as I expected but I think it is possible to
handle a new forced-child value without calling all reconfigure().<br>
Moreover this file contains a raw number, it would be better to use
a server name, which would be converted into index after that.<br>
<br>
If possible it may be better to send the new value using 'gluster'
command on client? I.e. something like 'gluster volume set
client.prefered-server SERVER-NAME'?<br>
<br>
<br>
Any advices welcome.<br>
<br>
<br>
Regards,<br>
--<br>
Y.<br>
<br>
<br>
<<<<br>
--- glusterfs-3.6.7/xlators/cluster/afr/src/afr-common.c
2015-11-25 12:55:58.000000000 +0100<br>
+++ glusterfs-3.6.7b/xlators/cluster/afr/src/afr-common.c
2015-12-10 14:59:18.898580772 +0100<br>
@@ -764,10 +764,18 @@<br>
int i = 0;<br>
int read_subvol = -1;<br>
afr_private_t *priv = NULL;<br>
- afr_read_subvol_args_t local_args = {0,};<br>
+ afr_read_subvol_args_t local_args = {0,};<br>
<br>
priv = this->private;<br>
<br>
+<br>
+ /* if forced-child use it */<br>
+ if ((priv->forced_child >= 0)<br>
+ && (priv->forced_child <
priv->child_count)<br>
+ && (readable[priv->forced_child])) {<br>
+ return priv->forced_child;<br>
+ }<br>
+<br>
/* first preference - explicitly specified or local subvolume
*/<br>
if (priv->read_child >= 0 &&
readable[priv->read_child])<br>
return priv->read_child;<br>
>>><br>
<br>
<<<<br>
@@ -83,6 +83,7 @@<br>
unsigned int hash_mode; /* for when read_child is not
set */<br>
int favorite_child; /* subvolume to be preferred in
resolving<br>
split-brain cases */<br>
+ int forced_child; /* child to use (if possible) */<br>
<br>
gf_boolean_t inodelk_trace;<br>
gf_boolean_t entrylk_trace;<br>
>>><br>
<br>
<<<<br>
--- glusterfs-3.6.7/xlators/cluster/afr/src/afr.c 2015-11-25
12:55:58.000000000 +0100<br>
+++ glusterfs-3.6.7b/xlators/cluster/afr/src/afr.c 2015-12-10
16:34:55.530790442 +0100<br>
@@ -23,6 +23,7 @@<br>
<br>
struct volume_options options[];<br>
<br>
+<br>
int32_t<br>
notify (xlator_t *this, int32_t event,<br>
void *data, ...)<br>
@@ -106,9 +107,26 @@<br>
int ret = -1;<br>
int index = -1;<br>
char *qtype = NULL;<br>
+ FILE *prefer = NULL;<br>
+ int i = -1;<br>
<br>
priv = this->private;<br>
<br>
+ /* if /var/lib/glusterd/forced-child exists read the
content<br>
+ and use it as prefered target for read */<br>
+ priv->forced_child = -1;<br>
+ prefer = fopen("/var/lib/glusterd/forced-child", "r");<br>
+ if (prefer) {<br>
+ if (fscanf(prefer, "%d", &i) == 1) {<br>
+ if ((i >= 0) && (i <
priv->child_count)) {<br>
+ priv->forced_child = i;<br>
+ gf_log (this->name, GF_LOG_INFO,<br>
+ "using %d as forced-child", i);<br>
+ }<br>
+ }<br>
+ fclose(prefer);<br>
+ }<br>
+<br>
GF_OPTION_RECONF ("afr-dirty-xattr",<br>
priv->afr_dirty, options, str,<br>
out);<br>
@@ -234,6 +252,7 @@<br>
int read_subvol_index = -1;<br>
xlator_t *fav_child = NULL;<br>
char *qtype = NULL;<br>
+ FILE *prefer = NULL;<br>
<br>
if (!this->children) {<br>
gf_log (this->name, GF_LOG_ERROR,<br>
@@ -261,6 +280,21 @@<br>
<br>
priv->read_child = -1;<br>
<br>
+ /* if /var/lib/glusterd/forced-child exists read the content<br>
+ and use it as prefered target for read */<br>
+ priv->forced_child = -1;<br>
+ prefer = fopen("/var/lib/glusterd/forced-child", "r");<br>
+ if (prefer) {<br>
+ if (fscanf(prefer, "%d", &i) == 1) {<br>
+ if ((i >= 0) && (i <
priv->child_count)) {<br>
+ priv->forced_child = i;<br>
+ gf_log (this->name, GF_LOG_INFO,<br>
+ "using %d as forced-child",
i);<br>
+ }<br>
+ }<br>
+ fclose(prefer);<br>
+ }<br>
+<br>
GF_OPTION_INIT ("afr-dirty-xattr", priv->afr_dirty, str,
out);<br>
<br>
GF_OPTION_INIT ("metadata-splitbrain-forced-heal",<br>
>>><br>
<br>
<<<<br>
--- glusterfs-3.6.7/glusterfsd/src/glusterfsd-mgmt.c 2015-11-25
12:55:58.000000000 +0100<br>
+++ glusterfs-3.6.7b/glusterfsd/src/glusterfsd-mgmt.c 2015-12-10
16:34:20.530789162 +0100<br>
@@ -1502,7 +1502,9 @@<br>
if (size == oldvollen && (memcmp (oldvolfile,
rsp.spec, size) == 0)) {<br>
gf_log (frame->this->name, GF_LOG_INFO,<br>
"No change in volfile, continuing");<br>
- goto out;<br>
+ if (access("/var/lib/glusterd/forced-child", R_OK) != 0) {<br>
+ goto out; /* don't skip if exists to re-read
forced-child */<br>
+ }<br>
}<br>
<br>
tmpfp = tmpfile ();<br>
>>><br>
<br>
--<br>
Y.<br>
</body>
</html>