[Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

Wed Jun 8 10:06:01 UTC 2016

OK, here the results go.

I've taken 5 statedumps with 30 mins between each statedump. Also, 
before taking the statedump, I've recorded memory usage.

Memory consumption:

1. root      1010  0.0  9.6 7538188 374864 ?      Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
2. root      1010  0.0  9.6 7825048 375312 ?      Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
3. root      1010  0.0  9.6 7825048 375312 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
4. root      1010  0.0  9.6 8202064 375892 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
5. root      1010  0.0  9.6 8316808 376084 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7

As you may see VIRT constantly grows (except for one measurements), and 
RSS grows as well, although its increase is considerably smaller.

Now lets take a look at statedumps:

1. https://gist.github.com/3fa121c7531d05b210b84d9db763f359
2. https://gist.github.com/87f48b8ac8378262b84d448765730fd9
3. https://gist.github.com/f8780014d8430d67687c70cfd1df9c5c
4. https://gist.github.com/916ac788f806328bad9de5311ce319d7
5. https://gist.github.com/8ba5dbf27d2cc61c04ca954d7fb0a7fd

I'd go with comparing first statedump with last one, and here is diff 
output: https://gist.github.com/e94e7f17fe8b3688c6a92f49cbc15193

I see numbers changing, but now cannot conclude what is meaningful and 
what is meaningless.

Pranith?

08.06.2016 10:06, Pranith Kumar Karampuri написав:
> On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko
> <oleksandr at natalenko.name> wrote:
> 
>> Yup, I can do that, but please note that RSS does not change. Will
>> statedump show VIRT values?
>> 
>> Also, I'm looking at the numbers now, and see that on each reconnect
>> VIRT grows by ~24M (once per ~10–15 mins). Probably, that could
>> give you some idea what is going wrong.
> 
> That's interesting. Never saw something like this happen. I would
> still like to see if there are any clues in statedump when all this
> happens. May be what you said will be confirmed that nothing new is
> allocated but I would just like to confirm.
> 
>> 08.06.2016 09:50, Pranith Kumar Karampuri написав:
>> 
>> Oleksandr,
>> Could you take statedump of the shd process once in 5-10 minutes and
>> send may be 5 samples of them when it starts to increase? This will
>> help us find what datatypes are being allocated a lot and can lead
>> to
>> coming up with possible theories for the increase.
>> 
>> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko
>> <oleksandr at natalenko.name> wrote:
>> 
>> Also, I've checked shd log files, and found out that for some
>> reason
>> shd constantly reconnects to bricks: [1]
>> 
>> Please note that suggested fix [2] by Pranith does not help, VIRT
>> value still grows:
>> 
>> ===
>> root      1010  0.0  9.6 7415248 374688 ?      Ssl  чер07   0:14
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>> --xlator-option
>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>> ===
>> 
>> I do not know the reason why it is reconnecting, but I suspect leak
>> to happen on that reconnect.
>> 
>> CCing Pranith.
>> 
>> [1] http://termbin.com/brob
>> [2] http://review.gluster.org/#/c/14053/
>> 
>> 06.06.2016 12:21, Kaushal M написав:
>> Has multi-threaded SHD been merged into 3.7.* by any chance? If
>> not,
>> 
>> what I'm saying below doesn't apply.
>> 
>> We saw problems when encrypted transports were used, because the RPC
>> layer was not reaping threads (doing pthread_join) when a connection
>> ended. This lead to similar observations of huge VIRT and relatively
>> small RSS.
>> 
>> I'm not sure how multi-threaded shd works, but it could be leaking
>> threads in a similar way.
>> 
>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
>> <oleksandr at natalenko.name> wrote:
>> Hello.
>> 
>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
>> keeping
>> volumes metadata.
>> 
>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:
>> 
>> ===
>> root     15109  0.0 13.7 76552820 535272 ?     Ssl  тра26   2:11
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>> --xlator-option
>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>> ===
>> 
>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
>> glustershd process: [1]
>> 
>> Also, here is sum of sizes, presented in statedump:
>> 
>> ===
>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F
>> '=' 'BEGIN
>> {sum=0} /^size=/ {sum+=$2} END {print sum}'
>> 353276406
>> ===
>> 
>> That is ~337 MiB.
>> 
>> Also, here are VIRT values from 2 replica nodes:
>> 
>> ===
>> root     24659  0.0  0.3 5645836 451796 ?      Ssl  тра24   3:28
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
>> --xlator-option
>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
>> root     18312  0.0  0.3 6137500 477472 ?      Ssl  тра19   6:37
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
>> --xlator-option
>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
>> ===
>> 
>> Those are 5 to 6G, which is much less than dummy node has, but still
>> look
>> too big for us.
>> 
>> Should we care about huge VIRT value on dummy node? Also, how one
>> would
>> debug that?
>> 
>> Regards,
>> Oleksandr.
>> 
>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> 
>> --
>> 
>> Pranith
> 
> --
> 
> Pranith