<div dir="ltr">I don't have volfiles, they are not on our machines as I said previously we don't have impact on gluster servers.<div><br></div><div>I saw some graph that looks similiar to volume file on logs. I will paste it here but we don't really have any impact on that. We are just using client to connect to gluster servers, we are not in control of.</div><div><br></div><div><div><b>1: volume drslk-prod-client-0</b></div><div><b> 2: type protocol/client</b></div><div><b> 3: option ping-timeout 20</b></div><div><b> 4: option remote-host brick13.gluster.iadm</b></div><div><b> 5: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 6: option transport-type socket</b></div><div><b> 7: option frame-timeout 60</b></div><div><b> 8: option send-gids true</b></div><div><b> 9: end-volume</b></div><div><b> 10: </b></div><div><b> 11: volume drslk-prod-client-1</b></div><div><b> 12: type protocol/client</b></div><div><b> 13: option ping-timeout 20</b></div><div><b> 14: option remote-host brick14.gluster.iadm</b></div><div><b> 15: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 16: option transport-type socket</b></div><div><b> 17: option frame-timeout 60</b></div><div><b> 18: option send-gids true</b></div><div><b> 19: end-volume</b></div><div><b> 20: </b></div><div><b> 21: volume drslk-prod-client-2</b></div><div><b> 22: type protocol/client</b></div><div><b> 23: option ping-timeout 20</b></div><div><b> 24: option remote-host brick15.gluster.iadm</b></div><div><b> 25: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 26: option transport-type socket</b></div><div><b> 27: option frame-timeout 60</b></div><div><b> 28: option send-gids true</b></div><div><b> 29: end-volume</b></div><div><b> 30: </b></div><div><b> 31: volume drslk-prod-replicate-0</b></div><div><b> 32: type cluster/replicate</b></div><div><b> 33: option read-hash-mode 2</b></div><div><b> 34: option data-self-heal-window-size 128</b></div><div><b> 35: option quorum-type auto</b></div><div><b> 36: subvolumes drslk-prod-client-0 drslk-prod-client-1 drslk-prod-client-2</b></div><div><b> 37: end-volume</b></div><div><b> 38: </b></div><div><b> 39: volume drslk-prod-client-3</b></div><div><b> 40: type protocol/client</b></div><div><b> 41: option ping-timeout 20</b></div><div><b> 42: option remote-host brick16.gluster.iadm</b></div><div><b> 43: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 44: option transport-type socket</b></div><div><b> 45: option frame-timeout 60</b></div><div><b> 46: option send-gids true</b></div><div><b> 47: end-volume</b></div><div><b> 48: </b></div><div><b> 49: volume drslk-prod-client-4</b></div><div><b> 50: type protocol/client</b></div><div><b> 51: option ping-timeout 20</b></div><div><b> 52: option remote-host brick17.gluster.iadm</b></div><div><b> 53: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 54: option transport-type socket</b></div><div><b> 55: option frame-timeout 60</b></div><div><b> 56: option send-gids true</b></div><div><b> 57: end-volume</b></div><div><b> 58: </b></div><div><b> 59: volume drslk-prod-client-5</b></div><div><b> 60: type protocol/client</b></div><div><b> 61: option ping-timeout 20</b></div><div><b> 62: option remote-host brick18.gluster.iadm</b></div><div><b> 63: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 64: option transport-type socket</b></div><div><b> 65: option frame-timeout 60</b></div><div><b> 66: option send-gids true</b></div><div><b> 67: end-volume</b></div><div><b> 68: </b></div><div><b> 69: volume drslk-prod-replicate-1</b></div><div><b> 70: type cluster/replicate</b></div><div><b> 71: option read-hash-mode 2</b></div><div><b> 72: option data-self-heal-window-size 128</b></div><div><b> 73: option quorum-type auto</b></div><div><b> 74: subvolumes drslk-prod-client-3 drslk-prod-client-4 drslk-prod-client-5</b></div><div><b> 75: end-volume</b></div><div><b> 76: </b></div><div><b> 77: volume drslk-prod-client-6</b></div><div><b> 78: type protocol/client</b></div><div><b> 79: option ping-timeout 20</b></div><div><b> 80: option remote-host brick19.gluster.iadm</b></div><div><b> 81: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 82: option transport-type socket</b></div><div><b> 83: option frame-timeout 60</b></div><div><b> 84: option send-gids true</b></div><div><b> 85: end-volume</b></div><div><b> 86: </b></div><div><b> 87: volume drslk-prod-client-7</b></div><div><b> 88: type protocol/client</b></div><div><b> 89: option ping-timeout 20</b></div><div><b> 90: option remote-host brick20.gluster.iadm</b></div><div><b> 91: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b> 92: option transport-type socket</b></div><div><b> 93: option frame-timeout 60</b></div><div><b> 94: option send-gids true</b></div><div><b> 95: end-volume</b></div><div><b> 96: </b></div><div><b> 97: volume drslk-prod-client-8</b></div><div><b> 98: type protocol/client</b></div><div><b> 99: option ping-timeout 20</b></div><div><b>100: option remote-host brick21.gluster.iadm</b></div><div><b>101: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b>102: option transport-type socket</b></div><div><b>103: option frame-timeout 60</b></div><div><b>104: option send-gids true</b></div><div><b>105: end-volume</b></div><div><b>106: </b></div><div><b>107: volume drslk-prod-replicate-2</b></div><div><b>108: type cluster/replicate</b></div><div><b>109: option read-hash-mode 2</b></div><div><b>110: option data-self-heal-window-size 128</b></div><div><b>111: option quorum-type auto</b></div><div><b>112: subvolumes drslk-prod-client-6 drslk-prod-client-7 drslk-prod-client-8</b></div><div><b>113: end-volume</b></div><div><b>114: </b></div><div><b>115: volume drslk-prod-client-9</b></div><div><b>116: type protocol/client</b></div><div><b>117: option ping-timeout 20</b></div><div><b>118: option remote-host brick22.gluster.iadm</b></div><div><b>119: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b>120: option transport-type socket</b></div><div><b>121: option frame-timeout 60</b></div><div><b>122: option send-gids true</b></div><div><b>123: end-volume</b></div><div><b>124: </b></div><div><b>125: volume drslk-prod-client-10</b></div><div><b>126: type protocol/client</b></div><div><b>127: option ping-timeout 20</b></div><div><b>128: option remote-host brick23.gluster.iadm</b></div><div><b>129: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b>130: option transport-type socket</b></div><div><b>131: option frame-timeout 60</b></div><div><b>132: option send-gids true</b></div><div><b>133: end-volume</b></div><div><b>134: </b></div><div><b>135: volume drslk-prod-client-11</b></div><div><b>136: type protocol/client</b></div><div><b>137: option ping-timeout 20</b></div><div><b>138: option remote-host brick24.gluster.iadm</b></div><div><b>139: option remote-subvolume /GLUSTERFS/drslk-prod</b></div><div><b>140: option transport-type socket</b></div><div><b>141: option frame-timeout 60</b></div><div><b>142: option send-gids true</b></div><div><b>143: end-volume</b></div><div><b>144: </b></div><div><b>145: volume drslk-prod-replicate-3</b></div><div><b>146: type cluster/replicate</b></div><div><b>147: option read-hash-mode 2</b></div><div><b>148: option data-self-heal-window-size 128</b></div><div><b>149: option quorum-type auto</b></div><div><b>150: subvolumes drslk-prod-client-9 drslk-prod-client-10 drslk-prod-client-11</b></div><div><b>151: end-volume</b></div><div><b>152: </b></div><div><b>153: volume drslk-prod-dht</b></div><div><b>154: type cluster/distribute</b></div><div><b>155: option min-free-disk 10%</b></div><div><b>156: option readdir-optimize on</b></div><div><b>157: subvolumes drslk-prod-replicate-0 drslk-prod-replicate-1 drslk-prod-replicate-2 drslk-prod-replicate-3</b></div><div><b>158: end-volume</b></div><div><b>159: </b></div><div><b>160: volume drslk-prod-write-behind</b></div><div><b>161: type performance/write-behind</b></div><div><b>162: option cache-size 1MB</b></div><div><b>163: subvolumes drslk-prod-dht</b></div><div><b>164: end-volume</b></div><div><b>165: </b></div><div><b>166: volume drslk-prod-read-ahead</b></div><div><b>167: type performance/read-ahead</b></div><div><b>168: subvolumes drslk-prod-write-behind</b></div><div><b>169: end-volume</b></div><div><b>170: </b></div><div><b>171: volume drslk-prod-readdir-ahead</b></div><div><b>172: type performance/readdir-ahead</b></div><div><b>173: subvolumes drslk-prod-read-ahead</b></div><div><b>174: end-volume</b></div><div><b>175: </b></div><div><b>176: volume drslk-prod-io-cache</b></div><div><b>177: type performance/io-cache</b></div><div><b>178: option cache-timeout 60</b></div><div><b>179: option cache-size 512MB</b></div><div><b>180: subvolumes drslk-prod-readdir-ahead</b></div><div><b>181: end-volume</b></div><div><b>182: </b></div><div><b>183: volume drslk-prod-quick-read</b></div><div><b>184: type performance/quick-read</b></div><div><b>185: option cache-size 512MB</b></div><div><b>186: subvolumes drslk-prod-io-cache</b></div><div><b>187: end-volume</b></div><div><b>188: </b></div><div><b>189: volume drslk-prod-md-cache</b></div><div><b>190: type performance/md-cache</b></div><div><b>191: subvolumes drslk-prod-quick-read</b></div><div><b>192: end-volume</b></div><div><b>193: </b></div><div><b>194: volume drslk-prod</b></div><div><b>195: type debug/io-stats</b></div><div><b>196: option latency-measurement off</b></div><div><b>197: option count-fop-hits off</b></div><div><b>198: subvolumes drslk-prod-md-cache</b></div><div><b>199: end-volume</b></div><div><b>200: </b></div><div><b>201: volume meta-autoload</b></div><div><b>202: type meta</b></div><div><b>203: subvolumes drslk-prod</b></div><div><b>204: end-volume</b></div><div><b>205: </b></div></div><div><br></div><div>Btw, do you think that different versions of gluster client and gluster server could be an issue here?</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-08 1:29 GMT+01:00 Vijay Bellur <span dir="ltr"><<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
Hi guys,<br>
<br>
We have rails app, which is using gluster for our distributed file<br>
system. The glusters servers are hosted independently as part of deal<br>
with other, we don't have any impact on them, we are connected o them by<br>
using gluster native client.<br>
<br>
We tried to resolve this issue using help from the admins of the company<br>
that is hosting our gluster servers, but they say that's the client<br>
issue and we ran out of ideas how that's possible if we are not doing<br>
anything special here.<br>
<br>
Information about independent gluster servers:<br>
-version: 3.6.0.42.1<br>
- They are using red hat<br>
-They are enterprise so the are always using older versions<br>
<br>
Our servers:<br>
System version: Ubuntu 14.04<br>
Our gluster client version: 3.6.2<br>
<br>
The exact problem is that it often happens(couple times a week) that<br>
errors in gluster causes proceses to become zombies. It happens with our<br>
application server(unicorn), nginx and our crawling script that is run<br>
as daemon.<br>
<br>
Our fstab file:<br>
<br>
10.10.11.17:/drslk-prod /mnt/storage glusterfs<br>
defaults,_netdev,nobootwait,<u></u>fetch-attempts=10 0 0<br>
10.10.11.17:/drslk-backup /mnt/backup glusterfs<br>
defaults,_netdev,nobootwait,<u></u>fetch-attempts=10 0 0<br>
<br>
Logs from gluster:<br>
<br>
2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_<u></u>unwind] (--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libglusterfs.so.0(_gf_log_<u></u>callingfn+0x186)[<u></u>0x7fb41ddeada6]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(saved_frames_<u></u>unwind+0x1de)[0x7fb41d<br>
bc1c7e] (--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(saved_frames_<u></u>destroy+0xe)[0x7fb41dbc1d8e]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(rpc_clnt_<u></u>connection_cleanup+0x82)[<u></u>0x7fb41dbc3602]<br>
(--> /usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(rpc<br>
_clnt_notify+0x48)[<u></u>0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced<br>
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18<br>
12:36:12.361489 (xid=0x5d475da)<br>
[2015-02-18 12:36:12.375765] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
/system/posts/00/00/71/77/59.<u></u>jpg (2ad81c2b-a141-478d-9dd4-<u></u>253345edbce<br>
b)<br>
[2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_<u></u>unwind] (--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libglusterfs.so.0(_gf_log_<u></u>callingfn+0x186)[<u></u>0x7fb41ddeada6]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(saved_frames_<u></u>unwind+0x1de)[0x7fb41d<br>
bc1c7e] (--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(saved_frames_<u></u>destroy+0xe)[0x7fb41dbc1d8e]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(rpc_clnt_<u></u>connection_cleanup+0x82)[<u></u>0x7fb41dbc3602]<br>
(--> /usr/lib/x86_64-linux-gnu/<u></u>libgfrpc.so.0(rpc<br>
_clnt_notify+0x48)[<u></u>0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced<br>
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18<br>
12:36:12.361858 (xid=0x5d475db)<br>
[2015-02-18 12:36:12.376355] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-<u></u>33b893af103d)<br>
[2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_<u></u>request]<br>
0-drslk-prod-client-10: not connected (priv->connected = 0)<br>
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_<u></u>submit]<br>
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc<br>
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport<br>
(drslk-prod-client-10)<br>
[2015-02-18 12:36:12.376814] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
(null) (00000000-0000-0000-0000-<u></u>000000000000)<br>
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_<u></u>notify]<br>
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client<br>
process will keep trying to connect to glusterd until brick's port is<br>
available<br>
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_<u></u>submit]<br>
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd<br>
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport<br>
(drslk-prod-client-10)<br>
[2015-02-18 12:36:12.376906] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
(null) (00000000-0000-0000-0000-<u></u>000000000000)<br>
[2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_<u></u>finish]<br>
0-drslk-prod-client-10: connection to <a href="http://10.10.11.23:24007" target="_blank">10.10.11.23:24007</a><br></div></div>
<<a href="http://10.10.11.23:24007/" target="_blank">http://10.10.11.23:24007/</a>> failed (Connection refused)<div><div class="h5"><br>
[2015-02-18 12:36:12.379296] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
(null) (00000000-0000-0000-0000-<u></u>000000000000)<br>
[2015-02-18 12:36:12.379700] W<br>
[client-rpc-fops.c:2766:<u></u>client3_3_lookup_cbk] 0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br>
(null) (00000000-0000-0000-0000-<u></u>000000000000)<br>
[2015-02-18 13:10:52.759736] E<br>
[client-handshake.c:1496:<u></u>client_query_portmap_cbk]<br>
0-drslk-prod-client-10: failed to get the port number for remote<br>
subvolume. Please run 'gluster volume status' on server to see if brick<br>
process is running.<br>
[2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_<u></u>notify]<br>
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client<br>
process will keep trying to connect to glusterd until brick's port is<br>
available<br>
[2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_<u></u>reconfig]<br>
0-drslk-prod-client-10: changing port to 49349 (from 0)<br>
[2015-02-18 13:11:02.898097] I<br>
[client-handshake.c:1413:<u></u>select_server_supported_<u></u>programs]<br>
0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),<br>
Version (330)<br>
[2015-02-18 13:11:02.898446] I<br>
[client-handshake.c:1200:<u></u>client_setvolume_cbk] 0-drslk-prod-client-10:<br>
Connected to drslk-prod-client-10, attached to remote volume<br>
'/GLUSTERFS/drslk-prod'.<br>
[2015-02-18 13:11:02.898460] I<br>
[client-handshake.c:1210:<u></u>client_setvolume_cbk] 0-drslk-prod-client-10:<br>
Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
</div></div></blockquote>
<br>
Can you provide the gluster volume configuration details?<br>
<br>
It does look like frame-timeout for the volume has been set to 60. Is there any specific reason? Normally altering the frame-timeout is not recommended.<span class="HOEnZb"><font color="#888888"><br>
<br>
-Vijay<br>
<br>
</font></span></blockquote></div><br></div>