<div dir="ltr"><div><div>Hello,<br><br></div>We&#39;ve recently upgraded from gluster 3.6.6 to 3.7.6 and have started encountering dmesg page allocation errors (stack trace is appended). <br><br>It appears that glusterfsd now sometimes fills up the cache completely and crashes with a page allocation failure. I *believe* it mainly happens when copying lots of new data to the system, running a &#39;find&#39;, or similar. Hosts are all Scientific Linux 6.6 and these errors occur consistently on two separate gluster pools.<br></div><div><br>Has anyone else seen this issue and are there any known fixes for it via sysctl kernel parameters or other means?<br><br></div><div>Please let me know of any other diagnostic information that would help.<br></div><div><br></div><div>Thanks,<br></div><div>Patrick<br></div><div><br><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">[1458118.134697] glusterfsd: page allocation failure. order:5, mode:0x20<br>[1458118.134701] Pid: 6010, comm: glusterfsd Not tainted 2.6.32-573.3.1.el6.x86_64 #1<br>[1458118.134702] Call Trace:<br>[1458118.134714]  [&lt;ffffffff8113770c&gt;] ? __alloc_pages_nodemask+0x7dc/0x950<br>[1458118.134728]  [&lt;ffffffffa0321800&gt;] ? mlx4_ib_post_send+0x680/0x1f90 [mlx4_ib]<br>[1458118.134733]  [&lt;ffffffff81176e92&gt;] ? kmem_getpages+0x62/0x170<br>[1458118.134735]  [&lt;ffffffff81177aaa&gt;] ? fallback_alloc+0x1ba/0x270<br>[1458118.134736]  [&lt;ffffffff811774ff&gt;] ? cache_grow+0x2cf/0x320<br>[1458118.134738]  [&lt;ffffffff81177829&gt;] ? ____cache_alloc_node+0x99/0x160<br>[1458118.134743]  [&lt;ffffffff8145f732&gt;] ? pskb_expand_head+0x62/0x280<br>[1458118.134744]  [&lt;ffffffff81178479&gt;] ? __kmalloc+0x199/0x230<br>[1458118.134746]  [&lt;ffffffff8145f732&gt;] ? pskb_expand_head+0x62/0x280<br>[1458118.134748]  [&lt;ffffffff8146001a&gt;] ? __pskb_pull_tail+0x2aa/0x360<br>[1458118.134751]  [&lt;ffffffff8146f389&gt;] ? harmonize_features+0x29/0x70<br>[1458118.134753]  [&lt;ffffffff8146f9f4&gt;] ? dev_hard_start_xmit+0x1c4/0x490<br>[1458118.134758]  [&lt;ffffffff8148cf8a&gt;] ? sch_direct_xmit+0x15a/0x1c0<br>[1458118.134759]  [&lt;ffffffff8146ff68&gt;] ? dev_queue_xmit+0x228/0x320<br>[1458118.134762]  [&lt;ffffffff8147665d&gt;] ? neigh_connected_output+0xbd/0x100<br>[1458118.134766]  [&lt;ffffffff814abc67&gt;] ? ip_finish_output+0x287/0x360<br>[1458118.134767]  [&lt;ffffffff814abdf8&gt;] ? ip_output+0xb8/0xc0<br>[1458118.134769]  [&lt;ffffffff814ab04f&gt;] ? __ip_local_out+0x9f/0xb0<br>[1458118.134770]  [&lt;ffffffff814ab085&gt;] ? ip_local_out+0x25/0x30<br>[1458118.134772]  [&lt;ffffffff814ab580&gt;] ? ip_queue_xmit+0x190/0x420<br>[1458118.134773]  [&lt;ffffffff81137059&gt;] ? __alloc_pages_nodemask+0x129/0x950<br>[1458118.134776]  [&lt;ffffffff814c0c54&gt;] ? tcp_transmit_skb+0x4b4/0x8b0<br>[1458118.134778]  [&lt;ffffffff814c319a&gt;] ? tcp_write_xmit+0x1da/0xa90<br>[1458118.134779]  [&lt;ffffffff81178cbd&gt;] ? __kmalloc_node+0x4d/0x60<br>[1458118.134780]  [&lt;ffffffff814c3a80&gt;] ? tcp_push_one+0x30/0x40<br>[1458118.134782]  [&lt;ffffffff814b410c&gt;] ? tcp_sendmsg+0x9cc/0xa20<br>[1458118.134786]  [&lt;ffffffff8145836b&gt;] ? sock_aio_write+0x19b/0x1c0<br>[1458118.134788]  [&lt;ffffffff814581d0&gt;] ? sock_aio_write+0x0/0x1c0<br>[1458118.134791]  [&lt;ffffffff8119169b&gt;] ? do_sync_readv_writev+0xfb/0x140<br>[1458118.134797]  [&lt;ffffffff810a14b0&gt;] ? autoremove_wake_function+0x0/0x40<br>[1458118.134801]  [&lt;ffffffff8123e92f&gt;] ? selinux_file_permission+0xbf/0x150<br>[1458118.134804]  [&lt;ffffffff812316d6&gt;] ? security_file_permission+0x16/0x20<br>[1458118.134806]  [&lt;ffffffff81192746&gt;] ? do_readv_writev+0xd6/0x1f0<br>[1458118.134807]  [&lt;ffffffff811928a6&gt;] ? vfs_writev+0x46/0x60<br>[1458118.134809]  [&lt;ffffffff811929d1&gt;] ? sys_writev+0x51/0xd0<br>[1458118.134812]  [&lt;ffffffff810e88ae&gt;] ? __audit_syscall_exit+0x25e/0x290<br>[1458118.134816]  [&lt;ffffffff8100b0d2&gt;] ? system_call_fastpath+0x16/0x1b<br></blockquote><br></div></div>