<div dir="ltr">I see:<div><br></div><div><div>#define GF_DECIDE_DEFRAG_THROTTLE_COUNT(throttle_count, conf) {         \</div><div>                                                                        \</div><div>                throttle_count = MAX ((get_nprocs() - 4), 4);                 \</div><div>                                                                        \</div><div>                if (!strcmp (conf-&gt;dthrottle, &quot;lazy&quot;))                  \</div><div>                        conf-&gt;defrag-&gt;rthcount = 1;                     \</div><div>                                                                        \</div><div>                if (!strcmp (conf-&gt;dthrottle, &quot;normal&quot;))                \</div><div>                        conf-&gt;defrag-&gt;rthcount = (throttle_count / 2);  \</div><div>                                                                        \</div><div>                if (!strcmp (conf-&gt;dthrottle, &quot;aggressive&quot;))            \</div><div>                        conf-&gt;defrag-&gt;rthcount = throttle_count;  \</div><div><br></div><div>So aggressive will give us the default of (20 + 16), normal is that divided by 2, and lazy is 1, is that correct?  If so that is what I was looking to see.  The only other thing I can think of here is making the tunible a number like event threads, but I like this.  IDK if I saw it documented but if its not we should note this in help.</div><div><br></div><div>Also to note, the old time was <span style="font-size:12.8000001907349px">98500.00 the new one is 55088.00, that is a 44% improvement!</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">-b</span></div><div><br></div><div><br></div></div><div class="gmail_extra"><div class="gmail_quote">On Mon, May 4, 2015 at 9:06 AM, Susant Palai <span dir="ltr">&lt;<a href="mailto:spalai@redhat.com" target="_blank">spalai@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Ben,<br>

    On no. of threads:<br>

     Sent throttle patch here:<a href="http://review.gluster.org/#/c/10526/" target="_blank">http://review.gluster.org/#/c/10526/</a> to limit thread numbers[Not merged]. The rebalance process in current model spawns 20 threads and in addition to that there will be a max 16 syncop threads.<br>

<br>

    Crash:<br>

     The crash should be fixed by this: <a href="http://review.gluster.org/#/c/10459/" target="_blank">http://review.gluster.org/#/c/10459/</a>.<br>

<br>

     Rebalance time taken is a factor of number of files and their size. If the frequency of files getting added to the global queue[on which the migrator threads act] is higher, faster will be the rebalance. I guess here we are seeing the effect of local crawl mostly as only 81GB is migrated out of 500GB.<br>

<span><br>

Thanks,<br>

Susant<br>

<br>

----- Original Message -----<br>

&gt; From: &quot;Benjamin Turner&quot; &lt;<a href="mailto:bennyturns@gmail.com" target="_blank">bennyturns@gmail.com</a>&gt;<br>

&gt; To: &quot;Vijay Bellur&quot; &lt;<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>&gt;<br>

</span><span>&gt; Cc: &quot;Gluster Devel&quot; &lt;<a href="mailto:gluster-devel@gluster.org" target="_blank">gluster-devel@gluster.org</a>&gt;<br>

&gt; Sent: Monday, May 4, 2015 5:18:13 PM<br>

&gt; Subject: Re: [Gluster-devel] Rebalance improvement design<br>

&gt;<br>

</span><div><div>&gt; Thanks Vijay! I forgot to upgrade the kernel(thinp 6.6 perf bug gah) before I<br>

&gt; created this data set, so its a bit smaller:<br>

&gt;<br>

&gt; total threads = 16<br>

&gt; total files = 7,060,700 (64 kb files, 100 files per dir)<br>

&gt; total data = 430.951 GB<br>

&gt; 88.26% of requested files processed, minimum is 70.00<br>

&gt; 10101.355737 sec elapsed time<br>

&gt; 698.985382 files/sec<br>

&gt; 698.985382 IOPS<br>

&gt; 43.686586 MB/sec<br>

&gt;<br>

&gt; I updated everything and ran the rebalanace on<br>

&gt; glusterfs-3.8dev-0.107.git275f724.el6.x86_64.:<br>

&gt;<br>

&gt; [root@gqas001 ~]# gluster v rebalance testvol status<br>

&gt; Node Rebalanced-files size scanned failures skipped status run time in secs<br>

&gt; --------- ----------- ----------- ----------- ----------- -----------<br>

&gt; ------------ --------------<br>

&gt; localhost 1327346 81.0GB 3999140 0 0 completed 55088.00<br>

&gt; <a href="http://gqas013.sbu.lab.eng.bos.redhat.com" target="_blank">gqas013.sbu.lab.eng.bos.redhat.com</a> 0 0Bytes 1 0 0 completed 26070.00<br>

&gt; <a href="http://gqas011.sbu.lab.eng.bos.redhat.com" target="_blank">gqas011.sbu.lab.eng.bos.redhat.com</a> 0 0Bytes 0 0 0 failed 0.00<br>

&gt; <a href="http://gqas014.sbu.lab.eng.bos.redhat.com" target="_blank">gqas014.sbu.lab.eng.bos.redhat.com</a> 0 0Bytes 0 0 0 failed 0.00<br>

&gt; <a href="http://gqas016.sbu.lab.eng.bos.redhat.com" target="_blank">gqas016.sbu.lab.eng.bos.redhat.com</a> 1325857 80.9GB 4000865 0 0 completed<br>

&gt; 55088.00<br>

&gt; <a href="http://gqas015.sbu.lab.eng.bos.redhat.com" target="_blank">gqas015.sbu.lab.eng.bos.redhat.com</a> 0 0Bytes 0 0 0 failed 0.00<br>

&gt; volume rebalance: testvol: success:<br>

&gt;<br>

&gt;<br>

&gt; A couple observations:<br>

&gt;<br>

&gt; I am seeing lots of threads / processes running:<br>

&gt;<br>

&gt; [root@gqas001 ~]# ps -eLf | grep glu | wc -l<br>

&gt; 96 &lt;- 96 gluster threads<br>

&gt; [root@gqas001 ~]# ps -eLf | grep rebal | wc -l<br>

&gt; 36 &lt;- 36 rebal threads.<br>

&gt;<br>

&gt; Is this tunible? Is there a use case where we would need to limit this? Just<br>

&gt; curious, how did we arrive at 36 rebal threads?<br>

&gt;<br>

&gt; # cat /var/log/glusterfs/testvol-rebalance.log | wc -l<br>

&gt; 4,577,583<br>

&gt; [root@gqas001 ~]# ll /var/log/glusterfs/testvol-rebalance.log -h<br>

&gt; -rw------- 1 root root 1.6G May 3 12:29<br>

&gt; /var/log/glusterfs/testvol-rebalance.log<br>

&gt;<br>

&gt; :) How big is this going to get when I do the 10-20 TB? I&#39;ll keep tabs on<br>

&gt; this, my default test setup only has:<br>

&gt;<br>

&gt; [root@gqas001 ~]# df -h<br>

&gt; Filesystem Size Used Avail Use% Mounted on<br>

&gt; /dev/mapper/vg_gqas001-lv_root 50G 4.8G 42G 11% /<br>

&gt; tmpfs 24G 0 24G 0% /dev/shm<br>

&gt; /dev/sda1 477M 65M 387M 15% /boot<br>

&gt; /dev/mapper/vg_gqas001-lv_home 385G 71M 366G 1% /home<br>

&gt; /dev/mapper/gluster_vg-lv_bricks 9.5T 219G 9.3T 3% /bricks<br>

&gt;<br>

&gt; Next run I want to fill up a 10TB cluster and double the # of bricks to<br>

&gt; simulate running out of space doubling capacity. Any other fixes or changes<br>

&gt; that need to go in before I try a larger data set? Before that I may run my<br>

&gt; performance regression suite against a system while a rebal is in progress<br>

&gt; and check how it affects performance. I&#39;ll turn both these cases into perf<br>

&gt; regression tests that I run with iozone smallfile and such, any other use<br>

&gt; cases I should add? Should I add hard / soft links / whatever else tot he<br>

&gt; data set?<br>

&gt;<br>

&gt; -b<br>

&gt;<br>

&gt;<br>

&gt; On Sun, May 3, 2015 at 11:48 AM, Vijay Bellur &lt; <a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a> &gt; wrote:<br>

&gt;<br>

&gt;<br>

&gt; On 05/01/2015 10:23 AM, Benjamin Turner wrote:<br>

&gt;<br>

&gt;<br>

&gt; Ok I have all my data created and I just started the rebalance. One<br>

&gt; thing to not in the client log I see the following spamming:<br>

&gt;<br>

&gt; [root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l<br>

&gt; 394042<br>

&gt;<br>

&gt; [2015-05-01 00:47:55.591150] I [MSGID: 109036]<br>

&gt; [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:<br>

&gt; Setting layout of<br>

&gt; /file_dstdir/<br>

&gt; <a href="http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006" target="_blank">gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006</a><br>

&gt; &lt; <a href="http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006" target="_blank">http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006</a> &gt;<br>

&gt; with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop:<br>

&gt; 2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start:<br>

&gt; 2141429670 , Stop: 4294967295 ],<br>

&gt; [2015-05-01 00:47:55.596147] I<br>

&gt; [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht:<br>

&gt; chunk size = 0xffffffff / 19920276 = 0xd7<br>

&gt; [2015-05-01 00:47:55.596177] I<br>

&gt; [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht:<br>

&gt; assigning range size 0x7fa39fa6 to testvol-replicate-1<br>

&gt;<br>

&gt;<br>

&gt; I also noticed the same set of excessive logs in my tests. Have sent across a<br>

&gt; patch [1] to address this problem.<br>

&gt;<br>

&gt; -Vijay<br>

&gt;<br>

&gt; [1] <a href="http://review.gluster.org/10281" target="_blank">http://review.gluster.org/10281</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

</div></div><div><div>&gt; _______________________________________________<br>

&gt; Gluster-devel mailing list<br>

&gt; <a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>

&gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>

&gt;<br>

</div></div></blockquote></div><br></div></div>