<div dir="ltr">Thanks Vijay!  I forgot to upgrade the kernel(thinp 6.6 perf bug gah) before I created this data set, so its a bit smaller:<div><br></div><div><div>total threads = 16</div><div>total files = 7,060,700 (64 kb files, 100 files per dir)</div><div>total data =   430.951 GB</div><div> 88.26% of requested files processed, minimum is  70.00</div><div>10101.355737 sec elapsed time</div><div>698.985382 files/sec</div><div>698.985382 IOPS</div><div>43.686586 MB/sec</div></div><div><br></div><div>I updated everything and ran the rebalanace on glusterfs-3.8dev-0.107.git275f724.el6.x86_64.:</div><div><br></div><div><div>[root@gqas001 ~]# gluster v rebalance testvol status</div><div>                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs</div><div>                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------</div><div>                               localhost          1327346        81.0GB       3999140             0             0            completed           55088.00</div><div>      <a href="http://gqas013.sbu.lab.eng.bos.redhat.com">gqas013.sbu.lab.eng.bos.redhat.com</a>                0        0Bytes             1             0             0            completed           26070.00</div><div>      <a href="http://gqas011.sbu.lab.eng.bos.redhat.com">gqas011.sbu.lab.eng.bos.redhat.com</a>                0        0Bytes             0             0             0               failed               0.00</div><div>      <a href="http://gqas014.sbu.lab.eng.bos.redhat.com">gqas014.sbu.lab.eng.bos.redhat.com</a>                0        0Bytes             0             0             0               failed               0.00</div><div>      <a href="http://gqas016.sbu.lab.eng.bos.redhat.com">gqas016.sbu.lab.eng.bos.redhat.com</a>          1325857        80.9GB       4000865             0             0            completed           55088.00</div><div>      <a href="http://gqas015.sbu.lab.eng.bos.redhat.com">gqas015.sbu.lab.eng.bos.redhat.com</a>                0        0Bytes             0             0             0               failed               0.00</div><div>volume rebalance: testvol: success: </div></div><div><br></div><div><br></div><div>A couple observations:</div><div><br></div><div>I am seeing lots of threads / processes running:</div><div><br></div><div><div>[root@gqas001 ~]# ps -eLf | grep glu | wc -l</div><div>96 &lt;- 96 gluster threads</div><div>[root@gqas001 ~]# ps -eLf | grep rebal | wc -l</div><div>36 &lt;- 36 rebal threads.  </div><div><br></div><div>Is this tunible?  Is there a use case where we would need to limit this?  Just curious, how did we arrive at 36 rebal threads?</div></div><div><br></div><div><div># cat /var/log/glusterfs/testvol-rebalance.log | wc -l</div><div>4,577,583</div></div><div><div>[root@gqas001 ~]# ll /var/log/glusterfs/testvol-rebalance.log -h</div><div>-rw------- 1 root root 1.6G May  3 12:29 /var/log/glusterfs/testvol-rebalance.log</div></div><div><br></div><div>:) How big is this going to get when I do the 10-20 TB?  I&#39;ll keep tabs on this, my default test setup only has:</div><div><br></div><div><div>[root@gqas001 ~]# df -h</div><div>Filesystem            Size  Used Avail Use% Mounted on</div><div>/dev/mapper/vg_gqas001-lv_root   50G  4.8G   42G  11% /</div><div>tmpfs                  24G     0   24G   0% /dev/shm</div><div>/dev/sda1             477M   65M  387M  15% /boot</div><div>/dev/mapper/vg_gqas001-lv_home  385G   71M  366G   1% /home</div><div>/dev/mapper/gluster_vg-lv_bricks  9.5T  219G  9.3T   3% /bricks</div></div><div><br></div><div>Next run I want to fill up a 10TB cluster and double the # of bricks to simulate running out of space doubling capacity.  Any other fixes or changes that need to go in before I try a larger data set?  Before that I may run my performance regression suite against a system while a rebal is in progress and check how it affects performance.  I&#39;ll turn both these cases into perf regression tests that I run with iozone smallfile and such, any other use cases I should add?  Should I add hard / soft links / whatever else tot he data set? </div><div><br></div><div>-b</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, May 3, 2015 at 11:48 AM, Vijay Bellur <span dir="ltr">&lt;<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/01/2015 10:23 AM, Benjamin Turner wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

Ok I have all my data created and I just started the rebalance.  One<br>

thing to not in the client log I see the following spamming:<br>

<br>

[root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l<br>

394042<br>

<br>

[2015-05-01 00:47:55.591150] I [MSGID: 109036]<br>

[dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:<br>

Setting layout of<br>

/file_dstdir/<a href="http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006" target="_blank">gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006</a><br></span>

&lt;<a href="http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006" target="_blank">http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006</a>&gt;<span class=""><br>

with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop:<br>

2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start:<br>

2141429670 , Stop: 4294967295 ],<br>

[2015-05-01 00:47:55.596147] I<br>

[dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht:<br>

chunk size = 0xffffffff / 19920276 = 0xd7<br>

[2015-05-01 00:47:55.596177] I<br>

[dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht:<br>

assigning range size 0x7fa39fa6 to testvol-replicate-1<br>

</span></blockquote>

<br>

<br>

I also noticed the same set of excessive logs in my tests. Have sent across a patch [1] to address this problem.<br>

<br>

-Vijay<br>

<br>

[1] <a href="http://review.gluster.org/10281" target="_blank">http://review.gluster.org/10281</a><br>

<br>

<br>

<br>

</blockquote></div><br></div>