<div dir="ltr">So I&#39;ve tried using a lot of your script, but I&#39;m still unable to get past the &quot;<span style="font-size:12.8px">Launching heal operation to perform full self heal on volume &lt;volname&gt; has been unsuccessful on bricks that are down. Please check if all brick processes are running.&quot; error message.  Everything else seems like it&#39;s working, but the &quot;gluster volume heal appian full&quot; is never working.</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Is there any way to figure out what exactly happened that would cause this error message?  The logs don&#39;t seem very useful in determining what exactly happened.  It seems to just state that it can&#39;t seem to &quot;Commit&quot; with the other bricks.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">When I restart the volume though, it sometimes fixes it, but not sure I want to run a script that constantly restarts the volume until &quot;gluster volume heal appian full&quot; is working.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 23, 2016 at 2:21 AM, Heiko L. <span dir="ltr">&lt;<a href="mailto:heikol@fh-lausitz.de" target="_blank">heikol@fh-lausitz.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

hostname not needed<br>

<br>

# nodea=10.1.1.100;bricka=/mnt/sda6/brick4<br>

should be working<br>

<br>

but I prefer like to work with hostnames.<br>

<br>

<br>

regards heiko<br>

<br>

PS i forgot notes:<br>

- xfs,zfs (ext3 work, but partially bad performance (V3.4))<br>

- brickdir should not be topdir of fs<br>

  /dev/sda6 /mnt/brick4, brick=/mnt/brick4 -&gt;  not recommended<br>

  /dev/sda6 /mnt/sda6,   brick=/mnt/sda6/brick4     better<br>

<div class="HOEnZb"><div class="h5"><br>

&gt; Thank you for responding, Heiko.  In the process of seeing the differences<br>

&gt; between our two scripts.  First thing I noticed was that the notes states &quot;need<br>

&gt; to be defined in the /etc/hosts&quot;. Would using the IP address directly be a<br>

&gt; problem?<br>

&gt;<br>

&gt; On Tue, Jun 21, 2016 at 2:10 PM, Heiko L. &lt;<a href="mailto:heikol@fh-lausitz.de">heikol@fh-lausitz.de</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; Am Di, 21.06.2016, 19:22 schrieb Danny Lee:<br>

&gt;&gt; &gt; Hello,<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; We are currently figuring out how to add GlusterFS to our system to make<br>

&gt;&gt; &gt; our systems highly available using scripts.  We are using Gluster 3.7.11.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Problem:<br>

&gt;&gt; &gt; Trying to migrate to GlusterFS from a non-clustered system to a 3-node<br>

&gt;&gt; &gt; glusterfs replicated cluster using scripts.  Tried various things to<br>

&gt;&gt; make this work, but it sometimes causes us to be in an<br>

&gt;&gt; &gt; indesirable state where if you call &quot;gluster volume heal &lt;volname&gt;<br>

&gt;&gt; full&quot;, we would get the error message, &quot;Launching heal<br>

&gt;&gt; &gt; operation to perform full self heal on volume &lt;volname&gt; has been<br>

&gt;&gt; unsuccessful on bricks that are down. Please check if<br>

&gt;&gt; &gt; all brick processes are running.&quot;  All the brick processes are running<br>

&gt;&gt; based on running the command, &quot;gluster volume status<br>

&gt;&gt; &gt; volname&quot;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Things we have tried:<br>

&gt;&gt; &gt; Order of preference<br>

&gt;&gt; &gt; 1. Create Volume with 3 Filesystems with the same data<br>

&gt;&gt; &gt; 2. Create Volume with 2 Empty filesysytems and one with the data<br>

&gt;&gt; &gt; 3. Create Volume with only one filesystem with data and then using<br>

&gt;&gt; &gt; &quot;add-brick&quot; command to add the other two empty filesystems<br>

&gt;&gt; &gt; 4. Create Volume with one empty filesystem, mounting it, and then copying<br>

&gt;&gt; &gt; the data over to that one.  And then finally, using &quot;add-brick&quot; command<br>

&gt;&gt; to add the other two empty filesystems<br>

&gt;&gt; - should be working<br>

&gt;&gt; - read each file on /mnt/gvol, to trigger replication [2]<br>

&gt;&gt;<br>

&gt;&gt; &gt; 5. Create Volume<br>

&gt;&gt; &gt; with 3 empty filesystems, mounting it, and then copying the data over<br>

&gt;&gt; - my favorite<br>

&gt;&gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Other things to note:<br>

&gt;&gt; &gt; A few minutes after the volume is created and started successfully, our<br>

&gt;&gt; &gt; application server starts up against it, so reads and writes may happen<br>

&gt;&gt; pretty quickly after the volume has started.  But there<br>

&gt;&gt; &gt; is only about 50MB of data.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Steps to reproduce (all in a script):<br>

&gt;&gt; &gt; # This is run by the primary node with the IP Adress, &lt;server-ip-1&gt;, that<br>

&gt;&gt; &gt; has data systemctl restart glusterd gluster peer probe &lt;server-ip-2&gt;<br>

&gt;&gt; gluster peer probe &lt;server-ip-3&gt; Wait for &quot;gluster peer<br>

&gt;&gt; &gt; status&quot; to all be in &quot;Peer in Cluster&quot; state gluster volume create<br>

&gt;&gt; &lt;volname&gt; replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}<br>

&gt;&gt; &gt; ${BRICKS[2]} force<br>

&gt;&gt; &gt; gluster volume set &lt;volname&gt; nfs.disable true gluster volume start<br>

&gt;&gt; &lt;volname&gt; mkdir -p $MOUNT_POINT mount -t glusterfs<br>

&gt;&gt; &gt; &lt;server-ip-1&gt;:/volname $MOUNT_POINT<br>

&gt;&gt; &gt; find $MOUNT_POINT | xargs stat<br>

&gt;&gt;<br>

&gt;&gt; I have written a script for 2 nodes. [1]<br>

&gt;&gt; but should be at least 3 nodes.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; I hope it helps you<br>

&gt;&gt; regards Heiko<br>

&gt;&gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Note that, when we added sleeps around the gluster commands, there was a<br>

&gt;&gt; &gt; higher probability of success, but not 100%.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; # Once volume is started, all the the clients/servers will mount the<br>

&gt;&gt; &gt; gluster filesystem by polling &quot;mountpoint -q $MOUNT_POINT&quot;: mkdir -p<br>

&gt;&gt; $MOUNT_POINT mount -t glusterfs &lt;server-ip-1&gt;:/volname<br>

&gt;&gt; &gt; $MOUNT_POINT<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Logs:<br>

&gt;&gt; &gt; *etc-glusterfs-glusterd.vol.log* in *server-ip-1*<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; [2016-06-21 14:10:38.285234] I [MSGID: 106533]<br>

&gt;&gt; &gt; [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]<br>

&gt;&gt; 0-management:<br>

&gt;&gt; &gt; Received heal vol req for volume volname<br>

&gt;&gt; &gt; [2016-06-21 14:10:38.296801] E [MSGID: 106153]<br>

&gt;&gt; &gt; [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on<br>

&gt;&gt; &gt; &lt;server-ip-2&gt;. Please check log file for details.<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; *usr-local-volname-data-mirrored-data.log* in *server-ip-1*<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; [2016-06-21 14:14:39.233366] E [MSGID: 114058]<br>

&gt;&gt; &gt; [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:<br>

&gt;&gt; &gt; failed to get the port number for remote subvolume. Please run &#39;gluster<br>

&gt;&gt; volume status&#39; on server to see if brick process is<br>

&gt;&gt; &gt; running. *I think this is caused by the self heal daemon*<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; *cmd_history.log* in *server-ip-1*<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :<br>

&gt;&gt; Commit<br>

&gt;&gt; &gt; failed on &lt;server-ip-2&gt;. Please check log file for details.<br>

&gt;&gt; _______________________________________________<br>

&gt;&gt; &gt; Gluster-users mailing list<br>

&gt;&gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt;&gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt;&gt;<br>

&gt;&gt; [1]<br>

&gt;&gt; <a href="http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt" rel="noreferrer" target="_blank">http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt</a><br>

&gt;&gt;   - old, limit 2 nodes<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; --<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;<br>

<br>

<br>

</div></div></blockquote></div><br></div>