<p dir="ltr">This could very well be related to op-version. Could you look at the faulty node&#39;s glusterd log and see the error log entries, that would give us the exact reason of failure.</p>
<p dir="ltr">-Atin<br>
Sent from one plus one</p>
<div class="gmail_quote">On Oct 30, 2015 5:35 PM, &quot;Thomas Bätzler&quot; &lt;<a href="mailto:t.baetzler@bringe.com">t.baetzler@bringe.com</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
can somebody help me with fixing our 8 node gluster please?<br>
<br>
Setup is as follows:<br>
<br>
root@glucfshead2:~# gluster volume info<br>
<br>
Volume Name: archive<br>
Type: Distributed-Replicate<br>
Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6<br>
Status: Started<br>
Number of Bricks: 4 x 2 = 8<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: glucfshead1:/data/glusterfs/archive/brick1<br>
Brick2: glucfshead5:/data/glusterfs/archive/brick1<br>
Brick3: glucfshead2:/data/glusterfs/archive/brick1<br>
Brick4: glucfshead6:/data/glusterfs/archive/brick1<br>
Brick5: glucfshead3:/data/glusterfs/archive/brick1<br>
Brick6: glucfshead7:/data/glusterfs/archive/brick1<br>
Brick7: glucfshead4:/data/glusterfs/archive/brick1<br>
Brick8: glucfshead8:/data/glusterfs/archive/brick1<br>
Options Reconfigured:<br>
cluster.data-self-heal: off<br>
cluster.entry-self-heal: off<br>
cluster.metadata-self-heal: off<br>
features.lock-heal: on<br>
cluster.readdir-optimize: on<br>
performance.flush-behind: off<br>
performance.io-thread-count: 16<br>
features.quota: off<br>
performance.quick-read: on<br>
performance.stat-prefetch: off<br>
performance.io-cache: on<br>
performance.cache-refresh-timeout: 1<br>
nfs.disable: on<br>
performance.cache-max-file-size: 200kb<br>
performance.cache-size: 2GB<br>
performance.write-behind-window-size: 4MB<br>
performance.read-ahead: off<br>
storage.linux-aio: off<br>
diagnostics.brick-sys-log-level: WARNING<br>
cluster.self-heal-daemon: off<br>
<br>
Volume Name: archive2<br>
Type: Distributed-Replicate<br>
Volume ID: 0fe86e42-e67f-46d8-8ed0-d0e34f539d69<br>
Status: Started<br>
Number of Bricks: 4 x 2 = 8<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: glucfshead1:/data/glusterfs/archive2/brick1<br>
Brick2: glucfshead5:/data/glusterfs/archive2/brick1<br>
Brick3: glucfshead2:/data/glusterfs/archive2/brick1<br>
Brick4: glucfshead6:/data/glusterfs/archive2/brick1<br>
Brick5: glucfshead3:/data/glusterfs/archive2/brick1<br>
Brick6: glucfshead7:/data/glusterfs/archive2/brick1<br>
Brick7: glucfshead4:/data/glusterfs/archive2/brick1<br>
Brick8: glucfshead8:/data/glusterfs/archive2/brick1<br>
Options Reconfigured:<br>
cluster.metadata-self-heal: off<br>
cluster.entry-self-heal: off<br>
cluster.data-self-heal: off<br>
diagnostics.count-fop-hits: on<br>
diagnostics.latency-measurement: on<br>
features.lock-heal: on<br>
diagnostics.brick-sys-log-level: WARNING<br>
storage.linux-aio: off<br>
performance.read-ahead: off<br>
performance.write-behind-window-size: 4MB<br>
performance.cache-size: 2GB<br>
performance.cache-max-file-size: 200kb<br>
nfs.disable: on<br>
performance.cache-refresh-timeout: 1<br>
performance.io-cache: on<br>
performance.stat-prefetch: off<br>
performance.quick-read: on<br>
features.quota: off<br>
performance.io-thread-count: 16<br>
performance.flush-behind: off<br>
auth.allow: 172.16.15.*<br>
cluster.readdir-optimize: on<br>
cluster.self-heal-daemon: off<br>
<br>
Some time ago node, glucfshead1 broke down. After some fiddling it was<br>
decided not to deal with that immediately because the gluster was in<br>
production and a rebuild on 3.4 would basically render the gluster unusable.<br>
<br>
Recently it was felt that we needed to deal with the situation and we<br>
hired some experts to deal with the problem. So we reinstalled the<br>
broken node and gave it a new name/ip and upgraded all systems to 3.6.4.<br>
<br>
The plan was to probe the &quot;new&quot; node into the gluster and then do a<br>
brick-replace on it. However that did not go as expected.<br>
<br>
The node that we removed is now listed as &quot;Peer Rejected&quot;:<br>
<br>
root@glucfshead2:~# gluster peer status<br>
Number of Peers: 7<br>
<br>
Hostname: glucfshead1<br>
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf<br>
State: Peer Rejected (Disconnected)<br>
<br>
Hostname: glucfshead3<br>
Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5<br>
State: Peer in Cluster (Connected)<br>
<br>
Hostname: glucfshead4<br>
Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928<br>
State: Peer in Cluster (Connected)<br>
<br>
Hostname: glucfshead5<br>
Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2<br>
State: Peer in Cluster (Connected)<br>
<br>
Hostname: glucfshead6<br>
Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72<br>
State: Peer in Cluster (Connected)<br>
<br>
Hostname: glucfshead7<br>
Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63<br>
State: Peer in Cluster (Connected)<br>
<br>
Hostname: glucfshead8<br>
Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9<br>
State: Peer in Cluster (Connected)<br>
<br>
If I probe the replacement node (glucfshead9) it only ever shows up on<br>
one of my running nodes and it&#39;s in state &quot;Rejected Peer (Connected)&quot;.<br>
<br>
How can we fix this - preferably without losing data?<br>
<br>
TIA,<br>
Thomas<br>
<br>
---<br>
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.<br>
<a href="https://www.avast.com/antivirus" rel="noreferrer" target="_blank">https://www.avast.com/antivirus</a><br>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>