<p dir="ltr">This could very well be related to op-version. Could you look at the faulty node&#39;s glusterd log and see the error log entries, that would give us the exact reason of failure.</p>

<p dir="ltr">-Atin<br>

Sent from one plus one</p>

<div class="gmail_quote">On Oct 30, 2015 5:35 PM, &quot;Thomas Bätzler&quot; &lt;<a href="mailto:t.baetzler@bringe.com">t.baetzler@bringe.com</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

can somebody help me with fixing our 8 node gluster please?<br>

<br>

Setup is as follows:<br>

<br>

root@glucfshead2:~# gluster volume info<br>

<br>

Volume Name: archive<br>

Type: Distributed-Replicate<br>

Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6<br>

Status: Started<br>

Number of Bricks: 4 x 2 = 8<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: glucfshead1:/data/glusterfs/archive/brick1<br>

Brick2: glucfshead5:/data/glusterfs/archive/brick1<br>

Brick3: glucfshead2:/data/glusterfs/archive/brick1<br>

Brick4: glucfshead6:/data/glusterfs/archive/brick1<br>

Brick5: glucfshead3:/data/glusterfs/archive/brick1<br>

Brick6: glucfshead7:/data/glusterfs/archive/brick1<br>

Brick7: glucfshead4:/data/glusterfs/archive/brick1<br>

Brick8: glucfshead8:/data/glusterfs/archive/brick1<br>

Options Reconfigured:<br>

cluster.data-self-heal: off<br>

cluster.entry-self-heal: off<br>

cluster.metadata-self-heal: off<br>

features.lock-heal: on<br>

cluster.readdir-optimize: on<br>

performance.flush-behind: off<br>

performance.io-thread-count: 16<br>

features.quota: off<br>

performance.quick-read: on<br>

performance.stat-prefetch: off<br>

performance.io-cache: on<br>

performance.cache-refresh-timeout: 1<br>

nfs.disable: on<br>

performance.cache-max-file-size: 200kb<br>

performance.cache-size: 2GB<br>

performance.write-behind-window-size: 4MB<br>

performance.read-ahead: off<br>

storage.linux-aio: off<br>

diagnostics.brick-sys-log-level: WARNING<br>

cluster.self-heal-daemon: off<br>

<br>

Volume Name: archive2<br>

Type: Distributed-Replicate<br>

Volume ID: 0fe86e42-e67f-46d8-8ed0-d0e34f539d69<br>

Status: Started<br>

Number of Bricks: 4 x 2 = 8<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: glucfshead1:/data/glusterfs/archive2/brick1<br>

Brick2: glucfshead5:/data/glusterfs/archive2/brick1<br>

Brick3: glucfshead2:/data/glusterfs/archive2/brick1<br>

Brick4: glucfshead6:/data/glusterfs/archive2/brick1<br>

Brick5: glucfshead3:/data/glusterfs/archive2/brick1<br>

Brick6: glucfshead7:/data/glusterfs/archive2/brick1<br>

Brick7: glucfshead4:/data/glusterfs/archive2/brick1<br>

Brick8: glucfshead8:/data/glusterfs/archive2/brick1<br>

Options Reconfigured:<br>

cluster.metadata-self-heal: off<br>

cluster.entry-self-heal: off<br>

cluster.data-self-heal: off<br>

diagnostics.count-fop-hits: on<br>

diagnostics.latency-measurement: on<br>

features.lock-heal: on<br>

diagnostics.brick-sys-log-level: WARNING<br>

storage.linux-aio: off<br>

performance.read-ahead: off<br>

performance.write-behind-window-size: 4MB<br>

performance.cache-size: 2GB<br>

performance.cache-max-file-size: 200kb<br>

nfs.disable: on<br>

performance.cache-refresh-timeout: 1<br>

performance.io-cache: on<br>

performance.stat-prefetch: off<br>

performance.quick-read: on<br>

features.quota: off<br>

performance.io-thread-count: 16<br>

performance.flush-behind: off<br>

auth.allow: 172.16.15.*<br>

cluster.readdir-optimize: on<br>

cluster.self-heal-daemon: off<br>

<br>

Some time ago node, glucfshead1 broke down. After some fiddling it was<br>

decided not to deal with that immediately because the gluster was in<br>

production and a rebuild on 3.4 would basically render the gluster unusable.<br>

<br>

Recently it was felt that we needed to deal with the situation and we<br>

hired some experts to deal with the problem. So we reinstalled the<br>

broken node and gave it a new name/ip and upgraded all systems to 3.6.4.<br>

<br>

The plan was to probe the &quot;new&quot; node into the gluster and then do a<br>

brick-replace on it. However that did not go as expected.<br>

<br>

The node that we removed is now listed as &quot;Peer Rejected&quot;:<br>

<br>

root@glucfshead2:~# gluster peer status<br>

Number of Peers: 7<br>

<br>

Hostname: glucfshead1<br>

Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf<br>

State: Peer Rejected (Disconnected)<br>

<br>

Hostname: glucfshead3<br>

Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5<br>

State: Peer in Cluster (Connected)<br>

<br>

Hostname: glucfshead4<br>

Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928<br>

State: Peer in Cluster (Connected)<br>

<br>

Hostname: glucfshead5<br>

Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2<br>

State: Peer in Cluster (Connected)<br>

<br>

Hostname: glucfshead6<br>

Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72<br>

State: Peer in Cluster (Connected)<br>

<br>

Hostname: glucfshead7<br>

Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63<br>

State: Peer in Cluster (Connected)<br>

<br>

Hostname: glucfshead8<br>

Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9<br>

State: Peer in Cluster (Connected)<br>

<br>

If I probe the replacement node (glucfshead9) it only ever shows up on<br>

one of my running nodes and it&#39;s in state &quot;Rejected Peer (Connected)&quot;.<br>

<br>

How can we fix this - preferably without losing data?<br>

<br>

TIA,<br>

Thomas<br>

<br>

---<br>

Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.<br>

<a href="https://www.avast.com/antivirus" rel="noreferrer" target="_blank">https://www.avast.com/antivirus</a><br>

<br>

<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>