<div dir="ltr">I missed having passwordless SSH auth for the root user. However it did not make a difference:<div><br></div><div>After verifying prerequisites, issued gluster nfs-ganesha enable on node cobalt:<br><div><br></div><div><div>Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...</div><div>Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.</div><div>Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.</div><div>Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.</div><div>Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name Lookups.</div><div>Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...</div><div>Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.</div><div>Sep 22 10:19:56 cobalt systemd: Started RPC bind service.</div><div>Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3 locking....</div><div>Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting</div><div>Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC</div><div>Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3 locking..</div><div>Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...</div><div>Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.</div><div>Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit capabilities (legacy support in use)</div><div>Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from <a href="http://cobalt.int.rdmedia.com" target="_blank">cobalt.int.rdmedia.com</a> while not monitoring any hosts</div><div>Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha</div><div>Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the following cobalt iron</div><div>Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster Manager.</div><div>Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.</div><div>Sep 22 10:19:57 cobalt systemd: Reloading.</div><div>Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket'</div><div>Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket'</div><div>Sep 22 10:19:57 cobalt systemd: Reloading.</div><div>Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket'</div><div>Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket'</div><div>Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...</div><div>Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.</div><div>Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow</div><div>Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport (UDP/IP Unicast).</div><div>Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface [10.100.30.37] is now up.</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync configuration map access [0]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cmap</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync configuration service [1]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cfg</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cpg</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync profile loading service [4]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider corosync_votequorum</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: votequorum</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: quorum</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member {10.100.30.37}</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member {10.100.30.38}</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (<a href="http://10.100.30.37:140" target="_blank">10.100.30.37:140</a>) was formed. Members joined: 1</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (<a href="http://10.100.30.37:148" target="_blank">10.100.30.37:148</a>) was formed. Members joined: 1</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:</div><div>Sep 22 10:19:58 cobalt corosync[2816]: [MAIN ] Completed service synchronization, ready to provide service.</div><div><b>Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out. Terminating.</b></div><div><b>Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine (corosync):</b></div><div><b>Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.</b></div><div><b>Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed state.</b></div><div>Sep 22 10:21:32 cobalt logger: warning: pcs property set no-quorum-policy=ignore failed</div><div>Sep 22 10:21:32 cobalt logger: warning: pcs property set stonith-enabled=false failed</div><div>Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed</div><div>Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone failed</div><div>Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon ganesha_mon --clone failed</div><div>Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace ganesha_grace --clone failed</div><div>Sep 22 10:21:34 cobalt logger: warning pcs resource create cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32 op monitor interval=15s failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs resource create cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs constraint order cobalt-trigger_ip-1 then nfs-grace-clone failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs constraint order nfs-grace-clone then cobalt-cluster_ip-1 failed</div><div>Sep 22 10:21:34 cobalt logger: warning pcs resource create iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op monitor interval=15s failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs resource create iron-trigger_ip-1 ocf:heartbeat:Dummy failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add iron-cluster_ip-1 with iron-trigger_ip-1 failed</div><div>Sep 22 10:21:34 cobalt logger: warning: pcs constraint order iron-trigger_ip-1 then nfs-grace-clone failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint order nfs-grace-clone then iron-cluster_ip-1 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers iron=1000 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers cobalt=2000 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers cobalt=1000 failed</div><div>Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers iron=2000 failed</div><div>Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push /tmp/tmp.yqLT4m75WG failed</div></div><div><br></div><div>Notice the failed corosync service in bold. I can't find any logs pointing to a reason. Starting it manually is not a problem:</div><div><br></div><div><div>Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine (corosync): [ OK ]</div></div><div><br></div><div>Then I noticed pacemaker was not running on both nodes. Started it manually and saw the following in /var/log/messages on the other node:<br></div><div><br></div><div><div>Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe</div><div>Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]</div><div>Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore</div><div>Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since no STONITH resources have been defined</div><div>Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable STONITH with the stonith-enabled option</div><div>Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity</div><div>Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations until there are resources to manage</div><div>Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!</div><div>Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes until quorum is attained (or no-quorum-policy is set to ignore)</div><div>Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-20.bz2</div><div>Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.</div><div>Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete</div><div>Sep 22 10:36:44 iron crmd[4617]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]</div><div><br></div></div><div>I'm starting to think there is some leftover config somewhere from all these attempts. Is there a way to completely reset all config related to NFS-Ganesha and start over?</div></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 22 September 2015 at 09:04, Soumya Koduri <span dir="ltr"><<a href="mailto:skoduri@redhat.com" target="_blank">skoduri@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Tiemen,<br>
<br>
Have added the steps to configure HA NFS in the below doc. Please verify if you have all the pre-requisites done & steps performed right.<br>
<br>
<a href="https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md" rel="noreferrer" target="_blank">https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md</a><br>
<br>
Thanks,<br>
Soumya<span class=""><br>
<br>
On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
Whoops, replied off-list.<br>
<br>
Additionally I noticed that the generated corosync config is not valid,<br>
as there is no interface section:<br>
<br>
/etc/corosync/corosync.conf<br>
<br>
totem {<br>
version: 2<br>
secauth: off<br>
cluster_name: rd-ganesha-ha<br>
transport: udpu<br>
}<br>
<br>
nodelist {<br></span>
 node {<br>
    ring0_addr: cobalt<br>
    nodeid: 1<br>
    }<br>
 node {<br>
    ring0_addr: iron<br>
    nodeid: 2<br>
    }<span class=""><br>
}<br>
<br>
quorum {<br>
provider: corosync_votequorum<br>
two_node: 1<br>
}<br>
<br>
logging {<br>
to_syslog: yes<br>
}<br>
<br>
<br>
<br>
<br>
---------- Forwarded message ----------<br></span><span class="">
From: *Tiemen Ruiten* <<a href="mailto:t.ruiten@rdmedia.com" target="_blank">t.ruiten@rdmedia.com</a> <mailto:<a href="mailto:t.ruiten@rdmedia.com" target="_blank">t.ruiten@rdmedia.com</a>>><br>
Date: 21 September 2015 at 17:16<br>
Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume<br></span><span class="">
To: Jiffin Tony Thottan <<a href="mailto:jthottan@redhat.com" target="_blank">jthottan@redhat.com</a> <mailto:<a href="mailto:jthottan@redhat.com" target="_blank">jthottan@redhat.com</a>>><br>
<br>
<br>
Could you point me to the latest documentation? I've been struggling to<br>
find something up-to-date. I believe I have all the prerequisites:<br>
<br>
- shared storage volume exists and is mounted<br>
- all nodes in hosts files<br>
- Gluster-NFS disabled<br>
- corosync, pacemaker and nfs-ganesha rpm's installed<br>
<br>
Anything I missed?<br>
<br>
Everything has been installed by RPM so is in the default locations:<br>
/usr/libexec/ganesha/ganesha-ha.sh<br>
/etc/ganesha/ganesha.conf (empty)<br>
/etc/ganesha/ganesha-ha.conf<br>
<br>
After I started the pcsd service manually, nfs-ganesha could be enabled<br>
successfully, but there was no virtual IP present on the interfaces and<br>
looking at the system log, I noticed corosync failed to start:<br>
<br>
- on the host where I issued the gluster nfs-ganesha enable command:<br>
<br>
Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...<br>
Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.<br>
Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from<br>
</span><a href="http://iron.int.rdmedia.com" rel="noreferrer" target="_blank">iron.int.rdmedia.com</a> <<a href="http://iron.int.rdmedia.com" rel="noreferrer" target="_blank">http://iron.int.rdmedia.com</a>> while not monitoring<span class=""><br>
any hosts<br>
Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...<br></span>
Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync Cluster Engine<span class=""><br>
('2.3.4'): started and ready to provide service.<br></span>
Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync built-in<span class=""><br>
features: dbus systemd xmlconf snmp pie relro bindnow<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport<br>
(UDP/IP Unicast).<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing<br>
transmit/receive security (NSS) crypto: none hash: none<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface<br>
[10.100.30.38] is now up.<br></span>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync configuration map access [0]<br></span>
Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cmap<br>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<br>
corosync configuration service [1]<br>
Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cfg<br>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync cluster closed process group service v1.01 [2]<br></span>
Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cpg<br>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync profile loading service [4]<br>
Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider<br>
corosync_votequorum<br>
Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br></span>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync vote quorum service v1.0 [5]<br></span>
Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: votequorum<br>
Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync cluster quorum service v0.1 [3]<br></span>
Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: quorum<span class=""><br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member<br>
{10.100.30.38}<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member<br>
{10.100.30.37}<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership<br></span>
(<a href="http://10.100.30.38:104" rel="noreferrer" target="_blank">10.100.30.38:104</a> <<a href="http://10.100.30.38:104" rel="noreferrer" target="_blank">http://10.100.30.38:104</a>>) was formed. Members joined: 1<span class=""><br>
Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1<br></span>
Sep 21 17:07:20 iron corosync[3427]: [MAIN Â ] Completed service<span class=""><br>
synchronization, ready to provide service.<br>
Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership<br></span>
(<a href="http://10.100.30.37:108" rel="noreferrer" target="_blank">10.100.30.37:108</a> <<a href="http://10.100.30.37:108" rel="noreferrer" target="_blank">http://10.100.30.37:108</a>>) was formed. Members joined: 1<div><div class="h5"><br>
Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine<br>
(corosync): [FAILED]<br>
Sep 21 17:08:21 iron systemd: corosync.service: control process exited,<br>
code=exited status=1<br>
Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.<br>
Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state.<br>
<br>
<br>
- on the other host:<br>
<br>
Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration...<br>
Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.<br>
Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.<br>
Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.<br>
Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name<br>
Lookups.<br>
Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...<br>
Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.<br>
Sep 21 17:07:19 cobalt systemd: Started RPC bind service.<br>
Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3<br>
locking....<br>
Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting<br>
Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC<br>
Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3<br>
locking..<br>
Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...<br>
Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.<br>
Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit<br>
capabilities (legacy support in use)<br>
Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha<br>
Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request<br></div></div>
from <a href="http://cobalt.int.rdmedia.com" rel="noreferrer" target="_blank">cobalt.int.rdmedia.com</a> <<a href="http://cobalt.int.rdmedia.com" rel="noreferrer" target="_blank">http://cobalt.int.rdmedia.com</a>> while not<span class=""><br>
monitoring any hosts<br>
Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the<br>
following cobalt iron<br>
Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability<br>
Cluster Manager.<br>
Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.<br>
Sep 21 17:07:20 cobalt systemd: Reloading.<br>
Sep 21 17:07:20 cobalt systemd:<br>
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue<br>
'RemoveOnStop' in section 'Socket'<br>
Sep 21 17:07:20 cobalt systemd:<br>
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue<br>
'RemoveOnStop' in section 'Socket'<br>
Sep 21 17:07:20 cobalt systemd: Reloading.<br>
Sep 21 17:07:20 cobalt systemd:<br>
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue<br>
'RemoveOnStop' in section 'Socket'<br>
Sep 21 17:07:20 cobalt systemd:<br>
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue<br>
'RemoveOnStop' in section 'Socket'<br>
Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...<br></span>
Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync Cluster Engine<span class=""><br>
('2.3.4'): started and ready to provide service.<br></span>
Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync built-in<span class=""><br>
features: dbus systemd xmlconf snmp pie relro bindnow<br>
Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport<br>
(UDP/IP Unicast).<br>
Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing<br>
transmit/receive security (NSS) crypto: none hash: none<br>
Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface<br>
[10.100.30.37] is now up.<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync configuration map access [0]<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cmap<br>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<br>
corosync configuration service [1]<br>
Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cfg<br>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync cluster closed process group service v1.01 [2]<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cpg<br>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync profile loading service [4]<br>
Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider<br>
corosync_votequorum<br>
Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync vote quorum service v1.0 [5]<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: votequorum<br>
Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:<span class=""><br>
corosync cluster quorum service v0.1 [3]<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: quorum<span class=""><br>
Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member<br>
{10.100.30.37}<br>
Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member<br>
{10.100.30.38}<br>
Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership<br></span>
(<a href="http://10.100.30.37:100" rel="noreferrer" target="_blank">10.100.30.37:100</a> <<a href="http://10.100.30.37:100" rel="noreferrer" target="_blank">http://10.100.30.37:100</a>>) was formed. Members joined: 1<span class=""><br>
Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service<span class=""><br>
synchronization, ready to provide service.<br>
Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership<br></span>
(<a href="http://10.100.30.37:108" rel="noreferrer" target="_blank">10.100.30.37:108</a> <<a href="http://10.100.30.37:108" rel="noreferrer" target="_blank">http://10.100.30.37:108</a>>) was formed. Members joined: 1<span class=""><br>
Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster<br>
members. Current votes: 1 expected_votes: 2<br>
Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1<br></span>
Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service<div><div class="h5"><br>
synchronization, ready to provide service.<br>
Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.<br>
Terminating.<br>
Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine<br>
(corosync):<br>
Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine.<br>
Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed state.<br>
Sep 21 17:08:55 cobalt logger: warning: pcs property set<br>
no-quorum-policy=ignore failed<br>
Sep 21 17:08:55 cobalt logger: warning: pcs property set<br>
stonith-enabled=false failed<br>
Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start<br>
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed<br>
Sep 21 17:08:56 cobalt logger: warning: pcs resource delete<br>
nfs_start-clone failed<br>
Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon<br>
ganesha_mon --clone failed<br>
Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace<br>
ganesha_grace --clone failed<br>
Sep 21 17:08:57 cobalt logger: warning pcs resource create<br>
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor<br>
interval=15s failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs resource create<br>
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add<br>
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs constraint order<br>
cobalt-trigger_ip-1 then nfs-grace-clone failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs constraint order<br>
nfs-grace-clone then cobalt-cluster_ip-1 failed<br>
Sep 21 17:08:57 cobalt logger: warning pcs resource create<br>
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor<br>
interval=15s failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs resource create<br>
iron-trigger_ip-1 ocf:heartbeat:Dummy failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add<br>
iron-cluster_ip-1 with iron-trigger_ip-1 failed<br>
Sep 21 17:08:57 cobalt logger: warning: pcs constraint order<br>
iron-trigger_ip-1 then nfs-grace-clone failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint order<br>
nfs-grace-clone then iron-cluster_ip-1 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
cobalt-cluster_ip-1 prefers iron=1000 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
cobalt-cluster_ip-1 prefers cobalt=2000 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
iron-cluster_ip-1 prefers cobalt=1000 failed<br>
Sep 21 17:08:58 cobalt logger: warning: pcs constraint location<br>
iron-cluster_ip-1 prefers iron=2000 failed<br>
Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push<br>
/tmp/tmp.nXTfyA1GMR failed<br>
Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt failed<br>
<br>
BTW, I'm using CentOS 7. There are multiple network interfaces on the<br></div></div>
servers, could that be a problem?Â<span class=""><br>
<br>
<br>
<br>
<br>
On 21 September 2015 at 11:48, Jiffin Tony Thottan <<a href="mailto:jthottan@redhat.com" target="_blank">jthottan@redhat.com</a><br></span><div><div class="h5">
<mailto:<a href="mailto:jthottan@redhat.com" target="_blank">jthottan@redhat.com</a>>> wrote:<br>
<br>
<br>
<br>
On 21/09/15 13:56, Tiemen Ruiten wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello Soumya, Kaleb, list,<br>
<br>
This Friday I created the gluster_shared_storage volume manually,<br>
I just tried it with the command you supplied, but both have the<br>
same result:<br>
<br>
from etc-glusterfs-glusterd.vol.log on the node where I issued the<br>
command:<br>
<br>
[2015-09-21 07:59:47.756845] I [MSGID: 106474]<br>
[glusterd-ganesha.c:403:check_host_list] 0-management: ganesha<br>
host found Hostname is cobalt<br>
[2015-09-21 07:59:48.071755] I [MSGID: 106474]<br>
[glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha<br>
host found Hostname is cobalt<br>
[2015-09-21 07:59:48.653879] E [MSGID: 106470]<br>
[glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:<br>
Initial NFS-Ganesha set up failed<br>
</blockquote>
<br>
As far as what I understand from the logs, it called<br>
setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.<br>
Can u please provide following details :<br>
-Location of ganesha.sh file??<br>
-Location of ganesha-ha.conf, ganesha.conf files ?<br>
<br>
<br>
And also can u cross check whether all the prerequisites before HA<br>
setup satisfied ?<br>
<br>
--<br>
With Regards,<br>
Jiffin<br>
<br>
<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
[2015-09-21 07:59:48.653912] E [MSGID: 106123]<br>
[glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit<br>
of operation 'Volume (null)' failed on localhost : Failed to set<br>
up HA config for NFS-Ganesha. Please check the log file for details<br>
[2015-09-21 07:59:45.402458] I [MSGID: 106006]<br>
[glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]<br>
0-management: nfs has disconnected from glusterd.<br>
[2015-09-21 07:59:48.071578] I [MSGID: 106474]<br>
[glusterd-ganesha.c:403:check_host_list] 0-management: ganesha<br>
host found Hostname is cobalt<br>
<br>
from etc-glusterfs-glusterd.vol.log on the other node:<br>
<br>
[2015-09-21 08:12:50.111877] E [MSGID: 106062]<br>
[glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable<br>
to acquire volname<br>
[2015-09-21 08:14:50.548087] E [MSGID: 106062]<br>
[glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable<br>
to acquire volname<br>
[2015-09-21 08:14:50.654746] I [MSGID: 106132]<br>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs<br>
already stopped<br>
[2015-09-21 08:14:50.655095] I [MSGID: 106474]<br>
[glusterd-ganesha.c:403:check_host_list] 0-management: ganesha<br>
host found Hostname is cobalt<br>
[2015-09-21 08:14:51.287156] E [MSGID: 106062]<br>
[glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable<br>
to acquire volname<br>
<br>
<br>
from etc-glusterfs-glusterd.vol.log on the arbiter node:<br>
<br>
[2015-09-21 08:18:50.934713] E [MSGID: 101075]<br>
[common-utils.c:3127:gf_is_local_addr] 0-management: error in<br>
getaddrinfo: Name or service not known<br>
[2015-09-21 08:18:51.504694] E [MSGID: 106062]<br>
[glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable<br>
to acquire volname<br>
<br>
I have put the hostnames of all servers in my /etc/hosts file,<br>
including the arbiter node.<br>
<br>
<br>
On 18 September 2015 at 16:52, Soumya Koduri <<a href="mailto:skoduri@redhat.com" target="_blank">skoduri@redhat.com</a><br></div></div><div><div class="h5">
<mailto:<a href="mailto:skoduri@redhat.com" target="_blank">skoduri@redhat.com</a>>> wrote:<br>
<br>
Hi Tiemen,<br>
<br>
One of the pre-requisites before setting up nfs-ganesha HA is<br>
to create and mount shared_storage volume. Use below CLI for that<br>
<br>
"gluster volume set all cluster.enable-shared-storage enable"<br>
<br>
It shall create the volume and mount in all the nodes<br>
(including the arbiter node). Note this volume shall be<br>
mounted on all the nodes of the gluster storage pool (though<br>
in this case it may not be part of nfs-ganesha cluster).<br>
<br>
So instead of manually creating those directory paths, please<br>
use above CLI and try re-configuring the setup.<br>
<br>
Thanks,<br>
Soumya<br>
<br>
On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:<br>
<br>
Hello Kaleb,<br>
<br>
I don't:<br>
<br>
# Name of the HA cluster created.<br>
# must be unique within the subnet<br>
HA_NAME="rd-ganesha-ha"<br>
#<br>
# The gluster server from which to mount the shared data<br>
volume.<br>
HA_VOL_SERVER="iron"<br>
#<br>
# N.B. you may use short names or long names; you may not<br>
use IP addrs.<br>
# Once you select one, stay with it as it will be mildly<br>
unpleasant to<br>
# clean up if you switch later on. Ensure that all names -<br>
short and/or<br>
# long - are in DNS or /etc/hosts on all machines in the<br>
cluster.<br>
#<br>
# The subset of nodes of the Gluster Trusted Pool that<br>
form the ganesha<br>
# HA cluster. Hostname is specified.<br>
HA_CLUSTER_NODES="cobalt,iron"<br>
#HA_CLUSTER_NODES="<a href="http://server1.lab.redhat.com" rel="noreferrer" target="_blank">server1.lab.redhat.com</a><br>
<<a href="http://server1.lab.redhat.com" rel="noreferrer" target="_blank">http://server1.lab.redhat.com</a>><br>
<<a href="http://server1.lab.redhat.com" rel="noreferrer" target="_blank">http://server1.lab.redhat.com</a>>,<a href="http://server2.lab.redhat.com" rel="noreferrer" target="_blank">server2.lab.redhat.com</a><br>
<<a href="http://server2.lab.redhat.com" rel="noreferrer" target="_blank">http://server2.lab.redhat.com</a>><br>
<<a href="http://server2.lab.redhat.com" rel="noreferrer" target="_blank">http://server2.lab.redhat.com</a>>,..."<br>
#<br>
# Virtual IPs for each of the nodes specified above.<br>
VIP_server1="10.100.30.101"<br>
VIP_server2="10.100.30.102"<br>
#VIP_server1_lab_redhat_com="10.0.2.1"<br>
#VIP_server2_lab_redhat_com="10.0.2.2"<br>
<br>
hosts cobalt & iron are the data nodes, the arbiter<br>
ip/hostname (neon)<br>
isn't mentioned anywhere in this config file.<br>
<br>
<br>
On 18 September 2015 at 15:56, Kaleb S. KEITHLEY<br></div></div>
<<mailto:<a href="mailto:kkeithle@redhat.com" target="_blank">kkeithle@redhat.com</a>><a href="mailto:kkeithle@redhat.com" target="_blank">kkeithle@redhat.com</a><br>
<mailto:<a href="mailto:kkeithle@redhat.com" target="_blank">kkeithle@redhat.com</a>><br>
<mailto:<a href="mailto:kkeithle@redhat.com" target="_blank">kkeithle@redhat.com</a> <mailto:<a href="mailto:kkeithle@redhat.com" target="_blank">kkeithle@redhat.com</a>>>><br>
wrote:<br>
<br>
  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:<br>
  > Hello,<br>
  ><br>
  > I have a Gluster cluster with a single replica 3,<br>
arbiter 1 volume (so<br>
  > two nodes with actual data, one arbiter node). I<br>
would like to setup<br>
  > NFS-Ganesha HA for this volume but I'm having some<br>
difficulties.<br>
  ><br>
  > - I needed to create a directory<br>
/var/run/gluster/shared_storage<br>
  > manually on all nodes, or the command 'gluster<br>
nfs-ganesha enable would<br>
  > fail with the following error:<br>
  > [2015-09-18 13:13:34.690416] E [MSGID: 106032]<br>
  > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name:<br>
mkdir() failed on path<br>
  > /var/run/gluster/shared_storage/nfs-ganesha, [No<br>
such file or directory]<br>
  ><br>
  > - Then I found out that the command connects to<br>
the arbiter node as<br>
  > well, but obviously I don't want to set up<br>
NFS-Ganesha there. Is it<br>
  > actually possible to setup NFS-Ganesha HA with an<span class=""><br>
arbiter node? If it's<br></span>
  > possible, is there any documentation on how to do<br>
that?<br>
  ><br>
<br>
  Please send the /etc/ganesha/ganesha-ha.conf file<br>
you're using.<br>
<br>
  Probably you have included the arbiter in your HA<br>
config; that would be<br>
  a mistake.<br>
<br>
  --<br>
<br>
  Kaleb<span class=""><br>
<br>
<br>
<br>
<br>
--<br>
Tiemen Ruiten<br>
Systems Engineer<br>
R&D Media<br>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br></span>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>><span class=""><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
<br>
<br>
<br>
--<br>
Tiemen Ruiten<br>
Systems Engineer<br>
R&D Media<br>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br></span>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>><span class=""><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
<br>
<br>
<br>
--<br>
Tiemen Ruiten<br>
Systems Engineer<br>
R&D Media<br>
<br>
<br>
<br>
--<br>
Tiemen Ruiten<br>
Systems Engineer<br>
R&D Media<br>
<br>
<br></span><span class="">
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
</span></blockquote>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Tiemen Ruiten<br>Systems Engineer<br>R&D Media<br></div></div>
</div>