[Gluster-users] NFS not start on localhost

Thu Oct 23 14:07:17 UTC 2014

The only way I can manage to hit this issue too, is by mounting an
NFS-export on the Gluster server that starts the Gluster/NFS process.
There is not crash happening on my side, Gluster/NFS just fails to
start.

Steps to reproduce:
1. mount -t nfs nas.example.net:/export /mnt
2. systemctl start glusterd

After this, the error about being unable to register NLM4 is in
/var/log/glusterfs/nfs.log.

This is expected, because the Linux kernel NFS-server requires an NLM
service in portmap/rpcbind (nlockmgr). You can verify what process
occupies the service slot in rpcbind like this:

1. list the rpc-programs and their port numbers

    # rpcinfo -p

2. check the process that listens on the TCP-port for nlockmgr (port
   32770 was returned by the command from point 1)

    # netstat -nlpt | grep -w 32770

If the right column in the output lists 'glusterfs', then the
Gluster/NFS process could register successfully and is handling the NLM4
calls. However, if the right columnt contains a single '-', the Linux
kernel module 'lockd' is handling the NLM4 calls. Gluster/NFS can not
work together with the Linux kernel NFS-client (mountpoint) or the Linux
kernel NFS-server.

Does this help? If something is unclear, post the output  if the above
commands and tell us what further details you want to see clarified.

Cheers,
Niels

On Mon, Oct 20, 2014 at 12:53:46PM +0200, Demeter Tibor wrote:
> 
> Hi,
> 
> Thank you for you reply.
> 
> I did your recommendations, but there are no changes.
> 
> In the nfs.log there are no new things.
> 
> 
> [root at node0 glusterfs]# reboot
> Connection to 172.16.0.10 closed by remote host.
> Connection to 172.16.0.10 closed.
> [tdemeter at sirius-31 ~]$ ssh root at 172.16.0.10
> root at 172.16.0.10's password: 
> Last login: Mon Oct 20 11:02:13 2014 from 192.168.133.106
> [root at node0 ~]# systemctl status nfs.target 
> nfs.target - Network File System Server
>    Loaded: loaded (/usr/lib/systemd/system/nfs.target; disabled)
>    Active: inactive (dead)
> 
> [root at node0 ~]# gluster volume status engine
> Status of volume: engine
> Gluster process						Port	Online	Pid
> ------------------------------------------------------------------------------
> Brick gs00.itsmart.cloud:/gluster/engine0		50160	Y	3271
> Brick gs01.itsmart.cloud:/gluster/engine1		50160	Y	595
> NFS Server on localhost					N/A	N	N/A
> Self-heal Daemon on localhost				N/A	Y	3286
> NFS Server on gs01.itsmart.cloud			2049	Y	6951
> Self-heal Daemon on gs01.itsmart.cloud			N/A	Y	6958
>  
> Task Status of Volume engine
> ------------------------------------------------------------------------------
> There are no active volume tasks
>  
> [root at node0 ~]# systemctl status 
> Display all 262 possibilities? (y or n)
> [root at node0 ~]# systemctl status nfs-lock
> nfs-lock.service - NFS file locking service.
>    Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; enabled)
>    Active: inactive (dead)
> 
> [root at node0 ~]# systemctl stop nfs-lock
> [root at node0 ~]# systemctl restart gluster
> glusterd.service    glusterfsd.service  gluster.mount       
> [root at node0 ~]# systemctl restart gluster
> glusterd.service    glusterfsd.service  gluster.mount       
> [root at node0 ~]# systemctl restart glusterfsd.service 
> [root at node0 ~]# systemctl restart glusterd.service 
> [root at node0 ~]# gluster volume status engine
> Status of volume: engine
> Gluster process						Port	Online	Pid
> ------------------------------------------------------------------------------
> Brick gs00.itsmart.cloud:/gluster/engine0		50160	Y	5140
> Brick gs01.itsmart.cloud:/gluster/engine1		50160	Y	2037
> NFS Server on localhost					N/A	N	N/A
> Self-heal Daemon on localhost				N/A	N	N/A
> NFS Server on gs01.itsmart.cloud			2049	Y	6951
> Self-heal Daemon on gs01.itsmart.cloud			N/A	Y	6958
>  
> 
> Any other idea?
> 
> Tibor
> 
> 
> 
> 
> 
> 
> 
> 
> ----- Eredeti üzenet -----
> > On Mon, Oct 20, 2014 at 09:04:2.8AM +0200, Demeter Tibor wrote:
> > > Hi,
> > > 
> > > This is the full nfs.log after delete & reboot.
> > > It is refers to portmap registering problem.
> > > 
> > > [root at node0 glusterfs]# cat nfs.log
> > > [2014-10-20 06:48:43.221136] I [glusterfsd.c:1959:main]
> > > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2
> > > (/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
> > > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
> > > /var/run/567e0bba7ad7102eae3049e2ad6c3ed7.socket)
> > > [2014-10-20 06:48:43.224444] I [socket.c:3561:socket_init]
> > > 0-socket.glusterfsd: SSL support is NOT enabled
> > > [2014-10-20 06:48:43.224475] I [socket.c:3576:socket_init]
> > > 0-socket.glusterfsd: using system polling thread
> > > [2014-10-20 06:48:43.224654] I [socket.c:3561:socket_init] 0-glusterfs: SSL
> > > support is NOT enabled
> > > [2014-10-20 06:48:43.224667] I [socket.c:3576:socket_init] 0-glusterfs:
> > > using system polling thread
> > > [2014-10-20 06:48:43.235876] I
> > > [rpcsvc.c:2127:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
> > > rpc.outstanding-rpc-limit with value 16
> > > [2014-10-20 06:48:43.254087] I [socket.c:3561:socket_init]
> > > 0-socket.nfs-server: SSL support is NOT enabled
> > > [2014-10-20 06:48:43.254116] I [socket.c:3576:socket_init]
> > > 0-socket.nfs-server: using system polling thread
> > > [2014-10-20 06:48:43.255241] I [socket.c:3561:socket_init]
> > > 0-socket.nfs-server: SSL support is NOT enabled
> > > [2014-10-20 06:48:43.255264] I [socket.c:3576:socket_init]
> > > 0-socket.nfs-server: using system polling thread
> > > [2014-10-20 06:48:43.257279] I [socket.c:3561:socket_init]
> > > 0-socket.nfs-server: SSL support is NOT enabled
> > > [2014-10-20 06:48:43.257315] I [socket.c:3576:socket_init]
> > > 0-socket.nfs-server: using system polling thread
> > > [2014-10-20 06:48:43.258135] I [socket.c:3561:socket_init] 0-socket.NLM:
> > > SSL support is NOT enabled
> > > [2014-10-20 06:48:43.258157] I [socket.c:3576:socket_init] 0-socket.NLM:
> > > using system polling thread
> > > [2014-10-20 06:48:43.293724] E
> > > [rpcsvc.c:1314:rpcsvc_program_register_portmap] 0-rpc-service: Could not
> > > register with portmap
> > > [2014-10-20 06:48:43.293760] E [nfs.c:332:nfs_init_versions] 0-nfs: Program
> > > NLM4 registration failed
> > 
> > The above line suggests that there already is a service registered at
> > portmapper for the NLM4 program/service. This happens when the kernel
> > module 'lockd' is loaded. The kernel NFS-client and NFS-server depend on
> > this, but unfortunately it conflicts with the Gluster/nfs server.
> > 
> > Could you verify that the module is loaded?
> >  - use 'lsmod | grep lockd' to check the modules
> >  - use 'rpcinfo | grep nlockmgr' to check the rpcbind registrations
> > 
> > Make sure that you do not mount any NFS exports on the Gluster server.
> > Unmount all NFS mounts.
> > 
> > You mentioned you are running CentOS-7, which is systemd based. You
> > should be able to stop any conflicting NFS services like this:
> > 
> >  # systemctl stop nfs-lock.service
> >  # systemctl stop nfs.target
> >  # systemctl disable nfs.target
> > 
> > If all these services cleanup themselves, you should be able to start
> > the Gluster/nfs service:
> > 
> >   # systemctl restart glusterd.service
> > 
> > In case some bits are still lingering around, it might be easier to
> > reboot after disabling the 'nfs.target'.
> > 
> > > [2014-10-20 06:48:43.293771] E [nfs.c:1312:init] 0-nfs: Failed to
> > > initialize protocols
> > > [2014-10-20 06:48:43.293777] E [xlator.c:403:xlator_init] 0-nfs-server:
> > > Initialization of volume 'nfs-server' failed, review your volfile again
> > > [2014-10-20 06:48:43.293783] E [graph.c:307:glusterfs_graph_init]
> > > 0-nfs-server: initializing translator failed
> > > [2014-10-20 06:48:43.293789] E [graph.c:502:glusterfs_graph_activate]
> > > 0-graph: init failed
> > > pending frames:
> > > frame : type(0) op(0)
> > > 
> > > patchset: git://git.gluster.com/glusterfs.git
> > > signal received: 11
> > > time of crash: 2014-10-20 06:48:43configuration details:
> > > argp 1
> > > backtrace 1
> > > dlfcn 1
> > > fdatasync 1
> > > libpthread 1
> > > llistxattr 1
> > > setfsid 1
> > > spinlock 1
> > > epoll.h 1
> > > xattr.h 1
> > > st_atim.tv_nsec 1
> > > package-string: glusterfs 3.5.2
> > > [root at node0 glusterfs]# systemctl status portma
> > > portma.service
> > >    Loaded: not-found (Reason: No such file or directory)
> > >    Active: inactive (dead)
> > > 
> > > 
> > > 
> > > Also I have checked the rpcbind service.
> > > 
> > > [root at node0 glusterfs]# systemctl status rpcbind.service
> > > rpcbind.service - RPC bind service
> > >    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled)
> > >    Active: active (running) since h 2014-10-20 08:48:39 CEST; 2min 52s ago
> > >   Process: 1940 ExecStart=/sbin/rpcbind -w ${RPCBIND_ARGS} (code=exited,
> > >   status=0/SUCCESS)
> > >  Main PID: 1946 (rpcbind)
> > >    CGroup: /system.slice/rpcbind.service
> > >            └─1946 /sbin/rpcbind -w
> > > 
> > > okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Starting RPC bind
> > > service...
> > > okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Started RPC bind service.
> > > 
> > > The restart does not solve this problem.
> > > 
> > > 
> > > I think this is the problem. Why are "exited" the portmap status?
> > 
> > The 'portmap' service has been replaced with 'rpcbind' since RHEL-6.
> > They have the same functionality, 'rpcbind' just happens to be the newer
> > version.
> > 
> > Did you file a bug for this already? As Vijay mentions, this crash seems
> > to happen because the Gluster/nfs service fails to initialize correctly
> > and then fails to cleanup correctly. The cleanup should get fixed, and
> > we should also give an easier to understand error message.
> > 
> > Thanks,
> > Niels
> > 
> > > 
> > > 
> > > On node1 is ok:
> > > 
> > > [root at node1 ~]# systemctl status rpcbind.service
> > > rpcbind.service - RPC bind service
> > >    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled)
> > >    Active: active (running) since p 2014-10-17 19:15:21 CEST; 2 days ago
> > >  Main PID: 1963 (rpcbind)
> > >    CGroup: /system.slice/rpcbind.service
> > >            └─1963 /sbin/rpcbind -w
> > > 
> > > okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Starting RPC bind
> > > service...
> > > okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Started RPC bind service.
> > > 
> > > 
> > > 
> > > Thanks in advance
> > > 
> > > Tibor
> > > 
> > > 
> > > 
> > > ----- Eredeti üzenet -----
> > > > On 10/19/2014 06:56 PM, Niels de Vos wrote:
> > > > > On Sat, Oct 18, 2014 at 01:24:12PM +0200, Demeter Tibor wrote:
> > > > >> Hi,
> > > > >>
> > > > >> [root at node0 ~]# tail -n 20 /var/log/glusterfs/nfs.log
> > > > >> [2014-10-18 07:41:06.136035] E [graph.c:307:glusterfs_graph_init]
> > > > >> 0-nfs-server: initializing translator failed
> > > > >> [2014-10-18 07:41:06.136040] E [graph.c:502:glusterfs_graph_activate]
> > > > >> 0-graph: init failed
> > > > >> pending frames:
> > > > >> frame : type(0) op(0)
> > > > >>
> > > > >> patchset: git://git.gluster.com/glusterfs.git
> > > > >> signal received: 11
> > > > >> time of crash: 2014-10-18 07:41:06configuration details:
> > > > >> argp 1
> > > > >> backtrace 1
> > > > >> dlfcn 1
> > > > >> fdatasync 1
> > > > >> libpthread 1
> > > > >> llistxattr 1
> > > > >> setfsid 1
> > > > >> spinlock 1
> > > > >> epoll.h 1
> > > > >> xattr.h 1
> > > > >> st_atim.tv_nsec 1
> > > > >> package-string: glusterfs 3.5.2
> > > > >
> > > > > This definitely is a gluster/nfs issue. For whatever reasone, the
> > > > > gluster/nfs server crashes :-/ The log does not show enough details,
> > > > > some more lines before this are needed.
> > > > >
> > > > 
> > > > I wonder if the crash is due to a cleanup after the translator
> > > > initialization failure. The complete logs might help in understanding
> > > > why the initialization failed.
> > > > 
> > > > -Vijay
> > > > 
> > > > 
> >