sanlock messages ==================================================== sanlock messages/logging can be configured in various ways. The following is based on sanlock 3.5.0. 2017-06-02 default message prefixes ------------------------ "2017-04-28 10:34:52 1491929 [23442]:" . 2017-04-28 10:34:52 date from gettimeofday(2), printed with localtime_r(3), or gmtime_r(3). (gmtime with config setting logfile_use_utc) . 1491929 monotonic time in seconds from clock_gettime(2) with CLOCK_MONOTONIC. . [23442] thread id, see gettid(2). In /var/log/sanlock.log, the prefix above is all that appears. In /var/log/messages, syslog typically adds its own prefix ahead of the prefix above (making for very lengthy prefixes in /var/log/messages.) default message destinations ---------------------------- 1. /var/log/sanlock.log By default, LOG_ERR and LOG_WARNING (log_error/log_warn) messages are saved here. To record all messages in sanlock.log, (including debug), set sanlock.conf logfile_priority = 7. 2. /var/log/messages By default, LOG_ERR (log_error) messages are set to syslog. To record both error and warning messages in /var/log/messages, set sanlock.conf syslog_priority = 4. To record all messages (including debug), set syslog_priority = 7. 3. sanlock client log_dump The daemon's 1MB internal circular message buffer includes all messages. To print all messages since the daemon was started (or since the circular buffer rolled over), run: "sanlock client log_dump" (and usually redirect output to a file.) As soon as possible after a problem occurs, it's recommended to save a copy of this debug data to a file for later inspection. If too much time passes, the daemon will be restarted or the important data will be pushed out of the buffer. If it's not possible to get debug data from the daemon after a failure (e.g. the host is always reset or sanlock is restarted), then saving all the debug data to sanlock.log (logfile_priority = 7) is an alternative. Standard informational messages ============================================================================= The following messages for normal events are recorded by default in sanlock.log. daemon started -------------- "sanlock daemon started 3.5.0 host example" or "sanlock daemon started 3.5.0 host 638bd439-538c-47d5-826c-3b094075c598.example" Every time the sanlock daemon is started, this is saved in sanlock.log. The monotonic time in the prefix can be a useful reference, and the name of the host recorded here is a key piece of information when debugging. If the name of the host is not statically defined in sanlock.conf (our_host_name), then sanlock generates a new host name every time the daemon is started. The new host name must be unique among all hosts, so sanlock uses a random UUID (for uniqueness) and appends several characters from the uname nodename (for help debugging). lockspace add ------------- "s1 lockspace lockspacename:2:/path/to/file_or_device:0" When an application joins a lockspace, the lockspace definition is saved in sanlock.log. ::: name of lockspace local host identifier in lockspace path to storage reserved for leases offset on path (bytes) The "s1" prefix is an abbreviation that sanlock uses to refer to this lockspace in subsequent messages (because a UUID as the lockspacename may be hard to read.) The numbers in s are simply incremented while the daemon continues running, and restart at 1 when the daemon is restarted. resource lease acquire ---------------------- "s1:r1 resource lockspacename:resourcename:/path/to/file_or_device:68157440 for 2,9,6073" When an application acquires a resource lease, the resource definition is saved in sanlock.log. :::[:] name of lockspace name of resource path to storage reserved for leases offset on path (bytes) optional leader version or SH for shared lease The "r1" prefix is an abbreviation for the resource name, like the "s1" prefix above for lockspace names. "for NUM1,NUM2,NUM3" Identifies the source of the lease request: NUM1 is the internal client id, NUM2 is the file descriptor used for the request, NUM3 is the client pid. hosts discovered ---------------- "s1 host 1 11 172064 host-01" "s1 host 2 23 172038 host-02" When new hosts in a lockspace are first detected, their information is recorded in sanlock.log. Upon joining a lockspace, all hosts are new to the joining host, and the joining host will record the details of all current members (including itself). The "s1" prefix above is the lockspace name abbreviation (see above). "host NUM1 NUM2 NUM3 NAME" Uniquely identifies a host at the current point in time. NUM1 is the host_id NUM2 is the host_id generation (incremented each time the host_id is used to join a lockspace) NUM3 is the current renewal timestamp recorded in the host_id's delta lease NAME is the unique host name (see daemon started message above) Unusual events ============================================================================= Messages for these unusual events generally appear in both sanlock.log and /var/log/messages. open error ---------- "open error -1 /path/to/file_or_device" open(2) on the file or LV failed. The error number may be some help in finding the problem. It could be things like permissions, selinux, broken storage setup. Try reading directly to see if the storage works, e.g. "dd if=/path/to/file_or_device of=/dev/null bs=512 count=2048 iflag=direct" io timeout ---------- "15aadd15 aio timeout RD 0x7f70140008c0:0x7f70140008d0:0x7f702e8a1000 ioto 10 to_count 1" An async i/o operation timed out. The read may complete later, in which case a corresponding "aio collect" message will appear with the result. "15aadd15" Thread id. For the most common case of a lockspace thread, these are the first characters of the lockspace name that the thread is dedicated to. For a worker thread, this will be something like "worker0". "aio timeout WR|RD" An aio read or write operation did not complete in the specified timeout period (default 10 seconds). (In previous versions, 1|0 were used in place of WR|RD.) "0x:0x:0x" Internal buffer pointers used for the aio op (aicb, data buffer, iocb). "ioto 10" The io timeout in seconds. "to_count 1" A counter of the number of io timeouts recorded by this thread. io results ---------- "443aa0b3 aio collect WR 0x7f7db80008c0:0x7f7db80008d0:0x7f7daf6fe000 result -28:0 match res" or "742162cd aio collect RD 0x7f266c0008c0:0x7f266c0008d0:0x7f266c101000 result 0:0 match len 512" An async i/o operation completed (in a non-standard way). "443aa0b3" Thread id. (See above) "aio collect WR|RD" This is the result of an aio operation (write|read). (In previous versions, 1|0 were used in place of WR|RD.) "0x:0x:0x" Internal buffer pointers used for the aio op (aicb, data buffer, iocb). "result -28:0 match res ..." The aio result from the kernel io_event.res and io_event.res2. If io_event.res is a negative error number, the aio op failed. sanlock does not try to interpret specific negative numbers (several different error numbers can appear here.) Other possible results following aio collect: "result : match len ..." The aio result from the kernel io_event.res and io_event.res2. If io_event.res is not negative, but does not match the requested length , the aio op failed. "result : old free" The aio result from the kernel is for an old i/o that sanlock has given up on, and this result is just discarded. "result : other free ..." The aio result from the kernel is for another i/o that sanlock is not waiting for, and this result is just discarded. "result : other free ..." The aio result from the kernel is for another i/o that sanlock is not waiting for, and this result is just discarded. "result : match reap" The aio result is late (arrived after the standard timeout), but was successful, and can be used. close_task_aio -------------- "60c7bc7a close_task_aio 0 0x7fa7780008c0 busy" "60c7bc7a close_task_aio destroyed incomplete ops" A thread is exiting (typically a lockspace thread), and freeing memory. Before freeing memory associated with incomplete aio operations, it attempts to collect the results. If that fails, it logs the "busy" message. Eventually, it will give up waiting and print the "destroyed" message. renewal problems ---------------- Every 20 seconds (assuming the default 10 sec io timeout) a host renews the delta lease for it's host_id in each lockspace it has joined. This renewal consists of reading the single sector for the delta lease, verifying the contents, updating the timestamp in that sector to the current time on the host, and writing that sector back to storage. Error/warning messages are printed when the i/o involved in the renewal fails, and when the host has not been able to complete a successful renewal for an extended period. These errors are not necessarily critical on their own. They can happen periodically without any ill effects, but they are more important when they persist long enough to trigger more serious failure. ---- "s2 delta_renew read timeout 10 sec offset 0 /path/to/file_or_device" "s2 renewal error -202 delta_length 10 last_success 1483" This shows that a renewal read timed out after not completing in 10 sec. "offset 0 /path/to/file_or_device" The location of the delta lease on disk. "error -202" The i/o timeout error number. "delta_length 10" Seconds the delta lease renewal operation took before returning. "last_success 1483" The timestamp that was used in the last successful delta lease renewal. ---- "s15 delta_renew read rv -2 offset 0 /path/to/file_or_device" "s15 renewal error -2 delta_length 0 last_success 24559" This shows that a renewal read failed because an error was returned. "offset 0 /path/to/file_or_device" The location of the delta lease on disk. "rv -2" The error number returned from the read. Different kinds of i/o errors from different kinds of storage will result in different error numbers here. "delta_length 0" The time the renewal operation took before returning. In the case of 0, the read failed immediately. "last_success 24559" The timestamp that was used in the last successful delta lease renewal. ---- "s1 delta_renew write time error -" This is printed when the renewal write operation fails. It is less common than a read error above. "time " The number of seconds the write operation took before failing. "error -" The specific error returned from the kernel. ---- "s1 delta_renew long write time sec" This is printed when a successful renewal write operation takes longer than the default i/o timeout (i/o timeouts are not applied to renewal writes, but the value is used as the threshold for emitting a warning.) "time sec" The time taken by the write. ---- "s17 renewed 1232493 delta_length 22 too long" This is printed when a successful lockspace renewal operation (one read and one write) takes longer than 20 seconds (twice the 10 second default i/o timeout). An i/o timeout is not applied to the renewal write, so 20 seconds is used as the threshold for emitting the warning. "delta_length 22" The number of seconds taken by the renewal (both i/o's). "renewed 1232493" The local timestamp that was used in the renewal. renewal failure --------------- When a host does not renew its delta lease for a long enough period of time, warnings start to be emitted in anticipation of a pending expiration. If the host manages to renew the lease at some point in the following sequence, then the escalation and warnings stop. As soon as any of these warnings appear, it would be prudent to collect the output of "sanlock client renewal -s LOCKSPACE" (and probably "sanlock client log_debug"), so that a final expiration can be more thoroughly understood later. s20 check_our_lease warning 60 last_success 1319137 ... s20 check_our_lease warning 79 last_success 1319137 s20 check_our_lease failed 80 The "check_our_lease warning" messages begin when sanlock has not been able to renew a delta lease for 60 seconds (it is supposed to be renewed once every 20 seconds). The message is printed once a second, indicating how many seconds old the last successful renewal is, and the timestamp used in the last successful renewal. This is a critical warning. Unless the renewal i/o succeeds within 20 seconds of the first warning, the "check_our_lease failed 80" message is printed, at which point sanlock enters recovery mode for the lockspace. s20 kill 42444 sig 15 count 1 ... s20 kill 42444 sig 15 count 10 After "check_our_lease failed", kill messages appear, printed once a second, indicating that sanlock has entered recovery mode for the lockspace. In this phase, it attempts to shut down any processes using the lockspace. "kill " is the process being told to quit. "sig " is the signal being sent to the process. sig 100 means that the process has a graceful shutdown mechanism that sanlock is using. sig 15 is SIGTERM, which is used when the process has no graceful shutdown mechanism specified. sig 9 is SIGKILL, which is used if the others do not. count shows the number of attempts that have been made to send the signal. dead 42444 ci 3 count 10 The specified pid that was killed (per messages above) has exited. This is a successful step in lockspace recovery phase. s20 all pids clear All processes that were using the failing lockspace have exited, and the lockspace recovery phase as completed successfully. At this point, the watchdog is disarmed for the lockspace, and the host should avoid being forcibly reset. If this message does not appear, the wdmd warnings below will continue counting down until the host is reset. wdmd warnings ------------- While the check_our_lease warnings appear from sanlock, wdmd also begins printing warnings indicating that the watchdog timeout for the expiring lease is getting close. These messages appear only in /var/log/messages. wdmd[3627]: test warning now 5909660 ping 5909650 close 5210877 renewal 5909586 expire 5909666 client 8476 sanlock_lockspacename wdmd[3627]: /dev/watchdog closed unclean wdmd[3627]: test warning now 5909661 ping 5909650 close 5909660 renewal 5909586 expire 5909666 client 8476 sanlock_lockspacename ... wdmd[3627]: test failed rem 54 now 5909666 ping 5909650 close 5909660 renewal 5909586 expire 5909666 client 8476 sanlock_lockspacename wdmd[3627]: test failed rem 53 now 5909667 ping 5909650 close 5909660 renewal 5909586 expire 5909666 client 8476 sanlock_lockspacename wdmd[3627]: /dev/watchdog reopen "client 8476 sanlock_lockspacename" This is the pid, of sanlock in this case, and a name provided by the process for the wdmd connection. "/dev/watchdog closed unclean" The watchdog device has been closed to generate a potentially final ping in case it needs to fire for a pending expiration. "/dev/watchdog reopen" The watchdog device has been reopened because an expiring connection has been refreshed before the watchdog fired. "test warning" A connection is close to expiring. "test failed" A connection has expired, and the watchdog will no longer be pinged, forcing it to reset the host. If the connection is refreshed within 60 seconds, the reset can still be averted. "rem N" N seconds remain until the watchdog fires. "now N1 ping N2 close N3 renewal N4 expire N5" All monotonic host times in seconds. now is the current time. ping is the time /dev/watchdog was last pinged. close is the time /dev/watchdog was last closed (generates ping). renewal is the last refresh time from the connection owner. expire is the expiration time set by the connection owner. resource lease contention ------------------------- When hosts are simultaneously attempting to acquire a single resource lease, the disk paxos algorithm in sanlock resolves the owner. In cases like these, the hosts log the paxos algorithm details in sanlock.log and syslog. These events are expected when hosts contend for leases. They are logged because they are infrequent and if debugging around one of these events is needed, it's important to have all the details. There are various ways that the algorithm can resolve the owner, and the details may be logged differently, e.g. r1128 ballot 437 choose bk_max[2] lver 437 mbal 3 bal 3 inp 3 1 8692 r1128 ballot 437 commit other owner 3 1 8692 Another host (host_id 3) acquired the lease. ... r1014 ballot 372 abort2 mbal 1 mbal[2] 3 r1014 paxos_acquire 372 retry delay 631839 us r1014 paxos_acquire 372 owner is our inp 1 1 6757188 commited by 3 The local host (host_id 1) acquired the lease. ... r691 paxos_acquire owner 3 1 8359 writer 1 owner dblock free r691 ballot 167 abort2 mbal 1 mbal[2] 3 r691 paxos_acquire 167 retry delay 572312 us r691 paxos_acquire 167 owner is our inp 1 1 6756929 commited by 3 The local host (host_id 1) acquired the lease. ... r654 ballot 145 abort1 mbal 1 mbal[2] 3 r654 paxos_acquire 145 retry delay 13757 us r654 paxos_acquire 145 owner is 3 1 0 Another host (host_id 3) acquired the lease.