This guide should help administrators and consultants tackle issues in Pacemaker cluster installations. We concentrate on troubleshooting and analysis methods with the crmsh history.
Cluster leaves numerous traces behind, more than any other system. The logs and the rest are spread among all cluster nodes and multiple directories. The obvious difficulty is to show that information in a consolidated manner. This is where crmsh history helps.
Hopefully, the guide will help you investigate your specific issue with more efficiency and less effort.
In [Listing 1] a modestly complex sample cluster is shown with which we can experiment and break in some hopefully instructive ways.
|We won’t be going into how to setup nodes or configure the cluster. For that, please refer to the Getting Started document.|
node 167906357: sle12-c node 167906355: sle12-a primitive s-libvirt stonith:external/libvirt \ params hostlist="sle12-a sle12-c" hypervisor_uri="qemu+ssh://hex-10.suse.de/system?keyfile=/root/.ssh/xen" reset_method=reboot \ op monitor interval=5m timeout=60s primitive p_drbd_nfs ocf:linbit:drbd \ params drbd_resource=nfs \ op monitor interval=15 role=Master \ op monitor interval=30 role=Slave \ op start interval=0 timeout=300 \ op stop interval=0 timeout=120 primitive nfs-vg LVM \ params volgrpname=nfs-vg primitive fs1 Filesystem \ params device="/dev/nfs-vg/fs1" directory="/srv/nfs" fstype=ext3 \ op monitor interval=30s primitive virtual-ip IPaddr2 \ params ip=10.2.12.100 primitive nfs-server nfsserver \ params nfs_shared_infodir="/srv/nfs/state" nfs_ip=10.2.12.100 \ op monitor interval=30s primitive websrv apache \ params configfile="/etc/apache2/httpd.conf" \ op monitor interval=30 primitive websrv-ip IPaddr2 \ params ip=10.2.12.101 primitive small-apache apache \ params configfile="/etc/apache2/small.conf" group nfs-disk nfs-vg fs1 group nfs-srv virtual-ip nfs-server ms ms_drbd_nfs p_drbd_nfs \ meta notify=true clone-max=2 location nfs-pref virtual-ip 100: sle12-a location websrv-pref websrv 100: sle12-c colocation vg-with-drbd inf: nfs-vg ms_drbd_nfs:Master colocation c-nfs inf: nfs-srv nfs-disk colocation c-websrv inf: websrv websrv-ip colocation small-apache-with-virtual-ip inf: small-apache virtual-ip # need fs1 for the NFS server order o-nfs inf: nfs-disk nfs-srv # websrv serves requests at IP websrv-ip order o-websrv inf: websrv-ip websrv # small apache serves requests at IP virtual-ip order virtual-ip-before-small-apache inf: virtual-ip small-apache # drbd device is the nfs-vg PV order drbd-before-nfs-vg inf: ms_drbd_nfs:promote nfs-vg:start property cib-bootstrap-options: \ dc-version=1.1.12-ad083a8 \ cluster-infrastructure=corosync \ cluster-name=sle12-test3l-public \ no-quorum-policy=ignore \ last-lrm-refresh=1429192263 op_defaults op-options: \ timeout=120s
If you’re new to clusters, that configuration may look overwhelming. A graphical presentation in [Image 1] of the essential elements and relations between them is easier on the eye (and the mind).
As homework, try to match the two cluster representations.
Quick (& dirty) start
For the impatient, we give here a few examples of history use.
Most of the time you will be dealing with various resource (a.k.a. application) induced phenomena. For instance, while preparing this document we noticed that a probe failed repeatedly on a node which wasn’t even running the resource [Listing 2].
# crm status Last updated: Tue Dec 16 11:57:04 2014 Last change: Tue Dec 16 11:53:22 2014 Stack: corosync Current DC: sle12-c (167906357) - partition with quorum Version: 1.1.12-ad083a8 2 Nodes configured 10 Resources configured Online: [ sle12-a sle12-c ] [...] nfs-server (ocf::heartbeat:nfsserver): Started sle12-a [...] Failed actions: nfs-server_monitor_0 on sle12-c 'unknown error' (1): call=298, status=complete, last-rc-change='Tue Dec 16 11:53:23 2014', queued=0ms, exec=135ms
The history resource command shows log messages relevant to the supplied resource [Listing 3].
# crm history resource nfs-server INFO: fetching new logs, please wait ... Dec 16 11:53:23 sle12-c nfsserver(nfs-server): ERROR: NFS server is up, but the locking daemons are down Dec 16 11:53:23 sle12-c crmd: notice: te_rsc_command: Initiating action 54: stop nfs-server_stop_0 on sle12-a Dec 16 11:53:23 sle12-c crmd: notice: te_rsc_command: Initiating action 3: stop nfs-server_stop_0 on sle12-c (local) Dec 16 11:53:23 sle12-c nfsserver(nfs-server): INFO: Stopping NFS server ... Dec 16 11:53:23 sle12-c nfsserver(nfs-server): INFO: Stopping sm-notify Dec 16 11:53:23 sle12-c nfsserver(nfs-server): INFO: Stopping rpc.statd Dec 16 11:53:23 sle12-c nfsserver(nfs-server): INFO: NFS server stopped Dec 16 11:53:23 sle12-c crmd: notice: te_rsc_command: Initiating action 55: start nfs-server_start_0 on sle12-a Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: Stopping NFS server ... Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: Stopping sm-notify Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: Stopping rpc.statd Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: NFS server stopped Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: Starting NFS server ... Dec 16 11:53:23 sle12-a nfsserver(nfs-server): INFO: Starting rpc.statd. Dec 16 11:53:24 sle12-a nfsserver(nfs-server): INFO: executing sm-notify Dec 16 11:53:24 sle12-a nfsserver(nfs-server): INFO: NFS server started Dec 16 11:53:24 sle12-a lrmd: notice: operation_finished: nfs-server_start_0:23320:stderr [ id: rpcuser: no such user ] Dec 16 11:53:24 sle12-a lrmd: message repeated 3 times: [ notice: operation_finished: nfs-server_start_0:23320:stderr [ id: rpcuser: no such user ]]
|NFS server error message.|
|Warning about a non-existing user id.|
|Messages logged by resource agents are always tagged with
type(ID) (in [Listing 3]: nfsserver(nfs-server)).
Everything dumped to stderr/stdout (in [Listing 3]: id: rpcuser: no such user) is captured and subsequently logged by lrmd. The stdout output is at the info severity which is by default not logged by pacemaker since version 1.1.12.
At the very top we see error message reporting that the NFS server is running, but some other stuff, apparently unexpectedly, is not. However, we know that it cannot be running on the c node as it is already running on the a node. Not being able to figure out what is going on, we had to turn on tracing of the resource agent. [Listing 4] shows how to do that.
# crm resource trace nfs-server monitor 0 INFO: Trace for nfs-server:monitor is written to /var/lib/heartbeat/trace_ra/ INFO: Trace set, restart nfs-server to trace non-monitor operations # crm resource cleanup nfs-server Cleaning up nfs-server on sle12-a Cleaning up nfs-server on sle12-c Waiting for 2 replies from the CRMd.. OK
Trace of the nfsserver RA revealed that the nfs-server init script (used internally by the resource agent) always exits with success for status. That was actually due to the recent port to systemd and erroneous interpretation of systemctl status semantics: it always exits with success (due to some paradigm shift, we guess). FYI, systemctl is-active should be used instead and it does report a service status as expected.
As a bonus, a minor issue about a non-existing user id rpcuser is also revealed.
|Messages in the crm history log output are colored depending on the originating host.|
The rest of this document gives more details about crmsh history. If you’re more of a just-try-it-out person, enter crm history and experiment. With history commands you cannot really break anything (fingers crossed).
Introduction to crmsh history
The history crmsh feature, as the name suggests, deals with the past. It was conceived as a facility to bring to the fore all trails pacemaker cluster leaves behind which are relevant to a particular resource, node, or event. It is used in the first place as a troubleshooting tool, but it can also be helpful in studying pacemaker clusters.
To begin, we run the info command which gives an overview, as shown in [Listing 5].
# crm history crm(live)history# timeframe "Apr 15 20:25" "Apr 15 20:35" crm(live)history# info Source: live Created on: Thu Apr 16 11:32:36 CEST 2015 By: report -Z -Q -f Wed Apr 15 20:25:00 2015 -t 2015-04-15 20:35:00 /var/cache/crm/history/live Period: 2015-04-15 20:25:00 - 2015-04-15 20:35:00 Nodes: sle12-a sle12-c Groups: nfs-srv nfs-disk Resources: s-libvirt p_drbd_nfs nfs-vg fs1 virtual-ip nfs-server websrv websrv-ip small-apache Transitions: 1906 1907 crm(live)history# peinputs v Date Start End Filename Client User Origin ==== ===== === ======== ====== ==== ====== 2015-04-15 20:29:59 20:30:01 pe-input-1906 no-client no-user no-origin 2015-04-15 20:30:14 20:30:15 pe-input-1907 no-client no-user no-origin
The timeframe command limits the observed period and helps focus on the events of interest. Here we wanted to look at the 10 minute period. Two transitions were executed during this time.
Transitions are basic units capturing cluster movements (resource operations and node events). A transition consists of a set of actions to reach a desired cluster status as specified in the cluster configuration by the user.
Every configuration or status change results in a transition.
Every transition is also a CIB, which is how cluster configuration and status are stored. Transitions are saved to files, the so called PE (Policy Engine) inputs.
In [Listing 6] we show how to display transitions. The listing is annotated to explain the output in more detail.
crm(live)history# transition INFO: running ptest with /var/cache/crm/history/live/sle12-c/pengine/pe-input-1907.bz2 INFO: starting dotty to show transition graph Current cluster status: Online: [ sle12-a sle12-c ] s-libvirt (stonith:external/libvirt): Started sle12-c ... small-apache (ocf::heartbeat:apache): Stopped Transition Summary: * Start small-apache (sle12-a) Executing cluster transition: * Resource action: small-apache start on sle12-a Revised cluster status: Online: [ sle12-a sle12-c ] s-libvirt (stonith:external/libvirt): Started sle12-c ... small-apache (ocf::heartbeat:apache): Started sle12-a Transition sle12-c:pe-input-1907 (20:30:14 - 20:30:15): total 1 actions: 1 Complete Apr 15 20:30:14 sle12-c crmd: notice: te_rsc_command: Initiating action 60: start small-apache_start_0 on sle12-a Apr 15 20:30:14 sle12-a apache(small-apache): INFO: AH00558: httpd2: Could not reliably determine the server's fully qualified domain name, using 10.2.12.51. Set the 'ServerName' directive globally to suppress this message
|The transition command without arguments displays the latest transition.|
|Graph of transition actions is provided by graphviz. See [Image 2].|
|Output of crm_simulate with irrelevant stuff edited out. crm_simulate was formerly known as ptest.|
|Transition summary followed by selection of log messages. History weeds out messages which are of lesser importance. See [Listing 8] if you want to see what history has been hiding from you here.|
Incidentally, if you wonder why all transitions in these examples are green, that is not because they were green in any sense of the color, but just due to that being color of node c: as chance would have it, c was calling shots at the time (being Designated Coordinator or DC). That is also why all crmd and pengine messages are coming from c.
|Transitions are the basis of pacemaker operation, make sure that you understand them.|
What you cannot see in the listing is a graph generated and shown in a separate window in your X11 display. [Image 2] may not be very involved, but we reckon it’s as good a start as starts go.
It may sometimes be useful to see what changed between two transitions. History diff command is in action in [Listing 7].
crm(live)history# diff -1 0 --- -1 +++ 0 @@ -11 +11 @@ -primitive small-apache apache params configfile="/etc/apache2/small.conf" meta target-role=Stopped +primitive small-apache apache params configfile="/etc/apache2/small.conf" meta target-role=Started crm(live)history# diff -1 0 status --- -1 +++ 0 @@ -15 +14,0 @@ - small-apache (ocf::heartbeat:apache): Started sle12-a
|Configuration diff between two last transitions. Transitions may be referenced with indexes starting at 0 and going backwards.|
|Status diff between two last transitions.|
Whereas configuration diff is (hopefully) obvious, status diff needs some explanation: the status section of the PE inputs (transitions) always lags behind the configuration. This is because at the time the transition is saved to a file, the actions of that transition are yet to be executed. So, the status section of transition N corresponds to the configuration N-1.
crm(live)history# transition log INFO: retrieving information from cluster nodes, please wait ... Apr 15 20:30:14 sle12-c crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Apr 15 20:30:14 sle12-c stonithd: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 15 20:30:14 sle12-c pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 15 20:30:14 sle12-c pengine: notice: LogActions: Start small-apache#011(sle12-a) Apr 15 20:30:14 sle12-c crmd: notice: do_te_invoke: Processing graph 123 (ref=pe_calc-dc-1429122614-234) derived from /var/lib/pacemaker/pengine/pe-input-1907.bz2 Apr 15 20:30:14 sle12-c crmd: notice: te_rsc_command: Initiating action 60: start small-apache_start_0 on sle12-a Apr 15 20:30:14 sle12-c pengine: notice: process_pe_message: Calculated Transition 123: /var/lib/pacemaker/pengine/pe-input-1907.bz2 Apr 15 20:30:14 sle12-a stonithd: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 15 20:30:14 sle12-a apache(small-apache): INFO: AH00558: httpd2: Could not reliably determine the server's fully qualified domain name, using 10.2.12.51. Set the 'ServerName' directive globally to suppress this message Apr 15 20:30:14 sle12-a crmd: notice: process_lrm_event: Operation small-apache_start_0: ok (node=sle12-a, call=69, rc=0, cib-update=48, confirmed=true) Apr 15 20:30:15 sle12-c crmd: notice: run_graph: Transition 123 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1907.bz2): Complete
Resource and node events
Apart from transitions, events such as resource start or stop are what we usually want to examine. In our extremely exciting example of apache resource restart, the history resource command picks the most interesting resource related messages as shown in [Listing 9]. Again, history shows only the most important log parts.
|If you want to see more detail (which may not always be recommendable), then use the history detail command to increase the level of detail displayed.|
crm(live)history# resource small-apache Apr 15 20:29:59 sle12-c crmd: notice: te_rsc_command: Initiating action 60: stop small-apache_stop_0 on sle12-a Apr 15 20:29:59 sle12-a apache(small-apache): INFO: Attempting graceful stop of apache PID 9155 Apr 15 20:30:01 sle12-a apache(small-apache): INFO: apache stopped. Apr 15 20:30:14 sle12-a apache(small-apache): INFO: AH00558: httpd2: Could not reliably determine the server's fully qualified domain name, using 10.2.12.51. Set the 'ServerName' directive globally to suppress this message Apr 15 20:30:14 sle12-c crmd: notice: te_rsc_command: Initiating action 60: start small-apache_start_0 on sle12-a
Node related events are node start and stop (cluster-wise), node membership changes, and stonith events (aka node fence). We’ll refrain from showing examples of the history node command—it is analogue to the resource command.
History log command, unsurprisingly, displays logs. The messages from various nodes are weaved and shown in different colors for the sake of easier viewing. Unlike other history commands, log shows all messages captured in the report. If you find some of them irrelevant they can be filtered out: the exclude command takes extended regular expressions and it is additive. We usually set the exclude expression to at least ssh|systemd|kernel. Use exclude clear to remove all expressions. And don’t forget the timeframe command that imposes a time window on the report.
External reports, configurations, and graphs
The information source history works with is hb_report generated report. Even when examining live cluster, hb_report is run behind the scene to collect the data before presenting it to the user. Well, at least to generate the first report: there is a special procedure for log refreshing and collecting new PE inputs, which runs much faster than creating a report from scratch. However, juggling with multiple sources, appending logs, moving time windows, may not always be foolproof, and if the source gets borked you can always ask for a brand new report with refresh force.
Analyzing reports from external source is no different from what we’ve seen so far. In fact, there’s a source command which tells history where to look for data.