
PMDF channel counters are meant to
indicate the trend and health of your mail system.
Accuracy of accounting is not what the channel counters are designed to do.
The lack of accuracy in PMDF's channel counters in an inherent aspect of their
design; it is not a bug.
Specifically, PMDF's channel counters adhere to what Marshall Rose calls the
fundamental axiom of management, which is that management must itself
not interfere with proper system and network operation by consuming anything but
the tiniest amount of resource.
PMDF's channel counters are implemented using the lightest weight mechanisms
we have available: A shared memory section on each system that is periodically
synchronized to a disk database. Channel counters do not "try harder" -- if an
attempt to map the section fails, no information is recorded, if one of the
locks in the section cannot be obtained almost immediately no information is
recorded, when a system is shut down the information contained in the in-memory
section is lost forever.
What PMDF channel counters are not for
PMDF channel counters are not intended to provide
an accurate accounting of message traffic.
- PMDF channel counter process (VMS only). The resident worker
process which is started by the PMDF_STARTUP.COM command. Its process name is
usually "PMDF counters", changing to "PMDF count exit" when it is exiting or
restarting.
- In-memory database. The in-memory channel counters cache is
stored in a system permanent page file section named PMDF_COUNTERS. This page
file section is node-specific. It is created and initialized by the PMDF
channel counter process if it does not already exist. If it already exists,
then the channel counters contained within will be your "current" in-memory
channel counters.
- On-disk database (VMS only). This is a cluster-wide database on
disk. This database is created automatically by the PMDF channel counter
process. When created it will have all counters zeroed except for the count of
messages files stored in the queue directory for the channel when that
directory was scanned. Note that on an active system, the number of files in a
directory at any instant does not reflect the number of real, stored messages.
For example, some files may be in the process of being deleted, other files
may be added after the scan is done, etc.
- Synchronization (VMS only) of the in-memory database with the
on-disk database. This happens when you use the PMDF COUNTER/SYNCHRONIZE
command.
- the data recorded on disk is locked and read,
- the value is added to the in-memory value, and
- the new value written back out to disk, and
- the in-memory value is zeroed.
- COUNTERS commands. There are two sets of commands. See the manual
for more complete descriptions.
| from DCL |
PMDF QM/MAINT subcommands |
on UNIX |
function |
| PMDF COUNTERS/SYNCHRONIZE |
COUNTERS SYNCHRONIZE |
Not applicable |
Tells the PMDF channel counters process to synchronize |
| PMDF COUNTERS/SHOW |
COUNTERS SHOW |
pmdf counters -show |
Shows you the on-disk channel counter values. For the QM subcommand,
the counters are implicitly synchronized first. |
|
- If a channel's stored counter keeps decreasing over long period
of time.
- If a channel's stored counter keeps increasing over long period
of time.
- If a channel name does not make sense.
- Negative numbers.
- The stored number does not match the real number of messages in
the channel.
- The number seems to be off by a few (relative to the total number of
messages processed by your system).
An example of the output from the QM/MAINT COUNTER SHOW command is shown
below Channel Messages Recipients Blocks
----------------------------- -------- ---------- ---------
directory
Received 6519 9038 69545
Stored -4 -4 -149
Delivered 6523 9042 69694
Submitted 6811 9019 71123
| name |
description |
example corresponding mail.log entry |
| Received |
Messages coming from any channel (e.g., "xyz") to the
channel named "directory". That is, messages enqueued to the "directory"
channel by any other channel.
| xyz directory E |
| Stored |
Messages stored in the channel queue to be delivered. That
the number may be negative just means that the zero used for the counters
does not mean zero messages stored on disk.
| (Not applicable) |
| Delivered |
Message which have been processed by the channel "directory"
and either delivered or returned.
| directory D |
| Submitted |
Messages sent from the channel "directory" to any other
channel; e.g., enqueued to the channel "xyz".
| directory xyz E |
Note that the directory channel shows more submissions than delivered. This
is usually the case, SUBMITTED >= DELIVERED, since each message the channel
dequeues (DELIVERED) will result in at least one new message enqueued
(SUBMITTED) but possibly more than one. For example, if it has two recipients
reached via different channels, then two enqueues will be required. Or, if the
message bounces a copy will go back to the sender and another copy may be sent
to the postmaster. Usually that will be two submissions (unless both are reached
through the same channel).
Moreover, when you shut down a node, the data which has accumulated since the
last PMDF COUNTERS/SYNCH command in the in-memory data cache is lost. If PMDF
processing is spread across a cluster, that means that an enqueue may have been
processed and hence accumulated in the in-memory cache on node A and the
associated dequeue processed and accumulated on node B. If node A goes down
before its in-memory cache is synchronized to the cluster-wide on-disk database,
it will be lost. If the node B data then does get synchronized to the disk, you
will then have a lack of balance in the recorded enqueues/dequeues for that
channel.
Now, you can force the detached channel counter processes to periodically
flush their data. Also, having your shutdown procedures do a PMDF CACHE/SYNCH is
marginally helpful, but still leaves a window during which a message may come in
(say over DECnet or TCP/IP) and get its enqueue recorded to the in-memory cache
after the final synchronization. This is marginally helpful because
synchronizations do not always succeed. For instance, if another process has the
on-disk database locked, a detached channel counter process may not be able to
make its updates. In such a case, the process gives up and assumes that the
synchronization will succeed later. Of course, the synchronization will never
occur if the system is about to be brought down.
To have periodic synchronizations done, define cluster-wide the system
logical $ DEFINE/SYSTEM PMDF_COUNTER_INTERVAL "dd hh:mm:ss"
For instance, to update the on-disk database once every 10 minutes, you would
use $ DEFINE/SYSTEM PMDF_COUNTER_INTERVAL "00 00:10:00.00"
That logical must be defined before PMDF_STARTUP.COM runs so that it is seen
by the detached channel counter processes.
|