I’m doing a great deal of thinking, and doing, on systems monitoring.  The need has always been there but I thought I could never justify taking the time to do it.  My brother showed me Nagios last summer and although I thought the concept was great I figured it would be too complicated and time consuming to setup.  I was wrong.  In an afternoon this spring I had it up and running and detected a problem the very first day, before the users were even aware.

So that’s wonderful.  I now have positive monitoring and alerting for all of my servers, 911 workstations and consoles.  I’m even monitoring portions of the county network that are beyond my actual control…for reasons I won’t go into here.

Now I think about how to monitor public safety radio systems.  Users of the systems have a history of being aware of problems but flatly refusing to report them.  It’s as if they desire constant failure so they have something to complain to elected officials about.  Some elected officials seem to thrive on the drama.  To be fair, there are a number of situations that come up where the users will either not understand that a problem exists or they may not be aware of a problem until they need the system, only to discover that it does not work.  Monitoring and alerting for these systems would help me to proactively address problems and take the drama away from the TWM’s (technical term for drama loving users).

Nagios does a great job monitoring computer systems that have IP connections.  What it does is idea but it has no way that I am aware of for monitoring non-IP connected systems.  I also need ways, in some cases, to provide simple status indication to 911 operators/dispatchers. This may require me to build some hardware, providing opportunity to prototype using Arduino or Raspberry Pi.

Here are some of the ideas I have floating in my head at the moment:

A system to monitor each radio network.  I have two primary radio networks:  LEDN which is the Law Enforcement Dispatch Network and EFDN which is the EMS / Fire Dispatch network.  Each of these have transmitters on about 10 sites.  Each site is connected to a channel on a JPS SNV-12 Voter.  The voter is also connected to the console.  When the dispatcher transmits it causes the voter to key all sites for transmit at once.  If a mobile user transmits that is received by the voter which also keys all sites for transmit.

Therefore anytime a network transmits, whether initiated by the console or a mobile, we should see that result in a valid COR + CTCSS from a receiver tuned to the transmitter for a particular site.  Ideally we would have a receiver setup for every transmitter on the system.  So we end up needing to create a hardware device that:

  • Monitor the following digital signals (1 or 0):
    • Output of a console channel for either PTT.
    • The output of the voter PTT.
    • The COR/CTCSS detect output of  receivers on each channel the radio network being monitored.
  • Outputs a message indicating any abnormal condition (i.e. no transmit from site x, no transmit from voter upon console transmit)

Then I need to find a way to get that message into Nagios and have it alert on a fault with the specific information.

That gives us a monitor for the transmit or outbound path.  We also need to monitor the receive or inbound path.  A start for that is already provided with positive monitoring of each receive line from each site.  Similar to the transmit portion, a receiver at each site on the network is connected to each channel of the voter.  On this receive line there is a 2175 Hz tone at all times.  If the voter fails to detect that tone the channel is taken out of service and a fault is indicated for the channel.  When the tone has been received again for two seconds the fault is removed and the channel is placed back into service.  With that guard tone in operation we know right away if we lose any receiver on the system.

There are a number of ways for me to get this fault information.  Each channel card provides a digital signal (1 or 0) to indicate a fault, so I could grab that with hardware and do what I like with it.  That might be quite handy for a status lamp board in the dispatch room.  But the voter also makes this information available on a RS-232 serial port or via telnet.  Issuing a command via either of those will return a status.  That information could be used somehow indicate a status to Nagios, but I’m not sure how I would do that.

A common complaint is that fire or EMS agencies did not receive their page and they always blame either the system or the dispatcher.  They refuse to ever accept the fact that it could be their equipment or their operator error unless we first prove that there is no fault at 911.  Indeed there have been operator errors at 911 and, very rarely, problems with the radio system.  Another complaint has been that there was no voice announcement from the dispatcher after the page was sent.  This has frequently been due to the dispatcher selecting the wrong channel for the voice announcement, they actually give the information but it goes out on the wrong system.

To go back after the fact and manually dig up the information takes a great deal of time.  Instead it would be nice to log all pages sent, verify that they successfully made it through the system and even provide the dispatcher with an indication of a positive page.  Icing on the cake would be to somehow verify that the dispatcher sends an audio message on the same channel the page went out.

I should explain that all pages are sent on the EFDN network except for pages to the Priest Lake area.  Those folks are special (in ways not to be described here) so they get their pages sent on the Priest Lake channel.  Because they are the exception and dispatchers have to do things completely different for Priest Lake than they do for the other 92% of the agencies we support, dispatcher error has been unfortunately common.

One idea I have for this is to monitor the transmit audio lines from each console position.  I could bring these into a linux based computer via USB sound cards and decode tones.  I would also monitor console outputs on the two channels where pages are possible, again decoding tones.  Now I know what tones were sent and who sent them.  If I hook up a receiver to a channel on EFDN and another on Priest Lake and monitor that audio I can again decode tones.  This should allow me, if I can learn to write some program, to know who sent a page, lookup from a table where what channel that page should have been on and verify that it did indeed go out on the proper channel successfully.  A continued monitor of audio should show activity within 20 seconds of tones which would indicate that a voice message was sent and received on the proper channel.

I could then take this to a status light board in the dispatch room.  I would have a lamp for each agency to indicate successful page.  Another lamp for each agency below that one would indicate successful voice.  A big red “ERROR” lamp should also be included that would illuminate if tones didn’t complete the chain, if there was a lack of voice or if voice did not go out on the appropriate channel.