Process control engineers are quite familiar with the concept of redundancy, the practice of essentially having replacement or substitute components of key segments in a given system or process. The components may be a spare server already configured within the firewall, server hardware, additional software licenses, backup network switches, or immediately replaceable I/O hardware at the sensor level. Having redundancy provides the core ability to recover from outages and shutdowns.
Resilience focuses more on how quickly system problems can be addressed and corrected to preserve the continuity of operations within acceptable parameters. A control system designed with some sense of resilience in mind is expected to maintain an adequate level of operational normalcy in response to disturbances and potential changes that impact performance.
The LogMate software for Alarm Management is used in process control applications to provide working insight into the performance of processes and delivers information in a meaningful context for improved troubleshooting and faster decision-making. Many organizations that have requirements to analyze performance and have an uninterrupted collection of alarms and events, also have determined that Alarm Management is part of their redundancy and resilience considerations. This may even be required by regulatory oversight that enforces reporting and auditing.
Cold Redundancy – Low Resilience
These systems are often considered a lower priority and may be acceptable if offline for extended periods of time. There may be components that need service, repair, or replacement. Backup components are typically not in a ready state for use but are readily available. Oftentimes there are manual procedures in place to return to bring the process back online.
Budgetary considerations for low resilience systems are relatively small since some components may be completely ignored while others may be designated as “deal with it if and when something happens.”
Warm Redundancy – Medium Resilience
These systems are typically designated medium priority and will have tighter constraints on tolerable downtime. Redundant components are expected to be more immediately available, on hand, and possibly simple to boot up to an operational state. There may be some manual as well as automated procedures to bring the process back to normal.
Additional budgetary concerns may exist due to increased costs associated with procuring, storing, and maintaining redundant components.
Hot Redundancy – High Resilience
These systems have very high priority with very little to no downtime projected. Redundant components may actively run in parallel, are maintained similarly to the primary system, and are ready to put in service with minimal procedures to failover (manual or automated).
There are serious deliberations regarding budget when involving complete secondary (tertiary, etc.) systems as costs for implementation and maintenance can build up quickly. Since those costs are characteristically associated with alleviating risks of downtime, they may be deemed acceptable to maintain operations.
Resilience and Alarm Management
LogMate High Availability module addresses the need for those processes mandating high resilience that include retaining insight into alarms and events, whether it’s due to regulated environments or because of audit requirements. LMHA allows two Capture servers (alarm and event data collection and processing servers) to back each other up to ensure failover continuity between primary and standby systems.
This backup approach ensures the data is continually collected by the primary and secondary servers seamlessly and in sequence (record ID, sequence number). This reduces and potentially eliminates disrupting the normal operations of collecting data, thus maintaining continuous insight into system performance (detecting upsets and maintaining logs, performance analysis).
LMHA can also be configured to run as a Windows service on the primary server that performs a soft handshake with the secondary server during normal operation or before a failover, and a hard failover if the primary server goes offline.
To discuss how TiPS safeguards the ability to retain Alarm Management functionality during server downtime, contact us today.