512-863-3653 sales@tipsweb.com

LogMate® High Availability

A More Seamless Disaster Recovery Setup and Experience

LMHA enables two Capture installations to communicate with each other in order to ensure that if a failover is required, the Capture system on standby is able to pick up from where the primary system left off and vice-versa once the primary system is back online. This allows for your Capture data to be parsed sequentially with as few gaps as possible in the data for your end-users who rely on the data for reports and alarm management analysis. From the end users’ perspective, it’s as if a disaster scenario never occurred!

✔ Manage your failover from any LMHA Capture server

✔ Keep your LMHA Capture servers synced

✔ Check the status of your LMHA Capture servers

✔ Perform a soft handshake with the partner server prior to failover

✔ Perform a hard failover if the partner server is offline

✔ Run LMHA as a service

✔ Redundancy – always have a backup Capture system ready to go during Disaster Recovery (DR)

✔ Avoid the hassle of reconfiguring Capture each time a DR scenario occurs

✔ Switch over from one Capture system to another with a click of a button

✔ Minimize gaps in Capture data collection during DR

✔ Ensures that end-users will be able to continue reviewing and analyzing alarms and events during DR

 LMHA Objectives:

LMHA synchronizes two independent alarm and event sources to the same highly available LogMate database.

  • Failover process can be done manually, but using LMHA eliminates stress and inaccuracy.
  • Saves time to perform the failover using LMHA.
  • LMHA process does not require in-depth knowledge of LogMate and SQL Server during critical failover events.

LMHA Technical Benefits:

  • User interface to show the status of the failover process, and current Capture application with primary connection to the LogMate database.
  • Logging feature to diagnose and troubleshoot any failover issues.
  • Easy recovery from unsuccessful failover attempts.

In addition, LMHA prevents:

  • Incorrect ID field values that would generate duplicate records or skip records, causing inaccurate alarm management reporting.
  • The offline Capture from coming online and causing duplicate records and inaccurate alarm management reports.
  • Incorrect failover can cause many 1-2 days of work with Customer Support to correct the LogMate database. This is not an exception and has happened many times in the past. Users simply forget that a manual process is required, and start the backup Capture without the adjustment. This causes data to be duplicated, sometimes from months ago since the last failover occurred.

Manual failover without using LMHA:

Manual tasks that must be performed for switching to another source upon failure of the primary source:

  • Shutdown primary Capture application (if available, worst case scenario is primary data center is not available) and flags the application to not start again without permission
  • Determine the last record collected from primary Capture (date, time, point name, key field)
    • Log into netView and manually record the last record collected or
      • Log into SSMS and manually record the last record.
  • On backup data source, manually review the records to find the same date, time, point name, key field.
    • Log into SSMS and query records to find same record in backup source.
  • Determine the ID field value to the record found in the backup data source.
    • Manually record ID
  • Manually adjust the backup Capture application to start at the ID field in the backup data source.
    • Open the tips.ini for back Capture and manually enter the Next Where Value
  • Launch the backup Capture application.

Keep in mind this same manual failover process will need to be repeated when returning to the primary Capture application.

Automated failover using LMHA:

  • Start LMHA UI
  • Press the failover button

Actual time to perform failover manually:

  • If you have expertise (TiPS SME): the user can do this in 30 minutes
  • If no expertise using TiPS Support: 60 minutes (on call with TiPS Support) X 2+ people
  • If no expertise using a step by step document w/o TiPS: 90 minutes
  • Typically failover tests happen once a quarter to once a year, no one ever tests until it is really needed since it is difficult.

Keep in mind this same manual failover process will need to be repeated when returning to the primary Capture application.

View Our Other LogMate® Modules