Software supportability needs to be anticipated and designed into an enterprise software platform. It is tempting to wait until the software is “stable”, then add supportability to it. This is a mistake, because the time you need the support tools the most is when the software is immature. The investment will have the most payback earlier in the Software Life Cycle.
Software Supportability Requirements
The supportability strategy for an enterprise software platform must anticipate the following requirements:
- Installation verification
- Database access, security, and Integrity
- Activity logging for support and audit requirements
- Error handling and logging for problem determination
- Availability and performance monitoring, with appropriate alerting
- Integration with trouble ticket processsing
Let’s take a closer look at these requirements.
Install Verification Tests (IVT)
There are several approaches to installation verification.
One approach is to validate that the required files and executables were deployed in the proper locations. This is really testing the deployment and not the application. This can have value but the validation scripts will have to constantly change as the deployment is modified.
The other approach is to run a set of tests that confirm execution and availability of the enterprise app. These tests can range from a simple login that is run manually at first, to fully automated IVT using a test automation framework such as Selenium, JBehave, or Fitnesse. One of the challenges is building automated tests that do not rely on specific data, since this could vary by customer.
The installation verification process and monitoring approaches described below should validate that the application can access the database. With today’s focus on increased security, I would recommend that a script be run after a deployment to ensure that the appropriate security and access control is in place. By setting up users with specific access control, a set of retrieval and update scripts can be run with pass/fail being defined as the proper security was enforced.
Data integrity must to be designed into the application. Creating the appropriate referential integrity must be guaranteed in your software development methodology. Involving the DBA specialists in the application design will help with this.
Here’s one final tip on database support. As quickly as possible, setup a repository for everyone’s “one off” queries that they use to troubleshoot. How often does an email get sent asking “Does anyone have a script that does X”? Having a central location that can be easily indexed and invoked will help. Include the date created, the version it applies to, and any usage notes.
Auditing user activity is a required element for most applications. In the case of Healthcare systems and HIPAA, it is a legal requirement. Logging critical processing steps in the application is a requirement of most systems, healthcare or not. A framework that logs the application component, the business function as know by the users, the user name, time and date, and any data that is critical to identify the operation, such as a patient id. This audit log also needs to be secured so that it is only viewed on a need to know basis with appropriate authorization.
Error handling and logging
This can be one of the most difficult aspects of a system architecture. It can take significant time away more functional requirements. Error handling and logging should be integrated together, since the type of information logged when an error occurs is critical to problem determination.
There are at least two types of errors, those detected by the application as incorrect user input or incorrect data, and unanticipated errors that are detected in the “catch” clause of a try/catch sequence.
I would recommend two frameworks, one that reports errors to end users, and another that logs unanticipated background errors that must be reported to an internal support staff for action.
The role of monitoring is to validate that critical components and services are available to end users. Products from CA, Microsoft, IBM, AppDynamics, Wiley Systems, and others provide real time performance and availability monitoring of enterprise applications
Invoking the UI mirrors the user experience more closely, but services that respond to availability calls is a more elegant solution. I prefer creating a single “ping”-type service that will invoke any critical services or objects that need to be monitored. Create an historical record of response times to detect performance degradation over time.
Alerting should be invoked whenever a critical application error is detected, or the monitoring platform detects a critical performance/availability problem. TicketPro is an example of rule-driven software that monitors server logs for specific errors and opens trouble tickets automatically.
An alternative solution is to integrate with an existing alerting system within your enterprise. The error handling framework and monitoring software must be aware of this system and use it accordingly.
Care must be taken to not “cry wolf” with alerts that are not valid, or flooding the alert system with multiple items for the same issues
The application architecture can assist in resolving trouble tickets by capturing the user perspective n the problem (screens, recent activity, data entry, etc) objectively. Relying on another person’s observations can be frustrating, inaccurate and inefficient. By provide a facility to capture user activity and software/hardware levels, the resolution time for tickets can be greatly reduced. BMC AppSight is an example of a product that logs user activity and all relevant software levels for problem determination.