Focusing your service-oriented architecture (SOA) performance management efforts on the application testing stage or attempting to manage production SOA applications solely with traditional point solutions and network management tools is a recipe for disaster, putting your customer's satisfaction and your company reputation on the line. Production SOA Performance and Service Level Agreement (SLA) management must be built into the overall SOA strategy.
Service-oriented architecture and virtualization are hot topics in almost every publication, blog, website, and IT strategy meeting today. The reality is that both of these technologies have been around for some time but are finally achieving broad market adoption and playing a key role in improving efficiency and lowering development and operational costs, and contributing to higher service levels, increased customer satisfaction, and contributing to bottom line results.
Since the beginning of the Information Technology (IT) era, newer and more innovative solutions have emerged at a
phenomenal rate but most organizations are taking a phased or evolutionary approach and testing the water before widespread adoption. A recent survey of over 600 large enterprise organizations either already invested in, or seriously considering SOA, in North America, Europe, and the Pacific confirms that SOA is in fact becoming more mainstream and adoption is accelerating (see Figure 1).
This article for the most part skips the "Why SOA." The generally accepted and most frequently mentioned reasons for considering SOA include the notion of faster and more cost-effective application development based on the assumption that once a particular service is defined, it can be reused by a number of business processes versus re-inventing that particular service for every business process with a similar need. This also lends itself to enforcing some standards within an enterprise by always delivering or performing a particular service the same way. Faster application development and deployment can help make organizations more agile to introduce new business process functionality in response to user needs and market or product dynamics.
Chances are, if you are reading this article, you already recognize the advantages of SOA, and have already deployed a SOA application or you are seriously considering deploying SOA and need to look more closely at SOA Application Performance Management (APM).
Govern, Manage, and Secure
One characteristic SOA shares with many new technologies is that while it simplifies the visible, user-facing process or service, it typically adds significant complexity to the underlying infrastructure. APM can be viewed as one of the three legs of the SOA infrastructure management stool. The other two are security management and governance. All three are important to managing the SOA infrastructure to provide a high performance, secure, and structured SOA environment.
Testing Versus Production
SOA performance can be broken down into two major phases based on the application life cycle. The first phase of performance testing occurs during the application development process. In this phase, load testing intended to simulate the production environment is used at various stages to ensure the application functionally behaves as intended at acceptable performance levels. The final QA phase of the development cycle, sometimes using synthetic transactions, typically includes testing to place maximum stress on the application code in the test environment before declaring it ready to be released for day-to-day production. This level of testing is a given in any development organization in that few organizations would consider releasing a mission-critical application without a final QA process that includes some level of performance testing. When the application exits the QA process and enters production, the focus on performance shifts from code path validation and stress testing to ongoing production performance management.
This second phase involving performance is the ongoing monitoring, reporting, and triage associated with the application once it is released to production. This ongoing activity is typically characterized as Application Performance Management. This is as important as the QA test activities and perhaps more so since poor performance in actual live production impacts the users of application services and can be highly visible both internally and externally. Unfortunately this is often the phase that is overlooked or given little attention during the initial SOA strategy planning process. Organizations may assume that the existing performance tools in place can handle the task. In addition they may overlook the need for a common language and agreed process to measure and report on performance in terms of Service Level Agreements that represent the performance commitment to the business unit.
The remainder of the discussion of SOA performance that follows focuses on considerations surrounding the complexity and dynamics of the production environment along with key considerations and challenges organizations must address to ensure success.
Managing performance is far more complex than producing historical reports on average response times and outages. Successful performance management must be based on real-time, end-to-end customer transaction monitoring and reporting across the entire SOA infrastructure. Historical data can, however, be a very valuable input to correlation and probable cause analysis. Successful performance monitoring should help IT operations better execute its new charter of being the business relationship manager between the end user and the business unit. It must provide a comprehensive view of the entire customer experience with the ability to detect slow transaction times as they develop and provide alerts and probable cause analysis. Armed with this information, IT operations can proactively address the issue before the end user is affected. Successful performance management provides a common language and reference data for use across the infrastructure for problem triage. Ideally all of this is accomplished with minimum overhead and real-world actual customer transaction experience information.
Service-Level Agreement Management
Many organizations establish a contract in the form of an SLA to define both target and minimum acceptable performance levels for an application. Frequently people think of SLAs when discussing external user-facing applications such as a business-to-consumer (BC) environment like retail sales, online banking, or other self-service portals. In fact SLAs should be part of the DNA of all applications. External customer or supplier SLAs may be more stringent but an enterprise's own employees have a need for predictable and reliable internal application performance as well. While external-facing application failures or slowdowns can be costly in terms of lost revenue or damaged reputation, poor internal-facing application performance can also be costly in terms of lost productivity and employee moral.
Much like the distinction between different SLA targets for external versus internal users, SLAs can be established at various points in the business process based on factors such as the nature of the transaction or the specific user. A brokerage firm, for example, would likely place the highest service-level priority on a specific business process such as executing a trade. A delay in any step of a trade transaction could make a significant difference in the amount of money saved or lost based on the overall transaction time. The SLA for that particular process therefore would set at a very demanding level. On the other hand the service level required for a less-critical function like printing a portfolio summary would likely be set at a less stringent level. Similarly that same brokerage firm may choose to set a higher service level for a very large customer executing frequent or high-volume, high-value trades versus an occasional user executing smaller or infrequent trades. While both users have high expectations for fast, reliable service, one may have a higher business value to the enterprise.
Another important aspect of APM in general, but SLA management specifically, is to have a common language and measurement for the SLA that reflects the actual customer experience that can be understood by both IT and the business process owner. While IT will have a need for much more detailed technical information for problem identification and avoidance, this level of detail is of little value or interest at the business unit level. The tools therefore that are used for SLA management should be capable of producing useful and meaningful information for both the technologist and the business process owner.
SOA Environment Complexity
Heterogeneity is a way of life in today's IT environment. Solutions span multiple hardware and software platforms provided by multiple vendors. These solutions may be geographically distributed and loosely coupled but, at the same time, highly interdependent. The management of these environments therefore must have visibility to the entire infrastructure that drives the application.
Figure 2 represents a somewhat simplified SOA infrastructure, with a typical SOA environment likely having multiples of many of the elements shown earlier. It's included here to help highlight the number and variety of infrastructure elements that a typical transaction would span. Any one of these elements could fail completely or cause a slowdown for a given application or transaction. When this happens, rapid triage is required to quickly identify the failing component and direct corrective resources to that area before a slowdown results in a failure. This reinforces the need for APM solutions that provide visibility to the entire transaction path while highlighting why silo-specific monitoring alone is not sufficient for rapid problem identification and resolution. Talk with anyone who has worked in an environment where every support organization gets a call when a problem is detected or reported and what you will find is that valuable time is lost with finger pointing and passing the buck. The more effective solutions provide the total environment view with a probable cause indicator that helps pinpoint the failing component.
It's Not Just Java
Another variable that comes into play as SOA becomes more mainstream is that .NET has emerged as an enterprise platform for these complex SOA environments. This now means that it's highly likely that a given transaction in its path through the SOA universe will span both J2EE and .NET platforms.
Keep in mind that the discussion is about managing a SOA, not managing parts of a SOA environment depending on which path the transaction takes. Ideally in this cross-platform environment, the enterprise will use the same toolset to monitor and manage the entire infrastructure. Using two different tools depending on the platform, even if they are from the same supplier, is not likely to deliver consistent results. You wouldn't start a road trip in unfamiliar territory by trying to paste together multiple maps using different scales, in different languages, and expect to get to your destination in the fastest time with the fewest roadblocks. Your SOA performance management solution should facilitate using the same language and the same metrics regardless of the platform the transaction path takes.
Evolution and Growth
While SOA is clearly a main street phenomenon today, many organizations are still in various stages of evolution with a broad range of legacy and distributed applications, Web services, and emerging SOAs. As you look at various methodologies and solutions available for APM, a little extra effort looking at the portability and scalability of your APM solution can provide significant payback. You should take into consideration a solution that adds minimum overhead, works in your current environment, and will grow, scale, and adapt as your environment changes. Talented help desk and support staff are difficult to find, train, and retain. APM solutions that provide views that can be used and understood by the novice while supplying valuable detailed information to the technical experts can help maximize your investment in personnel. Likewise solutions that grow with your evolving infrastructure provide consistency in operational practices regardless of where you are on the road to SOA.
Where to From Here?
SOA infrastructures have clearly matured and are at various stages of delivering on the promise of cost savings, efficiency, and business results. The importance of managing the user experience is paramount if your SOA-based business process is to be successful. Managing the user experience includes making sure you understand the user, the business process, and how well you serve that user through the business process. Application Performance Management is one of the three key infrastructure management tools you will need to successfully manage that user experience.