Improving J2EE Application Performance

This article describes how to achieve a high level of performance in a J2EE application, independent of which Application Server you use.

Introduction

You delivered the application to the users and they are now using it every day. Early feedback has indicated that the performance of the application does not meet the requirements. The service level agreement called for a maximum response time of five seconds per request. "Inconceivable!" After all you tried the application yourself and found the response time to always meet this metric. Where did you go wrong and what can be done about it?

This article describes how to achieve a high level of performance in a J2EE application, independent of which Application Server you use. A structured approach to improving performance will be described that ranges from broad strokes (monitoring J2EE Application Server resource usage) to fine strokes (finding bottlenecks in the application). After reading this article you will find that performance isn't just an occasional activity--- performance is a state of mind.


Choose your performance goals

It is always best to choose the project goals before development starts. Choosing project goals upfront allows tradeoff decisions to be more easily made during the development cycle. This should include a general statement of the type of performance required from the application. The possibilities for performance goal choices vary based on the type of application that is being built.

If you are building an n-tier application that has the goal of supporting clients over slow connections (dial-up modems), then you may employ techniques such as compressing as much information into individual TCP/IP packets as possible. Another optimization may be to minimize the number of round trips made between tiers. This has been one of the performance goals for Oracle database servers. What does this have to do with J2EE performance? The performance goals usually have to make trade-offs between time and space. Oracle chose to minimize the time it takes to perform database operations even in wide area network environments. This came at the expense of additional memory (space) requirements as operations/data were bundled on the client machine.

In most applications, performance is dependent on several factors. First the performance goals need to be agreed on. Do you want the application to scale to handle a large number of simultaneous users? Do you want the application to offer a fast response rate to a small number of users? You should keep your general performance goal in mind as you design and develop. Knowing these goals in the design phase lets you incorporate high-performance techniques in your implementation (choice of algorithms) and coding (avoiding synchronization). Later, you will choose what to measure based on your performance goal choice(s). When you have the functionality of the application working, you can begin the process of improving performance (this can be at the unit or whole program level.)

If you want the application to scale to handle a large number of simultaneous users, it will be important to minimize the use of shared memory that may be updated (Java objects instances that are read from/written to) so that client requests can run concurrently (without waiting for a synchronization lock). If your goal is for the highest per-user performance, then you will want to cache data to minimize the lookup time, although this can reduce scalability, as users have to wait for the synchronization lock for sharing the cache. A common alternative is to use a reader/writer lock that allows many requests to read the cache (all readers share one synchronization lock), at the expense of starving a writer who will rarely gain access to the lock during peak usage time. The reader/writer lock technique works well when there are more readers than writers. Although the reader/writer lock can be a middle ground between the opposing requirements, other tricks can be utilized to achieve both goals. If memory is in abundance, you can cache data in memory that is not shared between users. Each user will have a cache to reduce the expense of data lookup, at the cost of additional per-user memory consumption. This technique is known as "zero shared memory" optimization. The coding techniques that can be employed to achieve high performance are continuing to grow (see J2EE Patterns Repository ).


Follow a structured approach to improving performance

Dangers and common pitfalls
  • Starting with the wrong steps (fix memory leaks before improving execution speed.)
  • Design for performance but don't rewrite every line of code that looks inefficient (measure don't guess).
  • Don't cache data unless you know how and when to invalidate the cached entries.
  • When things go wrong, avoid finger pointing at others, take responsibility for the problem jointly.
Choose your goal, high per user performance, scalability or both
  • High per user performance can be obtained by using data caching techniques.
  • High scalability can be obtained by minimizing shared memory.
  • Middle ground is using data caching with priority given to readers over writers (reader/writer lock).
  • "Zero shared memory" optimization for caching data in memory rich environment.
Follow step-by-step approach for improving performance
  • Choose performance goal(s) and design for them.
  • Fix bugs that prevent application from operating correctly.
  • Eliminate memory leaks.
  • Build test environment.
  • Build client load test for simulating production clients.
  • Measure performance with single client load.
  • Revise code to improve performance for single user execution path. 
  • Eliminate bottlenecks in application hits under heavy load conditions.
  • Measure performance under heavy load conditions.
  • Revise code to improve performance under heavy load conditions.
  • Repeat measure and revise steps until performance goal(s) are met.
 

 

 

It is exciting to make changes to the application that significantly improve performance. However, it is not exciting to make the wrong performance changes. It is very easy to make changes that theoretically should improve performance but never seem to make a difference. If you take only one thing away from this article, it should be that a structured methodical approach should be taken to improving performance. It is important to follow the steps outlined below in order.

Step one: You must make the application function correctly and solve the memory leak issues. If you do measure the performance of an application that has memory leaks, the results will be misleading, as the time to allocate additional memory will be included in the results. Also, the application will run slower as more memory is leaked over time. Read Eliminating memory leaks for helpful hints on resolving memory leaks.

Step two: You should build a test environment that resembles your deployment environment. If the production application will access the database server over a dial up modem connection or high-speed network connection, you should do the same in your test environment. Having a good test environment will help ensure that you see the same performance characteristics as your customer will see.

Step three: You will need a way to simulate the clients accessing the application so that you will be able to measure for your chosen performance goal(s). Some of the possible metrics to choose from are:

  1. Max response time under heavy load.

  2. CPU utilization under heavy load.

  3. How the application scales as additional users are added.

You may purchase a tool for putting a load on the application or you can create such a tool yourself. The client load test program is critical to ensuring that the application can handle the required performance levels. You will use the client load test repeatedly throughout the performance improvement phase. You will want the ability to vary the number of users simulated by the load test.

Step four: Discover the specific execution code paths that need to be changed to improve performance for a single user. For this step you could use a profiler tool that measures the performance of the code in the application. While running the load test with a single user, measure the performance of the application. Then identify the execution paths in the application that consume the most time. The idea here is to find the code that consumes the most time in the application and improve it to be faster. This is where you will want to apply the performance improvement techniques described very well in the patterns section and in the "Tips on Performance Testing and Optimization" article on TheServerSide.com. Make one code change at a time and repeat the tests to determine if the change improved the performance. You will want to repeat step four until performance meets the end-user requirements.

Step five: The next step is to identify the major bottlenecks that need to be eliminated to improve performance. Many of the Java virtual machines have a "show me the state of the Java process" feature built in. This is the mechanism that will show you a snapshot of each Java thread's stack trace. The stack trace shows you what line of Java code is executing in each thread and which method called it (read An Introduction to JavaTM Stack Traces to learn more reading a Java stack trace).

The common technique to obtain the application stack trace is to send a "kill -3" signal to the Application Server Java process (the ctrl+break key sequence handles this under Microsoft Windows). What you want to do is start the client load test and check to see if any client requests are blocked by a bottleneck in the application. You will capture snapshots of the application stack traces while the client load test is submitting requests. What you will want to see in the stack traces is that each client request is performing work on behalf of the client. What you are likely to see the first time is that many of the client requests are waiting to get a synchronization lock.

Perhaps the bottleneck may be a synchronized "Hashtable" that contains a list of currency exchange rate information for the current day. Assume that we have written a stock portfolio management application that shows the real-time valuation. The application reads the exchange rate from the "Hashtable" as it converts each holding to the configured currency for display. You can design the application with the fastest algorithms, but a bottleneck like this synchronized "Hashtable" of exchange rates will prevent your application from scaling well. A solution to eliminate the bottleneck could be to use a reader/writer lock that shares the read lock on an unsynchronized "HashMap" of exchange rates with all threads. When the exchange rates change and need to be updated, an exclusive write lock will need to be obtained. Read Implementing a Data Cache using Readers And Writers to learn about the implementation of reader/writer lock solution. You may also find that the Application Server code is the bottleneck; please report this to the vendor with the thread stack trace information so that the problem may be fixed. Fix any bottlenecks that you discover and keep repeating step five until the bottlenecks are eliminated.

Step six: Run the application under the client load test created in step #3. Now measure your application performance to see if the desired goals have been met. Many of the commercially available client load testing tools have the ability to measure the performance of the application under specific loads. If the application does not scale to a large number of users, you should repeat step five to see if you missed a bottleneck. If the application response time is not fast enough, you should repeat step four to improve the execution path performance further. You can also try to profile the application while it is running under the heavy client load, but be aware that the profiler may not always be able to handle it, as it may run out of memory.


Summary

In this article, we discussed what can be done to improve application performance. Many development projects concentrate on correct functionality first and good performance last . You should design for good performance at the start of a new project and make sure you do your performance tuning before the application is delivered to the customer. Define your performance goals, build a structured plan and follow the steps outlined here. Performance tuning is mostly a science of careful measurement of the execution paths through the application. Improve the performance of the heavily traveled paths through the application and you will meet the service level agreement requirements.


About the author

Scott Marlow is a manager in the SilverStream eXtend Application Server development group. Scott has worked in the Application Server group at SilverStream for the past three years, contributing to the development of the Application Server. Scott helped improve the performance of the SilverStream Application Server as well as help customers with application performance problems. Previously he developed Systems Management Solutions, Application Servers, Client/Server development tools and Database Servers. Contact info: [email protected]

Dig Deeper on Development tools for continuous software delivery