Weblogic JDBC Connection Pool misconfiguration and how it can cause serious performance problems

Part 1 of 2

Connection Pool (mis)configuration has long been a pet peeve of mine, and we have only recently changed our standards after a series of serious performance problems on our main production databases. In my opinion this is so important that I will post two blogs on this. This post will be about configuration parameters to minimize connections and disconnections. The second post will be about how high concurrency can lead to such severe degradation that the only way to recover is to shut down the database.

This post is going to focus on why a high frequency of connections and disconnections is bad for your Oracle database. There was a 9i Oracle documentation article called Top Ten Mistakes Found in Oracle Systems and the first and foremost point was “Bad Connection Management” where they stated:
“The application connects and disconnects for each database interaction. This problem is common with stateless middleware in application servers. It has over two orders of magnitude impact on performance, and it is totally unscalable.”

So over ten years ago, Oracle had the vision to alert its DBA’s about this, and people chose to ignore this prophetic advice at their own peril. The middleware in this case is Weblogic, and setting up JDBC connection pools in a dynamic manner leads to a high frequency of connections and disconnections. Two orders of magnitude means that performance could be affected by over a hundred-fold, and this is obviously unscalable. What this means is that if you expect your applications to scale with data growth over time, without performance degradation, they will not. Usually it does not matter how much hardware you throw at a legacy scalability problem – it will never be powerful enough because you will just reach your bottleneck more quickly. It’s not only the data that changes, it’s the number of users, the complexity and interaction of systems, and competition for resources on a server with finite resources.

Why should repetitive connections and connections be such a bad thing for the database? This is a question that has puzzled architects and developers for years, and some still don’t believe that it is bad. Let’s look at the amount of work the Oracle RDBMS and the server have to do for each connection. A new server process has to be created, which takes CPU and at least 20M of memory for 11g. The PGA has to be grown to possibly accommodate for future work areas. Oracle then has to check its dictionary and dependancy tree semantically and syntactically for user privileges, which has a large CPU cost. Just this check can comprise of scores of dictionary reads with its associated parses, executes and fetches. After all of this work, then there is the matter of executing logon triggers, if any, and then all the overhead of finding space for that new session’s SQL in the SGA, and all future DML of the session.

If you have the space for keeping a slightly higher number of session s connected, it is really far better to do so.  It is good practice to acquire a connection, do the maximum work needed for that connection, then return that connection to the pool as soon as possible. Having a connection pool create a new session, do some work, then disconnect (shrink the pool) is the worst practice.

So, for many years I had this argument with architects and developers, and it was eventually agreed to keep connection pools static. This link states that an easy way to boost performance is to keep INIT and MAX the same. In this way they could never grow or shrink right? Wrong. If you were in a situation where the application server thought a connection was down, but Oracle though it was up, the setting of MIN and MAX to the same value is not going to prevent a flood of connection attempts when oracle becomes available. Weblogic tests connections periodically and if the result does not return timeously, Weblogic will create a new connection regardless and remove the non-functioning one from the pool. So we decided it was better to set the Shrink Frequency (Harvest Interval) to a high value like 30 minutes, so that the frequency of disconnections will not be a problem.

So these are the values we proposed in our new standards, after many discussions and meetings with architects:

  • Select the MAX (Maximum Capacity)
  • Set the MIN (Minimum Capacity) to be at least half (50%) of MAX so that the connection pool cannot shrink below this value
  • Set the INIT (Initial Capacity) to be at least quarter (25%) of MAX. This is the initial number of connections created when the application server is started
  • Set the Shrink Frequency (Harvest Interval) to 1800 (30 minutes). This setting is critical in managing the frequency of disconnections.
  • Set Connection Creation Retry Frequency to 120 (2 minutes)
  • Set Test Frequency to 120 (2 minutes). This is to poll the database to test the connection. This defaults to “SELECT 1 FROM DUAL” which bypasses the buffer cache so it’s not bad.
  • Set Seconds to trust an Idle Pool Connection to 10.
  • Cache for prepared statements must be set to 20 or higher. Setting this higher should be discussed with your DBA. The total number of cached cursors would be the cache setting * number of connections in pool * number of servers in cluster. This should be set very carefully. That calulation equates to the number of open cursors allowed per session, so if it is set too low then cursors are repeatedly closed and opened. This leads to excessive hard parsing of SQL statements which can also lead to performance degradation. In fact “Bad Use of Cursors” is the number 2 point on the “Top Ten Mistakes” list, and it says that this has an order of magnitude impact in performance, and is totally unscalable.

I am not saying that this is necessarily the most optimum way of doing things. In your situtation, setting your pools to static might be the easiest and best implementation method. What I am saying is be very careful of repetitive Oracle session connections and disconnections and make sure you that use your existing CP sessions as efficiently as possible.

In Part 2 I will discuss the more complicated topic of high concurrency where I will dive into hairy internals like latches and mutexes. Happy computing and thanks for reading!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s