Friday, May 8, 2009

SOA/BPM performance best practices (System and subsystem configuration)

This section handles some scalability related tuning knobs and some relevant tuning parameters for the involved subsystems like the JVMs and the databases.

Clustering topologies:
In order to take care of growth and workload distribution, modern business process engines can run in a clustered setup, spreading the workload across various physical nodes (horizontal scaling) or for better utilizing spare resources within existing nodes (vertical scaling).
For WPS three different cluster topology patterns have been identified and described e.g in http://www-01.ibm.com/support/docview.wss?uid=swg27010320&aid=1 or http://www.ibm.com/developerworks/websphere/library/techarticles/0703_redlin/0703_redlin.html .
The first pattern (shown on the left) is also known as the “bronze topology”. It consists of a single application server cluster, where the WPS business applications, the support applications like CEI and BPC Explorer, and the messaging infrastructure hosting the messaging engines (MEs) that form the system integration buses (SIBus) all reside within each of the application servers, that form the cluster.
This bronze topology is suitable for a solution, that comprises of only synchronous web services and synchronous SCA invocations, preferably with short running flows only.

The second pattern (shown in the middle) is also known as the “silver topology”. It has two clusters, the first one containing the WPS business applications and the support applications as before, but the messaging infrastructure is located in the second cluster.
This silver topology is suitable for a solution that uses long running processes, but does neither need CEI, nor message sequencing, nor asynchronous deferred response, nor JMS or MQ bindings, nor message sequencing mechanisms.

The third pattern (shown on the right) is called the “golden topology”. Compared to the previous patterns, the support applications are separated into a third cluster.
This golden topology is suited for all the remaining cases, where asynchronous processing plays a nontrivial role in the solution. It also provides the most “JVM space” for the business process applications that should run in this environment. If the available hardware resources allow for setting up this golden topology, then it is advisable to start with this topology pattern from the very beginning as it is the most versatile one.
What is not shown in the above figure is the management infrastructure, that controls the cluster(s). These consist of node agents and a deployment manager node as the central point of administration of the entire cell, these clusters belong to. A tuning tip for this management infrastructure is to turn off automatic synchronization of the node configurations. Depending on the complexity of the setup, this synchronization processing is better kicked off manually during defined maintenance windows in off-peak times.

JVM Garbage Collection
Verbose garbage collection is not as verbose as the name suggests. Those few lines of information that are produced, when verboseGC is turned on don't really hurt the system's performance. On the other side they can be a very helpful source of information when troubleshooting performance problems.
The JVM used by WPS V6.1 supports several garbage collection strategies: the Throughput Garbage Collector (TGC), the Low Pause Garbage Collector (LPGC), and the Generational Garbage Collector (GGC).
The TGC provides the best out-of-box throughput for applications running on a JVM by minimizing costs of the garbage collector. However, it has “stop-the-world” phases that can take between 100ms and multiple seconds during garbage collection.
The LPGC provides garbage collection in parallel to the JVM’s work. Due to increased synchronization costs, throughput decreases. If response time is more important than highest possible throughput, this garbage collector could be a good choice.
The GGC is new in the IBM 1.5 JVM. It is well suited for applications that produce a lot of short-lived small Java objects. As it reduces pause times it should be tried in such cases instead of the TGC or LPGC. When properly tuned, it provides the best garbage collection performance for SOA/BPM workloads. [http://www.redbooks.ibm.com/abstracts/redp4431.html]

JVM memory considerations
Increasing the heap size of the JVM of the application server can improve the throughput of business processes. However it should ensured, that there is enough real memory available to avoid that the operating system would start swapping. Detailed information on JVM parameter tuning can be found in [http://www.ibm.com/developerworks/java/jdk/diagnosis/].

Database subsystem tuning
To a large degree the performance of long running flows and/or human tasks in a SOA/BPM solution depends on a properly tuned, enterprise class database management system besides the afore mentioned application server tuning. This paper provides some tuning guidelines for IBM's DB2 database system as an example. Most of the rules should also be applicable to other production database management systems.
It is not advisable to use simple file based databases like Cloudscape or Derby as a database management system for WPS other than for the purpose of unit testing.

Configuration advisor
DB2 comes with a built-in configuration advisor. After creating the database, the advisor can be used to configure the database for the usage scenario expected. The input for the Configuration Advisor depends on the actual system environment, load assumptions, etc. Details on how to use this advisor can be found in [http://www-01.ibm.com/support/docview.wss?uid=swg27012639&aid=1]. Some parameter settings in the output of the advisor should be checked and adjusted afterwards.

MINCOMMIT A value of ‘1’ is strongly recommended. The advisor sometimes suggests other values.
NUM_IOSERVERS The value of NUM_IOSERVERS should match the number of physical disks (+2) the database resides on.
NUM_IOCLEANERS Especially on multi-processor machines, enough IO cleaners should be available to make sure that dirty pages in the bufferpool are written to disk. Provide at least one IO cleaner per processor.

Database statistics
Optimal database performance requires the database optimizer to do its job well. The optimizer acts based on statistical data about the number of rows in a table, the use of space by a table or index, and other information. When the system is set up, these statistics are empty. As a consequence the optimizer usually takes sub-optimal decisions, leading to poor performance.
Therefore after initially putting load on your system, or whenever the data volume in the database changes significantly, you should update the statistics by running the RUNSTATS utility (DB2). Make sure there is sufficient data (> 2000 process instances) in the database before you run RUNSTATS. Avoid running RUNSTATS on an empty database as this will lead to bad performance.

Enable Re-Optimization
If BPC API queries (as used by the BPC Explorer e.g.) are used regularly on your system, it is recommended to allow the database to re-optimize SQL queries once, as described at [http://www-01.ibm.com/support/docview.wss?rs=2307&uid= swg21299450]. This tuning step greatly improves the response times of BPC API queries. In Lab tests the response time for one and the same query has been reduced from over 20 seconds down to 300 milliseconds. With improvements in such orders of magnitude the additional overhead for re-optimizing SQL queries should be affordable.

Database indexes
In most cases the BPM product's datastore has not been defined such, that all the database indexes that might potentially be used have been defined. In order to avoid unnecessary processing out of the box, it is much more likely, that only those indexes have been defined, that are necessary to run the most basic queries with an acceptable response time.
As a tuning step one can do some analysis on the SQL statements resulting from end user queries to see, how the query filters used by the end user (or in the related API call) relate to the WHERE clauses in the resulting SQL statements and define additional indexes on the related tables to improve the performance of these queries. After defining new indexes, the above mentioned RUNSTATS action needs to be run to enable the use of the newly created indexes.
Sometimes customers are uncertain about whether they are turning their environment into an unsupported state when defining additional indexes. This is definitively not the case. Customers are even encouraged to apply such tuning steps and check whether they help. If not, they can be undone easily e.g. by removing the index.

Further database tuning
Any decent database management system can keep its data in memory buffers called bufferpools to avoid physical I/O. Data, that is in these bufferpools needs not be read from disk when referred to, it can be taken from these memory buffers directly. Hence it makes a lot of sense to make these buffers large enough to hold as much data as possible.
The key tuning parameter to look at is called bufferpool hit ratio and describes the ratio between the physical data and index reads and the logical reads. As a rule of thumb you can increase the size of the buffer pools as long as you get a corresponding increase of the bufferpool hit ratio. A well tuned system can easily have a hit ratio well above 90%.
WPS accesses it's databases in multiple concurrent threads and uses row level locking to ensure data consistency during it's transactions. As a result, there can be a lot of row locks being active at times of heavy processing. The related database parameters for the space, where the database maintains the lock information might have to be adjusted.
For DB2 the affected database configuration parameters are LOCKLIST and MAXLOCKS. Shortages in this lock maintenance space can lead to so called lock escalations, where row locks are escalated to undesirable table locks, which even can lead to deadlock situations. Data integrity is still maintained in such situations, but the associated wait times can severely impact throughput and response times.

No comments: