Implementing a Data Cache using Readers And Writers
You may have come across a situation where you need to cache some data in your server. You then use this cache to service queries from incoming requests. You will also probably have a thread that listens for cache update events. This thread updates the cache in response to these events.
You may have come across a situation where you need to cache some data in your server. You then use this cache to service queries from incoming requests. You will also probably have a thread that listens for cache update events. This thread updates the cache in response to these events.
This leads to a problem as it's unlikely that both the threads servicing requests and the update thread will be able to use your cache simultaeneously. You may be tempted to use an exclusive lock such as a Java synchronised block to handle this. This will work but it is probably the worst way that you can do this from a performance perspective. This articles shows how you can use reader writer locks to accomplish the same thing using a much more efficient technique.
Reader writer locks.
The problem with the solution above is that a synchronised block only allows a single thread to access the protected resource. This is not optimal in our case as when we apply queries to the cache, we are not modifying it. Therefore if we only had threads reading the cache we would need no synchronization at all, all of the threads could access the data structure at the same time safely as we don't modify any data structures. But, we get a problem when a thread wants to change it. We may corrupt the cache if both readers and writers access the cache concurrently. A reader writer lock solves this problem. It is basically a software device that allows any number of 'reader' threads to access the protected resource. These readers must acquire a reader lock and release it after they have finished with the protected resource. The writers, like-wise, must acquire a writer lock to access the protected resource and release the lock when they are finished.
The behaviour of this will be the following. We will allow readers to freely access the resource without restriction. How-ever, if a writer needs access then we do the following:
- Queue all future readers.
Any new threads needing read access are blocked temporaily. - Wait until all active readers release their lock.
There may be reader threads that are currently accessing the resource after being previously granted read access. We must wait for these threads to finish and release the lock. - The write thread is granted access.
The existing readers have now released and we blocked all incoming read requests when the write request occured. The writer is now guaranteed exclusive access to the resource. - The write thread finishes and releases the lock.
The writer has now finished and releases his lock. We now unblock all the reader threads that were blocked waiting for the writer to finish and the system carries on as before.
So, we basically allow multiple readers to access the resource concurrently. When a writer needs access, it gets priority. It gains access the to resource when all current readers (at its time of request) finish and when it's finished, the readers once again gain unlimited access until the next writer needs access.
Where can I get a ReaderWriter lock?
Doug Leas site has an implementation of this. I recommend you use it. It's not hard to write one but why bother. I had one in the past, along with my own implementations of semaphores, thread pools etc, but Dougs stuff works and means you don't have to maintain it. Doug supports two kinds of readwrite locks. We will use the one that gives writers priority. The following pseudocode shows how you use it.
Creating the lock.
We need to create the read write lock. We need one per resource that we want to protect. I'm using a singleton class to do this.
import EDU.oswego.cs.dl.util.concurrent.*; public class MyLock { static ReadWriteLock myLock; public static ReadWriteLock getLock() { if(myLock == null) { synchronized (MyLock.class) { if(myLock == null) { myLock = new WriterPreferenceReadWriteLock(); } } } return myLock; } }
When ever our reader or writers need to get the lock then they call MyLock.getLock() to get access to the lock. The singleton creates the lock the first time it is accessed. Note the way we do this to avoid the need for a synchronized method. If the myLock variable is not null (i.e. the lock is already created) then we simply return it. Otherwise, we synchronize on the class (we're static) and we then need to check again whether the lock exists because another thread may have done this between the if and the synchronized. If myLock is still null then we create the lock. This lets us create one and only one lock and guarantee that only one lock will exist for this resource in the VM without paying a synchronize price each time the lock is requested.
Reader threads look like this.
public void serviceReader() throws InterruptedException { ReadWriteLock rwLock = MyLock.getLock(); Sync s = rwLock.readLock(); s.acquire(); try { // access the resource // in a read only fashion. } finally { s.release(); } }
Writer threads look like this.
public void serviceWriter() throws InterruptedException { ReadWriteLock rwLock = MyLock.getLock(); Sync s = rwLock.writeLock(); s.acquire(); try { // access the resource // in a writeable fashion. } finally { s.release(); } }
Rules of engagement.
Make sure both both readers and writers do not nest calls to acquiring their locks. If a thread acquires the lock and then later acquires it again then the locking code is not guaranteed. So, take care not to nest calls to the locking code. Ways you can be caught out as if you wraped the remote methods of all your read only session bean methods with the above reader lock code snippet. This seems logical, i.e. all requests for read data will try to get a read only lock. But, remember that if a session bean method invokes another remote method on it-self or another bean using the same lock then we will have the nested scenario as the calls will occur on the same thread.
In my Corba days, I would have put the calls to acquire and release in a request interceptor so that they would be make only when the request comes in from the network. The ORBs in the EJB servers today don't support these interceptor calls (with the exception of Ionas product). They are useful for stuff like this though.
If you must have nested calls like this then use a 'ReentrantWriterPreferenceReadWriteLock' lock instead of 'WriterPreferenceReadWriteLock' in the singleton class. Look up the documentation for these lock types.
Make sure that if you acquire the lock, you release it. You will lock out either all the readers or the writers if you allow an unpaired acquire to take place. This is why in our example, I put the code accessing the resource in a try block with the finally doing the release. This guarantees that the lock will be released.
If you're just executing a single instruction while the lock is held then a synchronized block is faster. But, if you're doing reasonable chunks of work while holding the lock then you'll notice the benefits of allowing multiple readers to access the shared resource concurrently.
A real world scenario please...
Imagine you're using an OR mapper product such as TopLink. You are taking advantage of caching for performance reasons. Unfortunately, there are legacy applications modifying the database directly. This means that your toplink cache can be stale. We want to update the toplink cache whenever these updates occur to minimise the time we have stale data in the cache.
So, easy you think. We put a trigger on the tables. The stored procedure called by the trigger sends a message on a pub/sub bus. Our application server VMs subscribe to this event and receive it. We start a thread in the application server that subscribes and blocks listening for these events. When an event arrives, it simply makes calls to the toplink apis invalidating the stale copies of what has changed in the database.
There is a little problem though. Toplink required that these apis could only be called when no-one else (read threads) were accessing the cache. This is the kind of problem that reader writer locks solve. I made all my incoming requests (whether they were read or write) acquire reader locks. The thread that calls the stale api gets a writer lock when the events arrive. This means that the application server accesses the cache at full speed normally. When a stale data event arrives then the incoming requests (that need reader locks) get blocked until the stale data code is executed. The incoming requests then continue.
Don't get confused here. The reader writer locks have nothing to do with reading and writing actually. It is really a lock that splits parties that need access to the resource in to two groups. Anyone from the first group can concurrently access the resource. How-ever, when someone from the second groups needs access, we block everyone in the first group until we get access. So, the second group has priority over the first group. In our example, the threads processing incoming requests are in the first group whether they read or write. The stale event thread is the only party in the second group. Stop and think about this until you understand.
How to detect changes to the database?
We can place triggers on a table. These triggers can call a stored procedure when-ever a record is updated/inserted or deleted. If we are caching the data in our application server then obviously when this happens we will want to know about it. We can split databases in to two categories when we look at implementing this. Oracle and everybody else.
Oracle 8i
Oracle is the most elegant. I'm assuming you're using Oracle 8i. Oracle 8i has a builtin JMS mechanism. We can define a queue in Oracle. Our pl/sql (or Java) stored procedure gets attached to the trigger. The stored procedure enqueues a message on the Oracle queue. The message would describe the event that happened. Our application servers will run a thread inside the application server VM that subscribe to and then listen to this queue. When a legacy system or an application server modifies the database then our triggers will call this stored procedure. It then enqueues the message. If the transaction is commited then the message is commited and sent to the listeners. If the transaction is rolled back then it is not commited and not sent. If the transaction is commited then all our application servers will receive the message. They can then use the message contents to see what happened and update the cache accordingly.
Oracle 8 includes a full JVM. You could as an alternative use a Java stored procedure to multicast the event or send the event any way you choose. This isn't possible using Sybase or DB2 as they don't include the java.net package in the VM inside the database.
If you're using an older version of Oracle then there is a way to send messages over an Oracle connection from the server to a client. Check your OCI manual for details. This message will be sent though regardless of the transaction outcome unlike the Oracle AQ solution.
Sybase
We can do this several ways with Sybase. Sybase can send a udp datagram from a stored procedure. The stored procedure to do this is called syb_sendmsg. Our application servers can listen for this UDP datagram and see the changes. How-ever, the datagram will be transmitted whether the transaction commits or not. So, this means we'll get datagrams when no changes actually happened.
Another approach would be to use Sybase Event Broker. We can setup database replication. The Event Broker application can act as a replication subscriber to our database. When ever the database is modified then a replication event will be sent to Event Broker. We can parse this replication event and in turn send a message using a JMS transport to our listening application servers. You could also do this using a demon using the Sybase replication agent API but it may be more difficult.
You could also write a Sybase Open Server that sends a message to the listening application servers. The Sybase trigger can then call a stored procedure in the Open Server when a modification occurs.
Sybase V12's Java inside the database is not so useful here as they don't include the java.net package which means you can't communicate with anything outside the database from Java except as a stored procedure call. Sybase have also included Java as an extra cost item when the other vendors include Java as standard.
DB2
Here your options for Java stored procedures are limited as they don't include the java.net package. You could use a C fenced stored procedure how-ever to send a message to the listening application servers but again this would not be transactional. If we had a subsequent rollback then the message would have been sent in any case.
Caches in Servlet/JDBC applications.
Nothing really changes. The same techniques apply. If you use a caching layer then just use a reader/writer lock in this caching layer. Make sure that all read-cache APIs acquire a reader lock while all modification APIs use the writer locks. Normally, the only modification source would be the JMS/datagram listener. When the JMS/datagram listener sees a database modification message then it gets a writer lock and updates the cache.
Conclusion.
So, this shows how we can use a reader writer lock to allow a group of threads to have concurrent access to a shared resource. The lock allows a second class of threads to gain prioritised exclusive access to the shared resource also. This type of lock is very useful when you need to maintain any type of shared cache that must be updated safely. There are other scenarios also of course but I'm just pointing out this one for the purposes of this article.