3.6. OLAT Clustering and Scalability Concepts

This chapter describes clustering and scalability concepts introduced to OLAT 6.1. These concepts help to distribute OLAT over several tomcats with a load balancer in front. It is not designed for high availability, sessions on a node which goes down are kicked out. But it allows to go beyond the 2-3GB JVM RAM boundary by running multiple tomcats each with 2.5GB JVM RAM assigned.

Note that OLAT can either be run in singleVM or in cluster mode. Nevertheless the developer needs to know about clustering mechanisms even if he or she would not run an OLAT in cluster mode.

Hence when you use Caches, EventBus or any type of locking or synchronization read this section carefully to ensure that your code works in both singleVM and cluster mode deployments

When resources are used from within a single Java VM and are to be protected by serializing access to them, the synchronized keyword is used or the classes from the java.util.concurrent package. Since all calls go through the same JVM, this works fine.

When running OLAT in cluster mode (see chapter about the OLAT cluster installation), then several JVM share access to the same database, the same filesystem, and the same Instant messaging server.

It is of course necessary to protect the access to those resources that are shared between JVMs in order to guarantee a consistent and stable system.

Since a developer should not have to care whether or not its OLAT is running in singleVM or cluster mode, several „infrastructure tools“ (java interfaces) have been introduced to alleviate the life of a developer.

All those tools can be accessed through the Coordinator, which can be obtained via

CoordinatorManager.getInstance()

The four tools provided are:

The next section explains what those tools are for

Syncer

The Syncer allows cluster-wide synchronization on a given OlatResourceable. That means that for each given OlatResourceable, at most one thread of all JVMs executes the „execute“ method of the provided SyncerCallback or SyncerExecutor concurrently.

public <T> T doInSync(OLATResourceable ores, SyncerCallback<T> action);

if you don't need a return object, then use the SyncerExecutor instead

public void doInSync(OLATResourceable ores, SyncerExecutor action);

An OlatResourceable is considered to be equal to another when both its type (a string) and its Id (a Long) are equal.

PS: In the current Syncer implementation the DB transaction is committed before doInSync returns. That means that you will have multiple commits during the handling of a request

PPS: In the current Syncer implementation doing so-called nested-doInSyncs is not allowed. Nested doInSync is when you are within a doInSync and call doInSync again. This will throw an Exception. The reason for not allowing this is to avoid potential, distributed deadlocks.

Implementation detail

In SingleVM mode, the OlatResourceable is converted to a String and then kept in a Map to serve as a monitor for the synchronized command.

public void doInSync(OLATResourceable ores, SyncerExecutor e) {
	synchronized (DerivedStringSyncer.getInstance().getSynchLockFor(ores)) {
		e.execute();
	}
}

In Cluster mode, first the same as in the SingleVM is done to limit concurrent select-for-update tries to one per JVM . Then the pessimistcLockManager is used, which obtains a write lock on the row in table Plock representing the olatresourceable. if the olatresourceable does not yet exist in the table, it is created first using another select for update on a row that is guaranteed to exist on system startup. (a global lock)

public void doInSync(OLATResourceable ores, SyncerExecutor e) {
	String asset = OresHelper.createStringRepresenting(ores, null);
	synchronized(DerivedStringSyncer.getInstance().getSynchLockFor(ores)){
		DBFactory.getInstance().beginTransaction();
		// acquire a db lock with select for update which blocks other
 		// db select for updates on the same record 
		// until the transaction is committed or rollbacked
		PessimisticLockManager.getInstance().findOrPersistPLock(asset);
		e.execute();
		// now release the lock - only if previously no other
 		// transaction was spawned.
		// commits only if the beginTransaction above was the first
 		// nested - otherwise the commit reaching level floor will 
		// actually commit it.
		// the transaction bracket is opened and closed here so that
 		// also non-db synchronized actions such as file system 
		// operation can be safely synced.
		DBFactory.getInstance().commit();
	}
}

Locker

This is the replacement for the old LockManager class .

It allows to obtain a lock on a given OlatResourceable.

Both non-persistent and persistent Locks can be aquired.

A non-persistent Lock is cleared when either the JVM crashes or (more often hopefully ;)) when the user that is the owner of a Lock logs off or when the controller that technically holds the lock releases its lock, e.g. in the dispose() method. Non-persistent locks are used for GUI-Locking.

A persistent Lock survives JVM startup/shutdowns and is used to protect access to resources where an explicit release of the lock is needed. (e.g. when editing a QTI document)

Try to avoid persistent locks in favor of non-persistent locks.

Implementation detail

non-persistent locks:

  • In SingleVM mode, a lock is essentially a LockObject held in a Map of the SingleVMCacher singleton.

  • In Cluster mode, the lock is written to the database to be cluster-capable.

The persisting locks are backup by a database table for both the singleVM and the cluster mode.

Cacher

The Cacher is a hierachical cache structure to cache data

If, for example, you would like to obtain a cache per identity per course (to e.g. be able to cache the assessment data for the user), then you must build the hierarchy as depicted above. The CourseEnvironment creates an AssessmentManager instance per Course which then obtains a CacheWrapper from the Cacher, and e.g. will create child CacheWrappers for its Course and request CacheWrappers for each identity of this course (normally lazily, that is only when data needs to be read). For each level the ehcache configuration is read from the olatcoreconfig.xml (e.g. a „Course“ entry is then applied to all child CacheWrappers which are created with an OLATResourceable which is of type „Course“)

Configuration parameters are:

  • max time to live

  • max time to idle

  • max elements

See also the Hints and cookbook section.

The hierarchy is only used for namespacing reasons. To e.g. obtain a cache for a certain user in a certain course, it is recommended that the responsible manager class holds a root-CacheWrapper and then for each course it creates a child CacheWrapper which in turn is used to create a CacheWrapper per user by using the appropriate method on the course CacheWrapper. The actual cached data for an identity resides in the CacheWrapper obtained for the Identity.

PS: Make sure to not store objects retrieved from the Cacher in instance variables. Otherwise an object might have been replaced with a newer version in the Cacher yet you don't notice as you're still holding the old object. Also it would violate the idea that the Cacher can control how many objects are kept in memory (memory management)

PPS: Note that there is currently an issue (OLAT-4424) where you can potentially run into a nested doInSync exception due to a cached object having to be reloaded

Implementation detail

The CacheWrapper class in OLAT is mainly a wrapper around the EHCache from the ehcache library.

The Cluster mode version is the same as the SingleVM version, except that it invalidates elements that were „put“ into or „removed“ from a cache by sending an ClusterCacheWrapperEvent to all other cluster nodes.

EventBus

The EventBus is used to broadcast messages which normally affect more than one user.

It offers channels (identified by a OLATResourceable) to separate between different topics of interests. Code can subscribe/register and unsubscribe/deregister.

If your code subscribes to the channel, it is also responsible to unsubscribe at the end of your object's lifecycle (e.g. in the dispose() method of your controller).

The event bus has a topic or publish/subscribe architecture – that is the sender of a message does not know

  • if there is anybody listening at all on a particular channel

  • when its message will arrive

The recepient on the other side should:

  • only take a short time to do its tasks

  • only invalidate data, not reload it (we can postpone that before it is rendered, if it even gets rendered at all)

  • do not synchronize on data (possibility of deadlocking)

  • know which channels it is interested in and only register to those

Implementation detail

In SingleVM mode, the event bus is mainly a Map per OLATResourceable which implements a listener/sender (observer) pattern.

In cluster mode, the JMS provider is used to serialize those messages and send them asynchronously to all cluster nodes. The messages to its own JVM are sent „inhouse“, that is, using a direct method call (this is more efficient and allows for one node to still run if all other nodes and the JMS bus is down).

Example usages of the tools

Serializing access to the database for special operations

Description of the problem:

Enrollment can be configured to only accept a certain number of people in a group.

When two users subscribe to the same group at the same moment with only only place being left, only one user can be accepted, and the other one must be rejected.

Solution: the following operations need to be performed and need to be serialized.

  • read how many users are already in a group

  • if the maximum is not reached yet, insert the user into the group

The 2nd point happens by adding an entry in the relationship table between identities and groups.

A common read_committed isolation is not enough here, since the database does not offer a constraint such as „the number of entries in the n-n table must not exceed the number which is described in the group“

How it was solved in SingleVM OLAT: with synchronized(olatresourceablestring) {...}

How it is solved in cluster mode olat: the Syncer uses a select for update on a special olat database table.

What effectively happens on the database in cluster mode is the following: (SQL is in pseudocode)

Table 3.3. 

Thread 1Thread 2
select for update from plock where asset=“businessgroup:12345“ select for update where asset=“businessgroup:12345“
select count(*) from group g, gr2ident g2i identity i where g2i.group = g and g2i.identity = i and g.id = xx - assuming thread 1 acquired the pessimistic lock before thread 2, then thread 2 has zu wait till thread 1 either rollbacks or commits.
if (count < allowed max) -> insert new tupel into gr2ident table (=adding a user to this group), else send message to the user's screen.  
commit the transaction thread 2 now gets the lock on the row with asset=“businessgroup:12345“ and continues with the select count(*)... as shown in thread 1

Things to avoid

Next is a list of things to avoid in coding, please add more to it.

  • Using ehcache's classes directly: always use Cacher interface, (or otherwise improve it)

  • Calling doInSync from within doInSync - aka nested doInSync - this causes an Exception to be thrown in the current Syncer implementation

  • Using too many doInSyncs or using doInSyncs frequently with the same OlatResourcable. Note that doInSync can become a bottleneck if many users/requests want to synchronize on the same OlatResourcable. Therefore try to use an OlatResourcable which is as specific as possible (hence can be potentially used in parallel by as few as possible users/requests).

  • Doing expensive operations within a doInSync. Remember that within a doInSync you are holding a cluster-wide synchronization lock and that you prevent any other user/request from calling into the doInSync with the same OlatResourcable. Therefore time spent within doInSync is very precious and should not be wasted. Do as few things in doInSync as possible.

  • Using Cacher.put when you actually wanted to have other cluster nodes be informed - use update/updateMulti instead

  • The above is true the other way round as well: Try to use Cacher.put where possible - as it doesn't require other Caches to be informed.

  • Keeping a reference to an object retrieved from the Cacher in an instance variable. Fetch the object from the cache instead of storing it somewhere in a field

  • Synchronous event calls: by design the EventBus is asynchronous. This is a good thing as it eliminates a potential performance problem whereby a thread needs to wait for a reply to its request. Always try to stay on the asynchronous side when using EventBus and don't implement anything synchronous.

Hints and cookbooks for the four „tools“

We describe a few tips, hints, and a kind-of-a-cookbook for the four tools explained in the previous chapter.

Cacher

Think about caching in general before using the cacher

  • when to use it („prove“ a cache is needed (you can use a cache in code, but set maxelements to 0 if not needed)

  • calculate the size per element to estimate memory usage

  • the single element to cache should have a certain size (tara/netto, payload)

put versus update versus updateMulti:

be vary about methods offered: in cacher, there are three ways of storing something in the cache

  • put: This puts a key/value pair into the cache - without notifying any other cluster cache (if in cluster mode).

  • update: This puts a key/value pair into the cache - notifying any other cluster cache of the change (if in cluster mode).

  • updateMulti: Same as update but with multiple key/value pairs. This is speed optimized compared to calling update n times!

Cache configuration:

in the file olatcoreconfig.xml there is a bean named org.olat.core.util.cache.n.impl.cluster.ClusterCacher and in the single JVM mode the bean is in the same file, but named org.olat.core.util.cache.n.impl.svm.SingleVMCacher The property „rootConfig“ with its children config describes how the caches are configured. The sample below means that the NewCachePersistingAssessmentManager has a cache to create subcaches per course (entry with key „CourseModule“) and from this cache a cache per identity (entry with key „Identity“ in this course is created. How does this now look from the Java side? In the class NewCachePersistingAssessmentManager, there is one static field

private static CacheWrapper assessmentMainCache = CoordinatorManager.getCoordinator().getCacher().getOrCreateCache(NewCachePersistingAssessmentManager.class, null);

when an instance is created per course, then also a cache is created.

private NewCachePersistingAssessmentManager(ICourse course) {
  this.course = course;
  courseCache = assessmentMainCache.getOrCreateChildCacheWrapper(course);
}

this cache per course („coursecache“) is now used not for caching, but only to create subcaches for each Identity in the course.

private CacheWrapper getCacheWrapperFor(Identity identity) {
  OLATResourceable ores = OresHelper.createOLATResourceableInstanceWithoutCheck("Identity", identity.getKey());
  CacheWrapper cw = courseCache.getOrCreateChildCacheWrapper(ores);
  return cw;
}

The sample configuration:

<bean class="org.olat.core.util.cache.n.impl.cluster.ClusterCacher" ...>
...
<property name="rootConfig">
 <bean class="org.olat.core.util.cache.n.CacheConfig">
   <property name="childrenConfig"><map>
    <entry key="org.olat.course.assessment.NewCachePersistingAssessmentManager">
     <bean class="org.olat.core.util.cache.n.CacheConfig">
	<property name="timeToLive" value="1" />
	<property name="timeToIdle" value="1" />
	<property name="maxElementsInMemory" value="1" />	
	<property name="childrenConfig"><map>
	 <entry key="CourseModule">
	  <bean class="org.olat.core.util.cache.n.CacheConfig">				   
	  <property name="childrenConfig"><map>
	    <entry key="Identity">
	     <bean class="org.olat.core.util.cache.n.CacheConfig">
	      <property name="timeToLive" value="0" />
		<property name="timeToIdle" value="60" />
		<property name="maxElementsInMemory" value="1000" />
	     </bean>
    	    </entry>
	   </map></property>
	</bean>
...

If no configuration for a certain class is given, then, upon request of such a cache, the configuration of its parent is taken (which again may be inherited from its parents)

In previous OLATs, configurations from the cache were either coming from the file ehcache.xml (now deprecated) or were hardcoded in code.

Configuration parameters:

Table 3.4. 

parameterEquivalent in EhcacheDescription
timeToLive timeToLiveSeconds From the ehcache docu: Sets the time to live for an element before it expires. i.e. The maximum time between creation time and when an element expires. Is only used if the element is not eternal. Optional attribute. A value of 0 means that and Element can live for infinity. The default value is 0.
timeToIdle timeToIdleSeconds From the ehcache docu: Sets the time to idle for an element before it expires. i.e. The maximum amount of time between accesses before an element expires. Is only used if the element is not eternal. Optional attribute. A value of 0 means that an Element can idle for infinity. The default value is 0.
maxElementsInMemory maxElementsInMemory From the ehcache docu: Sets the maximum number of objects that will be created in memory The Ehcache within the OLAT CacheWrapper is only used as a RAM-Cache and never written to disk (overflowToDisk=false, diskPersistent=false)

The Ehcache within the OLAT CacheWrapper is only used as a RAM-Cache and never written to disk (overflowToDisk=false, diskPersistent=false)

Syncer

  • Don't simply replace all „synchronized“ with CoordinatorManager.getCoordinator().getSyncer().doInSync(...); everywhere!

    • Only those which have synchronized resources which -would- affect the state of several JVM when in cluster mode.

    • Not those which synchronize on any GUI operation or similar, e.g. within usersession or such.

  • The Syncer will -always- be used when synchronizing within manager classes.

  • Try to make the synchronized code block as small and as fast as possible

    • make sure that a sync'ed block never takes longer than at most 10 seconds or such (special solutions are thus needed for gzipping large data directories), otherwise the database will throw a LockWaitTimeoutExceeded Exception.

  • For each synchronized keyword you introduce, explain that and why it does not have to be a doInSync. This can be done via

    //o_clusterOK by:<person>

  • Make sure to use GUILocks where that would be a better fit than doInSync!

EventBus

  • if possible, take an already existing and generic MultiUserEvent

  • a Multiuserevent must be serializable and should be small (network traffic) -> preferable only use base java types and list or sets of base types.

  • only send one multiuserevent within one call (at least from within one manager)

  • calls to the event bus should only happen from Manager classes (with register helper method incoming calls are then sent back to the controller)

Locker

Try to rather avoid than to solve conflicts by using GUI-Locks with the Locker facility. There are many examples already in OLAT.