/
Cluster Configuration

Live Forms v6.1 is no longer supported. Click here for information about upgrading to our latest GA Release.

Cluster Configuration

 

 

 supports clustered form servers for both high availability, load balancing and fault tolerance. The software can be installed and configured either as a stand-alone single form server or as a cluster of form servers behind a load balancer.

NOTE that only certain version include clustering. Please refer to the Release Notes for details.

A Form Server Cluster is instrumental in heavy load scenarios where a single form server cannot handle the load alone. Clustering also enables fault tolerance should one node fail for either design time or use mode forms. If one form server is shutdown the form instances automatically get transferred to another running form server node.

 supports clusters running under either Apache web servers or Oracle Web Logic servers. This chapter details how to enable a  cluster running under an Apache web server.

On this page: 

 

Apache/Tomcat Cluster Configuration

This documentation assumes that you are already familiar with the software installation and configuration of an Apache Web Server.

  1. Download and install the Apache Web Server software on a single machine that will serve as the  Tomcat load balancer.
  2. Download and unzip  as described in the chapter Downloading and Installing, on each machine participating in the cluster.
  3. Configure Apache to perform the required load balancing function
  4. Configure Tomcat. This configuration must be repeated for all  tomcat server participating as a clustered form server.

The details for steps three and four above follow.

Configure Apache Load Balancer

These steps should be performed by a system admin who is familiar with installation and configuration of Apache Web Servers. These steps refer to the location where you installed the Apache software as <apache-installdir>

  1. Edit <apache-installdir>/conf/httpd.conf
  2. Uncomment these lines to enable mod_proxy:
    1. LoadModule proxy_module modules/mod_proxy.so
    2. LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
    3. LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
  3. Uncomment this line to enable the apache server performance monitor in balance-manager:
    1. LoadModule status_module modules/mod_status.so
  4. Add this line (Note this was added to the area of http.conf that has "Supplemental..")
    1. Include conf/extra/tomcatcluster.conf
  5. Create file named tomcatcluster.conf in <apache-installdir>/conf/extra with the following lines:
<Proxy balancer://mycluster>
        BalancerMember ajp://localhost:8009 route=jvm1 ping=60
        BalancerMember ajp://localhost:9009 route=jvm2 ping=60
</Proxy>
ProxyPass /frevvo balancer://myCluster/frevvo stickysession=JSESSIONID
<Location /balancer-manager>
        SetHandler balancer-manager
        Order Deny,Allow
        Deny from all
        Allow from 172.20.25.52
</Location>

Important Notes regarding the tomcatcluster.conf file content:

  1. The ports (8009, 9009) and route names (jvm1, jvm2) shown in the example above can be any ports and route names you want. The critical point is that they must match exactly what you setup in the Tomcat server.xml files described below in Configure Tomcat Form Servers.
  2. The ajp localhost can only be used in Same Machine Clusters. See Configure Tomcat Form Servers below for more details of Same vs Distributed Tomcat form server clusters.
  3. The BalancerMember paramenter ping=60 forces an AJP ping (to wait for at most 60 seconds before giving up) to the server before requests are delivered. This ensures that a request is only sent when there is a healthy Tomcat instance ready to serve. Without the ping parameter the Tomcat form servers may return a response of 'service temporarily unavailable'. If you are seeing this error increase the ping timeout.
  4. The <Location /balancer-manager> element is for the sole purpose of enabling access to the Apache balance-manager: http://localhost/balancer-manager
    • Add an Allow from line for each ipaddr of the Tomcat form server nodes in your cluster. The ipaddr 172.20.25.52 is just an example.

 Here is another sample tomcatcluster.conf file configured for a Distributed Tomcat form server cluster. Note the use of specific ipaddrs rather than localhost.

<Proxy balancer://mycluster>
        BalancerMember ajp://172.20.25.52:8009 route=jvm1 ping=60
        BalancerMember ajp://172.20.25.115:9009 route=jvm2 ping=60
</Proxy>
ProxyPass /frevvo balancer://myCluster/frevvo stickysession=JSESSIONID
<Location /balancer-manager>
        SetHandler balancer-manager
        Order Deny,Allow
        Deny from all
        Allow from 172.20.25.52
        Allow from 172.20.25.115
</Location>

Configure Tomcat Form Servers

The first step is to install the  software on each machine that will become a node in your cluster. In order to provide scalability and fault tolerance it's most likely that each form server node will run on a separate physical machines - a distributed cluster. However it is possible to install two nodes on the same physical machine. This documentation covers both scenarios.

First we will cover the distributed cluster configuration where each form server node runs on its own separate physical machine. The details of each step outlined below are discussed below the steps.

  1. Follow the steps documented under Downloading and Installing to install a copy of  on each machine.
  2. Edit <frevvo-home>/frevvo/tomcat/config/server.xmlin each installation to:
    1. Setup the AJP jvmRoute
    2. Uncomment the <Cluster> element
    3. Change the value of the channelSendOptions="8"> in the cluster section to 6. This makes sure that session replication is synchronous and messages are not received out of order.
  3. Edit <frevvo-home>/frevvo/tomcat/bin/setenv.bat on windows or setenv.sh on linux to add cluster setenv parameters.
  4. Edit <frevvo-home/frevvo/tomcat/conf/Catalina/localhost/frevvo.xml 
    1. Uncomment the <Manager> element for DeltaManager.
    2. Ensure that the form server's database configured in frevvo.xml is the same database used for ALL cluster nodes.

The same node cluster configuration is when two or more form server nodes are installed on the same physical machine. This configuration requires several additional steps mostly due to the requirement that each form server needs it's own unique port numbers.

AJP jvmRoute 

To setup the AJP jvmRoute, search <frevvo-home>/frevvo/tomcat/config/server.xml for the following lines:

<!-- You should set jvmRoute to support load-balancing via AJP ie :
<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1">
-->
<Engine name="Catalina" defaultHost="localhost">

Uncomment line 2 and put add comment characters around line 4. This is how these lines in sever.xml should look after your updates:

<!-- You should set jvmRoute to support load-balancing via AJP ie : -->
<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1">
<!-- <Engine name="Catalina" defaultHost="localhost"> -->

Next change jvm1 so that it is unique on each node. For example on your 1st cluster node stick with "jvm1". On your 2nd cluster node change it to "jvm2". The exact string is not important. What is important is that each cluster node has its own unique string. Also important is that the jvmRoute strings you select are the same ones you add to your Apache tomcatcluster.conf file.

<Cluster> Element

Search <frevvo-home>/frevvo/tomcat/config/server.xml to locate the <Cluster> element. Uncomment the <Cluster> element. Before your edits the server.xml contains the <!-- characters before and --> characters after the many lines that constitute the <Cluster> element:

<!-- 
   <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                 channelSendOptions="6">
          <Channel className="org.apache.catalina.tribes.group.GroupChannel">
            <Membership className="org.apache.catalina.tribes.membership.McastService"
                        address="228.0.0.4"
                        port="55555"
                        frequency="500"
                        dropTime="3000"/>
            <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                      address="auto"
                      port="4000"
                      autoBind="100"
                      selectorTimeout="5000"
                      maxThreads="6"/>
            <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
              <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
            </Sender>
            <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
            <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
          </Channel>
          <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                 filter=""/>
          <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
          <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
          <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
      </Cluster>
 -->

After your edits the comments characters in the first and final line should be deleted. Note it is important the the McastService port is the same on all cluster nodes. This will be the case if you uncomment these lines already in server.xml. If you need to change this port number make sure you change it on all cluster nodes.

<Manager> Element

All clustered nodes require the <Manager> element in frevvo.xml to be switched from PersistentManger to DeltaManager. Edit <frevvo-home/frevvo/tomcat/conf/Catalina/localhost/frevvo.xml and comment out the PersistentManager and uncomment the DeltaManager. After the changes your frevvo.xml file should look like this:

<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false"
notifyListenersOnReplication="true"/>
<!--
<Manager className="org.apache.catalina.session.PersistentManager" saveOnRestart="false">
<Store className="org.apache.catalina.session.FileStore"/>
</Manager>
-->

Setenv Parameters

All clustered nodes require the addition of the following parameters to <frevvo-home>/frevvo/tomcat/bin/setenv.bat on windows or setenv.sh on linux:

-Dfrevvo.metadata.cache-config=/WEB-INF/cache-clustered.xml 
-Dfrevvo.cache.config.file=cache-tcp.xml 
-Djgroups.tcpping.num_members=2 
-Djgroups.tcpping.initial_hosts=172.20.25.69[7801],172.20.25.115[7801] 
-Dfrevvo.cluster=clusterB -Djgroups.tcpping.bind_port=7801 
-Djgroups.bind_addr=172.20.25.69 
-Djava.net.preferIP4Stack=true

Note:

  • num_members=2 - must be set to the actual number of clustered form servers
  • initial_hosts=172.20.25.69[7801],172.20.25.115[7801]  - must be set to the actual ipaddrs of each machine in your cluster and port 7801 must be available.
  • bind_addr=172.20.25.69 - must be set to the actual ipaddr of the machine this setenv file is installed on. Do not use localhost.

Cluster Configuration Properties

Single Sign-on

To enable single sign on in a cluster, uncomment the following parameter in each of the <frevvo-home>/frevvo/tomcat/config/server.xml configuration files of all cluster nodes.

<Valve className="org.apache.catalina.ha.authenticator.ClusterSingleSignOn" />

MaxIdle

frevvo.instances.maxIdle is now set to 8 hours. See this in cache-clustered.xml in frevvo.war. frevvo.instances.maxIdle controls for how long (in millis) using/editing forms/flows are kept. I have changed this value to 1728000000 to match web.xml, which is configured to 8 hours.  

<expiration maxIdle="${frevvo.instances.maxIdle:1728000000}" wakeUpInterval="5000"/>

Blah Blah Blah... Work in Progress!!!!!

Special Considerations

Upload Control

The upload control is not currently fault tolerant. Uploaded file attachments are only stored with the node that processed the upload. If the node that is processing your form instance shuts down, all attachments are lost. The user will notice this only If the user refreshes the form or tries to add/delete another attachment. In this case the upload control is refreshed and the current state of all prior attachments having been removed becomes visible. If the user does not refresh the form or add/delete another attachment the form will continue to show the lost attachments. However when the form is submitted they will not be part of the submission. This out of sync behavior will be addressed in a future software update.

A second scenario is a form that is ready to submit as all required fields are filled and a required upload control (max=1) has an attachment already uploaded. If the node processing this form instance shuts down and the next action on the form is the user clicking the submit button, the user will see the form refresh because the upload control attachment was deleted and the form requires an attachment. The form will refresh to its orginal UI state meaning the Section controls and Tabs will go back to their initial selected/expanded states. This may seem odd to the form user. At this point all the user has to do is to re-upload the attachment and submit the form. The UI state going back to the default will be improved in a future release.

Trying Templates

Trying form templates from the  Templates menu are not fault tolerant. If you are trying a form template and the node the form instance is running on fails, the form will not continue operating propertly. This is a temporary limitation and may be lifted in a future release. 

Service Temporarily Unavailable

This message is related to the mod-balancer load balancer. If you see this message refer to your Apache mod-proxy documentation for two configuration properties:

  • retry (default 60secs) - "Connection pool worker retry timeout in seconds. If the connection pool worker to the backend server is in the error state, Apache will not forward any requests to that server until the timeout expires. This enables to shut down the backend server for maintenance, and bring it back online later. A value of 0 means always retry workers in an error state with no timeout."
  • ping (not set by default)- "Ping property tells the web server to send a CPING request on ajp13 connection before forwarding a request. The parameter is the delay in seconds to wait for the CPONG reply. This feature has been added to avoid problems with hung and busy Tomcat and require ajp13 ping/pong support which has been implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+. This will increase the network traffic during the normal operation which could be an issue, but it will lower the traffic in case some of the cluster nodes are down or busy. Currently this has an effect only for AJP. By adding a postfix of ms the delay can be also set in milliseconds."

Synchronize Clocks

It is very important that all clustered server machines must have their clocks synchronized.

Severe Startup Log Messges

There are some error message that may get logged in the Tomcat logfiles that can be safely ignored. These errors may occur if you update the frevvo.war file while Tomcat is running rather. It is best to stop the Tomcat  server, then update frevvo.war, then restart the Tomcat  server. If you always follow that process you should avoid these log messages.

Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/frevvo] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [qh] (value [qh@5bf7253e]) and a value of type [java.lang.Boolean] (value [true]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.eclipse.emf.ecore.xml.type.util.XMLTypeUtil.CharArrayThreadLocal] (value [org.eclipse.emf.ecore.xml.type.util.XMLTypeUtil$CharArrayThreadLocal@2006eb91]) and a value of type [char[]] (value [[C@26538d04]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [np] (value [np@374f1544]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [gt] (value [gt@28fd3fba]) and a value of type [gr] (value [gr@a62e15c]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [np] (value [np@374f1544]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@61b96457]) and a value of type [org.infinispan.context.SingleKeyNonTxInvocationContext] (value [SingleKeyNonTxInvocationContext{flags=null}]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jgroups.protocols.FlowControl$1] (value [org.jgroups.protocols.FlowControl$1@5dd7e765]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.infinispan.marshall.jboss.AbstractJBossMarshaller$1] (value [org.infinispan.marshall.jboss.AbstractJBossMarshaller$1@998c805]) and a value of type [org.infinispan.marshall.jboss.AbstractJBossMarshaller.PerThreadInstanceHolder] (value [org.infinispan.marshall.jboss.AbstractJBossMarshaller$PerThreadInstanceHolder@d73c52f]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jboss.marshalling.UTFUtils.BytesHolder] (value [org.jboss.marshalling.UTFUtils$BytesHolder@2bb843a4]) and a value of type [byte[]] (value [[B@4e60da68]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jgroups.protocols.FlowControl$1] (value [org.jgroups.protocols.FlowControl$1@38a30a0b]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Designer Edits Production Form

If a form is in use when a designer edits that form to add/remove/modify fields, as usual this creates a new form version but the form currently in use remains at the original version. The form user is not disrupted and continues to use the current form instance until the form is submitted. However if the cluster node running this form instance fails, when the form instance is reinstantiated on another cluster node it gets updated to the new form version. The user will see a session expiration error message.

This issue should happen infrequently if ever as designers do not often edit live production forms. And at the same time a cluster node must be shutdown or fail for this issue to be seen by the end user.

If this scenario ever occurs the following error will be logged to the frevvo log files:

16:39:52.515 |-WARN [ ajp-bio-8009-exec-3] [g.f.s.c.i.ValueHolderImpl] - Could not unmarshall value:
java.lang.IllegalStateException: The formtype is version incompatible. Formtype version = 2, expecting 1

Server Shutdown

Additional form server nodes can be dynamically hot swapped (added and removed) to/from a cluster. Once you have signaled one of the servers in your cluster to stop, that server will immediately start rejecting new form requests; wait for existing activities such as rule execution to complete and only then shutdown the node. Your servlet container (ex:Tomcat or WLS, etc) should handle this function correctly. 

For Tomcat the standard behavior waits only 2 seconds for orderly shutdown. servers many need more time specifically if they were in the process of executing a form business rule. The frevvo-tomcat bundle overrides this default via Tomcat's unloadDelay property and increases the orderly shutdown wait to 40 seconds.

Depending on a customer's specific usage you can increase this to a higher value by editing <frevvo-home>\tomcat\conf\Catalina\localhost\frevvo.xml. Edit the 1st line of frevvo.xml as shown here to change 40000 ms (40 seconds) to whatever you need.

<Context unpackWAR="false" path="/frevvo" swallowOutput="true" unloadDelay="40000">