Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Cluster Configuration 

 

Section
Column

 supports clustered form servers for both high availablityavailability, load balancing and fault tolerance. The software can be installed and configured either as a stand-alone single form server or as a cluster of form servers behind an a load balancer.

Info

NOTE that only certain version include clustering. Please refer to the Release Notes for details.

A Form Server Cluster is instrumental in heavy load scenarios where a single form server cannot handle the load alone. Clustering also enables fault tolerance should one node fail for either design time or use mode forms. If one form server is shutdown the form instances automatically get transferred to another running form server node.

 supports clusters running under either Apache web servers or Oracle Web Logic servers. This chapter details how to enable a  cluster running under an Apache web server.

Column
width300px

On this page: 

Table of Contents
maxLevel2

 

...

Configure Tomcat Form Servers

The firsts first step is to install the  software on each machine that will become a node in your cluster. In order to provide scalability and fault tolerance it's most likely that each form server node will run on a separate physical machines - a distributed cluster. However it is possible to install two nodes on the same physical machine. This documentation covers both scenarios.

...

  1. Follow the steps documented under Downloading & Installing to install a copy of  on each machine.
  2. Edit <frevvo-home>/frevvo/tomcat/config/server.xmlin each installation to:
    1. Setup the AJP jvmRoute
    2. Uncomment the <Cluster> element
    3. Change the value of the channelSendOptions="8"> in the cluster section to 6. This makes sure that session replication is synchronous and messages are not received out of order.
  3. Edit <frevvo-home>/frevvo/tomcat/bin/setenv.bat on windows or setenv.sh on linux to add cluster setenv parameters.
  4. Edit <frevvo-home/frevvo/tomcat/conf/Catalina/localhost/frevvo.xml 
    1. Uncomment the <Manager> element for DeltaManager.
    2. Ensure that the form server's database configured in <frevvo-home/frevvo/tomcat/conf/Catalina/localhost/frevvoin frevvo.xml is the same database used for ALL cluster nodes.

...

Search <frevvo-home>/frevvo/tomcat/config/server.xml to locate the <Cluster> element. Remove the xml comment characters before and after the Uncomment the <Cluster> element. Before you your edits the server.xml contains the <!-- characters before and --> characters after the many lines that constitute the <Cluster> element:

Code Block
<!-- 
   <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                 channelSendOptions="86">
          <Channel className="org.apache.catalina.tribes.group.GroupChannel">
            <Membership className="org.apache.catalina.tribes.membership.McastService"
                        address="228.0.0.4"
                        port="55555"
                        frequency="500"
                        dropTime="3000"/>
            <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                      address="auto"
                      port="4000"
                      autoBind="100"
                      selectorTimeout="5000"
                      maxThreads="6"/>
            <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
              <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
            </Sender>
            <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
            <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
          </Channel>
          <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                 filter=""/>
          <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
          <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
          <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
      </Cluster>
 -->

After your edits the comments characters in the first and final line should be deletedeleted. Note it is important the the McastService port is the same on all cluster nodes. This will be the case if you uncomment these lines already in server.xml. If you need to change this port number make sure you change it on all cluster nodes.

...

To enable single sign on in a cluster, uncomment the following parameter in each of the the <frevvo-home>/frevvo/tomcat/config/server.xml configuration files of all cluster nodes.

...

frevvo.instances.maxIdle is now set to 8 hours. See this in cache-clustered.xml in frevvo.war. frevvo.instances.maxIdle controls for how long (in millis) using/editing forms/flows are kept. I have changed this value to 1728000000 to match web.xml, which is configured to 8 hours.  

...

Blah Blah Blah... Work in Progress!!!!!

Special Considerations

Upload Control

The upload control is not currently fault tolerant. Uploaded file attachments are only stored with the node that processed the upload. If the node that is process processing your form instance shuts down, all attachments are lost. The user will notice this only If the user refreshes the form or tries to add/delete another attachment. In this case the upload control is refreshed and the current state of all prior attachments having been removed becomes visible. If the user does not refresh the form or add/delete another attachment the form will continue to show the lost attachments. However when the form is submitted they will not be part of the submission. This out of sync behavior will be addressed in a future software update.

Service Temporarily Unavailable

This message is related to the mod-balancer load balancer. If you see this message refer to your Apache mod-proxy documentation for two configuration properties:

  • retry (default 60secs) - "Connection pool worker retry timeout in seconds. If the connection pool worker to the backend server is in the error state, Apache will not forward any requests to that server until the timeout expires. This enables to shut down the backend server for maintenance, and bring it back online later. A value of 0 means always retry workers in an error state with no timeout."
  • ping (not set by default)- "Ping property tells webserver to send a CPING request on ajp13 connection before forwarding a request. The parameter is the delay in seconds to wait for the CPONG reply. This features has been added to avoid problem with hung and busy Tomcat's and require ajp13 ping/pong support which has been implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+. This will increase the network traffic during the normal operation which could be an issue, but it will lower the traffic in case some of the cluster nodes are down or busy. Currently this has an effect only for AJP. By adding a postfix of ms the delay can be also set in milliseconds."

Synchronize Clocks

...

A second scenario is a form that is ready to submit as all required fields are filled and a required upload control (max=1) has an attachment already uploaded. If the node processing this form instance shuts down and the next action on the form is the user clicking the submit button, the user will see the form refresh because the upload control attachment was deleted and the form requires an attachment. The form will refresh to its orginal UI state meaning the Section controls and Tabs will go back to their initial selected/expanded states. This may seem odd to the form user. At this point all the user has to do is to re-upload the attachment and submit the form. The UI state going back to the default will be improved in a future release.

Trying Templates

Trying form templates from the  Templates menu are not fault tolerant. If you are trying a form template and the node the form instance is running on fails, the form will not continue operating propertly. This is a temporary limitation and may be lifted in a future release. 

Service Temporarily Unavailable

This message is related to the mod-balancer load balancer. If you see this message refer to your Apache mod-proxy documentation for two configuration properties:

  • retry (default 60secs) - "Connection pool worker retry timeout in seconds. If the connection pool worker to the backend server is in the error state, Apache will not forward any requests to that server until the timeout expires. This enables to shut down the backend server for maintenance, and bring it back online later. A value of 0 means always retry workers in an error state with no timeout."
  • ping (not set by default)- "Ping property tells the web server to send a CPING request on ajp13 connection before forwarding a request. The parameter is the delay in seconds to wait for the CPONG reply. This feature has been added to avoid problems with hung and busy Tomcat and require ajp13 ping/pong support which has been implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+. This will increase the network traffic during the normal operation which could be an issue, but it will lower the traffic in case some of the cluster nodes are down or busy. Currently this has an effect only for AJP. By adding a postfix of ms the delay can be also set in milliseconds."

Synchronize Clocks

It is very important that all clustered server machines must have their clocks synchronized.

Severe Startup Log Messges

There are some error message that may get logged in the Tomcat logfiles that can be safely ignored. These errors may occur if you update the frevvo.war file while Tomcat is running rather. It is best to stop the Tomcat  server, then update frevvo.war, then restart the Tomcat  server. If you always follow that process you should avoid these log messages.

Code Block
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/frevvo] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [qh] (value [qh@5bf7253e]) and a value of type [java.lang.Boolean] (value [true]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.eclipse.emf.ecore.xml.type.util.XMLTypeUtil.CharArrayThreadLocal] (value [org.eclipse.emf.ecore.xml.type.util.XMLTypeUtil$CharArrayThreadLocal@2006eb91]) and a value of type [char[]] (value [[C@26538d04]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [np] (value [np@374f1544]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [gt] (value [gt@28fd3fba]) and a value of type [gr] (value [gr@a62e15c]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [np] (value [np@374f1544]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@61b96457]) and a value of type [org.infinispan.context.SingleKeyNonTxInvocationContext] (value [SingleKeyNonTxInvocationContext{flags=null}]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jgroups.protocols.FlowControl$1] (value [org.jgroups.protocols.FlowControl$1@5dd7e765]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.infinispan.marshall.jboss.AbstractJBossMarshaller$1] (value [org.infinispan.marshall.jboss.AbstractJBossMarshaller$1@998c805]) and a value of type [org.infinispan.marshall.jboss.AbstractJBossMarshaller.PerThreadInstanceHolder] (value [org.infinispan.marshall.jboss.AbstractJBossMarshaller$PerThreadInstanceHolder@d73c52f]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jboss.marshalling.UTFUtils.BytesHolder] (value [org.jboss.marshalling.UTFUtils$BytesHolder@2bb843a4]) and a value of type [byte[]] (value [[B@4e60da68]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Sep 4, 2012 11:15:45 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/frevvo] created a ThreadLocal with key of type [org.jgroups.protocols.FlowControl$1] (value [org.jgroups.protocols.FlowControl$1@38a30a0b]) and a value of type [java.lang.Boolean] (value [false]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Designer Edits Production Form

If a form is in use when a designer edits that form to add/remove/modify fields, as usual this creates a new form version but the form currently in use remains at the original version. The form user is not disrupted and continues to use the current form instance until the form is submitted. However if the cluster node running this form instance fails, when the form instance is reinstantiated on another cluster node it gets updated to the new form version. The user will see a session expiration error message.

This issue should happen infrequently if ever as designers do not often edit live production forms. And at the same time a cluster node must be shutdown or fail for this issue to be seen by the end user.

If this scenario ever occurs the following error will be logged to the frevvo log files:

Code Block
16:39:52.515 |-WARN [ ajp-bio-8009-exec-3] [g.f.s.c.i.ValueHolderImpl] - Could not unmarshall value:
java.lang.IllegalStateException: The formtype is version incompatible. Formtype version = 2, expecting 1

Server Shutdown

Additional form server nodes can be dynamically hot swapped (added and removed) to/from a cluster. Once you have signaled one of the servers in your cluster to stop, that server will immediately start rejecting new form requests; wait for existing activities such as rule execution to complete and only then shutdown the node. Your servlet container (ex:Tomcat or WLS, etc) should handle this function correctly. 

...

Depending on a customer's specific usage you can increase this to a higher value by editing <frevvo-home>\tomcat\conf\Catalina\localhost\frevvo.xml. Edit the 1st line of frevvo.xml as shown here to change 40000 ms (40 seconds) to whatever you need.

Code Block
<Context unpackWAR="false" path="/frevvo" swallowOutput="true" unloadDelay="40000">