JVM & Virtual Machines - Performance Considerations
If using Java in conjunction with virtual machines one needs to consider the following.
- As the number of JVM instances per VM is increased, more overhead/cost is incurred as each additional JVM is initialised on start.
- Memory in Java is managed inside the JVM and so efforts by the host virtual machine to 'optimize memory usage' by removing pages, will degrade Java application performance across the population of co-located JVM's.
If JVM's are co-located, then the required heap size for each JVM should be specified, and sufficient physical memory allocated to the underlying virtual machine to meet the total sum of JVM heap and operating system memory requirements.
To side step these considerations, Virtual Machine vendors usually suggest that instead of stacking multiple JVMs within a virtual machine, one should instead host more application instances per JVM by increasing the size of the JVM; adding more threads and enlarging the heap. However this approach simply shifts the focus of the problem from the VM to the JVM. Large heap sizes (beyond 4GB) may be required, and these may be problematic as they impact JVM performance via the increasing cost of garbage collection. Custom JVM's may be purchased with more efficient garbage collection to address the performance degradation; but this does not address the issue that multiple applications are now packed within a single failure domain.
Paremus recommend that the number of Service Fabric Fibre instances (JVM instances) per Atlas Agent (Physical / Virtual resource) should be ideally limited to 1; but no more than 4. If this upper limited needs to be exceeded your requirements should be discussed with Paremus.
Linux
'GConf ...' D-BUS Library (RedHat 6)
The Linux package dbus-1.2.24-7.el6_3.x86_64
used by desktop applications such as GNOME
and EMACS
is know to cause the JVM to crash on RedHat 6 versions of Linux. This package should be uninstalled and servers being used as Service Fabric Fibres.
Note: No longer seems the be a problem when using Java 8.
Use of 127.0.1.1 Loopback (Debian/Ubuntu)
As reported in http://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution
The IP address 127.0.1.1 in the second line of this example may not be found on some other Unix-like systems. The Debian Installer creates this entry for a system without a permanent IP address as a workaround for some software (e.g., GNOME) as documented in the bug #719621. The <host_name> matches the hostname defined in the "/etc/hostname".
Using the 127.0.1.1 loopback address bound to the /etc/hostname
with cause Service Fabric Fibres to fail to communicate with each other.
The /etc/hostname
must be bound to the public network interface that you expect the Service Fabric fibre to communicate over.
Atlas
Service Fabric start-up failure - opt/container/bin/posh: line 65: exec: java: not found
Ensure Java is installed on the nodes and JAVA_HOME
set
% atlas -f RAN -update=simple ACTION HOSTID CONTAINER REASON start test1[1] simple-infra [1..1] (!(group=*)) [test1:install simple-infra.1] Running [test1:start simple-infra.1] Error: start failed. check -host-status. % % % atlas --host-status=test1 filterAttrs {os.name=Linux, os.arch=x86, host=test1, os.version=2.6.25-14.fc9.i686} definition null fabric null group null lease null max 1 uri atlas://test1:4433?192.168.0.221 simple-infra.1 EXIT 127 simple-infra.1.log ----------------------------------------------------- opt/container/bin/posh: line 65: exec: java: not found
The Service Fabric expects Java to be either on the PATH
, or specified by the JAVA_HOME
environment variable. Typically these are either set system-wide in /etc/environment
or /etc/bashrc
, or set by an individual user in ~/.bashrc
.
Atlas is invoked by /etc/init.d/atlas
using ' su -c paremus ....
', so it should inherit the environment of the ' paremus
' user. Alternatively, you can set JAVA_HOME
in /opt/atlas-1.1.5/etc/atlas_env.sh
.
If a change is made to bashrc
or atlas_env.sh
, Atlas must be restarted for the new environment to take effect.
'libgcc_s.so.1 => not found' – or equivalent errors relating to missing 32-blt libraries.
Check missing Atlas dependencies on the Unix platform
# ldd linux/atlas-agent linux-gate.so.1 => (0x00838000) libresolv.so.2 => /lib/libresolv.so.2 (0x00b31000) libpthread.so.0 => /lib/libpthread.so.0 (0x005b6000) libdl.so.2 => /lib/libdl.so.2 (0x0053e000) libgcc_s.so.1 => not found libc.so.6 => /lib/libc.so.6 (0x00653000) /lib/ld-linux.so.2 (0x00fc3000)
Install missing library
# yum provides /lib/libgcc_s.so.1 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.linux.duke.edu * extras: mirrors.gigenet.com * updates: mirror.steadfast.net libgcc-4.4.7-3.el6.i686 : GCC version 4.4 shared support library Repo : base Matched from: Filename : /lib/libgcc_s.so.1 # yum install libgcc.i686
Installation
I do not have remote display access to the node to which I intend to install the Service Fabric. Given that I cannot run lzPack on the node, how should I install the Service Fabric software?
The lzPack has the option of creating an install script which may be used to install the Service Fabric to a compute node using the command line; see headless installation.
Troubleshooting Multicast
If no initial peers are set, the Paremus Service Fabric
will attempt to use IP multicast to discover fibres.
By default the discovery process uses the following IP address and ports:
224.0.1.84 jini-announcement - port 4160 224.0.1.85 jini-request - port 4160
If you intend to use Multicast and find that a subset of your fibre population is unable to join a Service Fabric
, check that multicast is configured appropriately.
Also note; when using a single machine there are configurations in which multicast packets are not propagated between processes.
Testing Multicast with the nicnack tool.
nicnack.zip ( click to download
) is a lightweight command line tool that can be used to provide initial diagnostics of network & host support for IP multicast. nicnack is network interface aware, i.e. it provides separate information for each network interface.
nicnack may run in one of three modes:
List mode*
This just lists all network interfaces on the machine, along with ip and host name information. For example:
./nicnack.sh list NICs: display name: vmnet8 name: vmnet8 ipv6: fe80:0:0:0:250:56ff:fec0:8 ip: 172.16.206.1 display name: vmnet1 name: vmnet1 ipv6: fe80:0:0:0:250:56ff:fec0:1 ip: 192.168.32.1 display name: wlan0 name: wlan0 ipv6: fe80:0:0:0:211:50ff:fe1b:d845 ip: 192.168.2.2 display name: lov name: lo ipv6: 0:0:0:0:0:0:0:1 ip: 127.0.0.1 -> localhost
Send mode
In this mode nicnack multicasts a custom message every two seconds to a specified multicast group and port. This can be read by other nicnackinstances in receive mode.
> ./nicnack.sh send 225.123.123.123 12345 hello vmnet8: preparing vmnet1: preparing lo: preparing wlan0: preparing vmnet8: configured socket vmnet1: configured socket wlan0: configured socket vmnet8: sending vmnet1: sending vmnet1: sent hello wlan0: sending wlan0: sent hello vmnet8: sent hello lo: configured socket lo: sending lo: sent hello vmnet1: sent hello vmnet8: sent hello lo: sent hello wlan0: sent hello vmnet1: sent hello wlan0: sent hello vmnet8: sent hello lo: sent hello
Receive mode.
In this mode nicknack listens for and displays messages sent to a specified multicast group and port by nicknack instances in send mode. Used in conjunction with Send mode this can be used to establish whether or not multicast traffic is successfully propagated between two hosts. Note that multicast visibility is not symmetric, i.e. host A's ability to send multicast packets to host B does not imply host B's ability to send packets to host A. A sample from a receive mode nicknack session follows.
> ./nicnack.sh receive 225.123.123.123 12345 vmnet8: preparing lo: preparing wlan0: preparing vmnet1: preparing wlan0: configued socket wlan0: receiving vmnet1: configued socket vmnet1: receiving vmnet8: configued socket vmnet8: receiving lo: configued socket lo: receiving wlan0: received 'hello' vmnet1: received 'hello' vmnet8: received 'hello' lo: received 'hello'
If, having run the nicknack utility, you suspect that IP multicast may be the issue, the following two areas should be looked at in more detail.
Firewalls
The security firewall on one, a subset, or all of your machines that are running the Paremus Service Fabric environment may be configured by default to block IP multicast traffic.
- Linux - To enable multicast send / receive capability for Linux systems, insert the following entry into the operating system's iptable
INPUT -d 224.0.0.0/4 -j ACCEPT
- Windows - In the case of Windows XP, by default, the Group Policy settings for the Windows Firewall are "Not Configured" for all objects. This allows the Windows Firewall to use its default settings, which are quite restrictive. With respect to multicast the default settings prohibits unicast response to multicast or broadcast requests.
On the relevant machines, edit Network > Network Connections > Firewall and set a disable policy from the following options.
Prohibit - unicast response to multicast or broadcast requests Not Configured - The incoming unicast response is accepted if received within 3 seconds. The setting can be overridden by a local administrator. Enabled - The incoming unicast response is dropped. This cannot be overridden by a local administrator. Disabled - The incoming unicast response is accepted if received within 3 seconds. This cannot be overridden by a local administrator.
For other Firewall products or alternative Microsoft operating system versions. Check relevant documentation.
Network Configuration
Simple layer II network switches treat multicast traffic in the same manner as broadcast traffic, that is, they will forwarded multicast packets to all active switch ports. If your Paremus Service Fabric test machines connect to a layer II network, comprising of one or more simple layer II ethernet switches (these interconnected without intervening layer III routers), then the network is unlikely to be the cause of IP multicast connectivity issues.
In more sophisticated environments, network infrastructure supports a mulitcast protocol known as IGMP. Within an IGMP enabled network environment, traffic associated with a multicast group is only forwarded to ports that have members participating in that group. A layer-2 switch supporting IGMP snooping can passively snoop on IGMP Query, Report and Leave (IGMP version 2) packets transferred between IP Multicast Routers/Switches and IP Multicast hosts (on each switch port), to learn the required IP Multicast group membership required by each port. The advantage of using IGMP snooping is that it generates no additional network traffic, whilst significantly reduce multicast traffic passing through your switch - as all multicast is only targeted to hosts that have registered interest in the multicast group.
If Paremus Service Fabric functions correctly when run on a single machine and also when run in a distributed environment with machines connected via a simple layer II network, but fails in a more complex network environments, then multicast configuration of the network is the the most likely cause.
In such circumstances politely explain the problem to your network administrators. The network administrators will be able to help you diagnose the issue in greater detail, and may be willing/able to disable IGMP snooping on the relevant switches to verify whether or not IGMP is a contributing factor.
Service Fabric License Management.
How do I install / update a Service Fabric license?
Usually the Service Fabric license is installed during the lzPack installation process process. If for some reason the appropriate license.ini
file is not available at installation time, it can be subsequently copied into the $install_root/etc
directory. The Fibre image should then be re-built as explained here. The same process is used to update an expired license.
System Documents
How do I request multiple instances of a Managed Service Factory part?
Managed Service Factory (MSF) parts can be created as follows:
<system.part category="msf" name="com.example.hello"> <property name="language" value="en"/> </system.part>
In this example, the name
attribute specifies both the name of the element within the document and the Persistent Identity (PID) of the configuration record.
In order to create multiple configuration records for the same PID, it is necessary to create two parts with different names, and override the name/PID mapping. This can be done by setting the part
attribute as follows:
<system.part category="msf" name="hello-english" part="com.example.hello"> <property name="language" value="en"/> </system.part> <system.part category="msf" name="hello-german" part="com.example.hello"> <property name="language" value="de"/> </system.part>
Note that the name
attribute is different, in order two create two distinct records, but both records have the same part
value and therefore both map to the configuration PID com.example.hello
.
Cannot connect to Fabric Management
If the redirector fails to locate Entire, it will print an error message: "Unable to redirect". This could happen if:
- There is no Infra Fibre currently running in the Fabric.
- Or a network issue prevents the selected Fibre from seeing the active Infra Fibre(s).
Repositories blocked by Proxy
When using Atlas place the equivalent of the following in the $FABRICHOME/var/atlas
start scripts.
System.setProperty("http.proxyHost", "localhost"); System.setProperty("http.proxyPort", "8080");
Alternatively if you are using an Environment
configuration file use the following JVM start flag.
-Dhttp.proxyHost=XX.XX.XX.XX -Dhttp.proxyPort=8080
Viewing Port Usage
View ports currently in use by a Fibre, using the Unix lsof
command:
$ lsof -P -p <PID> | grep LISTEN java 14523 derek 12u IPv6 0x12fd65a8 0t0 TCP *:55087 (LISTEN) java 14523 derek 15u IPv6 0x13266720 0t0 TCP *:55088 (LISTEN) ... ... ... java 14523 derek 41u IPv6 0x13329664 0t0 TCP *:55094 (LISTEN) java 14523 derek 53u IPv6 0x1331eff4 0t0 TCP [::10.0.1.9]:9012 (LISTEN) java 14523 derek 54u IPv6 0x131bce4c 0t0 TCP *:9013 (LISTEN) java 14523 derek 56u IPv6 0x12fd7ff4 0t0 TCP *:55114 (LISTEN) java 14523 derek 59u IPv6 0x1304c720 0t0 TCP *:55119 (LISTEN) ... ... ... java 14523 derek 151u IPv6 0x12fd6a70 0t0 TCP *:9021 (LISTEN) java 14523 derek 154u IPv6 0x1326480c 0t0 TCP *:55203 (LISTEN)
Setting Port Ranges
The base.Port is value is defined in /etc/posh/environment. Each Fibre uses a default port range starting at the base port and then incremented when each additional Fibre is started on the same host: (9000 + (100 * fibre instance-id).
system:setProperty posh.basePort (expr 9000 + 100 * $INSTANCE)
This value is then used as follows in etc/init.d/00-config
#################################################################### # PORT & NETWORK SETTINGS #################################################################### setcfg basePort (system:getproperty posh.basePort) basePort = (getcfg basePort) # http server port setsys org.osgi.service.http.port (expr $basePort + 0) # jmx server port setcfg jmxPort (expr $basePort + 1) # port used by clients outside the fabric to discover service locations # locationPort must also be changed in FabricAdmin/config.ini setcfg locationPort 49150 # limit the port ranges used to export remote objects setcfg minPort (expr $basePort + 10) setcfg maxPort (expr $basePort + 99) #setcfg maxPort 65535
- The
locationPort
parameter is used by the manage:detect command;locationPort
is not offset byinstance-id
, as when multiple Nimble nodes are running on the same host, only one of them is required to use thelocationPort
. minPort
andmaxPort
control the range of dynamically allocated ports for each Fibre.- The
httpPort
is the default value used by an HTTP service if not overridden by the OSGi Configuration Admin (usually inetc/nimble/cm.policy
).
The default is to assign dynamic ports in the range 9000 -> 9099
. However, this can be limited to a smaller range.
One may run the risk of port starvation if a too restrictive port range is used and too many services are installed into the same Fibre. Before modifying these values changes should be discussed with your Paremus Support Engineer.