Thursday, December 9, 2010

High Performance Java Environments

For the past several weeks I have been monitoring a set of Java business tier and portal servers for performance issues related to load and possibly configuration.  Both servers run in Windows 64 bit with 64 bit JVMs.  The portal server is using low-pause concurrent mark sweep garbage collection while the other is using a more traditional 32 bit garbage collection algorithm.  I am considering adding additional CPU cores to each box.

While I was trying to work inside the tight boundaries of Windows processing, I longed for the Linux world of Java Application Servers.  Where I work, Linux is not an option, so I started to explore additional options.  That is when I re-stumbled upon Azul Systems. Azul has two products, Zing and Vega 3.  I have asked for a demo/trial of the Zing product.  Zing seems to be a specialized JVM and then some.  It requires 16-24 GB minimum RAM and 4-6 CPU cores, also minimum.  Zing boasts the ability to execute JVM heap sizes of up to 1 TB.

My plan is to test the Zing product to increase performance and possibly reduce server counts.

Java Parallelization Options

Last night I taught a class on Monte Carlo Simulation (MSC) using Excel and Crystal Ball (Oracle).  This was part of an ongoing course on System Modeling Theory (a.k.a. Management Science) that I am teaching at Strayer University.  In modeling we use MCS to simulate the probability distribution of uncertain model parameters.  This helps us understand the uncertainty and potential risk of varying input parameters of the problems we are attempting to solve with our models.

As I was executing the simulation in Excel/Crystal Ball with a Normal Distribution Curve and 1000 trials, my mind started to wonder about how I would do this in Java.  Given my experience with Java numerical computation I theorized that I would need more resources than just my laptop if I were to pursue more complex model simulations that included many more uncertain input parameters and model permutations in a JRE.

With the advocacy of Cloud Computing everywhere these days, I have been tracking the progress of Java-based parallel and grid computing efforts.  I have noticed a few solutions that would seem to fit the bill for the more complex numerical data computation that I think I would need to tackle complex problems and financial models with Java.

According to its developers, Hadoop is open-source software for “reliable, scalable, distributed computing.”  Hadoop consists of several sub-projects, some of which that have been promoted to top-level Apache projects.  Some of the contributors of the Hadoop project are from Cloudera.  Cloudera offers a commercialized version of Hadoop with enterprise support, similar to the model that Redhat has with its RHEL/Fedora and JBoss platforms.

In a nutshell, the idea behind Hadoop’s MapReduce project, and its associated projects (HDFS, HBase, etc.), is to perform complex analyses on extremely large (multi-terabyte) data sets of structured and/or unstructured data.  The storage and processing of these huge data structures are distributed across multiple, relatively inexpensive, computers and/or servers (called nodes), instead of very large systems.  The multiple nodes form clusters.  The premise behind Hadoop as I understand it is to encapsulate and abstract the distributed storage and processing so that the developers do not have to manage that distributed aspect of the program.

Hadoop’s MapReduce project, written in Java, is based on Google’s MapReduce, written in C++.  It is used to split the huge data sets into more manageable and independent chunks of data that get processed in parallel with each other. Hadoop  MapReduce works in tandem with HDFS to store and process these data chunks on distributed computing nodes within the distributed cluster.  The use of MapReduce requires Java developers to learn the Hadoop MapReduce API and commands.

Grid Gain
Grid Gain is another solution for distributed processing, including MapReduce computation across potentially inexpensive distributed computer nodes.  According to Grid Gain, their product is a “Java-based grid computing middleware.  There are many features to this product, including what they call “Zero Deployment."

While Hadoop comes with HDFS that can be used to process unstructured data, Grid Gain does not use its own file system, but instead connects to existing relational databases such as Oracle and MySql.  Hadoop can use its own high performance HBase database as well and I have heard of a connector to MySql.  Hadoop seems to provide more isolation for task execution by spinning up multiple JVMs per task execution.  Grid Gain seems to come with more tools for cloud computing and management.  Finally, though Hadoop is written in Java, its MapReduce functionality can be used by non-Java programs.

Aparapi is another API that provides parallel Java processing.  Unlike Hadoop and Grid Gain, Aparapi translates Java executable code to OpenCL.  OpenCL is Apple’s patented framework for parallel programming.  The fascinating aspect of Aparapi and OpenCL is what it is designed to execute on.  OpenCL uses Graphical Process Units (GPU) for parallel processing.

In my past life I was more connected to hardware than I am today and I worked with Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA) with Analog to Digital Converters (ADC) and Digital to Analog Converters (DAC).  We used DSPs to process waveform data in near real-time.  We would capture the waveform data on our I/O cards and then offload the processing and transforms to a DSP.
I guess this is why OpenCL interests me so much.  With OpenCL, developers can write code that gets compiled at run time so that it is optimized to run on existing GPUs in a computer or against multiple GPUs in multiple computers.  Based on the “C” language OpenCL allows developers to use graphics chips like those from NVidia.  Imagine that for a moment…while most of the parallel processing world is harnessing grid and cloud computing power, OpenCL is focusing on a much cheaper hardware footprint.  In fact, Apple developers can use OpenCL on their Macs to harness the computer power of the installed GPU to perform high performance computing tasks.

With Aparapi, Java developers can now translate their code to be executed in the OpenCL framework.  The use of GPUs for parallel non-video processing is called General Purpose Computing on Graphics Processing Units (GPGPU).  Unlike CPUs that execute single threads very fast, thereby giving the illusion of multi-threading, GPUs have a parallel architecture that allows true simultaneous execution of threads.

Beyond Aparapi there is JCUDA, JOpenCL, and JOCL.  While JCUDA, JOpenCL and JOCL are based on JNI wrappers of OpenCL and NVidia’s CUDA, Aparapi takes a different approach and uses byte code analysis to translate Java to OpenCL executable code.

It remains to be seen which platforms and techniques will emerge as the standard.  More to come as I explore some of the Java parallel programming.