Sunday, May 30, 2010

Architectural improvements for JVM to be enterprise ready

Time proven architecture for long running services


I've observed that once in a while our long running Tomcat instance will get slower and slower, until it (the JVM) is restarted. Obviously the Tomcat developers are experts in enterprise Java, why is it still happening and how to fix it? One could look for specific problems in the code, but a much better approach is to adopt a time proven architecture, particularly well-known in traditional Unix daemon processing:

  1. The master daemon process is started.

  2. The master daemon process spawns some child processes to handle client requests

  3. After handling a limited number of requests, a child process will terminate itself (or done by the master). The key point here is that the OS will free any resources (e.g., memory, file handles, sockets) allocated to that child process; there is no way to leave anything behind after its death.


This architecture ensures a long-running service without any degrading, even if the code is poorly written and has resource leaks.

How can Java support this architecture?


So, does Java support this currently? The answer is no. To do that:

  1. The JVM needs to support the  concept of isolated process.

  2. Objects created by the JVM itself and different a process must NOT be allowed to refer to one other, otherwise there would be no way to cleanly kill a process.

  3. The JVM must allow efficiently forking a process to quickly create a child process. This way, the master process can perform all kinds of lengthy initialization (e.g., Hibernate and Spring initialization), while it is still very quick to spawn a child in the initialized state.

Friday, May 21, 2010

Open letter to certification providers

Dear certification providers,
In recent months I've interviewed three candidates with CCNP certifications and I was very disappointed to find that two of them didn't know how a switch differs from a hub or how it learns the MAC addresses. My first reaction was my disappointment with the lack of integrity of the people and the general availability of brain dump or even real test questions. However, from a more positive view, this has stroked me to think about what we can do to help you improve the situation?
If we consider certifications as a service in the perspective of quality management, then we can clearly see there is a huge problem in it: the exam candidate is both a client and a user (and also a piece of client-provided material :-) ), but there are many other users of this service out there; they are the prospective employers, job interviewers, peers and etc. A key requirement in quality management is to measure if the users are satisfied with the service output. Obviously, as a prospective employer, I am very unhappy with the service output because the CCNP certificate is not reflecting the real expertise of the certificate holders. However, none of you are providing ways for unhappy users like me to provide feedback on your service output. Without such a feedback mechanism, I really don't see how you can ensure the quality of your service.
Therefore, I'd request that you establish a mechanism to accept feedback. For example, provide a website to let me report on those certificate holders who obviously know little about the subject matter. Then follow up with investigation, re-certify as required and revoke their certificates. Just like a digital certificate whose private key has been compromised, such revoked certificates should be published on a website like a CRL.
This is about handling an individual incident. If there are a significant number of such incidents, you should escalate to become a problem (in ITIL term). It means that you must then identify the root cause and to plug the hole to prevent similar incidents, like introducing performance-based exams, reference-based certifications and whatever it takes to fix the problem and save your reputation.