Skip to main content

Posts

Showing posts from November, 2012

Forcing Tomcat to log through SLF4J/Logback

So you have your executable web application in JAR with bundled Tomcat (make sure to read that one first). However there are these annoying Tomcat logs at the beginning, independent from our application logs and not customizable:

Nov 24, 2012 11:44:02 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-bio-8080"] Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Tomcat Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.30 Nov 24, 2012 11:44:05 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-bio-8080"] I would really like to quite them down, or even better save them somewhere since they sometimes reveal important failures. But I definitely don't want to have a separate java.util.logging configuration. Did you wonder after reading the previous article how did I knew th…

Standalone web application with executable Tomcat

When it comes to deploying your application, simplicity is the biggest advantage. You'll understand that especially when project evolves and needs some changes in the environment. Packaging up your whole application in one, standalone and self-sufficient JAR seems like a good idea, especially compared to installing and upgrading Tomcat in target environment. In the past I would typically include Tomcat JARs in my web application and write thin command-line runner using Tomcat API. Luckily there is a tomcat7:exec-war maven goal that does just that. It takes your WAR artifact and packages it together with all Tomcat dependencies. At the end it also includes Tomcat7RunnerCli Main-class to manifest.

Curious to try it? Take your existing WAR project and add the following to your pom.xml:

<plugin> <groupId>org.apache.tomcat.maven</groupId> <artifactId>tomcat7-maven-plugin</artifactId> <version>2.0</version> <executions> …

Parallelization of a simple use case explained

Some time ago a friend of mine asked me about the possibilities of speeding up the following process: they are generating some data in two stages, reading from a database and processing the results. Reading takes approximately 70% of time and processing takes the remaining 30%. Unfortunately they cannot simply load the whole data into memory, thus they split reading into much smaller chunks (pages) and process these pages once they are retrieved, interleaving the these two stages in a loop. Here is a pseudo-code of what they have so far:

public Data loadData(int page) { //70% of time... } public void process(Data data) { //30% of time... } for (int i = 0; i < MAX; ++i) { Data data = loadData(i); process(data); } His idea of improving the algorithm was to somehow start fetching next page of data when current page is still being processed, thus reducing the overall run time of the algorithm. He was correct, but didn't know how to put this into Java code, not bei…

Remote actors - discovering Akka

Assume our test application became a huge success and slowly a single server is not capable of handling growing traffic. We are faced with two choices: replacing our server with a better one (scaling up) or buying a second one and building a cluster (scaling out). We've chosen to build a cluster as it's easier to scale in the future. However we quickly discover that our application no longer fulfils the very first requirement:

The client application should call the URL [...] at most from one thread - it's forbidden to concurrently fetch random numbers using several HTTP connections. Obviously every node in the cluster is independent, having its own, separate instance of Akka, thus a separate copy of RandomOrgClient actor. In order to fix this issue we have few options:

Non-blocking I/O - discovering Akka

Here comes the time to follow some good practices when implementing actors. One of the most important rules we should follow is avoiding any blocking input/output operations, polling, busy waiting, sleeping, etc. Simply put, actor while handling a message should only depend on CPU and if it doesn't need CPU cycles it should immediately return from receive and let other actors to process. If we follow this rule strictly, Akka can easily handle hundreds of thousands of messages per second using just a handful of threads. It shouldn't come as a surprise that even though our application can comprise thousands of seemingly independent actors (e.g. one actor per each HTTP connection, one player in MMO game, etc.), each actor gets only a limited CPU time within a pool of threads. With default 10 threads handling all the actors in the system, one blocking or sleeping actor is enough to reduce the throughput by 10%. Therefore 10 actors sleeping at the same time completely halt the sys…

become/unbecome - discovering Akka

Sometimes our actor needs to react differently based on its internal state. Typically receiving some specific message causes the state transition which, in turns, changes the way subsequent messages should be handled. Another message restores the original state and thus - the way messages were handled before. In the previous article we implemented RandomOrgBuffer actor based on waitingForResponse flag. It unnecessarily complicated already complex message handling logic:
var waitingForResponse = false def receive = { case RandomRequest => preFetchIfAlmostEmpty() if(buffer.isEmpty) { backlog += sender } else { sender ! buffer.dequeue() } case RandomOrgServerResponse(randomNumbers) => buffer ++= randomNumbers waitingForResponse = false while(!backlog.isEmpty && !buffer.isEmpty) { backlog.dequeue() ! buffer.dequeue() } preFetchIfAlmostEmpty…

Two actors - discovering Akka

Hope you are having fun so far, but our application has serious performance defect. After measuring response times of the RandomOrgRandom class we developed in the previous part we will notice something really disturbing (the chart represents response times of subsequent invocations in milliseconds):



In turns out that regularly response time (time to return one random number) is greater by several orders of magnitude (logarithmic scale!) Remembering how the actor was implemented the reason is quite obvious: our actor fetches eagerly 50 random numbers and fills the buffer, returning one number after another. If the buffer is empty, actor performs blocking I/O call to random.org web service which takes around half of a second. This is clearly visible on the chart - every 50th invocation is much, much slower. In some circumstances such behaviour would be acceptable (just like unpredictable garbage collection can increase latency of a response once in a while). But still let's try to …