Skip to main content


Showing posts from 2012

Forcing Tomcat to log through SLF4J/Logback

So you have your executable web application in JAR with bundled Tomcat (make sure to read that one first). However there are these annoying Tomcat logs at the beginning, independent from our application logs and not customizable:

Nov 24, 2012 11:44:02 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-bio-8080"] Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Tomcat Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.30 Nov 24, 2012 11:44:05 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-bio-8080"] I would really like to quite them down, or even better save them somewhere since they sometimes reveal important failures. But I definitely don't want to have a separate java.util.logging configuration. Did you wonder after reading the previous article how did I knew th…

Standalone web application with executable Tomcat

When it comes to deploying your application, simplicity is the biggest advantage. You'll understand that especially when project evolves and needs some changes in the environment. Packaging up your whole application in one, standalone and self-sufficient JAR seems like a good idea, especially compared to installing and upgrading Tomcat in target environment. In the past I would typically include Tomcat JARs in my web application and write thin command-line runner using Tomcat API. Luckily there is a tomcat7:exec-war maven goal that does just that. It takes your WAR artifact and packages it together with all Tomcat dependencies. At the end it also includes Tomcat7RunnerCli Main-class to manifest.

Curious to try it? Take your existing WAR project and add the following to your pom.xml:

<plugin> <groupId>org.apache.tomcat.maven</groupId> <artifactId>tomcat7-maven-plugin</artifactId> <version>2.0</version> <executions> …

Parallelization of a simple use case explained

Some time ago a friend of mine asked me about the possibilities of speeding up the following process: they are generating some data in two stages, reading from a database and processing the results. Reading takes approximately 70% of time and processing takes the remaining 30%. Unfortunately they cannot simply load the whole data into memory, thus they split reading into much smaller chunks (pages) and process these pages once they are retrieved, interleaving the these two stages in a loop. Here is a pseudo-code of what they have so far:

public Data loadData(int page) { //70% of time... } public void process(Data data) { //30% of time... } for (int i = 0; i < MAX; ++i) { Data data = loadData(i); process(data); } His idea of improving the algorithm was to somehow start fetching next page of data when current page is still being processed, thus reducing the overall run time of the algorithm. He was correct, but didn't know how to put this into Java code, not bei…

Remote actors - discovering Akka

Assume our test application became a huge success and slowly a single server is not capable of handling growing traffic. We are faced with two choices: replacing our server with a better one (scaling up) or buying a second one and building a cluster (scaling out). We've chosen to build a cluster as it's easier to scale in the future. However we quickly discover that our application no longer fulfils the very first requirement:

The client application should call the URL [...] at most from one thread - it's forbidden to concurrently fetch random numbers using several HTTP connections. Obviously every node in the cluster is independent, having its own, separate instance of Akka, thus a separate copy of RandomOrgClient actor. In order to fix this issue we have few options:

Non-blocking I/O - discovering Akka

Here comes the time to follow some good practices when implementing actors. One of the most important rules we should follow is avoiding any blocking input/output operations, polling, busy waiting, sleeping, etc. Simply put, actor while handling a message should only depend on CPU and if it doesn't need CPU cycles it should immediately return from receive and let other actors to process. If we follow this rule strictly, Akka can easily handle hundreds of thousands of messages per second using just a handful of threads. It shouldn't come as a surprise that even though our application can comprise thousands of seemingly independent actors (e.g. one actor per each HTTP connection, one player in MMO game, etc.), each actor gets only a limited CPU time within a pool of threads. With default 10 threads handling all the actors in the system, one blocking or sleeping actor is enough to reduce the throughput by 10%. Therefore 10 actors sleeping at the same time completely halt the sys…

become/unbecome - discovering Akka

Sometimes our actor needs to react differently based on its internal state. Typically receiving some specific message causes the state transition which, in turns, changes the way subsequent messages should be handled. Another message restores the original state and thus - the way messages were handled before. In the previous article we implemented RandomOrgBuffer actor based on waitingForResponse flag. It unnecessarily complicated already complex message handling logic:
var waitingForResponse = false def receive = { case RandomRequest => preFetchIfAlmostEmpty() if(buffer.isEmpty) { backlog += sender } else { sender ! buffer.dequeue() } case RandomOrgServerResponse(randomNumbers) => buffer ++= randomNumbers waitingForResponse = false while(!backlog.isEmpty && !buffer.isEmpty) { backlog.dequeue() ! buffer.dequeue() } preFetchIfAlmostEmpty…

Two actors - discovering Akka

Hope you are having fun so far, but our application has serious performance defect. After measuring response times of the RandomOrgRandom class we developed in the previous part we will notice something really disturbing (the chart represents response times of subsequent invocations in milliseconds):

In turns out that regularly response time (time to return one random number) is greater by several orders of magnitude (logarithmic scale!) Remembering how the actor was implemented the reason is quite obvious: our actor fetches eagerly 50 random numbers and fills the buffer, returning one number after another. If the buffer is empty, actor performs blocking I/O call to web service which takes around half of a second. This is clearly visible on the chart - every 50th invocation is much, much slower. In some circumstances such behaviour would be acceptable (just like unpredictable garbage collection can increase latency of a response once in a while). But still let's try to …

Request and response - discovering Akka

In the previous part we implemented our first actor and sent message to it. Unfortunately actor was incapable of returning any result of processing this message, which rendered him rather useless. In this episode we will learn how to send reply message to the sender and how to integrate synchronous, blocking API with (by definition) asynchronous system based on message passing.

Before we begin I must draw very strong distinction between an actor (extending Actor trait) and an actor reference of ActorRef type. When implementing an actor we extend Actor trait which forces us to implement receive method. However we do not create instances of actors directly, instead we ask ActorSystem:

val randomOrgBuffer: ActorRef = system.actorOf(Props[RandomOrgBuffer], "buffer") To our great surprise returned object is not of RandomOrgBuffer type like our actor, it's not even an Actor. Thanks to ActorRef, which is a wrapper (proxy) around an actor:

internal state, i.e. fields, of an act…

Your first message - discovering Akka

Akka is a platform (framework?) inspired by Erlang, promising easier development of scalable, multi-threaded and safe applications. While in most of the popular languages concurrency is based on memory shared between several threads, guarded by various synchronization mehods, Akka offers concurrency model based on actors. Actor is a lightweight object which you can interact with barely by sending messages to it. Each actor can process at most one message at a time and obviously can send messages to other actors. Within one Java virtual machine millions of actors can exist at the same time, building a hierarchical parent (supervisor) - children structure, where parent monitors the behaviour of children. If that's not enough, we can easily split our actors between several nodes in a cluster - without modifying a single line of code. Each actor can have internal state (set of fields/variables), but communication can only occur through message passing, never through shared data struct…

Java features applicability

Java language and standard library is powerful, but with great power comes great responsibility. After seeing a lot of user code misusing or abusing rare Java features on one hand and completely forgetting about most basic feature on the other, I decided to compose this summary. This is not a list of requirements and areas every Java developer should explore, know and use. It's quite the opposite! I group Java features in three categories: day to day, occasionally and never (frameworks and libraries only). The rule is simple: if you find yourself using given feature more often then suggested, you are probably over-engineering or trying to build something too general and too reusable. If you don't use given feature often enough (according to my subjective list), you're probably missing some really interesting and important opportunities.
Note that I only focus on Java, JVM and JDK. I do not suggest which frameworks and how likely you should use. Also I assume typical, serv…

Testing Quartz Cron expressions

Declaring complex Cron expressions is still giving me some headaches, especially when some more advanced constructs are used. After all, can you tell when the following trigger will fire "0 0 17 L-3W 6-9 ? *"? Since triggers are often meant to run far in the future, it's desired to test them beforehand and make sure they will actually fire when we think they will.

Quartz scheduler (I'm testing version 2.1.6) doesn't provide direct support for that, but it's easy to craft some simple function based on existing APIs, namely CronExpression.getNextValidTimeAfter() method. Our goal is to define a method that will return next N scheduled executions for a given Cron expression. We cannot request all since some triggers (including the one above) do not have end date, repeating infinitely. We can only depend on aforementioned getNextValidTimeAfter() which takes a date as an argument and returns nearest fire time T1 after that date. So if we want to find second schedul…

Java Coding Conventions considered harmful

There is an official Code Conventions for the Java Programming Language guide published on Oracle site. You would expect this 20+ pages document to be the most complete, comprehensive and authoritative source of good practices, hints and tips with regards to the Java language. But once you start to read it, disappointment followed by frustration and anger increases. I would like to point out the most obvious mistakes, bad practices, poor and outdated advices given in this guide. In case you are a Java beginner, just forget about this tutorial and look for better and more up-to-date reference materials. Let the horror begin!
2.2 Common File Names:
GNUmakefile The preferred name for makefiles. We use gnumake to build our software.gnumake to build Java projects? ant is considered old-school, so is maven. Who uses make to build WARs, JARs, generate JavaDocs...?
3.1.1 Beginning Comments:
All source files should begin with a c-style comment that lists the class name, version information, date…

Where do the stack traces come from?

I believe that reading and understanding stack traces is an essential skill every programmer should posses in order to effectively troubleshoot problems with every JVM language (see also: Filtering irrelevant stack trace lines in logs and Logging exceptions root cause first). So may we start with a little quiz? Given the following piece of code, which methods will be present in the stack trace? foo(), bar() or maybe both?
public class Main { public static void main(String[] args) throws IOException { try { foo(); } catch (RuntimeException e) { bar(e); } } private static void foo() { throw new RuntimeException("Foo!"); } private static void bar(RuntimeException e) { throw e; } } In C# both answers would be possible depending on how the original exception is re-thrown in bar() - throw e overwrites the original stack trace (originating in foo()) with the place where it was thrown again (…

RateLimiter - discovering Google Guava

RateLimiter class was recently added to Guava libraries (since 13.0) and it is already among my favourite tools. Have a look what the JavaDoc says:
[...] rate limiter distributes permits at a configurable rate. Each acquire() blocks if necessary until a permit is available [...] Rate limiters are often used to restrict the rate at which some physical or logical resource is accessed Basically this small utility class can be used e.g. to limit the number of requests per second your API wishes to handle or to throttle your own client code, avoiding denial of service of someone else's API if we are hitting it too often.
Let's start from a simple example. Say we have a long running process that needs to broadcast its progress to supplied listener:
def longRunning(listener: Listener) { var processed = 0 for(item <- items) { // work... processed += 1 listener.progressChanged(100.0 * processed / items.size) } } trait Listener { def progre…

Confitura 2012 - podziękowanie i uwagi

As an exception, this article is written in Polish as it is related to my talk at Confitura 2012 conference in Warsaw. It was chosen second best talk according to attendees votes - I would like to thank for that and relate to some of the comments. Kilka miesięcy temu miałem ogromną przyjemność poprowadzić wykład "Uwolnić się od if" podczas Confitury 2012. Frekwencja na mojej prezentacji w szacownych murach Uniwersytetu Warszawskiego oraz jej pozytywne przyjęcie były dla mnie ogromnym zaskoczeniem. Nagranie dostępne jest poniżej (oraz na Parleys ze slajdami):

Jeszcze większym zaskoczeniem były wyniki pokonferencyjnej ankiety. Jak się okazuje, w oczach uczestników, lepsza od mojej prezentacji okazała się jedynie ta prowadzona przez Sławomira Sobótkę - gratulacje dla niego za znakomity wykład i zasłużone zwycięstwo! Tak prezentuje się podium:
4,55: Ścisły przewodnik po aspektach miękkich dla ekspertów IT - Sławomir Sobótka4,22: Uwolnić się od "if" - Tomasz Nurkiewicz4…

Accessing clipboard in Linux terminal

On a day-to-day basis I am using Ubuntu, which means constantly switching between terminal window and ordinary graphical application like IDE, editors, etc. Often I need to copy a bit of text from IDE to terminal or back. At least in Ubuntu using inconsistent Ctrl + Shift + C (or V) key combination and mouse to select area in terminal is a bit painful. Life would be so much simpler if there was a command line utility to print current clipboard contents to standard terminal output and a second one to take standard input and put it into the clipboard. Life can be much simpler, have a look at xclip!
Without diving into details, start by defining the following aliases in your system (put them in ~/.bash_aliases right now, you'll love them):
alias ctrlc='xclip -selection c' alias ctrlv='xclip -selection c -o' ctrlc and ctrlv, sounds familiar? Of course aliases are completely arbitrary. Now run any bash command but pipe its output to ctrlc:
$ ps | ctrlc Nothing happened?…

Which Java thread consumes my CPU?

What do you do when your Java application consumes 100% of the CPU? Turns out you can easily find the problematic thread(s) using built-in UNIX and JDK tools. No profilers or agents required.
For the purpose of testing we'll use this simple program:
public class Main { public static void main(String[] args) { new Thread(new Idle(), "Idle").start(); new Thread(new Busy(), "Busy").start(); } } class Idle implements Runnable { @Override public void run() { try { TimeUnit.HOURS.sleep(1); } catch (InterruptedException e) { } } } class Busy implements Runnable { @Override public void run() { while(true) { "Foo".matches("F.*"); } } } As you can see, it starts two threads. Idle is not consuming any CPU (remember, sleeping threads consume memory, but not CPU) while Busy eats the whole core as regular expression parsing and executing is a…

String memory internals

This article is based on my answer on StackOverflow. I am trying to explain how String class stores the texts, how interning and constant pool works.
The main point to understand here is the distinction between String Java object and its contents - char[] under private value field. String is basically a wrapper around char[] array, encapsulating it and making it impossible to modify so the String can remain immutable. Also the String class remembers which parts of this array is actually used (see below). This all means that you can have two different String objects (quite lightweight) pointing to the same char[].
I will show you few examples, together with hashCode() of each String and hashCode() of internal char[] value field (I will call it text to distinguish it from string). Finally I'll show javap -c -verbose output, together with constant pool for my test class. Please do not confuse class constant pool with string literal pool. They are not quite the same. See also Understa…

Oslo coderetreat summer 2012 - in Scala

Few days ago I had a pleasure to attend Coderetreat summer 2012 in Oslo, carried out by Johannes Brodwall. It was a great experience, I had an opportunity to do some pair programming with six different people and learnt quite a lot about both programming and productivity. You can find some more thoughts about the event on Anders Nordby's article (+ C# implementation of the problem we were solving throughout the day).

So what was the problem? Suspiciously simple: develop a function taking an arbitrary character and printing a diamond-shape like this:

test("should generate A diamond") { diamond('A') should equal (List( "A" )) } test("should generate D diamond") { diamond('D') should equal (List( " A ", " B B ", " C C ", "D D", " C C ", " B B ", " A " )) } The first approach all of us tried was a terrible mixt…

Integrating with reCAPTCHA using... Spring Integration

Sometimes we just need CAPTCHA, that's a sad fact. Today we will learn how to integrate with reCAPTCHA. Because the topic itself isn't particularly interesting and advanced, we will overengineer a bit (?) by using Spring Integration to handle low-level details. The decision to use reCAPTCHA by Google was dictated by two factors: (1) it is a moderately good CAPTCHA implementation with decent images with built-in support for visually impaired people and (2) outsourcing CAPTCHA allows us to remain stateless on the server side. Not to mention we help in digitalizing books.

The second reason is actually quite important. Typically you have to generate CAPTCHA on the server side and store the expected result e.g. in user session. When the response comes back you compare expected and entered CAPTCHA solution. Sometimes we don't want to store any state on the server side, not to mention implementing CAPTCHA isn't particularly rewarding task. So it is nice to have something read…

javax.servlet.AsyncContext.start() limited usefulness

Some time ago I came across What's the purpose of AsyncContext.start(...) in Servlet 3.0? question. Quoting the Javadoc of aforementioned method:

Causes the container to dispatch a thread, possibly from a managed thread pool, to run the specified Runnable.

To remind all of you, AsyncContext is a standard way defined in Servlet 3.0 specification to handle HTTP requests asynchronously. Basically HTTP request is no longer tied to an HTTP thread, allowing us to handle it later, possibly using fewer threads. It turned out that the specification provides an API to handle asynchronous threads in a different thread pool out of the box. First we will see how this feature is completely broken and useless in Tomcat and Jetty - and then we will discuss why the usefulness of it is questionable in general.

Our test servlet will simply sleep for given amount of time. This is a scalability killer in normal circumstances because even though sleeping servlet is not consuming CPU, but sleeping HTTP t…

Secret powers of foldLeft() in Scala

foldLeft() method, available for all collections in Scala, allows to run a given 2-argument function against consecutive elements of that collection, where the result of that function is passed as the first argument in the next invocation. Second argument is always the current item in the collection. Doesn't sound very encouraging but as we will see soon there are great some use-cases waiting to be discovered.

Before we dive into foldLeft, let us have a look at reduce - simplified version of foldLeft. I always believed that a working code is worth a thousand words:
val input = List(3, 5, 7, 11) input.reduce((total, cur) => total + cur) or more readable:
def op(total: Int, cur: Int) = total + cur input reduce op The result is 26 (sum). The code is more-or-less readable: to reduce method we are passing 2-argument function op (operation). Both parameters of that function (and its return value) need to have the same type as the collection. reduce() will invoke that operation o…

Quartz scheduler misfire instructions explained

Sometimes Quartz is not capable of running your job at the time when you desired. There are three reasons for that:
all worker threads were busy running other jobs (probably with higher priority)the scheduler itself was downthe job was scheduled with start time in the past (probably a coding error) You can increase the number of worker threads by simply customizing the org.quartz.threadPool.threadCount in (default is 10). But you cannot really do anything when the whole application/server/scheduler was down. The situation when Quartz was incapable of firing given trigger is called misfire. Do you know what Quartz is doing when it happens? Turns out there are various strategies (called misfire instructions) Quartz can take and also there are some defaults if you haven't thought about it. But in order to make your application robust and predictable (especially under heavy load or maintenance) you should really make sure your triggers and jobs are configured concious…

eta-expansion (internals) in Scala explained

Today we will learn how Scala compiler implements very important aspect of the language known as an eta-expansion. We will see how scalac works around the limitations of JVM, in particular with the lack of higher-order functions. But before we begin, to a great surprise, we shall see how these limitations are bypassed by... the Java language itself!

Here is the easiest piece of code taking full advantage of anonymous inner classes in Java: the inner class calls method from an outer class. As you probably know every inner class has an implicit hidden Parent.this reference to an outer (parent) class. This is illustrated by the following example:
public class TestJ { public Runnable foo() { return new Runnable() { public void run() { inc(5); } }; } private int inc(int x) { return x + 1; } } Anonymous inner class calls private method of its parent completely transparently. Have you ever wondered how does an inner class get access to private meth…

Quartz scheduler plugins - hidden treasure

Although briefly described in the official documentation, I believe Quartz plugins aren't known enough, looking at how useful they are.

Essentially plugins in Quartz are convenient classes wrapping registration of underlying listeners. You are free to write your own plugins but we will focus on existing ones shipped with Quartz.
LoggingTriggerHistoryPlugin First some background. Two main abstractions in Quartz are jobs and triggers. Job is a piece of code that we would like to schedule. Trigger instructs the scheduler when this code should run. CRON (e.g. run every Friday between 9 AM and 5 PM until November) and simple (run 100 times every 2 hours) triggers are most commonly used. You associate any number of triggers to a single job.

Believe it or not, Quartz by default provides no logging or monitoring whatsoever of executed jobs and triggers. There is an API, but no built-in logging is implemented. It won't show you that it now executes this particular job due to this tr…

Configuring Quartz with JDBCJobStore in Spring

I am starting a little series about Quartz scheduler internals, tips and tricks, this is a chapter 0 - how to configure persistent job store. In Quartz you essentially have a choice between storing jobs and triggers in memory and in a relation database ( Terracotta is a recent addition to the mix). I would say in 90% of the cases when you use RAMJobStore with Quartz you don't really need Quartz at all. Obviously this storage backend is transient and all your pending jobs and triggers are lost between restarts. If you are fine with that, much simpler and more lightweight solutions are available, including ScheduledExecutorService built into JDK and @Scheduled(cron="*/5 * * * * MON-FRI") in Spring. Can you justify using extra 0,5MiB JAR in this scenario?

This changes dramatically when you need clustering, fail-over, load-balancing and few other buzz-words. There are several use-cases for that:
single server cannot handle required number of concurrent, long running jobs an…

Filtering irrelevant stack trace lines in logs

I love stack traces. Not because I love errors, but the moment they occur, stack trace is priceless source of information. For instance in web application the stack trace shows you the complete request processing path, from HTTP socket, through filters, servlets, controllers, services, DAOs, etc. - up to the place, where an error occurred. You can read them as a good book, where every event has cause and effect. I even implemented some enhancements in the way Logback prints exceptions, see Logging exceptions root cause first.

But one thing's been bothering me for a while. The infamous “stack trace from hell" symptom – stack traces containing hundreds of irrelevant, cryptic, often auto-generated methods. AOP frameworks and over-engineered libraries tend to produce insanely long execution traces. Let me show a real-life example. In a sample application I am using the following technology stack:
Colours are important. According to framework/layer colour I painted a sample stack …

Client-side server monitoring with Jolokia and JMX

The choice of Java monitoring tools is tremendous (random selection and order powered by Google):

javamelodypsi-probeJVisualVMjconsoleJAMonJava JMX Nagios PluginN/A
Besides, there are various dedicated tools e.g. for ActiveMQ, JBoss, Quartz scheduler, Tomcat/tcServer... So which one should you use as an ultimate monitoring dashboard? Well, none of them provide out-of-the-box features you might require. In some applications you have to constantly monitor the contents and size of a given JMS queue. Others are known to have memory or CPU problems. I have also seen software where system administrators had to constantly run some SQL query and check the results or even parse the logs to make sure some essential background process is running. Possibilities are endless, because it really depends on the software and its use-cases. To make matters worse, your customer doesn't care about GC activity, number of open JDBC connections and whether this nasty batch process is not hanging. It shoul…

Automatically generating WADL in Spring MVC REST application

Last time we have learnt the basics of WADL. The language itself is not as interesting to write a separate article about it, but the title of this article reveals why we needed that knowledge.

Many implementations of JSR 311: JAX-RS: The Java API for RESTful Web Services provide runtime WADL generation out-of-the-box: Apache CXF, Jersey and Restlet. RESTeasy still waiting. Basically these frameworks examine Java code with JSR-311 annotations and generate WADL document available under some URL. Unfortunately Spring MVC not only does not implement the JSR-311 standard (see: Does Spring MVC support JSR 311 annotations?), but it also does not generate WADL for us (see: SPR-8705), even though it is perfectly suited for exposing REST services.

For various reasons I started developing server-side REST services with Spring MVC and after a while (say, thirdy resources later) I started to get a bit lost. I really needed a way to catalogue and document all available resources and operations. WAD…

Gentle introduction to WADL (in Java)

WADL (Web Application Description Language) is to REST what WSDL is to SOAP. The mere existence of this language causes a lot of controversy (see: Do we need WADL? and To WADL or not to WADL). I can think of few legitimate use cases for using WADL, but if you are here already, you are probably not seeking for yet another discussion. So let us move forward to the WADL itself.

In principle WADL is similar to WSDL, but the structure of the language is much different. Whilst WSDL defines a flat list of messages and operations either consuming or producing some of them, WADL emphasizes the hierarchical nature of RESTful web services. In REST, the primary artifact is the resource. Each resource (noun) is represented as an URI. Every resource can define both CRUD operations (verbs, implemented as HTTP methods) and nested resources. The nested resource has a strong relationship with a parent resource, typically representing an ownership.

A simple example would be …