Skip to main content

MongoDB and recording appenders for Logback

Today I am giving you two new appenders for Logback: one for MongoDB and one which I called recording appender. Just as a reminder, appenders (both in Log4J and Logback) are an abstraction of your application logs destination. The most common are file and console appenders, followed by several others built-in. MongoDB appender is pretty straightforward, so I will start by describing the recording appender.

Recording appender


As you already know, one of the biggest benefits of using logging frameworks are logging levels. By carefully choosing levels for each logging statement we can easily filter which logs should be present in our log files and which shouldn't. Also we can apply different logging strategies for different environments. This is in theory. In practice we often face the choice between: log everything just in case and handle gigabytes of meaningless log files or log only warnings and errors but when they actually occur, they are meaningless as well, lacking important debugging context. The idea isn't new (see [1], [2] and [3] for example), but somehow decent implementation is missing in both Log4J and Logback. And the idea is simple – as long as there is nothing wrong happening with the system: do not log anything or log very little – but silently memorize all debug logs in some cyclic buffer. And whenever disaster occurs (any log with ERROR level, probably an exception), dump the buffer first to provide meaningful context.

Writing custom logging appenders is pretty straightforward. Following is the essence of my recording appender:




public class RecordingAppender extends UnsynchronizedAppenderBase<ILoggingEvent> {

  private ThreadLocal<CyclicBuffer<ILoggingEvent>> recordedEvents = new ThreadLocal<CyclicBuffer<ILoggingEvent>>() {
    @Override
    protected CyclicBuffer<ILoggingEvent> initialValue() {
      return new CyclicBuffer<ILoggingEvent>(maxEvents);
    }
  };

  @Override
  protected void append(ILoggingEvent eventObject) {
    if (triggersDump(eventObject)) {
      dumpRecordedEvents();
      dump(eventObject);
    } else
      recordedEvents.get().add(eventObject);
  }

  //...

}

I hope the code is self-explanatory, if not – I failed as a developer, not you as a reader. The only detail worth explaining is the usage of ThreadLocal. Logging history is stored ThreadLocal, so only logs from current thread will be dumped in case of error. This seems reasonable in most cases (and eliminates the need for synchronization). Why the appender is parametrized with generic ILoggingEvent type will be described later. The full source code of this appender is, as always, available on my Logback fork at GitHub (recording-appender branch).

Using this appender is really simple – just declare ch.qos.logback.classic.RecordingAppender and define one or more delegating appenders to be used when dump is required. As a side note: which GoF pattern is it?

With the configuration below every log statement with level WARN or higher will trigger dump. The configuration also states that up to 1000 records will be kept in memory, unless some of them are older than 15 seconds. When warning or error is encountered, it will be printed on the console, preceded by more detailed logs. It really works (in addition of having 100% code coverage).


<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
  <appender name="REC" class="ch.qos.logback.classic.RecordingAppender">
    <appender-ref ref="STDOUT"/>

    <maxEvents>1000</maxEvents>
    <dumpThreshold>WARN</dumpThreshold>
    <expiryTimeMs>15000</expiryTimeMs>
  </appender>

  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%-4relative [%thread] %-5level - %msg%n</pattern>
    </encoder>
  </appender>

  <root level="DEBUG">
    <appender-ref ref="REC"/>
  </root>
</configuration>


MongoDB appender


MongoDB, being document oriented database focused on performance and scalability seems like a great storage for application and server logs – much better than DbAppender using traditional relational database. Why? Quickly count how many tables and rows you need to store normalized logging event containing stack trace with several frames and MDC map? And what if you just want to store some properties, leaving others as optional? Sacrificing absolute durability and ACID constraints, in MongoDB you just store document – with nested properties, skipping optional parameters – and extremely fast. Also, once again, the idea isn't new.

But again, why would you like to store application or web server HTTP access logs (be patient!) in database altogether? Well, with a little help of sharding and MapReduce, searching, aggregating and transforming a lot of data is a pleasure.

The implementation is trivial (excerpt from MongoDBAppenderBase.java, see mongo-appender branch):

public abstract class MongoDBAppenderBase<E> extends UnsynchronizedAppenderBase<E> {

  //...

  @Override
  public void start() {
    try {
      connectToMongoDB();
      super.start();
    } catch (UnknownHostException e) {
      addError("Error connecting to MongoDB server: " + host + ":" + port, e);
    }
  }

  private void connectToMongoDB() throws UnknownHostException {
    mongo = new Mongo(new ServerAddress(host, port), buildOptions());
    DB db = mongo.getDB(dbName);
    if (username != null && password != null)
      db.authenticate(username, password.toCharArray());
    eventsCollection = db.getCollection(collectionName);
  }

  protected abstract BasicDBObject toMongoDocument(E event);

  @Override
  protected void append(E eventObject) {
    eventsCollection.insert(toMongoDocument(eventObject));
  }

  //...

}


Appender doesn't have to be synchronized (it only uses native MongoDB Java driver, which is thread safe and even handles connection pooling) and all it does is connecting to MongoDB server/cluster and insert logging events as documents in the database. Abstract toMongoDocument() method and E generic type? - looks suspicious... Logback has a pretty clever architecture. In logback-core you place general logging logic, like connecting and storing documents in MongoDB in our case. Then one can simply subclass the base appender to define logic specific to a given logging object type.


So what object types does Logback support? The traditional (classic) logging is what we are familiar with. logback-access on the other hand allows us to log web container access logs using Logback infrastructure. And because MongoDB has no schema and creates collections (table-like structures) the first time they are used, we can essentially store anything. Following is the toMongoDocument() implementation excerpt for classic logs (note the generic type):

public class MongoDBAppender extends MongoDBAppenderBase<ILoggingEvent> {

  public MongoDBAppender() {
    super("loggingEvents");
  }

  @Override
  protected BasicDBObject toMongoDocument(ILoggingEvent event) {
    final BasicDBObject doc = new BasicDBObject();
    doc.append("timeStamp", new Date(event.getTimeStamp()));
    doc.append("level", event.getLevel().levelStr);
    doc.append("thread", event.getThreadName());
    if (event.getMdc() != null && !event.getMdc().isEmpty())
      doc.append("mdc", event.getMdc());
    //...
    return doc;
    }
}

...and here is what you can expect to find your MongoDB database:

{
    "_id" : ObjectId("4d9cbcbf7abb3abdaf9679cd"),
    "timeStamp" : ISODate("2011-04-06T19:19:27.006Z"),
    "level" : "ERROR",
    "thread" : "main",
    "logger" : "ch.qos.logback.classic.db.mongo.MongoDBAppenderTest",
    "message" : "D" 
}

Very similar implementation for access logs follows. Once again look carefully – both appenders are extending MongoDBAppenderBase with different generic type, only implementing log-to-document logic, whereas common database connection logic is handled once in base class. Pretty elegant design (it's Logback design, not mine, I am just following it), seems like OOP is not dead after all:

public class MongoDBAppender extends MongoDBAppenderBase<IAccessEvent> {

  public MongoDBAppender() {
    super("accessEvents");
  }

  @Override
  protected BasicDBObject toMongoDocument(IAccessEvent event) {
    final BasicDBObject doc = new BasicDBObject();
    doc.append("timeStamp", new Date(event.getTimeStamp()));
    if(server)
      doc.append("server", event.getServerName());
    //...
    return doc;
  }
}

You can compare classic and access implementations to see how similar they are, although they cope with really different data. Here is what you might find in accessEvents collections in MongoDB:

{
    "_id" : ObjectId("4d98cc4f7abb95e59279e183"),
    "timeStamp" : ISODate("2011-04-03T19:36:47.339Z"),
    "server" : "localhost",
    "remote" : {
        "host" : "0:0:0:0:0:0:0:1",
        "user" : "tomcat",
        "addr" : "0:0:0:0:0:0:0:1" 
    },
    "request" : {
        "uri" : "/manager/images/tomcat.gif",
        "protocol" : "HTTP/1.1",
        "method" : "GET",
        "sessionId" : "1C6357816D9EEFD31F6D9D154D87308A",
        "userAgent" : "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.2.16) Gecko/20110323 Ubuntu/10.10 (maverick) Firefox/3.6.16",
        "referer" : "http://localhost:8080/manager/html" 
    },
    "response" : {
        "contentLength" : NumberLong(1934),
        "statusCode" : 200 
    } 
}

You might ask yourself a question: why would I store access logs in crazy, JSON-like documents stored in database driven by C++? The answer is: scalability. MongoDB speed and sharding capabilities make it a great choice for storing lots of free-form data. Now, using built-in MapReduce framework you might search, aggregate, or maybe even look for suspicious usage patterns across thousands of servers in parallel. Be warned thou that timeStamp, although looks promising, isn't very good candidate for sharding key. Assuming all your web servers have similar system clock, at a given point in time all of them will be writing to the same shard. After a moment or two, they will all switch to a different one. At the same time, all other shards are dying of boredom. But timeStamp+serverName looks nice (order of keys in compound key is really important, just like with indexes in RDBMS).

Another tip is using capped collections in MongoDB. There is no obvious natural key for logging events (both classic and access), so we need to use generated keys, which aren't useful as they don't form any particular order. But if you use capped collections, order of records in the database is guaranteed to be the same as insertion order. Also, capped collections are limited in size and automatically remove the oldest entries, which seems like a great fit for logging use case.

MongoDBAppender has many other features, including:
  • Fully configurable MongoDB connection
  • For access logger you can define which access parameters (like response status code, URI, session id, etc.) should be persisted
  • For classic logger: should caller data be persisted
See for yourself how many optional parameters are provided. If you like the idea of recording appender (your application very own flight data recorder), please take a look and vote for LBCLASSIC-260. I also filed an issue for MongoDB appender, LBCLASSIC-261. Oh, if I'm into advertising myself already, maybe this will catch your attention as well. Have a great time playing with my appenders (you can even combine them) and I'm waiting for your comments.

Comments

  1. Hi/Czesc Tomasz

    Thanks for sharing and blogging this. Very inspiring.
    I tried to use your RecordingAppender code with official logback 0.9.28 (latest at that time) and although it seems to work at first glance you would see some oddities. All recorded and dumped log entries have the same line number/source file on the output.
    I tried with
    \(%class{0}.java:%line\)
    \(%file{0}:%line\)%n

    and both of them behave in this way i.e. all DEBUG entries point to the source file/line where the WARN message has been logged

    Is it something that you fixed/changed in your fork or am I doing something incorrectly.

    Thanks again.
    Daniel

    ReplyDelete
  2. Indeed there was a bug. Because building caller data (line number, etc.) introduces significant overhead, it is created lazily the first time it is requested. That's why it is highly discouraged to use line number, file name, class and method name in logging patterns - and also why I haven't tested it. And because the caller data in your case was first accessed the moment WARN message was dumped, all caller data was pointing to that place.

    Nevertheless it is now fixed. Unfortunately you have to explicitly enable caller data if you want to use it (<callerData>true</callerData> on RecordingAppender). If you don't do this, you will get question marks, which is still better than having incorrect data. I didn't want to build caller data eagerly for the reasons described above.

    Have fun and thank you for reporting a bug!

    ReplyDelete
  3. Thanks a lot for fixing that.

    I'm still trying to find the best way to log in production environment.
    Dumping DEBUG triggered by WARN is a nice to have and methods names + line numbers are very convenient for tracking down the issue. I haven't seen huge performance drop so far but maybe that has changed when using RecordingAppender.

    Can you think of any website where I could find best logging practices. Especially creating loggers. Having static logger per class does not look like an option anymore.
    Dividing loggers into areas of the app that's what I'm after.

    Thanks
    D.

    ReplyDelete
  4. Actually I did some benchmarking of outputting method names, line numbers etc. This is the 6th episode of my series about logging best practices, check them out.

    As for dividing loggers per application areas - isn't your application divided into packages and logger names are just mirroring package structure? Say: "com.example.accounting", "com.example.printing", etc. This means you can easily tune logging levels and appenders for specific application parts because logger names correspond to packages. What's wrong with this approach?

    ReplyDelete
  5. Maybe is out of topic, but you are the first with a complete explanation about mongodb on log.
    I was interested to use mongodb, but when I propose to my collegues to use mongodb on server log, they answer me that it's too risky to use a network element to store a critical information like server log... What do you think about it? It's a warning to really consider?

    ReplyDelete
  6. Logback ships with DbAppender (JDBC based) so the idea isn't new. MongoDB has built-in load-balancing and failover capabilities, on the other hand it might not provide such strong write guarantees.

    If you are concerned about Mongo availability, use redundant and more reliable appenders in parallel (like file appender with rolling policy and ZIPping enabled). To be honest, if you find your logs being critical, I wouldn't trust entirely any network storage (including network shared folders). To keep it simple: how will you log the fact that MongoDB is down? ;-)

    ReplyDelete
  7. Good work, Tomek!

    I had a similar idea last year, but using Redis (purely because I have some Redis experience):

    https://github.com/ceki/logback/pull/1

    Creating an issue and getting folks to vote for it is a good idea :-)

    ReplyDelete
  8. Czesc/Hi Tomasz

    Look at http://graylog2.org/. It's a logging/alarming system built on top of MongoDB. It uses JSON format (GELF is the name) and UDP transport.
    You'll find GELF appender for logback too : https://github.com/Moocar/logback-gelf

    ReplyDelete

Post a Comment