ruby, dtrace, etc

Monday Nov 24, 2008

A JRuby/Rails Message Driven Bean

I'm building a system which receives messages from a larger application platform and records the details in its local database. It's built with Rails and JRuby, with Apache ActiveMQ as the message broker. Using JRuby lets me use ActiveMQ's native JMS-based client, rather than speaking STOMP to the broker (as our larger platform does on the other end of the queue). My deployment platform is Glassfish, and I'm deploying the Rails app to it in the usual way with Warbler.

To begin with, I wrote a standalone script to subscribe to the JMS queue, based on the suggestion to use JMS rather than ActiveMessaging. This works well, and is very simple to deploy for development. As Shane points out, it's just a literal translation from Java to (J)Ruby. A problem is that even when configured to use a failover transport to a pair of message brokers, the script will frequently exit and need restarting. Not only that, but starting the script as a separate process means I've got two JVMs running - one for Glassfish and one for the JRuby script, and they can't share a common JDBC connection pool, so there's more to manage there.

Finally, I'd have to manage any concurrency required myself - the script is resolutely single-threaded, and if I need multiple threads to saturate the hardware, I'd need to write the thread management code as part of the script.

Since I'm already using an app server for the web part of the system, the answer seems to be to move the script into Glassfish as a message-driven Bean.

There are a few issues to be solved: I only want a single copy of the Rails environment, and I need to arrange for the relevant parts of our application to be packaged so it can be correctly deployed with the bean.

The upside is that with Rails 2.2, I can share that Rails environment among as many threads as I need, without having to synchronize access to the app - given a JDBC connection pool, Glassfish will start many bean instances to handle incoming messages.

Unfortunately there doesn't seem to be anything like Warbler to help out with packaging the app into an MDB, so it's all rather manual. Here's what I needed to do to set this up:

  • Provide an app entry point which would accept a message, and run the required business logic in ruby
  • Create a standard J2EE Message-driven Bean in Java, and hook it to an external ActiveMQ instance
  • Spin up a single Rails instance inside the container
  • Arrange for the message receipt handler to call the app entry point with the contents of the message

App entry point for messaging

This is a class method on the model the app will create as a result of receiving the message. My messages are simple YAML strings, and this method accepts that YAML directly.

That means I'm converting to a Ruby data structure from YAML in Ruby code, and it might be better to make sure it's done in Java - the messages are small enough that this isn't a big deal though.

Create an MDB

I'm using NetBeans for this, simply because of its integration with Glassfish. Everything is done with Java 5 annotations, bar the selection of activemq instead of the built in broker (which doesn't have native STOMP support).

All that's needed beyond the standard Bean that NetBeans sets up is this stanza in sun-ejb-jar.xml:

      <mdb-resource-adapter>
        <resource-adapter-mid>activemq</resource-adapter-mid>
      </mdb-resource-adapter>
assuming the ActiveMQ RA is deployed in Glassfish as "activemq". This configuration is taken from this forum post

Rails Instance

I only want to start one Rails instance to be shared among all the bean instances. I'll create a RailsRuntime class, which loads JRuby, initialises Rails and provides a "string eval" method, and then another class, RailsRuntimeSingleton, to maintain that single instance.

Here's the RailsRuntime class:

import org.jruby.Ruby;
import org.jruby.RubyRuntimeAdapter;
import org.jruby.javasupport.JavaEmbedUtils;
import org.jruby.exceptions.RaiseException;
import java.util.ArrayList;

public class RailsRuntime {

        private Ruby runtime;
        private RubyRuntimeAdapter evaler;

        public RailsRuntime() {
            try {
                String jruby_home = "/usr/local/jruby";
                System.out.println("Initializing Rails with JRuby from " + jruby_home);
                
                System.setProperty("jruby.home", jruby_home);
                System.setProperty("jruby.base", jruby_home);
                System.setProperty("jruby.lib",  jruby_home + "/lib");
                System.setProperty("jruby.script", "jruby");

                runtime = JavaEmbedUtils.initialize(new ArrayList());
                evaler = JavaEmbedUtils.newRuntimeAdapter();
                
                evaler.eval(runtime, "ENV['RAILS_ENV'] = 'production'; require 'META-INF/rails/config/environment.rb'");
                System.out.println("Finished initializing Rails");
            }
            catch (RaiseException re) {
                System.out.println("RaiseException: " + re.getMessage());
                throw re;
            }
        }

        public void eval(String r) {
            evaler.eval(runtime, r);
        }
}

There's a hardcoded path to the JRuby home directory here, which isn't ideal - really I should just include a jruby-complete jar but I don't have the infrastructure in place yet to also include all the required gems, so I'm relying on that fixed path for now.

I do bundle up the Rails app into the jar, and so there's a fixed path available to environment.rb. Production mode is also set here.

The singleton class is very simple. A static property is guaranteed to be initialised only once, so this avoids the need for any synchronization around the RailsRuntime setup.

public class RailsRuntimeSingleton {
    public static RailsRuntime singleton = new RailsRuntime();
}

Call the app when we receive a message

Now I've got a Rails instance spun up, and messages coming into the bean, it's time to plug it all together. Here's the bean's onMessage method and the hook into the Rails app:

    public void onMessage(Message message) {
        TextMessage tm = (TextMessage)message;
        try {
            createEvent(tm.getText());
        }
        catch (JMSException e) {
            System.out.println("JMSException on getText()");
        }
    }

    private void createEvent(String yaml) {
        RailsRuntime runtime = RailsRuntimeSingleton.singleton;
        // no synchronize required!
        runtime.eval("Event.create_from_yaml('"+ yaml + "')");
    }

Before Rails 2.2, I needed to synchronize around the eval() method, to serialise access to Rails. Now that's not necessary, and I can allow the container to spawn as many threads as I have connections available in the JDBC pool.

My test environment is a simple, single-threaded Perl STOMP sender running flat out to an ActiveMQ instance, plus the bean in Glassfish all on a Macbook Pro. When the bean runs serialized, the receiver isn't able to keep up and a backlog of messages builds up. Simply removing the synchronized block means the receiver can keep up with the sender easily.


Next steps

There's a lot to get right with this setup even leaving aside the dependency on the external JRuby.

  • The message queue configuration in Java in the Bean class
  • The external message broker configuration in sun-ejb-jar.xml
  • Packaging up Rails and the application as part of the Bean

More work is required to include jruby-complete.jar and remove that external dependency. Beyond that, making the Java parts generic, allowing configuration in Ruby and building outside of the IDE in the style of Warbler seems to be the way to go.

Friday Jun 20, 2008

Profiling Ruby with DTrace

You can do a basic "profile" of a Ruby program with just the interpreter probes (Joyent's patch, OSX /usr/bin/ruby, etc) and a raw DTrace script. I've got a couple of those:

They're based on Brendan Gregg's js_functime.d script for the Mozilla Spidermonkey provider, and show you the hottest functions and lines respectively - they expect a target, so invoke them like:
  sudo /usr/sbin/dtrace -s rb_linetime.d -c "/usr/bin/ruby your-script-here"
That's fine if you have a simple script you can invoke, and if it's easy to stop and start whatever activity the process is doing. If it's not, it can be difficult to get useful information from this type of script. With Ruby-DTrace, you have a couple of options for getting more precise information:
  • Embed these D programs in Ruby, and do more work in that script
  • Add USDT probes to the program you're analysing, and do more work in D

Embedding the D program

The basic flow of an embedded D program looks like this, for probes in processes outside the script's control:
  progtext = "..."
  t = Dtrace.new 
  prog = t.compile progtext
  prog.execute
  t.go

  c = Dtrace::Consumer.new(t)
  c.consume do |data|
    # process data here
  end
You can also spawn processes under the control of DTrace:
  process = t.createprocess([ ... ])
  prog = t.compile(progtext)
  prog.execute
  t.go
  process.continue
Either way, for every action, consume yields the returned data to the block. You can then process this, filtering or formatting it to get the data you're looking for.

Adding USDT probes

You can add probes to your Ruby program which indicate what's going on at the app level, rather than the interpreter level. You can then use those to provide context for the profile-style output you get from the raw script, and perhaps to gate profiling: if there's a particular phase of processing you're interested in, probes placed around it can trigger collection of information for just that period. With probes added like this:
  Dtrace::Provider.create :ruby_app do |p|
    p.probe :process_start
    p.probe :process_end
  end

  [...]
  
  def process
    Dtrace::Probe::RubyApp.process_start do |p|
      p.fire
    end

    [... processing here ...]

    Dtrace::Probe::RubyApp.process_end do |p|
      p.fire
    end
  end
it's possible to gate data collection with a D program like this:
self->processing = 0;

ruby-app:::process-start
{
  self->processing = 1;
}

ruby-app:::process-end
{
  self->processing = 0;
}

ruby:::function-entry
/self->processing == 1/
{
   trace(...);
}
This can be used to collect profiling-style data only for a useful period in the program's execution - for a web framework that might be processing a request, allowing any housekeeping to be excluded. It's also possible to use these probes to provide context for the data collected, by passing internal application state as probe arguments: again, for a web framework you might pass the URL being handled.

Monday Jun 16, 2008

USDT probes with Ruby-DTrace

I've just released the first version of Ruby-DTrace to support creating DTrace providers in Ruby, without a dependency on the USDT build tools -- dtrace(1) and the compiler and linker. This means you can easily add USDT probes to your Ruby programs as you would for a C program. The probes created are part of their own provider - they're not part of the ruby interpreter's provider. Their relationship to the interpreter probes is similar to that between application-specific USDT probes and the pid provider.

Suppose you're working with Rails: you can use the pid provider to give you access to the database queries being executed, say, but that depends on finding a suitable function to trace in the native database driver - that's going to be database-specific at best. If you can create probes in Ruby, then you can simply create and use a "query-start" probe in ActiveRecord's generic Ruby code.

Here's the quick overview of what's possible with the provider API now. One caveat - creating probes is Intel-platform only right now, though both Solaris and Mac OS X are supported. SPARC and PowerPC are in the works.

Providers are created like this:

    require 'dtrace/provider'

    def probe
      Dtrace::Provider.create :my_ruby_program do |p|
        p.probe :my_probe, :string, :string
      end
    end

which creates a provider class called Dtrace::Probe::MyRubyProgram, and a probe called probe:my_probe. The probe name generated by this code is:

  my_ruby_program2968:ruby:probe:my_probe

These are regular USDT probes, so there's a pid associated with the provider.

Probes can be created at any time while the Ruby program is running: unlike native-code USDT probes, there's no additional build step.

You fire probes like this:

    Dtrace::Probe::MyRubyProgram.my_probe do |p|
      p.fire('some', 'arguments')
    end

The block syntax is how the "is_enabled" feature of USDT probes is exposed to Ruby: the block only executes if the probe is enabled, which lets you do expensive work to gather probe arguments, which will only be done if they're required. This keeps the overall disabled-probe overhead low, which should allow probes to be left installed even in production code.

That's the really quick overview, but there's a set of documentation online at Rubyforge with more details:

http://ruby-dtrace.rubyforge.org

Ruby-DTrace

There's at least three ways for a dynamic language to interact with DTrace:

  • Probes in the interpreter core, for events like entering a function.
  • Probes created from the language, for application-level events.
  • A means of running DTrace programs and consuming the events, for all probes on the system.

Ruby's had an implementation of the first part for a while, with Joyent's patch against the Ruby source distribution, and a number of vendor Ruby distributions now ship with these probes enabled. The latter two are what Ruby-DTrace provides: Ruby-defined USDT probes, and an interface to the DTrace program and consumer APIs. Ruby-DTrace is entirely implemented as an extension module, so it doesn't need a patched interpreter to work, but it doesn't conflict with the core probes in any way. You can have a Ruby program DTrace itself, collecting data from the core interpreter probes, from Ruby-defined probes, and from any probe on the system.

Since version 0.2.5 of Ruby-DTrace both of these APIs are available in the gem, and this blog is an attempt to document what's possible, and also to explain the inner workings: there doesn't seem to be a lot of public documentation covering the DTrace interfaces it uses, and I hope to make it easier for these features to be added to other languages by explaining how the Ruby implementation works.

The code is available from github, who also host the gem download:

http://github.com/chrisa/ruby-dtrace/tree/master

Feedback on experiences with this code is very welcome, either as comments here, or directly.

Calendar

Feeds

Search

Links

Navigation

Contact