Monday, November 23, 2009

Error detection with pattern matching

Scala's pattern matching is one of the most powerful features in the language. It not only helps write concise and very readable code, but also helps prevent trivial errors.

Let's say you want to save a line or two of code when comparing for some empty data structure like a List. You decide to use comparison for equality:


val items = Nil
if (items == Nil) println("No items")


Then you decide to refactor and turn the items collection into a Set. The Scala compiler is clever and should give you a warning, right? Well, not quite:


val items = Set()
if (items == Nil) println("No items")


This results in nothing printed, as the expression evaluates to false. Of course, given the types it's perfectly clear at compile-time that this will never be true. Can't the compiler give you a hint? Indeed, it will if you use pattern matching:


items match {
case Nil => println("No items")
case _ =>
}


This will result in the following error message:


error: pattern type is incompatible with expected type;
found : object Nil
required: scala.collection.immutable.Set[Nothing]
case Nil => println("No items")
^


The examples are contrived (unparameterized Nil and Set?), but you get the point. Yes, Nil can never be a value of the Set type. You might think this is fairly obvious, but pattern matching can be a life saver when implicit conversions are involved. Consider this:


"heh".reverse == "heh"


What does this evaluate to? This should be obvious, right? But the value is false! If you used pattern matching, you would easily see why:


"heh".reverse match {
case "heh" => "obvious?"
}


This will make the compiler very nervous, and this is the reason why:


error: type mismatch;
found : java.lang.String("heh")
required: scala.runtime.RichString
case "heh" => "obvious?"
^


So reverse converts the String to the wrapper RichString, which is not the same type as String.

I have had similar problems detecting a bug where I was checking for equality with None a variable which was of type net.liftweb.common.Box (a type very similar conceptually to Scala's built-in Option).

This made me adopt a general rule to prefer pattern matching rather than equality comparison. The bugs it catches are sometimes subtle and hard to see, and that's exactly what Scala's rich static type checker tries to avoid. Use it to your advantage.

Since we're talking about bugs caught by pattern matching, there's one subtle bug, which is often (though not always) caught by the compiler (another contrived example follows):


val items = Set()
Set() match {
case items => println("empty")
case _ => println("full")
}


This will result in an error, which looks a little bit unusual to the newbie:


error: unreachable code
case _ => println("full")
^


This error is usually crying out loud: hey, you're inadvertently using variable binding instead of constant matching! The newly bound variable items shadows the existing variable with the same name and all other cases after it will never match.

One way to fix it is to use backticks to prevent the name to be bound to a new variable:


Set() match {
case `items` => println("empty")
case _ => println("full")
}


As a rule of thumb it is advised to use CapitalLetters for case classes and constants which you intend to pattern match.

This error wouldn't have occurred if you used equality comparison in the first place, but even in mildly complex cases pattern matching trumps plain equality checking in readability and detecting errors. Apparently, there are cases where pattern matching fails (for instance, matching structural types), so there's still no reason to deprecate good old "==". But there are many more errors, which pattern matching catches, like checking if the match is exhaustive. So there's no point in saving a couple of characters but lose the type safety you expect from Scala.

Wednesday, November 4, 2009

Embedded Scala interpreter

The Scala interpreter is proof that a statically typed language can have features most people only expect from a dynamically-typed language. One of the cool uses of an interpreter is embedding it within your application. This allows you to conveniently experiment with Scala and probably even interact with object instances in your running system. I won't explain how the interpreter works here, but I will try to show you a simple way of embedding the interpreter.

As it usually happens, someone beat me to it. Josh Suereth explains in great detail how to embed an interpreter, but he has done so many customizations that his solution would probably fit on several pages.

I wanted a simpler solution which one could understand at a glance. The code for Lord of the REPLs is much shorter although it doesn't customize much of what the standard interpreter offers.

I tried to come up with the shortest working version you could possibly get away with. Provided I create the settings properly, this is what I could muster:


val out = new java.io.StringWriter()
val interpreter = new scala.tools.nsc.Interpreter(settings, new PrintWriter(out))
interpreter.interpret(code)


Not too much code, is it? (Half of it is probably due to the full package names). Now you could collect your output from the "out" stream and probably convert to String if you need using "out.toString".

Not so fast, though. I said this works if I have the appropriate settings:


val settings = new scala.tools.nsc.Settings()


The problem here is that the interpreter doesn't find two of the crucial jars needed for its proper functioning: scala-compiler.jar and scala-library.jar. When it doesn't it spits out the following error:


scala.tools.nsc.FatalError: object scala not found.


Thanks to the following discussion by Eric Torreborre (author of Specs) I managed to find out that one needs to add to the bootclasspath of the settings object:


val origBootclasspath = settings.bootclasspath.value
val pathList = List(jarPathOfClass(compilerPath),
jarPathOfClass(libPath))
settings.bootclasspath.value = (origBootclasspath :: pathList).mkString(java.io.File.separator)


One could hardcode the path to these two jars, but that's not too flexible. If we want to do it right, we might create a function which discovers the path to the jar from the name of a class that's in it:


def jarPathOfClass(className: String) = {
Class.forName(className).getProtectionDomain.getCodeSource.getLocation
}


Now you could find the paths to these jars like this:


val compilerPath = jarPathOfClass("scala.tools.nsc.Interpreter")
val libPath = jarPathOfClass("scala.ScalaObject")


I've read that getProtectionDomain.getCodeSource returns null in some classloaders and might have problems specifically with OSGi. In that case, one might need to resort to the following hack:


def jarPathOfClass(className: String) = {
val resource = className.split('.').mkString("/", "/", ".class")
val path = getClass.getResource(resource).getPath
val indexOfFile = path.indexOf("file:")
val indexOfSeparator = path.lastIndexOf('!')
path.substring(indexOfFile, indexOfSeparator)
}


With the last ugly piece of code creating an interpreter is not so concise anymore, but sometimes you can't be both robust and concise.

If you want to see the above snippets assembled in one piece you can check out Apache ESME's source code for the ScalaInterpreter action.

Warning: interpreting code directly in your application is a huge security risk and might not always be a good idea.