Thursday, October 21, 2010

Why Scala's Option won't save you from lack of experience

Some time ago Cedric Beust was the cause for some excitement in the Scala community by declaring that he doesn't see the advantages of using Scala's Option type, which is also similar to Haskell's Maybe type.

There were a lot of insightful comments which outlined the benefits of using Option. James Iry has written a well-reasoned post called Why Scala's "Option" and Haskell's "Maybe" types will save you from null.

I wanted to approach things differently. I wanted to show people some patterns which usually come with experience and let them decide which is better. That's why this turned into a very long post which looks more like a tutorial. Of course, I'm not adding anything new to the discussion, but just summarizing some of the common wisdom accumulated by the community. I'm sure this is not the last typical blog post showing the wonders of Option, but I hope it can clear up some (Some?) misunderstandings. Or Maybe not.

But more importantly, I wanted to explain why having a solution as a language feature is a premature optimization. It's neither as flexible, nor as powerful as having it as just a type in the library.

NullPointerException


Cedric is definitely not alone- programmers who decide to give Scala a try (and moreso Haskell or the ML family) face conceptual differences from popular imperative languages. Just reading that Option is a type wrapper does not mean it's easy to wrap one's head around it.

One advantage of using Scala is that if you are convinced that NullPointerExceptions solve your problem better, you are free to use it. Option is just one option. And of course, you can come back later any time if you make up your mind that Option has Some advantages (e.g. composability). Of course, some might view having too many choices as a disadvantage.

But both Scala and Clojure must live with the design decisions on the JVM which were taken before them, and with interoperating with a wealth of existing libraries. So allowing null is first of all a practical decision.


Getting started


First of all, let's define a simplistic employee type, a list of employees, and a map where the "builder" occupation points to the list of employees.

case class Person(name: String, email: String)
val employees = List(Person("bob", "bob@builder.com"))
val occupations = Map("builder" -> employees)



Compile-time safety and backward compatibility


One of the examples given by Cedric is pattern matching on an Option type. His main point of contention is that this looks very much like testing for null. Except one minor point: where is the example similar to the case when you don't want to test for null?

Exactly. No such valid example exists if you want to use the value inside Option, at least not unless you're explicit about it. As Paul Snively mentions, the compiler will stop you. Cedric has noticed, "the worst part about this example is that it forces me to deal with the null case right here". But this is not the worst part, it's maybe the best part.

Do you remember a feature which was added to Java 5, which was intended to save you from another type of exception, ClassCastException? Of course, that would be generics! The problem is, it gives you type-safety, but only as long as you use classes compiled with the Java 5 compiler and you don't use the escape hatch of raw types. As soon as you start using legacy code, you leave the safety of compiler checked code. You can ignore the warnings at your own risk, because then there's no guarantee you won't get a ClassCastException.

Does this remind you of something? Compile-time safety as long as you don't use legacy code or the escape hatch? These restrictions sound exactly like the ones Option has.

And of course, there is an escape hatch. You can use Option.get or instead of pattern matching, you can even use your old friend the if statement (Scala veterans, please close your eyes now):


val evangelists = occupations.get("evangelist")
// ugly, ugly, ugly
if (evangelists == None)
println("No such occupation here!")
else
println("Found occupation " + evangelists.get)

But, as you'll see later, this doesn't mean that you have to deal with the "no value" case right here. Pattern matching is not the only option.


Simplification through a generalization


Instead of solving the most obvious problem, it always pays out to see if it isn't a manifestation of a bigger class of problems. Having a related class of problems solved by a common pattern simplifies things. There are fewer rules to remember. Not only that, but the specific applications of the general solution begin to interact in ways you couldn't have anticipated before. Eventually problems will appear which would be solved by a general solution, problems which you didn't know about when you implemented the solution.

As it turns out, the problem of syntax similar to the safe dereference operator can be solved in Scala. I would say that having no explicit syntax for this, it's a fairly elegant solution, but this is subjective opinion.


Handling both value and lack of value and stop processing


This is handled by our well-known pattern match. It seems easy to use and obvious in what it does.

The advantage of pattern matching is that it uses the type system in such a way that forgetting to handle one of the cases explicitly will result in a compile time warning.

occupations.get("builder") match {
case Some(_) => println("builder occupation exists")
// oops, forgot to check for None or the catch-all _
}
// warning: match is not exhaustive!
// missing combination None


The disadvantage is that pattern matching doesn't compose very elegantly. If the result of the pattern match is just an intermediate step, you'll need to add another one, and another, and pattern matching does take some screen real estate.

Pattern matching is a bit like exception handling with try/catch blocks- you usually do it when you're interested in both the normal behaviour and the exceptional behaviour and that's fairly verbose. On a related note, did you know that you can use pattern matching in Scala's exception handlers?

So let's see what we can do to get more composable data processing.


Transform value


When we're interested in creating a series of steps for processing a value, we can use map. It will transform the value if it's there, but will leave it inside the Option. And map won't change the Option if it's empty (None.map will result in None).


val employee = employees find ( _.name == "bob" )
// Some(Person(bob,bob@builder.com))
employee map ( _.email )
// Some(bob@builder.com)



If you need to "flatten" the result you can use flatMap. This means that instead of an Option nested inside an Option, you get just one Option. It only results in Some (a "full" Option type) if it's called on Some and also results in Some:


val builders = occupations.get("builder")
// Some(List(Person(bob,bob@builder.com)))
val bobTheBuilder = builders flatMap { _ find ( _.name == "bob" ) }
// Some(Person(bob,bob@builder.com))


If you were using just map, you would get Some(Some(Person(bob,bob@builder.com))), which is probably a bit too nested for your taste.

Some of you are probably familiar with other languages which have map (like Ruby or Python) and are scratching their heads: "Wait, wasn't map defined only for lists/Enumerables?". Please be patient.


Only get the value if it satisfies a test


If you find only some of the possible values useful, you can weed out what you have by using filter. It will only result in Some for values which satisfy a certain condition (called a predicate).

bobTheBuilder filter { _.email endsWith "builder.com"}
// Some(Person(bob,bob@builder.com))


I'm sure at this point the folks who have used Google Collections have also joined the folks with past Ruby or Python experience screaming: "Hey, but filter is only used for Collections!"


Transform lack of value


That's fine, but eventually you want to get the value out. If there's no value, just assume some default value. We have to use pattern matching again, right?

But there is a shorter solution. getOrElse extracts the value or puts a default value of the same type if there's nothing in the Option container:


val larryWho = employees find ( _.name == "larry" )
// None
val emptyEmail = larryWho map ( _.email )

// None
emptyEmail.getOrElse("nobody@nowhere.com")
// nobody@nowhere.com


Groovy has this in the form of the Elvis operator. The trouble is, you can't get rid of the elvis operator, it's just adding cruft to the language, even though it's a useful one. It's also somewhat restrictive that it's all this operator can do.


Chain, chain, chain


The reason map, flatMap, filter and getOrElse are so useful is that they can be chained together, intermixed and the results can be passed around to other methods.

Let's shift to high gear and put it all together:

occupations.get("builder").
flatMap { _ find ( _.name == "bob" ) }.
map (_.email).
filter { _ endsWith "builder.com"}.
getOrElse("nobody@nowhere.com")
// bob@builder.com


If we're not yet interested in which step processing has failed, this is a clear way to express the process flow. It's also similar to the Fantom example Cedric desribed.

There is an ever shorter syntax for this using for expressions (or for comprehensions).


{for (builders <- occupations.get("builder");
bobTheBuilder <- builders find (_.name == "bob");
email = bobTheBuilder.email if email endsWith "builder.com"
) yield email
} getOrElse "nobody@nowhere.com"
// bob@builder.com


But wait, weren't for expressions a way to loop over stuff? Well, yes, this too. More generally, for expressions work with collections. And the beauty of it all is that we can use collections together with Option and do nested invocations. Let's modify the example a bit and suppose that there might be more than one person named Bob and we want them all.

for (builders <- occupations.get("builder") toList;
bobTheBuilder <- builders if bobTheBuilder.name == "bob";
email = bobTheBuilder.email if email endsWith "builder.com"
) yield email
// List(bob@builder.com)


Because for all practical purposes, Option behaves like a specialized collection. By viewing it as one, you reuse the experience of all the programmers using Groovy, Ruby, Python, Google Collections and whatnot, and flatten the learning curve.

Now imagine that the only way to work with a collection is to pattern match it. Would you use it? Yeah, me neither.


Safe invoke and composability


Now let's see how filter works in Fantom:

fansh> list := [1, 2, null]
fansh> list.findAll |v| { v.isEven }
sys::NullErr: java.lang.NullPointerException

Oh crap, then I need to to use the safe invoke operator:


fansh> list.findAll |v| { v?.isEven }
ERROR(20): Cannot return 'sys::Bool?' as 'sys::Bool'

But it all results in a compile-time error. It's the same story with the reduce higher-order function:

fansh> list.reduce(0) |r, v| { v + r }
sys::NullErr: java.lang.NullPointerException

So I can't practically use the safe invoke operator in nullable collections with filter/reduce, which the Fantom documentation has conveniently omitted from the documentation page. So we're back to checking for null the old way. This means that unlike flatMap, the safe invoke works fine when you chain, but not when you compose.


Iterator


Let's now see some other advantages of Option behaving like a collection. For instance, it lets you use Iterable's API in some elegant ways:

val noVal: Option[Int] = None
val someVal = Some(4)
List(1,2,3) ++ someVal ++ noVal
// List(1, 2, 3, 4)


Guess what happens here? Only the numbers contained in Some are added to the list. I think there's no operator for this in Fantom and Groovy, and it would be overkill to include one, too.


One size doesn't fit all


Wait, if Option is like a collection, does this mean that there are many types of Option? Does this mean I can create my own Option?

Yes, and yes. Just as there isn't just one type of List, or one type of Map, there can also be several types of Option. For instance, Lift defines its own, which it currently calls Box (I think it's a great metaphor). One of the things Box has in addition to Option is a type to collect Failures. It's no longer just a "dunno what happened, something failed along the way". It's a list of error messages which can pinpoint exactly what went wrong. This is invaluable for a web framework, because when a user expects a complex form to be validated, a simple "some of our input is wrong" just won't cut it.


for {
id <- S.param("id") ?~ "id param missing" ~> 401
u <- User.find(id) ?~ "User not found"
} yield u.toXml


And guess what, you can also define your own operators, which also work in for comprehensions.

for {
login <- get("/account/verify_credentials.xml", httpClient, Nil)
!@ "Failed to log in"
message <- login.post("/statuses/update.xml", "status" -> "test_msg1")

!@ "Couldn't post message"
xml <- message.xml
} yield xml


Except that they're not operators. When conventions and patterns evolve, Lift folks can always change the "operator". Or another framework can do it.

Another good example of using an enhanced Option is Josh Suereth's (jsuereth) Scala ARM library. It's collecting a list of errors, and Java's safe resource blocks proposed for Java 7 looks primitive in comparison.


Option is not only Scala's to have



Actually, there's nothing specific about Option that ties it to Scala. The only thing which is Scala specific is the syntax sugar of for comprehensions. You can use Option in Java if you want, although some of the examples above wouldn't be as concise and so it might be a bit of a pain. But this doesn't stop people from trying to recreate Maybe in Java.

Many folks have even tried to cheat and use Java's enhanced for expression as syntax sugar. So if Java can afford some syntax sugar over Iterator, and according to Joshua Bloch the for expression is a clear win, why shouldn't Scala do it? The difference is only that Scala's is applicable to a wider set of problems.


Why language syntax won't save you from the future


One advantage which some people don't realize Scala has is that it's a relatively minimal language with a relatively rich library. Apart from Option, there are other examples where having a library instead of a language feature has brought huge benefits to Scala. One such example is actors.

Let's compare Scala's actors to Erlang's. Undoubtedly Erlang is the daddy of practical actor implementations. Actors in the Erlang virtual machine have some superb characteristics which other runtime implementations will have a hard time catching up with. They're scalable and lightweight. They work across hosts and in the same virtual machine. They can be hot swapped and you can create millions of them in a single virtual machine.

But there's only one type of actor. This means it must deal with all possible cases, and as it usually happens, it deals better with some and worse with others. I have no doubt that having actors as part of the language, choosing only one type of actor is a very sensible decision, but it can still be restrictive sometimes.

Scala deals with this differently. Scala's actors are not part of the language, and the actor message send syntax (which is borrowed from Erlang) is just a method invocation in a library. This means that Scala's free to evolve different actor implementation, and you're free to choose the one which suits your case better. Some are more full-featured, some are lightweight and performant; some are remote, some are local; some use a thread pools, some use a single scheduler; some use managed hierarchies, some don't. Regarding actors Scala the language is smaller than Erlang, but the Scala libraries are richer.

Which actor library will win? I don't know. And probably neither do you. There might not be one best answer. That's why hardcoding stuff in the language is not a good way to prepare for the future. Only experience will, either the collective experience of the community or the extensive experience of a genius Benevolent Dictator For Life.

Saturday, September 11, 2010

Top 5 underused GNU screen features

Most people use GNU screen mostly for its abilities to detach from the terminal and create multiple shell sessions. This makes it ideal in combination with remote terminal connections like ssh. There are some features which, although not very popular, are still useful on a number of occasions.



As a bonus, let's start with a short explanation of the popular features. If you're familiar with detaching and multiplexing, you might want to skip to the sections about more rarely used features.



Detaching


The first thing to know about screen is that all shortcuts start with Ctrl + A by default.


The most common screen workflow is:



  1. Connect to a remote system via ssh




  2. Start/attach screen:



    screen -D -RR

    This command will detach a running screen session and create a new one, if necessary.



  3. Work




  4. Detach screen by pressing Ctrl + A, then D.



  5. Disconnect.



  6. Take a break, return to 1 :)




You can find the currently running screen sessions using the command:




screen -ls

Alternative: nohup is also used when you want to detach a program when you disconnect, but it will never attach to the terminal, so you cannot use it if it's interactive.




Multiplexing


When you connect via ssh, one shell is rarely enough. Connecting a second time is often inconvenient and slow. Enter screen- you can create a new session using Ctrl + A, then C. You switch to the next one using Ctrl + A, then space (or n). You switch to the previous one using Ctrl + A, then backspace (or p). Using Ctrl + A and then a digit you can directly jump to the session numbered 0-9.



Alternative: using ssh's option ControlMaster will reuse an existing connection and open another shell immediately, but this is only relevant for ssh.


Now you know the basics, let's see what other goodies screen offers.





1. Sharing sessions


Typing in a terminal is usually a lonely experience. Sometimes you wish that you could show someone else what you're doing or even do it together. It might be because you want to teach someone some UNIX tricks, or you must solve an issue together or you're up to the challenge of remote pair programming. Your wish is granted! In screen, you can make it so that your keyboards and monitors are in control of a single session.


The screen host should do the following:



  1. Ctrl + A then type:



    :multiuser on



  2. Ctrl + A then type:



    :acladd guest

    where guest is the name of the user you want to let in your session




Then the screen guest can join using the following command (which will work if there's a single multi-user session open):




screen -x username/

Alternative: VNC can be used only for graphical remote connection sharing.




2. Copy and paste


It's not always easy to copy and paste text in a terminal. For one, if you're on a getty console (the black screen with the login prompt and no graphics), you can't even use your mouse (usually). If you have a crappy X terminal, you might have the problem that wrapped lines are cut by newlines or it's hard to select the output if it spans more than a single screen... Whatever it is, with screen you can cut like a pro without even using the mouse.


Ctrl + A, then pressing "[" will enter copy mode. When in copy mode, you navigate around using the vi key shortcuts (you know the vi shortcuts, otherwise you wouldn't be reading about working in terminal sessions, right?).



Press space once to mark the beginning and a second time to mark the end of the snippet you want to cut.


Ctrl + A, then "]" will paste the copied text.


Did I mention screen can also copy rectangular blocks of text? Reading about it in the manual is left as an exercise for the curious reader.


Alternative: X-Server's select/middle-click can be used- only if you're running in an X session, though.





3. Log output


We humans are not good at remembering stuff, that's what computers are for. To record exactly what you have typed and what was the output, start logging using Ctrl + A, then H, and the same sequence to stop logging. The logging session is written to screenlog.0 if you're recording in the first window, screenlog.1 in the second one, etc.


Alternative: The UNIX command script will start a new shell and record all keystrokes and output in the file typescript.




4. Monitor for activity


Let's say you've started a long-running command and you're waiting for it to finish while you're busy typing in another window. Switching the window periodically just to check what's going on quickly becomes annoying, so you type Ctrl + A, then M and you're set- screen is watching the console for you. If anything changes in this window, you will see a notification in the status line at the bottom.



Alternative: Many linux terminals can monitor for activity, e.g. konsole- but they're graphical and need to run in an X session.




5. Lock screen


It's often useful to lock the screen session without detaching, especially when using a multi-user session. Ctrl + A, then x, and you're done. Type your password and your session is available again.


Alternative: Popular desktop environments use xscreensaver, but again- this only works in X sessions.





Where to go from here?


If you like GNU screen, you might also try tmux. According to the site, "tmux is intended to be a modern, BSD-licensed alternative to programs such as GNU screen". If you're interested, check out this detailed tmux blog post.


If you want to take screen to the next level, you might give tiling window managers a try. They do for the graphical environment what screen does for the terminal. The idea is that you don't resize your windows, but the window manager does that automatically by partitioning the desktop area in adjacent windows. Most of these also try to use keyboard shortcuts extensively and obviate the need of a mouse. I'm currently impressed by xmonad, although I've heard nice things about the awesome window manager as well (yes, that's its name).



Tuesday, August 10, 2010

Testing with Lift's TestFramework

Getting started



HTTP testing is sufficiently tedious that some folks don't do it. Even if we do it, in Java it doesn't look as pretty as it could be.

The Scala Lift web framework, apart from the other advantages it has (which are a topic for many a blog post) offers some syntax-sugar wrappers so that testing of our REST APIs can be concise and to the point.

Combined with Jetty, this leads to some seriously short and readable code.

First, we need to:

  • use the trait "with TestKit" in our test

  • override the baseUrl property

  • start our Jetty server





Then if we want to output our own failure message later, we need to provide an implicit class of type ReportFailure, where we only need to implement the fail method, which predictably takes a String. For example, regardless of whether we use ScalaTest or Specs, we can fall back to the fail method which is inherited by our test class:



This is all we need so far.

Using it



Now we're ready to GET some action.



There's already a lot going on here. First of all, the get and post methods return a TestResponse (you don't see it, because it's inferred). And the nice thing about TestResponse (and Scala) is that since it implements the foreach method, it can be used in for expressions together with the "<-" symbol.

Whatever is extracted by the for expression can be used to issue get and post requests, but in the context of the previous request. This means that cookies are preserved, so you can use the same HTTP session. In this contrived example we use an httpClient as an argument to the get method (which is supposed to use some HTTP credentials and log us in). In the following call, we don't need to use the authenticating http client, because we can have our cookies (and eat them, too).

We can also use http parameters as a sequence of tuples, which can be delimited using the "->" operator.

For any TestResponse, we can use the xml property, which is the XML as a scala.xml.Elem wrapped in the (ubiquitous for Lift) Box.

Finally, there's concise syntax to assert certain properties about the response. For example the "!@" operator checks if the response code is 200, otherwise it fails with the error message specified after the operator.

If we expect a specific return code other than 200, we can also specify it explicitly:



Customizing



If you want to define a HTTP basic authentication client to be used by default by your requests, you can override the method theHttpClient. Lift's TestFramework provides the buildBasicAuthClient method, which can be reused to quickly create an HTTP client with a set user and password.



If you're like me, you might lose hours trying to find out why the request doesn't work when the server doesn't explicitly request authentication. Then setting the preemptive flag on the client would definitely save some time.

I know some of you Scala geeks are already bored because everything so far is just so simple. There wasn't even any mention of implicits! But rest assured, I will find some use for implicits.

Pimping



The usage snippet in the for expression above is not quite right. If the response contains no valid xml, then the specs matcher will not be executed and the test will not fail, although it should. Here we can use the fact that TestResponse also implements the filter method, which allows us to have if expressions (also known as filters). The post snippet could then be rewritten like this:



We are again using the Specs matcher to test if the response contains xml and fail otherwise. Notice that this expression does not return true- it need not be boolean. This is due to the fact that TestResponse's filter method intentionally returns Unit, not Boolean.

However, that's too much boilerplate, which we have to do for every time we want to check the contents of an xml response:


  1. Check if the response is valid and has a response code of 200

  2. Check if the xml is not empty

  3. Extract the xml

  4. Only then check the contents of the XML



If only we could make TestResponse do what Specs does with XML... But since this is Scala, we can create a wrapper and then transparently substitute it using our implicit conversion:



An important thing to remember about implicit conversions is that if we want them to feel seamless, the wrapper class' methods should return the original class type (the one before the conversion). Otherwise we could get unexpected object classes similarly to what occurred with the Scala 2.7 RichString, for instance with "aaa" reverse == "aaa"

Since we have chosen to use the same operator as Specs uses, we had to disambiguate with the XmlBaseMatchers class name, which is the specs trait containing all the XML matching goodness.

Now we can use our fancy TestResponse operators. Notice that since we return the same class type that we already had, we can chain:



This looks more compact than the original version so you can focus on the API differences and not be distracted by boilerplate preconditions.

Friday, March 12, 2010

Pomodoro- what's in it for me?

Lately I've been trying to use the Pomodoro technique. I have been using time boxing for a longer time and find Pomodoro to be a simple and efficient framework which revolves around time boxing.

Pomodoro being so simple, I was rather surprised to find a critique of Pomodoro recently. The critic ignores the fact that the Pomodoro author doesn't mandate that Pomodoro should be used in every occasion; that Pomodoro may use a different time unit than 25 minutes; or that you can void any Pomodoro if you see fit. But let alone the fact that the author misinterprets how strict the Pomodoro should be. By focusing on productivity, it's easy to ignore an equally important aspect- Pomodoro also helps relieve stress and prevents burnout.

Let me try to explain how Pomodoro achieves this by describing the basic building blocks of the Pomodoro process, which applies to time-boxing in general.

What is time boxing? Simply put, it's deciding what to do in the next fixed short period of time- and sticking to it. This consists of the following key stages and transitions:

1) at the start of the Pomodoro- comitting to one thing only for the next time period

2) during the Pomodoro- trying to focus on this one thing without interruptions

3) at the end of the Pomodoro- wrapping up and detaching from the work, followed by a short break

Now let's enumerate some of the reasons for stress, anxiety and burnout. Among these are: multitasking, procrastination, interruptions and obsessing over the completion of a task.

Comitting to one thing is crucial both for a productive and stress-free state of mind. Multitasking is proven to lead to anxiety and inefficiency time and time again. Why? First of all, because switching contexts is very inefficient and exhausting at that. Getting into the right set of mind to do a task takes time and effort and switching to another task destroys all that mental preparation.

Multitasking also leads to the impression that we're not acomplishing anything. If you have followed Joel's article on the subject, the explanation is simple: if you do 2 tasks which take 10 minutes to finish sequentially, you will have a finished task in 10 minutes. If you switch the tasks every minute, you will only get the first task done after 19 minutes! Multiply this by the number of tasks you're trying to switch every day and you get the picture.

The conclusion is that multitasking takes more effort and gets things accomplished slower. If this is not frustrating, I don't know what is.

Furthermore, giving a time limit for a simple task helps you get started. It encourages decomposing the problem into subtasks and transforms a huge amorphous blob of work look like something more palatable. More importantly, getting started is the best way to defeat procrastination, and procrastination is also a major factor for anxiety. Now which is more stressful: a heap of work, where you don't know where to start, or neat organized small tasks, each of which is easier to estimate? Try for yourself.

Interruptions are another very frequent source of frustration. An extremely important but often forgotten type of interruptions is internal ones. Keeping track of one's desire to chat with someone, check your email or just browse the web for something interesting which has popped in your mind is hard to resist, and can break your flow. It is, however, much easier to control these urges if you know that a break is coming soon, rather than when you feel that it's all work all day long, and a small diversion will just take a couple of seconds. Except that it doesn't.

Interruptions from other people are harder to avoid, especially in a job, which is related to responding to different events, like answering a support hotline. However, even in the case of phone calls, you cannot be in two unrelated conversations at once. Besides, the criticism of Pomodoro listed two examples which are notably free from interruptions: a surgeon in an operation or a lawyer defending a case in court. You don't need to control task switching in these scenarios because the situation does not allow for any other tasks to be performed. Can you imagine a lawyer in court or an operating surgeon surf the web or check their mail? Thought so.

Finally, it is very easy to forget that the slow but steady runner wins the race. We often lose track of how much time and effort we have been spending on a task. The short length of the Pomodoro (or any time boxing technique) helps you take a step away and ask: where am I going with this? Is it taking longer than anticipated? Am I actually doing anything or going in circles? Taking a break and detaching yourself from the task at hand helps you see the bigger picture and backtrack if you've reached a dead end. Besides, taking a break will often let your subconscious find a solution, which is otherwise not obvious. Obsessing on completing the task is counterproductive and will leave you unable to take on the next task and burned out in the long run.

There are other benefits of time-boxing, but all in all, it is a very useful technique. Even people who claim they are able to concentrate without taking breaks, are often doing it subconsciously by letting their minds meander from time to time or switching for a couple of minutes to less stressful aspects of the task. But why rely on being a born time-boxer? Once you agree that time-boxing is helpful, it pays off to make it a habit. It will make you more concentrated and calm.

And remember- if it ever feels like you're more stressed by using time-boxing, don't push yourself too hard and stop it. Take a break; enjoy your own flow.

Sunday, January 17, 2010

Is Scala more complicated than what Java tries to become?

Is Scala more complicated than Java? My last post did not tell the whole truth. I've only listed Scala features, which have a Java analog. There is a glaring omission of advanced Scala features like implicit conversions, operator overloading, call-by-name parameters and pattern matching. These Scala features are more complicated than what Java has. There, I said it. But then Scala is more complicated in the way a calculator is more complicated than an abacus- sure you can do some of the same stuff with an abacus, but trying to calculate the square root of a number is much more cumbersome.

However, this complexity pays off, because it lets us simplify many day-to-day features. This post will try a different angle by comparing where Java wants to be and where Scala is right now. I hope after reading it you will at least question your assumptions whether this trade-off is worth it.

Upon its creation, Java was a fairly simple language. A major reason it took over C++ is because it was specifically designed to steer away from multiple inheritance, automatic memory management and pointer arithmetic. But it's not a simple language anymore, and it's getting more and more complicated.

Why? Java wasn't designed to be too extensible. Scala, on the other hand, was designed to be scalable, in the sense of flexible syntax. The very creators of Java knew very well that a "main goal in designing a language should be to plan for growth" (Guy Steele's famous words from Growing a Language)

We need to expand our definition of language complexity. The language needs to be able to abstract away accidental complexity, or using it will be difficult. Examples of accidental complexity: jumping to a position in your program with goto, and then remembering to go back (before procedural programming); or allocating memory, and then remembering to deallocate it (before garbage collectors). Another example: using a counter to access collections, and remembering to initialize and increment it correctly, not to mention checking when we're done.

Creating extensions of the language in order to hide these complexities doesn't happen often. When it does, it offers huge rewards. On the other hand, if a language is rigid, even though it looks simple, this forces you to invent your own arcane workarounds. When the language leaves you to deal with complexity on your own, the resulting code will necessarily be complicated.

Let's see what special new language features Java tries to add to the language, which Scala can do because of its flexibility and extensibility.

Pattern matching



Pattern matching is often compared with Java's switch/case statement. I have listed pattern matching as something which doesn't have an analog in Java, because comparing it to "switch" really doesn't do it justice. Pattern matching can be used for arbitrary types, it can be used to assign variables and check preconditions; the compiler will check if the match is exhaustive and if the types make sense. Meanwhile Java has only recently accepted Strings in switch statements, which is only scratching the surface of Scala's pattern matching.

Furthermore, Scala is using pattern matching all through the language- from variable assignment to exception handling. To compare, the proposal for handling multiple exceptions in Java is postponed yet again.

Case classes



In order to get rid of Java's verbose getters, setters, hashCode and equals, one solution is to muck with the javac compiler, like the folks from Project Lombok have done. Is going to the guts of javac complicated? I'm sure it is.

In Scala, you can do it if you just define your classes as case classes.

Implicit conversions



In short, implicit conversions help transparently convert one type to another if the original type doesn't support the operations requested.

There are many examples where this is useful.

What in Java is hardcoded in the language as conversions and promotions, in Scala is defined using implicit conversions. This is another example where Java can get quite complicated. In most cases where you need to decide how to convert a method argument, for instance, you must have in mind narrowing and widening conversions, promotions, autoboxing, varargs and overriding (whew!). In Scala, the advantage of having implicit conversions is that you can inspect the code, where no ambiguity can result. You can analyze the conversions taking place in the interpreter by supplying the "-Xprint:typer" parameter. You can even disable these implicits, if you don't like them, by shadowing the import.

Another example of what implicits can do is adding methods and functionality to existing classes. Dynamic languages already do that easily using open classes and "missing method" handlers. In Java one way to do this using bytecode manipulation trickery via libraries like cglib, bcel, asm or javassist.

Bytecode manipulation in Java is required for popular libraries like Hibernate, Spring and AspectJ. Few "enterprise" Java developers can imagine development without Hibernate and Spring. Although there are many more things you can do with AspectJ, it can be used to emulate implicits with type member declarations. However, even though using AspectJ is a more high-level way to solve the problem, it adds even more complexity, as it defines additional keywords and constructs.

If you're new to Scala, you don't lose much if you don't know how implicit conversions work, just like you don't need to know about the magic that happens behind the scenes when Hibernate persists objects or when Spring creates its proxies. Just as with bytecode generation, you're not advised to use this feature often, as it is difficult to use. Still, you'll be glad it exists, because someone will create a library which will make your life and the life of many developers so much easier.

Operator overloading



The line between operators and methods in Scala is blurred- you can use the symbols +, -, /, *, etc. as method names. In fact, that's exactly how arithmetic operators work in Scala- they are method invocations (relax, everything is optimized by the compiler).

Some people object that operator overloading adds unnecessary complexity, because they can be abused. Still, you can also abuse method naming in much the same way. For instance, some hapless folk can define methods with visually similar symbols, like method1, methodl and methodI. They can use inconsistent capitalization, like addJar or addJAR. One could use meaningless identifiers like ahgsl. Why should operator best practices be different than method naming best practices?

What is complicated is treating numeric types like ints and BigInteger differently. Not only that, but operations with BigInteger are very verbose and barely readable even with simple expressions. To compare, this is how a recursive definition of factorial looks like in Scala with BigInteger:


def factorial (x: BigInt): BigInt =
if (x == 0) 1 else x * factorial(x - 1)


This is how it would look if Scala didn't support operator overloading:


def factorial (x: BigInteger): BigInteger =
if (x == BigInteger.ZERO)
BigInteger.ONE
else
x.multiply(factorial(x.subtract(BigInteger.ONE)))


Call by name



One of the proposals for Java 7 language extension was automatic resource management. This is one more rule to the language, which you need to remember. Without this feature, code is also unnecessarily complicated, because it forces you to remember to always close resources after using them- if you slip up, subtle bugs with leaking files or connections can result.

In Scala, it's easy to add language constructs like this. Using function blocks, which are evaluated only when they are invoked, one can emulate almost any language construct, including while, if, etc..

Existential types



Existential types are roughly an alternative to Java wildcards, only more powerful.

Martin Odersky: If Java had reified types and no raw types or wildcards, I don't think we would have that much use for existential types and I doubt they would be in Scala.


If Martin Odersky says that existential types wouldn't be in the language if it wasn't for Java compatibility, why would you even need to know about them? Mostly if you need to interoperate with Java generics.

Conclusion



Scala tries to define fewer language rules, which are however more universal. Many of these advanced features are not often used, but they pay off by allowing to create constructs, which in Java would require specific hardcoded additions to the language. In Scala, they can be defined simply as libraries.

Why does it matter that it's in the libraries, and not hardcoded in the language? You can more easily evolve and adapt these features, you can add your own extensions, and you can even disable some of the library parts or replace them.

The conclusion is that if a language is not designed to be extended, it will eventually develop features, which are not well-integrated and this language will collapse under the weight of its own complexity.

Finally, learning something so that you avoid a lot of routine error-prone operations reduces effort by increasing the level of abstraction, at the cost of additional complexity. When you were in school, it was complicated to learn multiplication, but if you got over it, it would save you from quite a bit of repetition than if you just used addition.

P.S. I realize it's not possible to resolve the issue once and for all which language is more complicated- Java or Scala- in a post or two. First of all, have in mind that simple is not the same as easy to use. There are also many topics which are open for discussion. I haven't touched on Scala traits; I haven't mentioned functions as first-class constructs compared to the Java 7 closure proposal; and there's a lot that can be said about how Scala obviates many Java design patterns. Extending the Scala syntax via compiler plugins is another interesting advanced topic.

I suppose someone could even write a blog post about these topics some day.

Monday, January 4, 2010

Is Scala more complicated than Java?

One Scala-related thread on Artima drew a lot of attention: "Is Scala really more complicated than Java?". This post really struck a nerve. Whoever claims that Scala is much more complicated than Java has clearly not seen a Java Programmer Certification in a while and is probably not using many new features since Java 5 came out.

What I'll try to prove in this post is not that Scala is not a complicated language. There are certainly many languages which are simpler. The core features which are used reasonably often are indeed a simplification over Java. Scala also has features which are more complicated than what Java has. However, the complicated Scala features are more specialized at extending the language while the complexity of Java is usually imposed on everyone including the beginner.

This post will also not try to describe the language features in exhaustive detail- that's what the language specification tries to achieve, and the blog post is already long enough. I will assume that you know about the core language rules or can easily look them up.

What is complexity? Many conflate it with unreadability, some say it's the opposite to ease of use. Let's start with the following definition of complexity: it's the many special exceptions (pun not intended) to the rules, making the whole system difficult to understand and remember.

Based on that definition, let's run a comparison of language features of Java and Scala.

Keywords



Java has more reserved words than Scala. Even when we remove the words of primitive types (boolean, int, float, etc.) Scala still has less keywords!


  • Scala misses some of Java's control structures
  • Yes, continue and break (well, at least until Scala 2.8) are not part of the language, as they are not deemed a best-practice way to solve programming problems.
  • if/then/else
  • in Scala returns a value, thus eliminating the need for the only ternary operator in Java, "?:", the use of which has long been discouraged.
  • for loop
  • Java folks discovered late in the game that the enhanced for loop is much less complicated to use in cases when you don't need the counter index. And they were right- it's one more detail which you (or at least newbies) can get wrong. But why stop there- Scala has a much more universal for loop, and there aren't two different syntaxes as in Java.
  • Scala keywords not in Java
  • one might argue that the override keyword in Scala complicates things as you might do the same thing in Java with an @Override annotation. That's not quite the case, as you still might override a method by accident and forget to put the annotation (as it's not mandatory), and then the compiler will not give as much as a warning! So that's one more special case you need to worry about and keep in your head. When you start using traits, you definitely start to appreciate that override is a keyword.


Access modifiers



Java has four access modifiers: default (package), private, public and protected. Scala has three, but it can combine them with a scope for protection. This flexibility allows you to define access control, for which Java has no analogs. They are also readable because they look self explanatory. For instance, if you have the class org.apache.esme.actor.UserActor, these are the Scala equivalents for Java's access modifiers:


private[actor]
Same as package access in Java

private[UserActor]
Same as private in Java



Scala's default access is public (yay, one less keyword!).

On the other hand, by defining the scope, Scala allows visibility types, which Java doesn't have:


private[this]
the member is accessible only from this instance. Not even other instances from the same class can access it

private[esme]
access for a package and subpackages. How many times did you have to use clumsy workarounds in Java because subpackages couldn't access the parent package?

protected
only subclasses can access the field. This is more consistent than Java's protected keyword. In Java, you have to remember that both subclasses and classes in the same package can access the field. How's that for remembering arbitrary rules?

private
This will not allow you to access inner objects. This is also more consistent than Java's private access, which is perhaps indicative of the fact that inner objects in Java were bolted on after the first version of the language was created



Namespaces



Java has four namespaces (fields, methods, packages, types), Scala has two (one for types, one for everything else). This has important implications when overriding. In Scala, you can start with a parameterless def (method definition) and then override with a val (immutable variable). This helps enforce the uniform access principle: whether you access a field or a method, this doesn't restrict you later on, because the syntax is the same everywhere you access it.

One thing which you cannot do is define a variable and a method with the same method. The "Programming in Scala" book explicitly mentions that allowing this would be a code smell because of possible code duplication and the ambiguity which could arise.

Types and the type system



Scala's type system has been criticized for being too complex, but let's have a look at Java's type system.

Primitive types


There are many exceptions in Java's type system, which make the language hard to use and evolve. Java has primitive types, while in Scala everything is an object, making this valid code:


(1 to 3) foreach println


Java's primitive types make a lot of things harder- before Java 5, manual wrapping in collections, e.g. int in Integer, boolean in Boolean, etc. was a pain. After auto-boxing and unboxing came out, the syntax is cleaner, but the overhead remains, as well as some very subtle bugs. For example, auto-unboxing a null wrapper type will cause a NullPointerException.

There is a lot of code duplication because of primitive types- you always have to specify special cases and can't be truly generic.

Generics


Scala has generics done right. For instance, you can define covariance at the declaration site, whereas Java requires you to do this at the call site. This, combined with Scala's type inference allows one to use generified libraries without having to know or define the complete type signature.

Java has yet another "special" type: arrays. As a result the rules for the underlying array type and the generified ArrayList are quite different and inconsistent. The type of arrays is checked at runtime, while the genericity of ArrayList is checked at compile-time. As a result, inappropriate assignment to an array element results in a ArrayStoreException only at runtime.

Constructors



Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors.

In Scala, any code which isn't part of a method or member variable declaration is executed as the primary constructor. You can define auxiliary constructors which call either the primary one or another auxiliary one defined in the same class before it. Can you come up with anything simpler?

Uniform syntax



Scala is sometimes accused of using too many symbols. Whatever you've seen, it's mostly not part of the language, but of the libraries. You can override them and even disable them. What does Java have to say about special symbols?

Arrays


Arrays in Java are accessed via square brackets. In Scala, parentheses are used, because accessing an array index is a method call (called apply). Don't worry though, the compiler optimizes this away.

Collection literals


Java has a lot of special syntax for array instantiation and will soon have ones for instantiating lists, sets and maps. Scala, again, creates these special collections using factory methods in the companion objects: Array(1,2,3), List(1,2,3), Map((1,2)). Lists can also be created using the cons "operator", but here's the trick: it's not actually an operator. It's a method, appending to the list. You can also create a map using the arrow tuple syntax: Map(1->2). And again, this is not "special" syntax, which is part of the language- it's a method, constructing a tuple!

Now someone might smirk and think: "Ha, gotcha! Do you mean to say that simply because you've pushed the complexity out of the language and into the libraries you don't need to deal with it?". True, but let's have a look at Java and its ever growing standard libraries. It has AWT and Swing. It has old I/O and new I/O (gosh, which one do I use?). It has RMI (do you remember RMI?) and OMG, it even has CORBA. These libraries will never die. Methods in Thread have been deprecated for ages. There's also no sign that the ill-conceived Date/Calendar classes will ever be removed, but you still must know JodaTime if you hope to get any job with dates done.

More importantly, extending the language easily helps abstract away the details and evolve the language without creating tons of special cases. As per our definition, special cases add up to increase complexity. We'll explore the topic of extending the language in the followup post.