Sunday, January 17, 2010

Is Scala more complicated than what Java tries to become?

Is Scala more complicated than Java? My last post did not tell the whole truth. I've only listed Scala features, which have a Java analog. There is a glaring omission of advanced Scala features like implicit conversions, operator overloading, call-by-name parameters and pattern matching. These Scala features are more complicated than what Java has. There, I said it. But then Scala is more complicated in the way a calculator is more complicated than an abacus- sure you can do some of the same stuff with an abacus, but trying to calculate the square root of a number is much more cumbersome.

However, this complexity pays off, because it lets us simplify many day-to-day features. This post will try a different angle by comparing where Java wants to be and where Scala is right now. I hope after reading it you will at least question your assumptions whether this trade-off is worth it.

Upon its creation, Java was a fairly simple language. A major reason it took over C++ is because it was specifically designed to steer away from multiple inheritance, automatic memory management and pointer arithmetic. But it's not a simple language anymore, and it's getting more and more complicated.

Why? Java wasn't designed to be too extensible. Scala, on the other hand, was designed to be scalable, in the sense of flexible syntax. The very creators of Java knew very well that a "main goal in designing a language should be to plan for growth" (Guy Steele's famous words from Growing a Language)

We need to expand our definition of language complexity. The language needs to be able to abstract away accidental complexity, or using it will be difficult. Examples of accidental complexity: jumping to a position in your program with goto, and then remembering to go back (before procedural programming); or allocating memory, and then remembering to deallocate it (before garbage collectors). Another example: using a counter to access collections, and remembering to initialize and increment it correctly, not to mention checking when we're done.

Creating extensions of the language in order to hide these complexities doesn't happen often. When it does, it offers huge rewards. On the other hand, if a language is rigid, even though it looks simple, this forces you to invent your own arcane workarounds. When the language leaves you to deal with complexity on your own, the resulting code will necessarily be complicated.

Let's see what special new language features Java tries to add to the language, which Scala can do because of its flexibility and extensibility.

Pattern matching



Pattern matching is often compared with Java's switch/case statement. I have listed pattern matching as something which doesn't have an analog in Java, because comparing it to "switch" really doesn't do it justice. Pattern matching can be used for arbitrary types, it can be used to assign variables and check preconditions; the compiler will check if the match is exhaustive and if the types make sense. Meanwhile Java has only recently accepted Strings in switch statements, which is only scratching the surface of Scala's pattern matching.

Furthermore, Scala is using pattern matching all through the language- from variable assignment to exception handling. To compare, the proposal for handling multiple exceptions in Java is postponed yet again.

Case classes



In order to get rid of Java's verbose getters, setters, hashCode and equals, one solution is to muck with the javac compiler, like the folks from Project Lombok have done. Is going to the guts of javac complicated? I'm sure it is.

In Scala, you can do it if you just define your classes as case classes.

Implicit conversions



In short, implicit conversions help transparently convert one type to another if the original type doesn't support the operations requested.

There are many examples where this is useful.

What in Java is hardcoded in the language as conversions and promotions, in Scala is defined using implicit conversions. This is another example where Java can get quite complicated. In most cases where you need to decide how to convert a method argument, for instance, you must have in mind narrowing and widening conversions, promotions, autoboxing, varargs and overriding (whew!). In Scala, the advantage of having implicit conversions is that you can inspect the code, where no ambiguity can result. You can analyze the conversions taking place in the interpreter by supplying the "-Xprint:typer" parameter. You can even disable these implicits, if you don't like them, by shadowing the import.

Another example of what implicits can do is adding methods and functionality to existing classes. Dynamic languages already do that easily using open classes and "missing method" handlers. In Java one way to do this using bytecode manipulation trickery via libraries like cglib, bcel, asm or javassist.

Bytecode manipulation in Java is required for popular libraries like Hibernate, Spring and AspectJ. Few "enterprise" Java developers can imagine development without Hibernate and Spring. Although there are many more things you can do with AspectJ, it can be used to emulate implicits with type member declarations. However, even though using AspectJ is a more high-level way to solve the problem, it adds even more complexity, as it defines additional keywords and constructs.

If you're new to Scala, you don't lose much if you don't know how implicit conversions work, just like you don't need to know about the magic that happens behind the scenes when Hibernate persists objects or when Spring creates its proxies. Just as with bytecode generation, you're not advised to use this feature often, as it is difficult to use. Still, you'll be glad it exists, because someone will create a library which will make your life and the life of many developers so much easier.

Operator overloading



The line between operators and methods in Scala is blurred- you can use the symbols +, -, /, *, etc. as method names. In fact, that's exactly how arithmetic operators work in Scala- they are method invocations (relax, everything is optimized by the compiler).

Some people object that operator overloading adds unnecessary complexity, because they can be abused. Still, you can also abuse method naming in much the same way. For instance, some hapless folk can define methods with visually similar symbols, like method1, methodl and methodI. They can use inconsistent capitalization, like addJar or addJAR. One could use meaningless identifiers like ahgsl. Why should operator best practices be different than method naming best practices?

What is complicated is treating numeric types like ints and BigInteger differently. Not only that, but operations with BigInteger are very verbose and barely readable even with simple expressions. To compare, this is how a recursive definition of factorial looks like in Scala with BigInteger:


def factorial (x: BigInt): BigInt =
if (x == 0) 1 else x * factorial(x - 1)


This is how it would look if Scala didn't support operator overloading:


def factorial (x: BigInteger): BigInteger =
if (x == BigInteger.ZERO)
BigInteger.ONE
else
x.multiply(factorial(x.subtract(BigInteger.ONE)))


Call by name



One of the proposals for Java 7 language extension was automatic resource management. This is one more rule to the language, which you need to remember. Without this feature, code is also unnecessarily complicated, because it forces you to remember to always close resources after using them- if you slip up, subtle bugs with leaking files or connections can result.

In Scala, it's easy to add language constructs like this. Using function blocks, which are evaluated only when they are invoked, one can emulate almost any language construct, including while, if, etc..

Existential types



Existential types are roughly an alternative to Java wildcards, only more powerful.

Martin Odersky: If Java had reified types and no raw types or wildcards, I don't think we would have that much use for existential types and I doubt they would be in Scala.


If Martin Odersky says that existential types wouldn't be in the language if it wasn't for Java compatibility, why would you even need to know about them? Mostly if you need to interoperate with Java generics.

Conclusion



Scala tries to define fewer language rules, which are however more universal. Many of these advanced features are not often used, but they pay off by allowing to create constructs, which in Java would require specific hardcoded additions to the language. In Scala, they can be defined simply as libraries.

Why does it matter that it's in the libraries, and not hardcoded in the language? You can more easily evolve and adapt these features, you can add your own extensions, and you can even disable some of the library parts or replace them.

The conclusion is that if a language is not designed to be extended, it will eventually develop features, which are not well-integrated and this language will collapse under the weight of its own complexity.

Finally, learning something so that you avoid a lot of routine error-prone operations reduces effort by increasing the level of abstraction, at the cost of additional complexity. When you were in school, it was complicated to learn multiplication, but if you got over it, it would save you from quite a bit of repetition than if you just used addition.

P.S. I realize it's not possible to resolve the issue once and for all which language is more complicated- Java or Scala- in a post or two. First of all, have in mind that simple is not the same as easy to use. There are also many topics which are open for discussion. I haven't touched on Scala traits; I haven't mentioned functions as first-class constructs compared to the Java 7 closure proposal; and there's a lot that can be said about how Scala obviates many Java design patterns. Extending the Scala syntax via compiler plugins is another interesting advanced topic.

I suppose someone could even write a blog post about these topics some day.

Monday, January 4, 2010

Is Scala more complicated than Java?

One Scala-related thread on Artima drew a lot of attention: "Is Scala really more complicated than Java?". This post really struck a nerve. Whoever claims that Scala is much more complicated than Java has clearly not seen a Java Programmer Certification in a while and is probably not using many new features since Java 5 came out.

What I'll try to prove in this post is not that Scala is not a complicated language. There are certainly many languages which are simpler. The core features which are used reasonably often are indeed a simplification over Java. Scala also has features which are more complicated than what Java has. However, the complicated Scala features are more specialized at extending the language while the complexity of Java is usually imposed on everyone including the beginner.

This post will also not try to describe the language features in exhaustive detail- that's what the language specification tries to achieve, and the blog post is already long enough. I will assume that you know about the core language rules or can easily look them up.

What is complexity? Many conflate it with unreadability, some say it's the opposite to ease of use. Let's start with the following definition of complexity: it's the many special exceptions (pun not intended) to the rules, making the whole system difficult to understand and remember.

Based on that definition, let's run a comparison of language features of Java and Scala.

Keywords



Java has more reserved words than Scala. Even when we remove the words of primitive types (boolean, int, float, etc.) Scala still has less keywords!


  • Scala misses some of Java's control structures
  • Yes, continue and break (well, at least until Scala 2.8) are not part of the language, as they are not deemed a best-practice way to solve programming problems.
  • if/then/else
  • in Scala returns a value, thus eliminating the need for the only ternary operator in Java, "?:", the use of which has long been discouraged.
  • for loop
  • Java folks discovered late in the game that the enhanced for loop is much less complicated to use in cases when you don't need the counter index. And they were right- it's one more detail which you (or at least newbies) can get wrong. But why stop there- Scala has a much more universal for loop, and there aren't two different syntaxes as in Java.
  • Scala keywords not in Java
  • one might argue that the override keyword in Scala complicates things as you might do the same thing in Java with an @Override annotation. That's not quite the case, as you still might override a method by accident and forget to put the annotation (as it's not mandatory), and then the compiler will not give as much as a warning! So that's one more special case you need to worry about and keep in your head. When you start using traits, you definitely start to appreciate that override is a keyword.


Access modifiers



Java has four access modifiers: default (package), private, public and protected. Scala has three, but it can combine them with a scope for protection. This flexibility allows you to define access control, for which Java has no analogs. They are also readable because they look self explanatory. For instance, if you have the class org.apache.esme.actor.UserActor, these are the Scala equivalents for Java's access modifiers:


private[actor]
Same as package access in Java

private[UserActor]
Same as private in Java



Scala's default access is public (yay, one less keyword!).

On the other hand, by defining the scope, Scala allows visibility types, which Java doesn't have:


private[this]
the member is accessible only from this instance. Not even other instances from the same class can access it

private[esme]
access for a package and subpackages. How many times did you have to use clumsy workarounds in Java because subpackages couldn't access the parent package?

protected
only subclasses can access the field. This is more consistent than Java's protected keyword. In Java, you have to remember that both subclasses and classes in the same package can access the field. How's that for remembering arbitrary rules?

private
This will not allow you to access inner objects. This is also more consistent than Java's private access, which is perhaps indicative of the fact that inner objects in Java were bolted on after the first version of the language was created



Namespaces



Java has four namespaces (fields, methods, packages, types), Scala has two (one for types, one for everything else). This has important implications when overriding. In Scala, you can start with a parameterless def (method definition) and then override with a val (immutable variable). This helps enforce the uniform access principle: whether you access a field or a method, this doesn't restrict you later on, because the syntax is the same everywhere you access it.

One thing which you cannot do is define a variable and a method with the same method. The "Programming in Scala" book explicitly mentions that allowing this would be a code smell because of possible code duplication and the ambiguity which could arise.

Types and the type system



Scala's type system has been criticized for being too complex, but let's have a look at Java's type system.

Primitive types


There are many exceptions in Java's type system, which make the language hard to use and evolve. Java has primitive types, while in Scala everything is an object, making this valid code:


(1 to 3) foreach println


Java's primitive types make a lot of things harder- before Java 5, manual wrapping in collections, e.g. int in Integer, boolean in Boolean, etc. was a pain. After auto-boxing and unboxing came out, the syntax is cleaner, but the overhead remains, as well as some very subtle bugs. For example, auto-unboxing a null wrapper type will cause a NullPointerException.

There is a lot of code duplication because of primitive types- you always have to specify special cases and can't be truly generic.

Generics


Scala has generics done right. For instance, you can define covariance at the declaration site, whereas Java requires you to do this at the call site. This, combined with Scala's type inference allows one to use generified libraries without having to know or define the complete type signature.

Java has yet another "special" type: arrays. As a result the rules for the underlying array type and the generified ArrayList are quite different and inconsistent. The type of arrays is checked at runtime, while the genericity of ArrayList is checked at compile-time. As a result, inappropriate assignment to an array element results in a ArrayStoreException only at runtime.

Constructors



Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors.

In Scala, any code which isn't part of a method or member variable declaration is executed as the primary constructor. You can define auxiliary constructors which call either the primary one or another auxiliary one defined in the same class before it. Can you come up with anything simpler?

Uniform syntax



Scala is sometimes accused of using too many symbols. Whatever you've seen, it's mostly not part of the language, but of the libraries. You can override them and even disable them. What does Java have to say about special symbols?

Arrays


Arrays in Java are accessed via square brackets. In Scala, parentheses are used, because accessing an array index is a method call (called apply). Don't worry though, the compiler optimizes this away.

Collection literals


Java has a lot of special syntax for array instantiation and will soon have ones for instantiating lists, sets and maps. Scala, again, creates these special collections using factory methods in the companion objects: Array(1,2,3), List(1,2,3), Map((1,2)). Lists can also be created using the cons "operator", but here's the trick: it's not actually an operator. It's a method, appending to the list. You can also create a map using the arrow tuple syntax: Map(1->2). And again, this is not "special" syntax, which is part of the language- it's a method, constructing a tuple!

Now someone might smirk and think: "Ha, gotcha! Do you mean to say that simply because you've pushed the complexity out of the language and into the libraries you don't need to deal with it?". True, but let's have a look at Java and its ever growing standard libraries. It has AWT and Swing. It has old I/O and new I/O (gosh, which one do I use?). It has RMI (do you remember RMI?) and OMG, it even has CORBA. These libraries will never die. Methods in Thread have been deprecated for ages. There's also no sign that the ill-conceived Date/Calendar classes will ever be removed, but you still must know JodaTime if you hope to get any job with dates done.

More importantly, extending the language easily helps abstract away the details and evolve the language without creating tons of special cases. As per our definition, special cases add up to increase complexity. We'll explore the topic of extending the language in the followup post.