Advancing JVM performance with the LLVM compiler

The following is a transcript of an interview between TheServerSide’s Cameron W. McKenzie and Azul Systems’ CTO Gil Tene.

Cameron McKenzie: I always like talking to Gil Tene, the CTO of Azul Systems.

Before jumping on the phone, PR reps often send me a PowerPoint of what we’re supposed to talk about. But  with Tene, I always figure that if I can jump in with a quick question before he gets into the PowerPoint presentation, I can get him to answer some interesting questions that I want the answers to. He’s a technical guy and he’s prepared to get technical about Java and the JVM.

Now, the reason for our latest talk was Azul Systems’ 17.3 release of Zing, which includes an LLVM-based, code-named Falcon, just-in-time compiler. Apparently, it’s incredibly fast, like all of Azul Systems’ JVMs typically are.

But before we got into discussing Azul Systems’ Falcon just-in-time compiler, I thought I’d do a bit of bear-baiting with Gil and tell him that I was sorry that in this new age of serverless computing and cloud and containers, and a world where nobody actually buys hardware anymore, that it must be difficult flogging a high-performance JVM when nobody’s going to need to download one and to install it locally. Well, anyways, Gil wasn’t having any of it.

Gil Tene: So, the way I look at it is actually we don’t really care because we have a bunch of people running Zing on Amazon, so where the hardware comes from and whether it’s a cloud environment or a public cloud or private cloud, a hybrid cloud, or a data center, whatever you want to call it, as long as people are running Java software, we’ve got places where we can sell our JVM. And that doesn’t seem to be happening less, it seems to be happening more.

Cameron McKenzie: Now, I was really just joking around with that first question, but that brought us into a discussion about using Java and Zing in the cloud. And actually, I’m interested in that. How are people using Java and JVMs they’ve purchased in the cloud? Is it mostly EC2 instances or is there some other unique way that people are using the cloud to leverage high-performance JVMs like Zing?

Gil Tene: It is running on EC2 instances. In practical terms, most of what is being run on Amazon today, it is run as virtual instances running on the public cloud. They end up looking like normal servers running Linux on an x86 somewhere, but they run on Amazon, and they do it very efficiently and very elastically, they are very operationally dynamic. And whether it’s Amazon or Azure or the Google Cloud, we’re seeing all of those happening.

But in many of those cases, that’s just a starting point where instead of getting server or running your own virtualized environment, you just do it on Amazon.

The next step is usually that you operationally adapt to using the model, so people no longer have to plan and know how much hardware they’re going to need in three months time, because they can turn it on anytime they want. So they can empower teams to turn on a hundred machines on the weekend because they think it’s needed, and if they were wrong they’ll turn them off. But that’s no longer some dramatic thing to do. Doing it in a company internal data center? It’s a very different thing from a planning perspective.

But from our point of view, that all looks the same, right? Zing and Zulu run just fine in those environments. And whether people consume them on Amazon or Azure or in their own servers, to us it all looks the same.

Cameron McKenzie: Now, cloud computing and virtualization is all really cool, but we’re here to talk about performance. So what do you see these days in terms of bare iron deployments or bare metal deployments or people actually deploying to bare metal and if so, when are they doing it?

Gil Tene: We do see bare metal deployments. You know, we have a very wide mix of customers, so we have everything from e-commerce and analytics and customers that run their own stuff, to banks obviously, that do a lot of stuff themselves. There is more and more of a move towards virtualization in some sort of cloud, whether it’s internal or external. So I’d say that a lot of what we see today is virtualized, but we do see a bunch of the bare metal in latency-sensitive environments or in dedicated super environments. So for example, a lot of people will run dedicated machines for databases or for low-latency trading or for messaging because they don’t want to take the hit for what the virtualized infrastructure might do to them if they don’t.

But having said that, we’re seeing some really good results from people on consistency and latency and everything else running just on the higher-end Amazon. So for example, Cassandra is one of the workloads that fits very well with Zing and we see a lot of turnkey deployments. If you want Cassandra, you turn Zing on and you’re happy, you don’t look back. In an Amazon, that type of cookie-cutter deployment works very well. We tend to see that the typical instances that people use for Cassandra in Amazon with or without us is they’ll move to the latest greatest things that Amazon offers. I think the i3 class of Amazon instances right now are the most popular for Cassandra.

Cameron McKenzie: Now, I believe that the reason we’re talking today is because there are some big news from Azul. So what is the big news?

Gil Tene: The big news for us was the latest release of Zing. We are introducing a brand-new JIT compiler to the JVM, and it is based on LLVM. The reason this is big news, we think, especially in the JVM community, is that the current JIT compiler that’s in use was first introduced 20 years ago. So it’s aging. And we’ve been working with it and within it for most of that time, so we know it very well. But a few years ago, we decided to make the long-term investment in building a brand-new JIT compiler in order to be able to go beyond what we could before. And we chose to use LLVM as the basis for that compiler.

Java had a very rapid acceleration of performance in the first few years, from the late ’90s to the early 2000s, but it’s been a very flat growth curve since then. Performance has improved year over year, but not by a lot, not by the way that we’d like it to. With LLVM, you have a very mature compiler. C and C++ compilers use it, Swift from Apple is based on its, Objective-C as well, the RAS language from Azul is based on it. And you’ll see a lot of exotic things done with it as well, like database query optimizations and all kinds of interesting analytics. It’s a general compiler and optimization framework that has been built for other people to build things with.

It was built over the last decade, so we were lucky enough that it was mature by the time we were making a choice in how to build a new compiler. It incorporates a tremendous amount of work in terms of optimizations that we probably would have never been able to invest in ourselves.

To give you a concrete example of this, the latest CPUs from Intel, the current ones that run, whether they’re bare metal or powered mostly on Amazon servers today, have some really cool new vector optimization capabilities. There’s new vector registers, new instructions and you could do some really nice things with them. But that’s only useful if you have some optimizer that’s able to make use of those instructions when they know it’s there.

With Falcon, our LLVM-based compiler, you take regular Java loops that would run normally on previous hardware, and when our JVM runs on a new hardware, it recognizes the capabilities and basically produces much better loops that use the vector instructions to run faster. And here, you’re talking about factors that could be, 50%, 100%, or sometimes 2 times or 3 times faster even, because those instructions are that much faster. The cool thing for us is not that we sat there and thought of how to use the latest Broadwell chip instructions, it’s that LLVM does that for us without us having to work hard.

Intel has put work into LLVM over the last two years to make sure that the backend optimizers know how to do the stuff. And we just need to bring the code to the right form and the rest is taken care of by other people’s work. So that’s a concrete example of extreme leverage. As the processor hits the market, we already have the optimizations for it. So it’s a great demonstration of how a runtime like a JVM could run the exact same code and when you put it on a new hardware, it’s not just the better clock speed and not just slightly faster, it can actually use the instructions to literally run the code better, and you don’t have to change anything to do it.

Cameron McKenzie: Now, whenever I talk about high-performance JVM computing, I always feel the need to talk about potential JVM pauses and garbage collection. Is there anything new in terms of JVM garbage collection algorithms with this latest release of Zing?

Gil Tene: Garbage collection is not big news at this point, mostly because we’ve already solved it. To us, garbage collection is simply a solved problem. And I do realize that that often sounds like what marketing people would say, but I’m the CTO, and I stand behind that statement.

With our C4 collector in Zing, we’re basically eliminating all the concerns that people have with garbage collections that are above, say, half a millisecond in size. That pretty much means everybody except low-latency traders simply don’t have to worry about it anymore.

When it comes to low-latency traders, we sometimes have to have some conversations about tuning. But with everybody else, they stop even thinking about the question. Now, that’s been the state of Zing for a while now, but the nice thing for us with Falcon and the LLVM compiler is we get to optimize better. So because we have a lot more freedom to build new optimizations and do them more rapidly, the velocity of the resulting optimizations is higher for us with LLVM.

We’re able to optimize around our garbage collection code better and get even faster code for the Java applications running it. But from a garbage collection perspective, it’s the same as it was in our previous release and the one before that because those were close to as perfect as we could get them.

Cameron McKenzie: Now, one of the complaints people that use JVMs often have is the startup time. So I was wondering if there’s anything that was new in terms of the Agile technologies you put into your JVM to improve JVM startup? And for that matter, I was wondering what you’re thinking about Project Jigsaw and how the new modularity that’s coming in with Java 9 might impact the startup of Java applications.

Gil Tene: So those are two separate questions. And you probably saw in our material that we have a feature called ReadyNow! that deals with the startup issue for Java. It’s something we’ve had for a couple of years now. But, again, with the Falcon release, we’re able to do a much better job. Basically, we have a much better vertical rise right when the JVM starts to speed.

The ReadyNow! feature is focused on applications that basically want to reduce the number of operations that go slow before you get to go fast, whether it’s when you start up a new server in the cluster and you don’t want the first 10,000 database queries to go slow before they go fast, or whether it’s when you roll out new code in a continuous deployment environment where you update your servers 20 times a day so you rollout code continuously and, again, you don’t want the first 10,000 or 20,000 web request for every instance to go slow before they get to go fast. Or the extreme examples of trading where at market open conditions, you don’t want to be running your high volume and most volatile trades in interpreter Java speed before they become optimized.

In all of those cases, ReadyNow! is basically focused on having the JVM hyper-optimize the code right when it starts rather than profile and learn and only optimize after it runs. And we do it with a very simple to explain technique, it’s not that simple to implement, but it’s basically we save previous run Java profiles and we start a run assuming or learning from the previous run’s behavior rather than having to learn from scratch again for the first thousand operations. And that allows us to run basically fast code, either from the first transaction or the tenth transaction, but not from the ten-thousandth transaction. That’s a feature in Zing we’re very proud of.

To the other part of your question about startup behavior, I think that Java 9 is bringing in some interesting features that could over time affect startup behavior. It’s not just the Jigsaw parts, it’s certainly the idea that you could perform some sort of analysis on code-enclosed modules and try to optimize some of it for startup.

Cameron McKenzie: So, anyways, if you want to find out more about high-performance JVM computing products, head over to Azul’s website. And if you want to hear more of Gil’s insights, follow him on Twitter, @giltene.
You can follow Cameron McKenzie on Twitter: @cameronmckenzie

d