Contrasting distributed computing with concurrent programming:
In many ways, concurrent programming and distributed programming address the same set of programming issues, but there are significant differences between the two, and architects and developers need to understand the difference.
The difference between distributed computing and concurrent programming is a common area of confusion as there is a significant amount of overlap between the two when you set out to accomplish performance goals in server, web and software development. So what’s the difference? At a fundamental level, distributed computing and concurrent programming are simply descriptive terms that refer to ways of getting work done at runtime (as is parallel processing, another term that’s often conflated with both distributed computing and concurrent programming). You’ll probably never get a bunch of developers to agree on an exact and all-encompassing definition of these terms. But we can paint with some broad brushstrokes to give you an understanding of what they mean in the enterprise-computing world.
The basics of distributed computing
Any time a workload is distributed between two or more computing devices or machines connected by some type of network, that’s distributed computing. There are a wide variety of ways to do this. When a client side device, suck as a PC, smartphone or tablet, can handle part of the work, that’s client-server distributed computing. A 3-tier architecture has a middle tier involved, as with many web applications, so that the client can be stateless and the server doesn’t have to remember information about each client or each session. Peer-to-peer architecture, where each component or machine is equally capable of and responsible for performing any required task, and clustered architecture that has multiple machines running a process in parallel, are other examples. Grid computing and cloud computing are two broad subsets of distributed computing.
The basics of concurrent programming
This term typically refers to software code that facilitates the performance of multiple computing tasks at the same time. It’s kind of like having a playground with twenty slides instead of just one. The kids don’t have to line up and wait their turn because they can play concurrently. The only reason you can use a computer mouse while listening to online radio, updating information in a spreadsheet and doing a virus scan on your PC is because of concurrent programming. In that scenario, it’s multi-tasking which allows several programs or processes to access the CPU without waiting turns. This setup permits intensive I/O processing and effective signal handling with resources being shared among multiple tasks. Concurrency can also occur via the implementation of multiple threads of computation (often with staggered start and completion points). That’s called multi-threading and it occurs within a single process or program. It’s the reason you can print one document while continuing to type in another document. Without multi-threading, UIs would be too slow since the system would be unable to respond to more than one user action at a time.
Differences and overlap
All distributed systems must, by their very nature, make use of some form of concurrent programming – otherwise they wouldn’t be able to get anything done. At a very simple level, you could say that distributed computing is mostly about infrastructure, physical or virtualized, while concurrent programming is implemented in the application layer. Both are used to leverage available resources and boost performance.
Distributed computing and concurrent programming in Java
Not surprisingly, Java provides a programming language, class libraries, APIs, architecture and other tools and support for both distributed computing and concurrent programming. A DJVM, Distributed Java Virtual Machine, on the server side allows parallel processing of a multi-threaded Java application for improved computing performance while making the distributed nature of the environment accessible through a single interface. Since enterprise applications are expected to make use of distributed computing, Java EE is the natural platform choice for use in Java-reliant organizations. Both the Java language itself and the available Java libraries support concurrent programming via APIs such as those in the java.util.Concurrent package. If you really want to dig into how to use JVM for distributed computing, review our previous article on "Distributed Computing Made Easy" by Jonas Boner here at The Server Side. It’s from 2006, but the basic principles still hold true.
What’s the future for these technologies?
Demands for computing power and better performance are only going to increase. The cloud, mobile and Big Data are all playing a role in creating the expectation that enterprises can and should manage massive amounts of information moment by moment. We’re seeing a proliferation of frameworks and tools to make this easier. Hadoop with MapReduce combines some of the best features of both distributed computing and concurrent programming with a hefty dose of parallel programming thrown in for good measure.
Chuck Lam, author of Hadoop in Action, says this technology is well established and ready to evolve even further. "I think the infrastructure is pretty mature now. You can even get Hadoop up and running on Amazon really easily since they introduced Elastic MapReduce. What’s interesting for us as a developer community is to start exploring the application layer. Many companies now have some kind of Hadoop cluster and can process any kind of data. Now it’s a matter of figuring out how to leverage that data to get some business value out of it." So, the future won’t just be about improving computing performance, it will be about increasing enterprise performance as well.