An Overview of Distributed Programming and Systems

Alex Alves
8 min readJan 26, 2024

--

A simple way to define/describe is the same system can be executed in many environments, where you can distribute the execution/process of some flow. Besides this, you need to build algorithms that can be executed in this environment. Let’s see some overview concepts, below!

Why is this important?

When you work with high processing power, distributing it is an excellent way to improve the utilization of computational resources without spending all your resources on the machine. You allow your system to process things in different environments/nodes/machines/servers.

When we use a non-distributed architecture, we have just one POD/VM (the first called) to execute all the flow:

On the other hand, when we use distributed architecture, we can divide the responsibility to execute all the processes. Where we use all PODs/VMs to do this:

Distributed programming is the art of solving the same problem that you can solve on a single computer using multiple computers. [Mikito Takada]

Pros

  • Horizontal Scalability
  • Availability: failures do not impact other modules/services
  • Performance: distributed tasks may get a great result
  • Resilience: distributed processes and data may be more failure resistance

Cons

  • Complexity: requires advanced knowledge about distributed programming, network, and resource manager
  • Consistency: mainly in a highly concurrency
  • Communication: may introduce latency when transmitting some data
  • Infrastructure: you should know about networking to define the best way to reduce latency, cost, and degradation

Topologies

When we talk about distributed systems, we talk about strategies for following, and we need to answer the following question by ourselves:

* Are we confident in the system/service, to process some piece of journey/flow correctly?

If the answer is “yes”, we don’t need some app to control the flow, we need to accept it, and we will get a decentralized system. If the answer is “no”, we need something to control all the journey/flow, to tell which service/environment will execute something, and then we'll get a centralized system. Both are distributed but in different ways (or Topologies). Let’s talk about it!

Broker (Decentralized)

The message flow is distributed across the journey/components. This topology is very useful if you do not want a central event orchestration. You only need to broadcast the message; someone consumes it and propagates another message without other components knowing about the process. It’s all about the chaining of events. And, to see what happens if an event is published, you need an “alive documentation” to demonstrate it, like an EventStorming.

As an example, we can imagine two scenarios, one where we don’t require any order/sequence for things to happen or another where we do require a certain ordering:

Without order/sequence
With order/sequence

If you pay close attention, both cases involve a certain sequence, because for the customer to receive the merchandise, the company/store has to send it, the same goes for the entire e-commerce process. But the big point is that each stage of the service subscribes to listen/consume messages, not the services, whether they are sequential or not. Despite there seeming to be a direct dependency between adding the item to the cart and going to checkout, the item added to the cart could come from N places, as the interesting part is the message itself. Therefore, the services remain autonomous, as the message publisher doesn’t know what the next step is and the message consumer doesn’t know where it came from.

Points to Consider
🔴 Monitoring complexity
🔴 Debbuging complexity
🔴 Error handling
🟢 Decoupling
🟢 Flexibility
🟢 Scalability
🟢 High throughput

Mediator (Centralized)

It is useful if you need some level of orchestration, like when you will do a rollout process (migrate functionalities of legacy to new application). Here, you have to pay attention to SPOF (Single Point of Failure) because all the processes will depend on your orchestration. Here, we maybe see the Request-Reply pattern, in which the orchestration needs to wait for the service reply when he calls it.

Now let’s revisit, once again, the two previous scenarios, including orchestration:

The order still remains important and necessary; both flows will occur in the same manner, but now we have a controller where it needs to know everything. With this, we include risk in the application and complexity, as the Orchestrator will require a lot of logic to know all the services and call them directly/indirectly, and it needs the response from each service to allow the flow to continue. Furthermore, if the orchestrator fails, the entire flow ends/gets lost.

Points to Consider
🔴 Single Point of Failure (SPOF)
🔴 Scalability limitations
🔴 Performance overhead
🔴 Over-coupling
🟢 Clearer visibility
🟢 Simplified error handling
🟢 Improved security

Sagas

This is a pattern that helps us to do rollbacks in “distributed transactions”, used in both topologies like Choreographic (to broker topology) and Orchestration (to mediator topology).

Choreographic
When a process fails, the service publishes an event of failure, and all interested services consume it to do something (like "revert"), using the "compensation event" concept.

Orchestration
Is almost the same thing as Choreographic, except it controls the failure service and calls directly all other points to "revert".

In other words, we don’t have a real rollback, just a “compensation event”, because the side effects have already happened. And, looking at this we observe how this strategy impacts the system coordination and consistency. As the topology models the components' interaction, let’s see below how can we use the concurrency control to guarantee the consistency and integrity of each service.

Concurrency Controls

It's something like a mechanism that ensures to control of multiple concurrent transactions. It thinks about the conflicts and how we can maintain data consistency.

Optimistic

This strategy considers the best scenarios possible because it assumes will happen infrequently and allows transactions to happen unsynchronized. However, you need to guarantee that the conflicts that will have don’t affect your system. Whether through exceptions or the ordering of actions that will happen, for example.

Pessimistic

On the other hand, you can use a blockade strategy. In this case, we assume that conflicts will happen frequently and we block the part of the code for the first client to use it. We can use thread local safe or distributed lock for this, using something like Redis.

The best practice to choose will depend of to know which how you will send and receive requests. For this, let’s see some high-level protocols.

Communication Protocols

Let's see some useful protocols that we can use in Distributed Architecture:

HTTP/HTTPS

  • Client-Server communication: allows to segregate tasks between client and server
  • Stateless: each request is independent
  • Scalability: easy server replication

RPC

  • Transparent Integration: call remote functions as local functions
  • Network Abstraction: you do not need to care about the subjacent networks
  • Concurrency Manager: allows control of transactions and distributed locks

MQ (Message Queue)

  • As AMQP orMQTT
  • Decoupling and Asynchronism: A component can send a message without waiting for your processing
  • Fault Tolerance: Ensure message delivery even if systems are temporarily unavailable
  • Load Balance: Allows to distribute messages in many server instances
  • Integration of Heterogeneous Systems: Programming language agnostic

Now, let’s see how we can use all these things together in an architecture/code design

Architectures/Patterns

Client-Server

It is when you divide the application into two, client and server. The client is the part that is responsible for interacting with the end user, and the server provides all the client requests. The client can be software in a mobile phone or web application, and the server can be a dedicated machine. We need to pay attention to a single point, which every request on the server needs a response, so the clients wait for the server to process the request.

Service-Oriented Architecture

A way to goal organize a system with interconnected services. Service in this context is autonomous, independent, and exposed. Following the Distributed System, we can distribute different tasks to different services.

Microservices

This pattern almost follows the same way as SOA, but the difference is how the Consumer Layer uses the services. In this case, we do not need the ESB, the services are so independent that the Consumer Layer uses it directly and their infrastructure is just for itself.

Event-Driven Architecture

We can do this pattern in different scenarios, from Monolithic to Microservice. The idea is to use the MQ protocol to do async communication, in which we can use the load balance strategy and redirect all the charges to the infrastructure, like Message Brokers. This is useful for distributing the same task process to different server instances.

Actor Models

This pattern is based on Actors, in which each actor is an independent unit and your interactions are by messages. This kind of architecture works with a stateful strategy, and for this is very performative.

Conclusions

We need to recognize this way brings us some benefits, like resilience, scalability, and performance. But we need to see the complexity and challenges that exist, too, like architectural decisions, concurrency controls, and protocol management. For this, we can look at the CAP Theorem, which may help us choose between consistency, availability, and fault tolerance.

The comprehension of these trade-offs helps us to decide the best way to follow and the best alignment between the tech team and product team.

PS: This article brings us an overview of the distributed system, we did not go deep into any concepts. For this, we require a deeper study.

Abbreviations / Significates

  • SPOF: Single Point Of Failure
  • POD: Kubernetes abstraction that represents a group of one or more application containers, and some shared resources for those containers
  • VM: Virtual Machine
  • RPC: Remote Procedure Call
  • AMQP: Advanced Message Queuing Protocol
  • MQTT: Message Queuing Telemetry Transport
  • ESB: Enterprise Service Bus

--

--

Alex Alves
Alex Alves

Written by Alex Alves

Bachelor in Computer Science, MBA in Software Architecture and .NET Developer.

No responses yet