Load Balancer: A comprehensive guide

What is a load balancer?

It is a device or software that distributes network or application traffic across multiple servers. This distribution is designed to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. They enhance the overall performance of applications by ensuring that no single server bears too much demand. By spreading the work evenly, load balancers improve user experience and application reliability.
Load balancers operate at the transport level (OSI Layer 4 – TCP/UDP) and/or the application level (OSI Layer 7 – HTTP/HTTPS).

Commonly Used Load Balancers:

NGINX: Widely used as both a load balancer and web server, known for its high performance and flexibility.
HAProxy: A robust, high-performance software load balancer known for its efficiency in handling thousands of concurrent connections.
AWS Elastic Load Balancing (ELB): Provides auto-scaling and highly available network traffic distribution across Amazon Web Services (AWS) instances.
Microsoft Azure Load Balancer: Used for distributing traffic for Microsoft Azure services, supports both inbound and outbound connections.
Google Cloud Load Balancing: A fully distributed, software-defined managed service that balances the load of traffic among instances in the cloud.
F5 BIG-IP: A line of products, primarily hardware-based, that manage application traffic, ensure availability, and improve performance.
Citrix ADC: Known previously as NetScaler, it provides load balancing, as well as additional application delivery features, enhancing security and performance.

Difference between load balancers and API Gateways:

Load balancers	API Gateways
Involved in choosing the server. They primarily manage traffic at the server level.	Involved in choosing the services inside a server. They manage the traffic at the application or service level
Primary function is to ensure that no single server bears too much load. It is designed to distribute incoming network traffic across multiple servers to ensure reliability and optimal resource utilization.	Primary function is to Manage, process, and route API requests to the appropriate backend services. They serve as a single entry point for all API requests and handling a variety of functions including routing, security, and rate limiting
Can operate at both the transport layer (Layer 4 – TCP/UDP) and the application layer (Layer 7 – HTTP/HTTPS).	Operate at the application layer (Layer 7), dealing directly with HTTP/HTTPS requests and responses.
Use algorithms such as round robin, least connections, and IP hashing to distribute traffic evenly across servers.	Route API requests based on the API endpoint and method, and can direct traffic based on content, user, and session information.

Overlapping Functions of Load balancers and API Gateways

While the distinction is clear in their primary functions, there is some overlap in capabilities, especially in complex architectures:

Load Balancers with Layer 7 Capabilities: Modern load balancers that operate at Layer 7 (the application layer) can perform some tasks traditionally associated with API gateways, such as SSL termination and session persistence, and can make rudimentary decisions based on URL paths.
API Gateways with Load Balancing: Some API gateways come equipped with internal load balancing capabilities to distribute incoming API calls across multiple instances of a service, enhancing the gateway’s efficiency in handling requests.

When do we use a load balancer?

They are used in a variety of scenarios, primarily to ensure high availability, fault tolerance, and optimal resource utilization in networked environments. They play a crucial role in managing traffic across servers, ensuring that applications can handle large volumes of requests without degradation in performance. Here are some practical examples of when to use load balancers, along with considerations for choosing a specific implementation like NGINX, ELB (AWS Elastic Load Balancer), or others.

Practical Examples of Using Load Balancers

High Traffic Websites: For websites receiving large amounts of traffic, load balancers distribute requests across multiple servers to avoid overloading any single server and to ensure reliability and faster response times. For example, e-commerce platforms like Amazon use load balancers to distribute incoming user traffic during peak times, such as Black Friday or Cyber Monday.
Global Content Delivery: They are used in Content Delivery Networks (CDNs) to direct user requests to the nearest server geographically, thus minimizing latency and speeding up content delivery. Media companies streaming videos or games often use this approach to provide a seamless user experience.
Application Scaling: In cloud environments, they facilitate horizontal scaling—adding or removing servers based on demand. This is crucial for applications with variable load, ensuring they perform optimally during usage spikes without incurring unnecessary costs during quieter periods.
Failover and Disaster Recovery: They improve application reliability and availability by automatically rerouting traffic from failed servers to healthy ones, ensuring users experience no interruption in service. This is critical for mission-critical applications in sectors like finance and healthcare.
Microservices and API Gateway: In microservices architectures, They distribute traffic among different services based on the business logic or API routes. This setup enhances the efficiency and scalability of applications.

How do Load Balancers work?

1. Traffic Distribution:

When a client sends a request to a server, the load balancer intercepts this request before it reaches the server. It then decides to which server to send the request, based on various algorithms and availability checks.

They distribute traffic among a set of backend servers (also known as a server farm or server pool) using various algorithms and techniques. These algorithms determine how the load balancer selects a server for each incoming request in a way that optimizes the performance and reliability of the application or website. Here’s an overview of the primary methods used by load balancers to distribute traffic:

a. Round Robin

The simplest form of load balancing, where requests are distributed sequentially across the server pool. Once the end of the server list is reached, the load balancer starts again at the first server. This method works well when all servers are of equal specification and there are no persistent data requirements.

b. Least Connections

This method directs traffic to the server with the fewest active connections. It is more adaptive than round robin, as it considers the current load on each server. It’s particularly useful when there are significant differences in session duration or the amount of work each request demands.

c. Weighted Algorithms

Weighted Round Robin: Similar to round robin, but each server is assigned a weight based on criteria like processing capacity, and servers with higher weights receive more connections.
Weighted Least Connections: An enhancement of the least connections method where each server is also assigned a weight. Servers with higher weights (greater capacity) handle more connections than their lower-weight counterparts.

d. IP Hash

A hash key is created based on the IP address of the client, and this key determines which server will receive the request. This method ensures that a particular user (IP address) will consistently connect to the same server, which can be important for maintaining user session states.

e. Random

Requests are distributed randomly among the servers. This can be useful in environments where sessions do not need to be persistent, and the workload is uniformly intensive across all requests.

f. URI Hash

Similar to the IP hash, but the hash key is generated from the URI of the request. This ensures that requests for the same URI are consistently directed to the same server, potentially improving cache effectiveness on the servers.

g. Least Response Time

This method sends requests to the server that has the least amount of response time along with the fewest active connections. It combines two critical factors: the server’s current connection load and its recent performance metrics.

2. Health checks of servers

Health checks are critical functions of load balancers, ensuring that traffic is only routed to servers that are currently able to handle requests effectively. These checks allow the load balancer to detect any server that is not functioning correctly, either due to hardware failures, software crashes, or network issues, and to reroute traffic away from those servers until they are restored.

Types of Health Checks

Load balancers can perform several types of health checks, each providing a different level of insight into the server’s status:

a. Ping Check

The most basic form of health check involves sending an ICMP echo request (ping) to the server to verify network connectivity. If the server responds to the ping, it is considered available for traffic.

b. TCP Check:

A more advanced check that establishes a TCP connection with the server on a specified port. If the connection is successfully established, the server is considered healthy. This type of check is useful for verifying that a server’s network stack and specific application port are operational.

c. HTTP Check

Involves sending an HTTP request to the server and checking the HTTP status code returned in the response. Commonly, load balancers look for a 200 OK response to determine that a web server or application is functioning correctly. HTTP checks can also validate the content of the response to ensure that not only is the server up, but it is also returning the correct data.

d. HTTPS Check

Similar to HTTP checks but over a secure connection. This type of check also involves SSL/TLS negotiation, which verifies that the server’s SSL/TLS stack is operational.

e. Script or Program Check

Some load balancers allow the execution of custom scripts or programs to perform health checks. These scripts can check more complex conditions before deciding if the server is healthy. For example, a script might query a database via the server to ensure the database interaction is functioning correctly.

Frequency and Timeout of Health Checks

Frequency: Health checks are performed at regular intervals, which can be configured based on the criticality of the application and the typical response time of the servers. Common intervals are every 5 to 30 seconds.
Timeout: If a server does not respond within a specified timeout period (e.g., 5 seconds), it is marked as unhealthy. The load balancer stops sending new requests to that server until it passes a health check.

Handling Failing Health Checks

When a server fails a health check, the load balancer will typically take the following steps:

Stop Routing Traffic: The server is marked as down, and no new traffic is sent to it.
Retries: The load balancer may continue to check the server at the defined frequency to see if it comes back online.
Notification: Optionally, the load balancer can notify system administrators or integrated monitoring systems that a server is down.
Recovery: Once the server starts passing health checks again, it can be brought back into the pool of active servers to handle traffic.

Health checks are vital for maintaining the reliability and performance of applications served by multiple servers. By ensuring that all servers in the load balance pool are functional, load balancers can provide seamless, uninterrupted service to end-users.

3. Session persistence

Session persistence, also known as sticky sessions, is a critical feature of load balancers that ensures a user’s session remains on the same server during multiple requests. This feature is essential for applications where session data is stored locally on the server, such as in applications that maintain a user’s logged-in state or shopping cart information.

Here’s how load balancers implement session persistence

This is one of the most common methods for achieving session persistence. The load balancer inserts a cookie in the client’s browser on the first request to a server. This cookie typically contains an identifier unique to the server that handled the request. On subsequent requests, the client sends this cookie back to the load balancer, which reads the cookie and directs the request to the appropriate server based on the identifier.

b. IP Hashing

In this method, the load balancer uses a hash function on the client’s IP address to determine which server will handle the request. As long as the client’s IP address remains the same, the hash function will consistently direct the requests to the same server. This method does not require any modification on the client side but can be less reliable in environments where IP addresses may change frequently (like mobile networks).

c. Parameter-Based Persistence

With parameter-based persistence, the load balancer uses a parameter in the request to determine the server assignment. This could be a session ID embedded in the URL, a hidden form field, or any other parameter that the load balancer can recognize and use to map the session to a specific server.

d. Custom Headers

For applications using modern web APIs, especially with AJAX or RESTful services, the load balancer can use custom headers inserted by the client or the application to maintain session persistence. The headers can include session-related information that the load balancer uses to route requests to the correct server.

e. SSL Session IDs

For encrypted connections, load balancers can use SSL session IDs to maintain persistence. The SSL session ID is a unique identifier that is part of the SSL handshake between the client and the server. The load balancer can cache the SSL session ID along with the corresponding server identifier to route subsequent requests in the same SSL session to the same server.

Challenges and Considerations

Load Distribution: Using session persistence can sometimes lead to uneven load distribution among servers, especially if a few sessions are particularly resource-intensive. Load balancers need to be configured to handle such scenarios, possibly by setting limits on session times or the number of requests per session.
Failover: In the event of a server failure, the session data can be lost, leading to a degraded user experience. Some advanced load balancing solutions mitigate this by replicating session data across servers or by failing over to a backup server that also has access to the session data.

Session persistence techniques ensure that the user experience is seamless and consistent, particularly in complex web applications where a user might perform a series of interconnected requests based on previous interactions. This capability is crucial for maintaining the integrity of user sessions and providing a stable and reliable user experience.

4. SSL Termination

SSL termination is a process performed by load balancers to offload the encryption and decryption tasks from backend web servers. This technique is particularly beneficial in environments where secure data transmission is necessary, but where the performance impact on servers needs to be minimized. Here’s how SSL termination works on a load balancer:

Step-by-Step Process of SSL Termination

Client to Load Balancer Connection

The client initiates a secure connection with the load balancer by sending an SSL/TLS request.
This request includes what’s called an “SSL handshake,” which involves the exchange of cryptographic parameters between the client and the load balancer.

Decrypting the Data

The load balancer is configured with the necessary SSL certificates to decrypt the data received from the client. During the SSL handshake, the load balancer presents its SSL certificate to establish its identity to the client.
Once the client verifies the certificate, it encrypts data using the load balancer’s public key. The load balancer then uses its private key to decrypt this data.

Load Balancer to Server Connection

After decrypting the incoming data, the load balancer sends it to one of the backend servers. This internal transfer between the load balancer and the servers is typically done over an unencrypted connection because it is within a secure, controlled network environment. However, depending on security requirements, this communication can also be encrypted (SSL bridging).

Processing and Response

The server processes the decrypted request and sends the response back to the load balancer. If SSL bridging is used, this response might be encrypted and would need to be decrypted again by the load balancer.
The load balancer then encrypts the server’s response using the SSL session established with the client and sends it back to the client.

Advantages of SSL Termination

Improved Performance: By handling the resource-intensive tasks of encrypting and decrypting SSL/TLS traffic at the load balancer, backend servers are relieved of these duties. This allows them to serve more requests and perform their primary functions more efficiently.
Simplified SSL Management: Managing SSL certificates and keys is centralized at the load balancer, making it easier to perform updates and audits without having to configure each backend server separately.
Enhanced Flexibility: Load balancers can inspect the unencrypted content to make intelligent load-balancing decisions based on request details such as headers, cookies, or the requested URL path.

Security Considerations

Internal Network Security: Although the connection between the load balancer and the backend servers is usually unencrypted, it is critical to secure the internal network to prevent unauthorized access and potential data breaches.
Compliance: For environments where data protection is regulated (e.g., PCI DSS for payment data), it is important to ensure that any unencrypted communication meets compliance requirements.

SSL termination is a widely adopted strategy used in load balancing to optimize resource usage and improve the overall efficiency of data handling in secure network environments. However, it must be implemented with a clear understanding of the network security requirements and compliance obligations associated with handling sensitive data.

See more in HLD

Table of Contents