Michael Rumsewicz
In this paper we discuss the imperatives for Web server solutions that are efficient, scalable and reliable. We provide an initial, high level, set of requirements that such Web servers need to fulfil in order to successfully support commercial Web applications.
We introduce Eddie, an Ericsson sponsored Open Source effort making multi-
platform, commercial grade Web server systems a reality.
Eddie is a flexible web server infrastructure that enables
2 Requirements for a commercial grade web server
1 Introduction
The usage of the Internet and the World Wide Web is increasing at a tremendous rate and the Web
is providing a rapidly growing number of commercial services. Such applications ranging from a
simple information retrieval through to e-commerce.
However, the reliability of current offerings is typically much less than is required for mission
critical applications. It is critical to improve the performance of the Web to make it as reliable as
the telephone and hence a suitable medium for high volume business critical applications.
Telecommunications networks have a set of well-established goals during periods of congestion and partial system failure:
Unfortunately, the Web has not reached a level of maturity capable of achieving these goals. In fact, it is not clear whether comparable, generally accepted goals even exist for the Web in general, and Web servers in particular.
While the use of Web servers is growing tremendously, their reliability has not improved at the same rate. During periods of overload Web servers tend to allow requests from new users even though this degrades the Quality of Service perceived by users already accessing the system. There are numerous examples of overload resulting in severe service disruption and financial loss.
A number of companies are attempting to address the issue of improving Web server capabilities, especially in distributed environments. These companies range from recent start ups, such as Resonate (www.resonate.com), Arrowpoint (www.arrowpoint.com), and Coyote Point (www.coyotepoint.com) to major multinationals such as Hewlett Packard (www.hp.com), Cisco (www.cisco.com) and IBM (www.ibm.com).
The solutions proposed by these companies span the range from pure software to pure hardware
solutions. These solutions suffer from one or all the following shortcomings:
In this paper:
The Eddie team mission is to provide the tools which allow the construction of mission critical
internet sites providing a continual high level of service focused upon the customer, attuned to the
needs of the service provider.
Eddie provides:
To determine the requirements of a commercial grade web server, let's
look at a simple example
of user interaction with the web server of a corporation with servers spread across internationally
distributed sites (see Figure 1).
From the user point of view, the world primarily consists of
Users are typically unaware of the actual physical architecture of the web servers they are
accessing. In the following we shall refer to multisite web servers as
distributed web servers.
This is to emphasise the notion that each site should be considered an integral component of a
single server as perceived by the user.
Being able to access the web server actually requires a number of steps to be completed.
In our example, the distributed web server consists of two geographically separated sites (see Figure 2). Site 1
consists of a Domain Name Service (DNS) server and four servers running web server software.
Site 2 consists of a DNS server and three servers running web server software.
A typical session between a user and a web server goes through the following steps:
2. The Local DNS, unable to resolve the domain name, forwards the request to the Authoritative
DNS of the domain name.
3.
The Authoritative DNS returns the IP address of the server that should be accessed.
4. The user begins accessing web pages on the DNS specified server.
As mentioned earlier, users accessing the distributed web server are unaware of the server
particular configuration. The user will expect that on gaining access to the site, they will receive:
From the service provider perspective, the expectation is:
Eddie is:
Eddie consists of two main software packages:
Figure 3
shows how our example corporation might deploy Eddie.
At each site, two new servers are installed with the Eddie Intelligent HTTP Gateway package. We
will refer to these as
Front End Servers
in the following discussion. These servers are
responsible for controlling incoming traffic and distributing this traffic to designated web servers,
which we refer to as
Back End Servers
in the following sections.
At each site, the existing DNS server software is replaced by the Eddie Enhanced DNS server
package.
Detailed server load information is passed by each Back End Server to each Front End Server at
a site. This information allows the Front Ends to effectively balance incoming requests over all of
the Back Ends.
The Front End Servers take the load information and continually adjust the fraction of accepted
new client requests sent to each Back End Server (See Figure 4).
This load balancing is performed to keep each Back End Server working at approximately the
same fraction of its overall capacity. By avoiding static load balancing schemes, such as Round
Robin, we can efficiently use the full resources of all Back End Servers within the site.
This ensures that during periods of high load no capacity is wasted by having underutilised
servers. In other words, the throughput of the site is maximised as a result of our load balancing
scheme.
The Eddie approach to load balancing has been designed to make the distributed web server as
easy to manage and as future proof as possible to protect the service provider investment.
This is why we have ensured that the servers do
not need to know about server
It is also why Eddie does not require servers to:
The capacity of each Back End Server is automatically learned in real-time and adjusts as the mix
of users requests changes. Therefore, change in user traffic profiles patterns are detected and
admission control and balancing adjusted without manual intervention.
A website that can't
be scaled is a nightmare to maintain, whether it be for forever replacing
equipment, juggling configurations or wasting time managing the network rather than building the
business.
Whenever more capacity is required, a new Back End or Front End Server can simply be added to
a site (see Figure 5). There is no requirement that the new server be the same as existing servers at the site in
terms of speed or operating system. After modifying the Eddie configuration files, the capacity of
the new server is immediately available.
This capacity is available not only to that site, but to the entire distributed server, even if it is
distributed internationally, through the use of the Eddie Enhanced DNS Server package.
Eddie therefore provides a natural growth path for service providers while allowing them to
maintain user quality of service at each stage of expansion.
Web servers support a range of different functions, including
Tuning the performance of specific servers to perform such functions can obviously help to
increase the overall capacity of a distributed web server.
Being able to place different functions or information on different sets of servers allows a web
server administrator to physically isolate groups of functions and blocks of information from each
other. When the new functions have been proven in, they can be moved to join other functions on
other servers. This helps to minimise the risk of new functions having a deleterious impact on an
existing system.
The Eddie Intelligent HTTP Gateway package allows just such specialisation of function and
information on Back End Servers.
The Front End Servers parse all incoming user requests. This is done for two reasons:
As illustrated in
Figure 6
the web server administrator can, for example, dedicate certain Back
End servers to be CGI processing engines, while other machines may be dedicated to act as image
repositories. The various machines may then be tuned to optimise their performance for such
tasks. The Front End Servers take care of ensuring HTTP 1.0 and HTTP 1.1 features are properly
carried out. The allocation of functionality and information to particular Back Ends is specified by
the administrator in the Eddie configuration files.
If a user receives the first page they request, they should be able to receive every page they request
from the server with rapid response time, until they have finished. If the server cannot guarantee
rapid response time, the user should either be queued until sufficient resources are available and
told they will be admitted as soon as possible, or rejected and told to return later.
Eddie contains built-in real-time load monitoring routines to track the usage of each Front End and
Back End Server. Load information that can be passed includes CPU load, memory usage, disk
delays, page faults and run queue statistics. The web server administrator sets thresholds on the
usage of critical resources, which are then used by an Admission Control
function to decide
bwhether or not a particular server is overloaded. The load information of the Back End Servers is
used by the Front Ends to estimate the rate at which new users can be sent to Back Ends and
receive rapid response.
When a Front End Server receives a user request, it checks to see if the user has recently been
granted access to the site (see Figure 7). If so, the request is passed directly to the Back End Server that served
the previous request from this user. In this way, any required state information may be reliably
maintained for the user.
If the user has not been seen recently, the Front End Server decides whether sufficient Back End
Server resources are available.
If sufficient resources exist, the request is forwarded to a Back End Server and the Front End
Server creates a table entry noting the time of the request. All data passing between the user and
the Back End Server passes via this Front End Server which updates the timing information. This
is used to create a
soft
session for the user. If the user ceases to interact with the server for a
bconfigurable period, say 10 minutes, the session is closed. Subsequent requests from the user are
subjected to admission control.
If there are insufficient resources to immediately serve the customer, the request is queued by the
Front End Server and a web page is returned informing them that they will be admitted shortly.
This page is automatically updated, providing feedback to the user on the state of their request. As
a side benefit, this discourages them from continually clicking on the same URL and wasting
server resources. The site therefore has more capacity available for processing successful, and on
hopefully satisfied, users.
Whether we like it or not, computers occasionally fail and the applications running on them may of
die, lock up or otherwise malfunction.
The Intelligent HTTP Gateway package comes with built-in mechanisms for minimising the
impact of failed servers.
The Front End Servers within a site monitor the operational status of every Back End Server. If a
Back End Server fails, the failure is detected and the Front End Servers immediately re-direct user .
requests to other Back End Servers running the same application.
Failure of a Front End Server is also automatically detected and the IP address migration capability
of the Intelligent HTTP gateway package ensures that another Front End Server within the site
immediately takes over the IP address of the failed unit. When the failed unit is repaired and
brought back into service, its original IP address is automatically migrated back to it.
Thus, Eddie provides seamless continuity of service in the event of failures.
For server sites distributed over a city, a country, or even the world, Eddie's Enhanced DNS
enables maximum throughput to be achieved by making the entire processing capacity of all
servers available. The individual sites may have different numbers of servers, different vintages
of computer, even different operating systems.
The load information passed to each Front End Server within a site is processed and then summary
information of the Front End and Back End Servers in each site is passed to each Eddie Enhanced
DNS server, across the Internet if necessary (Figure 9).
The load information also includes information on the active / failure state of each Front End
server and so Eddie is able to route traffic away from failed servers.
Each Enhanced DNS then autonomously processes the received information.
There are also times when no information may be received by an Enhanced DNS from a site for a
period. For example, if the link from the Internet to the site fails, or if a failure occurs within the
Internet effectively isolating some sites from some Enhanced DNSs. In such situations, each
Enhanced DNS independently infers the availability of the site. Traffic is immediately routed
away from failed sites considered failed or inaccessible. When information is again received from
a site, the site is brought back into use by the Enhanced DNS.
The Enhanced DNS package dynamically balances client domain name resolution requests across
all accessible sites in the distributed web server, and more specifically, across each Front End
within each site. The balancing is performed to keep each site working at approximately the same
fraction of its overall capacity. This ensures maximum efficiency of Front End Server resources
within each site.
To further ensure that requests are effectively distributed across all sites, a short Time To Live is
applied to all domain name resolutions returned. This means that a user accessing the distributed
server on different days uses only the most up to date information.
We have demonstrated that Eddie delivers:
And, by the way, it's free.
[1] J. Armstrong, R. Virding, C. Wikstrom,
Concurrent Programming in ERLANG, Second
Edition, Prentice Hall, 1996.
[2] M. Castro, M. Dwyer, and M. Rumsewicz, A load balancing and control algorithm for a
distributed World Wide Web server, to appear in
Proceedings of 1999 IEEE International
Conference on Control Applications, Hawaii, August 1999.
[3] Australian provisional patent application, no. PP3082,
Access Control Method and
Apparatus.
2 Requirements for a commercial grade web server
3 The Eddie Solution
Eddie is an Ericsson sponsored Open Source effort aimed at delivering a solution satisfying all of
these user and service provider requirements.
3.1 Intelligent HTTP Gateway
The functionality within the Intelligent HTTP Gateway package provides
3.1.1 Load Balancing
Eddie provides throughput maximisation within each site by using sophisticated load balancing
functionality.
3.1.2 Scalability
Eddie provides scalability by allowing new servers to be simply added to an existing
configuration.
3.1.3 Performance Optimisation and Service Protection
Eddie provides the capability to exploit performance optimisations in Back End Servers and
protect an existing system from potentially degraded performance due to deployment of new
functionality.
3.1.4 Quality of Service
Eddie provides user Quality of Service by providing advanced admission control functionality.
We adopt a simple philosophy to user Quality of Service for Web servers:
3.1.5 Reliability
Eddie provides reliability by automatic detection of server failure and a combination of traffic
rerouting and IP address migration (see Figure 8).
3.2 The Enhanced DNS Server
The functionality within the Enhanced DNS Server package provides:
3.2.1 Load Balancing Across Geographically Distributed Sites
Eddie provides throughput maximisation across the entire distributed web server by using
sophisticated network load balancing functionality.
4 Summary
The Eddie team mission is to provide the tools which allow the construction of mission critical
internet sites providing a continual high level of service focused upon the customer, attuned to the
needs of the service provider.
More information
For more information on the Eddie Open Source effort and to download the Eddie code, visit the
Eddie Web site at www.eddieware.org.
References