How a web server works
Basically, the sequence of events goes like this:
- The server connects to the local TCP/IP stack.
- The server starts to ‘listen’ for connection requests on a specific port number, for example good old port 80, the default HTTP port (normally though, ports up to 1024 are reserved and you have to use something else, for example 8080).
- When a connection request arrives the server accepts it and receives the data.
- The server analyzes the request header and locates the appropriate resource requested by the client.
- The server sends the resource in a response to the client.
- The Server goes back to waiting for the next connection request (step2).
As far as the outside world is concerned that is what is happening. However internally things need to happen slightly differently in order to support the world.
Consider the situation where many people are accessing a web site at the same time. With the above processing scheme, anyone who tried to connect whilst the server was busy would be ignored. Obviously this is not a good thing. However what you can do (in our simple server) is to tell TCP/IP to queue connection requests until we can get to them. That way if a whole bunch of them arrive at the same time we won’t lose them.
This is good because what typically happens with browsers is that once a web pages HTML has been loaded and they start loading the resources referenced by the page, for example images, they initiate multiple concurrent connections to the server in order to load the resources more quickly. Obviously this is would not work if there was no way to queue the requests.
The second thing that typically happens in a web server is that once the connection request has been accepted, the main server process hands the request (and the socket) off to a sub process. This enables to main server process to go back to listening for connection requests as quickly as possible whilst allowing multiple client requests for resources to be handled concurrently. If your processing environment supports multiprocessing then you can implement this type of architecture. On z/OS you could even do it (albeit subject to some serious limitations) using batch jobs. One job runs some code to act as a ‘listener’ and when it receives a connection request, it accepts it and then submits a batch job that executes asynchronously to the main job, takes the socket and handles the request.
However as I have found, you do not HAVE to to offload the requests to an asynchronous process. There is nothing wrong with handling each request in turn. You just have to make sure that:
- You tell TCP/P to use a big enough connection queue so that you do not lose requests.
- You don’t take too long processing each request so that when multiple requests arrive simultaneously from a browser, the ones that get processed last do not time out.
What’s a big enough connection queue?
If you are just handling a single client instance (like my ISPF web front end) then usually you can limit the number of concurrent sessions the browser will start through the browser options. If you are trying to create a multi-user web server then you need to decide the maximum number of concurrent client you want to support.
How do you keep response time low?
On z/OS, REXX is a great development and prototyping language but by default, every time you call a new exec REXX checks to see if it has changed on the disk which can cause a lot of additional I/O. REXX also has to create a new REXX environment for the called exec and destroy it afterwords which leads to additional CPU usage. If you have a processing environment that can optimize REXX execution then traditional REXX programming can work well. However if programming in REXX in a normal TSO/ISPF environment (like I am for my ISPF web front end) then sometimes it is more efficient to embed code directly within an EXEC, even if this means duplicate code in multiple execs, rather than modularizing everything.