In this project you will create your own Web server. The server will implement a subset of the HTTP protocol only, but it will be functional enough to be used for simple browsing using your favorite Web browser.
The server must be written in C or C++.
When done, please use
~cs352/bin/turnin Project3to turn in your project. Make sure you include all relevant files.
HTTP stands for Hypertext Transfer Protocol. It is used for transferring most of the files over the Web. This includes text files, PDF documents, images in different formats etc. In principle the objects transferred (called resources) can be anything, including dynamically generated content that is produced on the fly by a program or script. In this project we will transfer existing files only.
The HTTP protocol is an example of client-server communication. At the highest level it works as follows:
An HTTP message (either request or response) is a text string consisting of a header and a body. The header has the following format:
INITIAL LINE HEADER1: VALUE1 HEADER2: VALUE2 ... HEADERn: VALUEn
The initial line consists of 3 words: the method word, a resource, and the protocol being used, which includes the protocol version. It is terminated by a CRLF. A CRLF is a carriage return character (1 byte, ASCII code 13 decimal), followed by a linefeed character (1 byte, ASCII code 10 decimal).
The remainder of the header contains information about the message.
Each additional line consists of a word referring to a property,
followed by a colon, followed by whitespace, followed by the value for
that property. Property values can contain spaces. An example
property of a request is the User-Agent property,
whose value includes the program name and version of the client making
the request. An example of a property of a response is the
Content-type property, which describes the type of the
content of the server response (e.g., text/plain,
image/png, video/mpeg etc). A
property that can occur in either the request or response is the
Content-length property whose value is the length
in bytes of the message body (if any).
An example request is
GET /A/B/C/file.html HTTP/1.1 Host: www.cs.iastate.edu User-Agent: mywebclient/1.0 [empty line here]In this example the client request consists of the
GET command, which is used to retrieve the
resource
http://www.cs.iastate.edu/A/B/C/file.html. This
corresponds to a file named file.html which
resides at path
A/B/C within the web server's file space. The
client also specifies that this request follows the version 1.1 HTTP
protocol specification. The remaining header lines contain additional
information about the request. In this project we will ignore all
request header lines sent by the client except the initial one.
GET is the only command that you will implement
for this project.
The request body is optional and is separated from the header by an empty line (just a CRLF with no text). In the previous example the body is empty.
After the request is received, the server sends back a response. The response follows the same template i.e., consists of a header and a body, with the header consisting an initial line, a number of lines in the same format as the request, and an empty line that acts as a terminator.
An example response is
HTTP/1.1 200 OK Date: Thu, 05 Apr 2007 05:09:59 GMT Server: Apache/2.0.40 (Red Hat Linux) Last-Modified: Thu, 26 Jan 2006 08:44:44 GMT Content-Length: 5806 Connection: close Content-Type: text/html; charset=ISO-8859-1 [empty line here] [body containing a 5806-byte HTML document]In this example the initial line indicates success. The first word on this line is the HTTP specification that this response will adhere to. The second word is a numerical response code meant to be easily parsable by the client. The remaining words are an informational message explaining the response meant to be human-readable, and may vary from server to server. Depending on their numeric value, response codes can be classified as follows:
The response codes we'll be using in this project for our server are:
200 OK
The request succeeded. If it was a GET,
the requested resource is returned in the response body.
400 Bad Request
The request was malformed in some way. For example, the initial line does not contain 3 words, or the 3rd word is not of the format HTTP/x.x.
403 ForbiddenThe requested file was found but cannot be returned due to insufficient access credentials. In your server this can happen if the client requests a file that is unreadable by the server (due to the file permissions for example).
404 Not Found
The requested file was not found.
500 Internal Server Error
There was an unexpected error with the internal operation of the server. This can happen for example if your server cannot allocate enough memory to satisfy the request or a thread could not be created to service the request (in a multithreaded server).
501 Not Implemented
The method in the request is not implemented. You should
return this for any HTTP command other than
GET.
The remaining lines in the example response above contain information about the message and server.
To connect to a web server, both the client and the server must create a socket. In this project you will have to familiarize yourselves with socket programming. Sockets are software constructs that allow two processes to communicate, either on the same machine or across the Internet. They were described in class; here are the notes we used in PDF format.
You may use the following C++ code as a starting point for Project 3. It shows how to establish a TCP/IP connection between two processes (a client and server). Once the connection is established, data is sent across the connection by writing to and reading from file descriptors.
The system calls that you will use for this project are the following. Note that you might not need to use all of them, depending on your implementation. Also note that the list not exclusive i.e., you may use other system calls as well if you like.
socket()bind()listen()accept()connect()read()write()close()fdopen()getsockname()gethostname()gethostbyaddr()ntohs(), htons()localtime()inet_ntoa()The server that you'll implement for this project has a simple structure that can be summarized as follows: Continuously wait for an incoming connection, accept it (if the client is not listed in the "forbidden" list, see below), service it (if possible), disconnect, and go back to waiting for another connection. In more detail the steps of your server are as follows, together with relevant system calls that you can use to implement them:
socket()).bind()).listen()). For a single-threaded
server a queue of size 1 is enough. If you choose to implement
the optional multi-threaded server you may use a limit of 5.accept() system call). This creates a new socket
that can be used for communication with the client.close()).read()).write()).close() to delete the new
socket).
Note: The accept() system call gives you new a
file descriptor to communicate with the client. File descriptor I/O
is generally considered low-level because one uses read()
and write() that lack the advanced formatting facilities
of fprintf() or input facilities of
fscanf(). To use fprintf() or
fscanf() one must obtain a FILE pointer (the
first argument of fprintf() is a "FILE *".
This conversion can be easily accomplished through the
fdopen() call. For details on how to use it see the man
page of fdopen().
5 points: makefile
Create a functional makefile. Name your Web server executable
webserv and make sure that typing
make will build it.
Create a file named README that describes the
functionality of each of your source files and each function within
them. You may use any number of source files and/or functions but you
must describe them in the
README file in sufficient detail so that a
technical person (i.e., another programmer) can understand your code.
40 points: GET command
Implement the GET command of the HTTP protocol.
This includes accepting an incoming connection, reading the request
line and the entire header (up to the first empty line), returning the
file requested in the request line, and closing the connection. If
the file does not exist, you must return the appropriate error message
as described above. If it exists but cannot be opened for reading you
must return again a "forbidden" response as described above. You can
determine what is the cause of an error of an
open() call by examining the system-defined
variable errno; see "man
errno" and "man 2 open".
When responding to a request, your server should return at least the following header lines:
Server:
webserv/1.0".Connection: close".
The web server executable should take two arguments: (1) A
directory name under which the web servers files and
sub-directories are stored. All file names requested using the
GET command are relative to this directory. (2)
The port that the web server is listening to. Note that ports
1024 and under are privileged and cannot be used (you'll get a
"permission denied" message if you try to use one of them).
20 points: Connection logging
Log each incoming connection in a log file named
webserv.log. Each connection entry should occupy
one line, and must be in the format of the following example:
Connection from host 129.186.67.3 (pyrite-m.cs.iastate.edu), port 47282 on Sun, 01 Apr 2007 01:24:18 PM, file "index.html", status 200 OKYou can see an example of how to produce the current date and time in a string in this example C program.
25 points: Reject forbidden addresses
Read a list of addresses from file
forbidden.txt in the server's directory that
contains human-readable addresses of hosts that would be rejected.
The format of the file is one forbidden address per line. Any request
from these hosts, or any connection from a host that does not resolve
to a human-readable address (and therefore cannot be checked against
the forbidden list) should be rejected. This can be done by comparing
each of the forbidden addresses to the human-readable host name of
each incoming connection request. For an example of how to obtain the
human-readable Internet address of an incoming connection see the tcp-server.cc program.
30 points: Multiple concurrent connections and multithreaded server
Implement a multithreaded server that can service multiple connections simultaneously. Of course, this requires the use of appropriate mechanisms for concurrent access to shared data e.g., the connection log (remember Project 2?).
The general structure of the main thread will now be:
accept().accept().
If you choose to implement this extra feature, make sure you include a
SEPARATE EMPTY FILE called EXTRACREDIT in the same
directory as your code. The TA MAY NOT go though your source files
to try to understand if you did implement the extra feature, so it is
very important that you indicate this fact by the existence of an
EXTRACREDIT file.
To test your web server you can use a standard browser e.g., Firefox.
You should be able to load a regular page containing text and images.
To debug the server the browser alone is probably insufficient as you
cannot see the actual messages between the client and the server. A
better way for debugging is to connect to the server using the
telnet program. Telnet takes
an address and optionally the port on that host to connect to. Here's
an example exchange with a web server using
telnet.
You can use pyrite to develop and compile your
server. However, pyrite is firewalled and
disallows connections to arbitrary ports from any host other than
itself. A better choice for testing your server are the lab machines
lin141a through lin141t which
are not firewalled among themselves so you can connect to any
listening port between any two of them. Note that these machines are
accessible remotely only by logging in to
pyrite first and using ssh to
connect to them.
A simple and easy-to-read description of the HTTP protocol:
If you would like to read all the gory details of the HTTP protocol, here are the relevant RFCs (Requests For Comments):
Here are some more thorough on-line references on sockets: