Overview of HTTP
When you use your browser to look at a web page, there’s quite a bit happening under the hood in order to make the page appear. All the files needed to display the page must be transferred from the web server to your computer. These files include the HTML page itself, and all the images (and other media) that are displayed. The means by which these files are transferred from the server to your computer is HTTP.
HTTP is the protocol of the web. It is the way in which web servers and web clients (usually browsers) communicate with one another. HTTP stands for hypertext transfer protocol. ‘Hypertext transfer’ means that hypertext files are transferred from web servers to their clients. Nowadays, HTTP allows for the transfer of many other types of files, but it started out as a way to browse the web by transferring HTML documents. The files that are transferred with HTTP are referred to as resources.
But what is a protocol? I like to use the analogy of the US Mail protocol. In order to successfully send a letter to someone, you must follow a protocol. You must put the message in an envelope, and the envelope must include the proper information, in the proper format. First comes the recipient’s name, then the address (which is a house number, street name, city, state abbreviation, and zip code). The return address is put on the upper left corner of the envelope. If you don’t follow this protocol, your message may not reach the recipient. In this case, hopefully the return address is properly formatted, and the letter comes back to you so that you can fix the errors and resend it.
In order to request a web page from a web server you must have the name of the web server AND the path to the resource within the web server. You could potentially replace the server name with an IP address (but that would lead us into a discussion about DNS resolution, so we’ll stick to using a name). The complete address for a resource is called a URL (Universal Resource Locator). Here’s an example:
There are three main parts to a URL:
- protocol (other protocols could be https:// or ftp://)
- domain name (aka ‘host name’ or ‘server name’)
- path (the path to the resource within the web server)
Note that the protocol and domain name point to the document root directory of the web server. This is the folder that contains the web pages (and other files) for a website . If you use a URL that does not include the path portion, the web server will respond by sending you the default document in its root directory, which is the site’s home page. The default document is traditionally named index.html. The important thing to note is that the third part of the url (the path) will be relative to the document root directory of the web server.
Requests and Responses
When you type a URL into the address bar of your browser, the browser creates and sends an HTTP request. The request is sent to the proper web server through the magic of DNS (which is beyond the scope of this article). The server then replies with a response that includes the file (or resource) specified in the request.
The request, like the US mail protocol, must follow a specific format. Here’s what a typical request looks like:
The first line of the request has three pieces of information. The first piece is the type of request, in this case GET (more about request types soon). The second piece is the path portion of the URL, and the last part specifies the version of HTTP that the request is using (in this case, version 1.1).
The second line of the request specifies the domain portion of the URL (recall that the domain name also known as the ‘host’ name). Notice the format of this line. It starts with Host, followed by a colon, and then the domain name. Lines that follow this format are known as headers. In this case the header name is ‘Host’ and the header value is ‘www.acme.com'. Headers can be used to send additional information about the request to the web server. For example, the third line is the User-Agent header, and it tells the web server information about the browser that is making the request. In this case the ‘user agent’ is Mozilla Firefox running on Windows. The fourth line of the request tells the web server that the browser is configured to use the English language. We won’t get into the last two headers in this request but note that there are many request headers that can be sent to a web server. Next, when we discuss the response that comes back from the web server, we’ll see that it can also contain headers.
When a web server receives a request from a client, it will look for the resource that matches the path in the request, and then it will send that resource (usually a file) to the client.
Here’s a simple HTTP response that a web server may send to a browser:
The first line of the response shows the version of HTTP being used, and then a status code. In this case 200 means that the web server was able to successfully respond to the request. ‘OK’ is the status message that goes with a code of 200. If the request had contained a path that did not exist on the web server, then the status code in the response would be 404, and the status message would be something like ‘page not found’. Here’s a link to all the HTTP status codes.
The next six lines in the response are headers that offer information about the web server and the resource that is being returned to the client. We saw earlier that a request can include headers that provide additional information about the request. You can see that headers can also be sent in the response. The Content-Length header indicates that the resource being sent back to the client is 88 kilobytes, and the Content-Type header indicates that the resource is an html file.
After the headers there is a blank line. The blank line separates the response headers from the body of the response. The body of the response is the resource that was requested, in this case an HTML file. The browser will read through the response body and display the HTML. Remember that a ‘resource’ is often a file (such as an .html, .pdf, or .jpg file). But it doesn’t have to be a file, it can be something like the data returned by running a query from a database.
In summary, clients (such as your browser) communicate with web servers by sending HTTP requests. Web servers send the requested data to the client in an HTTP response. HTTP has emerged as the dominant way for computers to communicate with one another. As you now know, every time a page is loaded in a browser, it’s done by using HTTP. But HTTP has also become the standard by which businesses share data. If you’ve ever heard the term API, then it most likely refers to a web service that uses HTTP to respond to requests for information.
In the next article, we’ll dig a little deeper into HTTP requests and responses.