Table of Contents
HTTP
- HTTP stands for Hypertext Transfer Protocol.
- Network protocol used to deliver virtually all files and other data (collectively called resources) on the World Wide Web, whether they’re HTML files, image files, query results…etc..
- A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client.
- The standard (and default) port for HTTP servers to listen on is 80.
Resources:
HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL (it’s the R in URL). The most common kind of resource is a file, but a resource may also be a dynamically- generated query result, the output of a CGI script, a document that is available in several languages, or something else.
HTTP vs. HTTPS
HTTP, or hypertext transfer protocol, is the way a Web server communicates with browsers like Internet Explorer® and Mozilla Firefox®. HTTP lets visitors view a site and send information back to the Web server.
HTTPS, hypertext transfer protocol secure, is HTTP through a secured connection. Communications through an HTTPS server are encrypted by a secure certificate known as an SSL. The encryption prevents third-parties from eavesdropping on communications to and from the server.
Note : Only servers that have their own SSL can create HTTPS connections. A site’s visitor cannot encrypt the connection.
Structure of HTTP transactions
- Client Server model
- An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message.
- Once the response is delivered, server closes the connection making HTTP a stateless protocol
- Format of the request and response messages are similar:
- an initial line,
- zero or more header lines,
- a blank line (i.e. a CRLF by itself), and
- an optional message body (e.g. a file, or query data, or query output).
Initial Request Line:
Request Line has three parts, separated by spaces
Method name, the local path of the requested resource, and the version of
HTTP being used
Example:
GET /path/to/file/index.html HTTP/1.0
Initial Response Line (Status Line):
- Has three parts separated by spaces
- The HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code
example:
HTTP/1.0 200 OK
The status code is a three-digit integer, and the first digit identifies the general category of response:
1xx indicates an informational message only
2xx indicates success of some kind
3xx redirects the client to another URL
4xx indicates an error on the client’s part
5xx indicates an error on the server’s part
Header Line:
- Header lines provide information about the request or response, or about the object sent in the message body.
The Message Body:
- An HTTP message may have a body of data sent after the header lines.
Sample HTTP Exchange
To retrieve the file at the URL
http://www.somehost.com/path/file.html
first open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket:
GET /path/file.html HTTP/1.0
From: someuser@jmarshall.com
User-Agent: HTTPTool/1.0
[blank line here]
The server should respond with something like the following, sent back through the same socket:
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html>
<body> <h1>Happy New Millennium!</h1> (more file contents) . . . </body> </html>
After sending the response, the server closes the socket.
HEAD and POST methods
Head method:
- Similar to GET request, except it asks the server to return the response headers only, and not the actual resource (i.e. no message body)
- Useful to check characteristics of a resource without actually downloading it, thus saving bandwidth
- The response to a HEAD request must never contain a message body, just the status line and headers
POST method:
- There’s a block of data sent with the request, in the message body. There are usually extra headers to describe this message body, like Content-Type: and Content-Length:.
- The request URI is not a resource to retrieve; it’s usually a program to handle the data you’re sending.
- The HTTP response is normally program output, not a static file
- The most common use of POST, by far, is to submit HTML form data to CGI scripts.
Example:
POST /path/script.cgi HTTP/1.0
From: frog@jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
HTTP proxies
- Program that acts as a intermediary between client and server
- Receives requests from the client and forwards those request to the intended servers. Responses also come in the same way.
- Proxies used commonly in the firewalls, for LAN wide caches
Example of using proxy:
GET http://www.somehost.com/path/file.html HTTP/1.0
HTTPS
- Hypertext Transfer Protocol over Secure Socket Layer
- URI scheme used to indicate a secure HTTP connection
- Using an https: URL indicates that HTTP is to be used, but with a different default TCP port (443) and an additional encryption/authentication layer between the HTTP and TCP
- For accepting https connections the administrator must create a public key certificate for the web-server and which are signed by the certificate authority.
- The system can also be used for client authentication, in order to restrict access to a Web server to only authorized users.
- Certificates issued normally contain the name and e-mail address of the authorized user, and are automatically checked by the server on each reconnect to verify the user’s identity, potentially without ever entering a password.
https Limitation
- https only protects data in transit from eavesdropping and man-in-the-middle attacks. Once data arrives at its destination, it is only as safe as the computer it is on
- https is insecure when applied on publicly available static content. The entire site can be indexed using a web crawler and the URI of the encrypted resource can be inferred by knowing only the intercepted request/response size. This allows an attacker to have access to the plaintext (the publicly available static content), and the encrypted text (the encrypted version of the static content).
- SSL operates below http and has no knowledge of higher level protocols, SSL servers can only strictly present one certificate for a particular IP/port combination. This means that in most cases it is not feasible to use name-based virtual hosting with https.