Opening a web page almost seems like magic—typing in a single Uniform Resource Locator (URL) results in images, videos, text, and other graphic elements combined into a page within a web browser. The magic under the hood includes
• The Hypertext Markup Language (HTML) and Cascading Style Sheets (CSS). These elements describe the content and styling of a web page. HTML provides most of the content in modern websites, while CSS provides most of the styling. The web browser renders this information into web pages. A group of web pages is often called a website.
• Images, videos, and other files.
• Apps written in JavaScript and other languages run within the web browser.
While users tend to think of a web browser as an application, it is a virtual machine—able to download information, resize and render images, and run applications.
The Hypertext Transfer Protocol (HTTP) carries all the different kinds of information needed to build a web page.
Note
HTTPs adds encryption to HTTP. This book uses
HTTP and HTTPs interchangeably; readers should assume all HTTP sessions are encrypted and, therefore, HTTPs.
For instance, if a web browser encounters this snippet of HTML in a file:
it will print out “Hello, World!” in the current position on the web page. If it encounters
<a href=“https://orange.example.com/image-01.png”
the browser will print out “Download image 1” and create a link from Download image 1 to the file image-01.png at the location shown in the href tag. If the browser encounters
<a href=“https://banana.example.com/page2.html”>G
it will print out “Go to page 2” and create a link from Download image 1 to the file indicated within the href tag. If the user clicks on a link, the browser must resolve the name (using the Domain Name Service or DNS), open a connection with the server, and download the file. If the browser encounters
<img src=“image-02.png” alt=“The second image” wi
the browser will fetch the file image-02.png from the server where it fetched the HTML file and render the image on the screen with a width and height of 500 pixels. If the screen is less than 500 pixels wide, the browser must resample the image, creating a smaller version to display on the user’s screen.
HTML relies on HTTP to transport all this data; HTTP runs over TCP or QUIC.
TCP and QUIC are network transport services. Why build another protocol on top of these to carry HTML packets specifically? Because HTTP implements a client/server architecture on top of either TCP or QUIC. The web browser acts as a client to the web server, requesting and pushing information based on the user’s actions. The web browser controls the flow of information across the network.
The three most important HTTP requests are
• GET, which requests data of some kind from the server.
• HEAD, which requests information about data. Every data in
HTML has a description, such as the length of an audio file, the resolution of a video file, the size of an image, or the language of a piece of text. A HEAD request allows the browser to determine whether this is the correct information (is this in the right language?) and to preplan a page layout.
• POST, which pushes information to the server. For instance, when a user selects the Submit button on a website, the browser uses an HTTP POST to send any information in a form to the server for processing.
HTTP adds application-specific capabilities to TCP’s or QUIC’s transport service.
File Transfer Protocol
Transferring data organized into files is a big part of what the Internet does. While many protocols, like HTTP, include file transfer to support a specific application, users often want to transfer a file outside any other application’s context.
The File Transfer Protocol (FTP ) is designed to transfer large files between hosts. The FTP protocol is often implemented as a separate program or daemon called FTP.
Figure 15-8 illustrates FTP’s connection process.
Figure 15-8 FTP Connection Operation
FTP is unique because it builds two different connections between the client and server:
• FTP sends and receives commands over the control session.
• FTP transfers data over the data session.
Using a separate control channel allows FTP to terminate a file transfer immediately, rather than waiting for a “terminate” signal embedded in a large file transfer to reach the server.
Figure 15-8 illustrates an active FTP session:
1. The client opens a control session using TCP with the server.
2. In the initial opening messages, the client tells the server which IP address and port number the server should use to open a data session with the client.
3. The server opens a second TCP session using the IP address and port number indicated by the client.
4. The client requests a listing of the files in the local directory using the LIST command. The client can send commands to change the server’s directory (or folder), etc.
5. The server returns a text listing of the current directory.
6. The client asks the server to send a copy of the example.png file using the RETR (retrieve) command.
7. The server sends a copy of the file over the data session.
8. The server notifies the client it has finished transferring the file using the control session.
FTP can also operate in passive mode, which means the client opens both the control and data sessions. In this case, step 2 in Figure 15-8 is a message from the server to the client, telling the client which IP and port numbers to use to open a data session.
The arrow on the line in step 3 of Figure 15-8 is reversed in a passive connection; the client opens the data session instead of the server.
Clients use passive FTP behind a Network Address Translator (NAT). Because the client opens both sessions, the server does not need to know about the address translator modifying the source and destination addresses on packets transmitted from the client to the server.
FTP is no longer widely used because the protocol is not private; neither the control nor data sessions are encrypted to protect transferred information. FTP was extended to create FTPS, allowing clients to request a Transport Layer Security (TLS) encrypted data session. Encrypting the data session, however, protects only the transferred data—not the information, like passwords, transmitted across the control session.
Secure File Transfer Protocol (SFTP ) has primarily replaced FTP.
SFTP is similar to SSH, but includes FTP-like file transfer capabilities.
Two other file transfer protocols worth knowing are
• Trivial File Transfer Protocol (TFTP), which does not support authentication or encryption, and does not detect or correct errors. TFTP is sometimes used to load a network device’s configuration from a remote server.
• Secure Copy Protocol (SCP), which uses the same methods as SFTP to securely transfer files between two hosts. SCP is often considered more efficient for file transfers between hosts connected to the same network, while SFTP is considered more efficient for transferring files between hosts connected far apart (in network terms).