HTTP: Difference between revisions

Latest revision as of 14:35, 18 November 2021

Basics[edit]

HTTP (hypertext transport protocol; die aktuelle Version ist 1.1) is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. It is a generic stateless object-oriented protocol, which may be used for many similar tasks such as name servers, and distributed object-oriented systems, by extending the commands, or "methods", used. A feature if HTTP is the negotiation of data representation, allowing systems to be built independently of the development of new advanced representations. The protocol does not attempt to define what types of resources are transferred. The data may be text, sound, full-motion video, even applications to be executed on the client machine.

The HTTP protocol allows for only a single resource to be transferred during a connection. This means that if a hypertext page has embedded references to other resources (such as images), the client must retrieve each resource individually through separate connections. For example, to construct the Web page of Figure 2.1, the Web browser had to make three connections. One to retrieve the HTML file, another to retrieve the embedded Java applet, and a third to retrieve the Under Construction picture. This shortcoming of HTTP has often been blamed for the slow response time of the Web.

Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection (v) between the user agent (UA) and the origin server (O). A more complicated situation occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to ppass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

Any party to the communication which is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request.

HTTP communication usually takes place over TCP/IP connections. The default port is TCP 80, but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used.

Requests/Repsonses[edit]

The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity metainformation, and possible entity-body content.

HTTP messages can take either a full request/response or a simple request/response message format. HTTP-full-request und full-response kann optionale header und entity-bodies beinhalten. Simple request und simple-response Nachrichten gestatten keine Gebrauch der Header-Informationen und sind auf das Abrufen eines entity bodies beschränkt.

Requests[edit]

Full Request haben grundsätzlich folgenden Aufbau:

<Request-Line> <General-Header> <Request-Header> <Entity-Header><CRLF> <Entity-Body>

Es gelten folgende Regeln:

<Request-Line> := <Method><SPACE><Request-URI><SPACE><HTTP-Version><CRLF>

Responses[edit]

Die Response hat folgenden Aufbau:

<Status-Line> <General-Header> <Response-Header> <Entity-Header><CRLF> <Entity-Body>

Eine einzelne HTTP-Operation nennt man Transaktion. Beim Gebrauch von HTTP bestimmen client und server die Dokumentformate dynamisch. D.h. wenn ein Browser ein Server kontaktiert, sendet der Browser dem server eine Liste von Formaten, die dieser erkennt. Im Prinzip handelt es sich um eine 4-Schritt Transaktion: 1. Einrichten einer Verbindung 2. Client sendet eine Anforderung in Form einer method gefolgt von einer Objekt URL und der HTTP-Protokoll Version 3. Der Server gibt ein Antwort in Form der HTTP-Protokoll Version, einem dreistelligen Statuscode und einer Plausibilitätsphrase 4. Der Server beendet die Verbindung

Response Codes[edit]

Die Response Codes werden in 5 Klassen unterteilt:

1xx	Informierend (z.Z. nicht benutzt)
2xx	Erfolg
200	ok
201	created
3xx	Es müssen noch weitere Maßnahmen getroffen werden
304	not-modified
4xx	Client-Fehler
400	bad request
401	unauthorized (identity could not be validated)
403	forbidden access (identity validated, but no access rights)
404	not found
407	Proxy Authentication Required
500	Internal Server Error
501	not implemented

URL[edit]

A URL is a URI that contains protocol information specifying how the data object should be retrieved from the server. Jeder Rechner/Host im Internet hat eine eindeutige 32 Bit-Adresse. Die 32 Bit-Adresse unterteilt sich in 4 Felder mit Werten von 0 bis 255. Die Host-Nummern bestehen aus zwei Teilen. Der erste Teil ist die Nummer des Netzwerks, der zweite Teil ist die Nummer des Hosts innerhalb eines Netzwerks. Bei Klasse-A Netzen ist das erste Feld die Netzwerknummer (1-126), bei Klasse-B Netzen die ersten beiden (128-191), und bei Klasse C-Netzen die ersten drei (192-223). Oft wird über die sogenannte Subnet-Mask ein Teil der Host-Nummer noch für Teilnetze innerhalb des eigentlichen Netzes verwendet. Meistens wird der DomainName (z.B.x@y.z) verwendet. Host können mehrere Namen besitzen. Zur Übersetzung verwendet das Internet den Domain Name Service (DNS), welches im Prinzip einer verteilten Datenbank gleicht. Der letzte Teil des Domainnamens entspricht entweder geographischen Gegebenheiten z.B. '.de' oder organisatorischen '.edu'. Groß- und Kleinschreibung wird bei Hostnamen nicht unterschieden.

HTTP definiert ein URI (Uniform Resource Identifier) als einen formatierten String, der Namen, Positionen und andere charakteristische Eigenschaften benutzt, um eine Datenquelle des Netzwerks zu identifizieren. Anders gesagt ist ein URI ein einfacher Text-String, der benutzt wird, um ein Objekt im Web anzusprechen. Examples are:

URI: //myserver.com/user1/default.htm URL: http://myserver.com/user1/default.htm

Es gibt absolute und relative URLs.

Die grundlegende Syntax für URLs ist:

<Schema> := (http | ftp | gopher | https | ???)

die volle Syntax für eine HTTP-URL lautet:

http://<host>:<port>/<path>?<search_part>#<anker>

Falls der port feht wird 80 angenommen (der Standard HTTP port). Der search_part wird ignoriert. // bedeuted, daß es sich um ein Internet Objekt handelt. Sonderzeichen müssen mit %<Hex ASCII-Code> encoded werden.

Vordefinierte URLs[edit]

127.0.0.1 localhost localhost 127.0.0.1

Header Felder[edit]

Accept	Liste der MIME Types, die der client versteht
Allow
Authorization
Content-encoding
Content-length
Content-type	The Content-Type entity-header field indicates the media type of the entity-body sent to the recipient.
Entity Header
expires
From
Host	The Host request-header field specifies the Internet host and port number of the resource being requested. content-type:<MediaType>
If modified since
Referer
User Agent

Methoden[edit]

HTTP bezieht sich auf seine Befehle als Verfahrensweise bzw. Methode. Die Methode nennt dem server die Anordnung, die er bei der Datenquelle auszuführen hat. Es gibt eigentlich drei wichtige Methoden. Weitere Methoden sind CHECKIN, CHECKOUT, DELETE, LINK, PUT, SEARCH, SHOWMETHOD, SPACEJUMP, TEXTSEARCH und UNLINK.

GET Methode[edit]

Die HTTP GET Methode fordert einen Web-Server auf, Informationen einzuholen, die die URI beschreibt. Die GET-Methode wird zu einem Conditional GET, wenn die Anfoderung, die der client sendet, ein if modified since Header-Feld enthält.

HEAD-Methode[edit]

Die HTTP-HEAD Methode ist fast mit GET identisch, mit der Ausnahme, daß der Web-Server beim Antworten kein entity body zurücksendet. Es gibt kein Conditional HEAD.

POST Methode[edit]

Die HTTP POST Methode verlangt vom WebServer, das in der Anforderung eingeschlossene Objekt mit der eingeschlossenen URI zu verbinden. Somit schafft oder ersetzt POST eine mit der URL verbundene Datenquelle. Der Server muß die Quelle aber nicht persistent abspeichern und zugreifbar machen.

Querystring[edit]

Bei method=post wird die Zeichenkette <Name>=<Value> an den HTTP Anfrage Header und einer Leerzeile angehängt. Dieser String nennt sich auch Querystring. Der Aubau ist

@@ Line 57: / Line 57: @@
 |2xx
 |Erfolg
+|-
+|200
+|ok
+|-
+|201
+|created
 |-
 |3xx
@@ Line 66: / Line 72: @@
 |4xx
 |Client-Fehler
+|-
+|400
+|bad request
 |-
 |401
-|nicht authorisierte Anforderung
+|unauthorized (identity could not be validated)
 |-
 |403
-|forbidden access
+|forbidden access (identity validated, but no access rights)
 |-
 |404
-|not-found
+|not found
 |-
 |407
 |Proxy Authentication Required
 |-
-|5xx
+|500
-|Server-Fehler
+|Internal Server Error
 |-
 |501