Internet Engineering Task Force Eddie Kohler INTERNET-DRAFT UCLA draft-ietf-dccp-spec-06.txt Mark Handley Expires: August 2004 UCL Sally Floyd ICIR 16 February 2004 Datagram Congestion Control Protocol (DCCP) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC 2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document specifies the Datagram Congestion Control Protocol (DCCP), which implements a congestion-controlled, unreliable flow of unicast datagrams suitable for use by applications such as streaming media, Internet telephony, and on-line games. Kohler/Handley/Floyd [Page 1] INTERNET-DRAFT Expires: August 2004 February 2004 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: Changes since draft-ietf-dccp-spec-05.txt: * Organization overhaul. * Add pseudocode for event processing. * Remove # NDP; replace with Ack Count. * Remove Identification, Challenge, ID Regime, and Connection Nonce. * Data Checksum (formerly Payload Checksum) uses a 32-bit CRC. * Switch location of non-negotiable features to clarify presentation; now the feature location controls its value. * Rename "value type" to "reconciliation rule". * Rename "Reset Reason" to "Reset Code". * Mobility ID becomes 128 bits long. * Add probabilities to Mobility ID discussion. * Add SyncAck. Kohler/Handley/Floyd [Page 2] INTERNET-DRAFT Expires: August 2004 February 2004 Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Design Rationale. . . . . . . . . . . . . . . . . . . . . . . 8 3. Conventions and Terminology . . . . . . . . . . . . . . . . . 9 3.1. Numbers and Fields . . . . . . . . . . . . . . . . . . . 9 3.2. Parts of a Connection. . . . . . . . . . . . . . . . . . 9 3.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4. Round-Trip Times . . . . . . . . . . . . . . . . . . . . 10 3.5. Robustness Principle . . . . . . . . . . . . . . . . . . 10 4. Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1. Packet Types . . . . . . . . . . . . . . . . . . . . . . 11 4.2. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 12 4.3. States . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4. Congestion Control . . . . . . . . . . . . . . . . . . . 15 4.5. Features . . . . . . . . . . . . . . . . . . . . . . . . 16 4.6. Other Differences from TCP . . . . . . . . . . . . . . . 17 4.7. Example Connection . . . . . . . . . . . . . . . . . . . 18 5. Header Formats. . . . . . . . . . . . . . . . . . . . . . . . 19 5.1. Generic Header . . . . . . . . . . . . . . . . . . . . . 20 5.2. DCCP-Request Header. . . . . . . . . . . . . . . . . . . 23 5.3. DCCP-Response Header . . . . . . . . . . . . . . . . . . 23 5.4. DCCP-Data, DCCP-Ack, and DCCP-DataAck Head- ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.5. DCCP-CloseReq and DCCP-Close Headers . . . . . . . . . . 25 5.6. DCCP-Reset Header. . . . . . . . . . . . . . . . . . . . 26 5.7. DCCP-Move Header . . . . . . . . . . . . . . . . . . . . 27 5.8. DCCP-Sync and DCCP-SyncAck Headers . . . . . . . . . . . 28 5.9. Options. . . . . . . . . . . . . . . . . . . . . . . . . 29 5.9.1. Padding Option. . . . . . . . . . . . . . . . . . . 30 5.9.2. Mandatory Option. . . . . . . . . . . . . . . . . . 30 6. Feature Negotiation . . . . . . . . . . . . . . . . . . . . . 31 6.1. Change Options . . . . . . . . . . . . . . . . . . . . . 31 6.2. Confirm Options. . . . . . . . . . . . . . . . . . . . . 32 6.3. Reconciliation Rules . . . . . . . . . . . . . . . . . . 32 6.3.1. Server-Priority . . . . . . . . . . . . . . . . . . 33 6.3.2. Non-Negotiable. . . . . . . . . . . . . . . . . . . 33 6.4. Feature Numbers. . . . . . . . . . . . . . . . . . . . . 33 6.5. Examples . . . . . . . . . . . . . . . . . . . . . . . . 34 6.6. Option Exchange. . . . . . . . . . . . . . . . . . . . . 36 6.6.1. Normal Exchange . . . . . . . . . . . . . . . . . . 36 6.6.2. Loss and Retransmission . . . . . . . . . . . . . . 37 6.6.3. Reordering. . . . . . . . . . . . . . . . . . . . . 38 6.6.4. Preference Changes. . . . . . . . . . . . . . . . . 39 6.6.5. Simultaneous Negotiation. . . . . . . . . . . . . . 39 6.6.6. Unknown Features. . . . . . . . . . . . . . . . . . 39 6.6.7. Invalid Options . . . . . . . . . . . . . . . . . . 40 6.6.8. Mandatory Feature Negotiation . . . . . . . . . . . 40 Kohler/Handley/Floyd [Page 3] INTERNET-DRAFT Expires: August 2004 February 2004 6.6.9. Out-of-Band Agreement . . . . . . . . . . . . . . . 41 6.6.10. State Diagram. . . . . . . . . . . . . . . . . . . 41 7. Sequence Numbers. . . . . . . . . . . . . . . . . . . . . . . 42 7.1. Variables. . . . . . . . . . . . . . . . . . . . . . . . 42 7.2. Initial Sequence Numbers . . . . . . . . . . . . . . . . 43 7.3. Quiet Time . . . . . . . . . . . . . . . . . . . . . . . 44 7.4. Acknowledgement Numbers. . . . . . . . . . . . . . . . . 44 7.5. Validity and Synchronization . . . . . . . . . . . . . . 45 7.5.1. Sequence-Validity Rules . . . . . . . . . . . . . . 45 7.5.2. Handling Sequence-Invalid Packets . . . . . . . . . 47 7.5.3. Sequence and Acknowledgement Number Windows. . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.5.4. Sequence Window Feature . . . . . . . . . . . . . . 49 7.5.5. Sequence Number Attacks . . . . . . . . . . . . . . 49 7.5.6. Examples. . . . . . . . . . . . . . . . . . . . . . 50 7.6. Extended Sequence Numbers. . . . . . . . . . . . . . . . 51 7.6.1. When to Use Extended Sequence Numbers . . . . . . . 51 7.6.2. Header Processing . . . . . . . . . . . . . . . . . 52 7.6.3. Transitioning to Extended Sequence Num- bers . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.6.4. Sequence Transition Capable Feature . . . . . . . . 54 7.7. NDP Count and Detecting Application Loss . . . . . . . . 55 7.7.1. Usage Notes . . . . . . . . . . . . . . . . . . . . 56 7.7.2. Send NDP Count Feature. . . . . . . . . . . . . . . 56 8. Event Processing. . . . . . . . . . . . . . . . . . . . . . . 56 8.1. Connection Establishment . . . . . . . . . . . . . . . . 56 8.1.1. Client Request. . . . . . . . . . . . . . . . . . . 57 8.1.2. Service Codes . . . . . . . . . . . . . . . . . . . 57 8.1.3. Server Response . . . . . . . . . . . . . . . . . . 59 8.1.4. Init Cookie Option. . . . . . . . . . . . . . . . . 60 8.1.5. Handshake Completion. . . . . . . . . . . . . . . . 60 8.2. Data Transfer. . . . . . . . . . . . . . . . . . . . . . 61 8.3. Termination. . . . . . . . . . . . . . . . . . . . . . . 62 8.3.1. Abnormal Termination. . . . . . . . . . . . . . . . 63 8.4. DCCP State Diagram . . . . . . . . . . . . . . . . . . . 63 8.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 64 9. Checksums . . . . . . . . . . . . . . . . . . . . . . . . . . 68 9.1. Header Checksum Field. . . . . . . . . . . . . . . . . . 68 9.2. Header Checksum Coverage Field . . . . . . . . . . . . . 69 9.3. Data Checksum Option . . . . . . . . . . . . . . . . . . 70 9.3.1. Check Data Checksum Feature . . . . . . . . . . . . 71 9.3.2. Usage Notes . . . . . . . . . . . . . . . . . . . . 71 10. Congestion Control IDs . . . . . . . . . . . . . . . . . . . 71 10.1. Unspecified Sender-Based Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 10.2. TCP-like Congestion Control . . . . . . . . . . . . . . 74 10.3. TFRC Congestion Control . . . . . . . . . . . . . . . . 74 10.4. CCID-Specific Options, Features, and Reset Kohler/Handley/Floyd [Page 4] INTERNET-DRAFT Expires: August 2004 February 2004 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 76 11.1. Acks of Acks and Unidirectional Connections . . . . . . . . . . . . . . . . . . . . . . . . . 77 11.2. Ack Piggybacking. . . . . . . . . . . . . . . . . . . . 78 11.3. Ack Ratio Feature . . . . . . . . . . . . . . . . . . . 79 11.4. Ack Vector Options. . . . . . . . . . . . . . . . . . . 79 11.4.1. Ack Vector Consistency . . . . . . . . . . . . . . 81 11.4.2. Ack Vector Coverage. . . . . . . . . . . . . . . . 83 11.5. Send Ack Vector Feature . . . . . . . . . . . . . . . . 83 11.6. Slow Receiver Option. . . . . . . . . . . . . . . . . . 84 11.7. Data Dropped Option . . . . . . . . . . . . . . . . . . 84 11.7.1. Data Dropped and Normal Congestion Response . . . . . . . . . . . . . . . . . . . . . . . . . 87 11.7.2. Particular Drop Codes. . . . . . . . . . . . . . . 87 12. Explicit Congestion Notification . . . . . . . . . . . . . . 88 12.1. ECN Capable Feature . . . . . . . . . . . . . . . . . . 88 12.2. ECN Nonces. . . . . . . . . . . . . . . . . . . . . . . 89 12.3. Other Aggression Penalties. . . . . . . . . . . . . . . 90 13. Timing Options . . . . . . . . . . . . . . . . . . . . . . . 90 13.1. Timestamp Option. . . . . . . . . . . . . . . . . . . . 90 13.2. Elapsed Time Option . . . . . . . . . . . . . . . . . . 91 13.3. Timestamp Echo Option . . . . . . . . . . . . . . . . . 92 14. Multihoming and Mobility . . . . . . . . . . . . . . . . . . 92 14.1. Mobility Capable Feature. . . . . . . . . . . . . . . . 93 14.2. Mobility ID Feature . . . . . . . . . . . . . . . . . . 93 14.3. Mobile Host Processing. . . . . . . . . . . . . . . . . 94 14.4. Stationary Host Processing. . . . . . . . . . . . . . . 95 14.5. Congestion Control State. . . . . . . . . . . . . . . . 96 14.6. Security. . . . . . . . . . . . . . . . . . . . . . . . 96 15. Maximum Packet Size. . . . . . . . . . . . . . . . . . . . . 97 16. Forward Compatibility. . . . . . . . . . . . . . . . . . . . 99 17. Middlebox Considerations . . . . . . . . . . . . . . . . . . 100 18. Relations to Other Specifications. . . . . . . . . . . . . . 101 18.1. DCCP and RTP. . . . . . . . . . . . . . . . . . . . . . 101 18.2. Multiplexing Issues . . . . . . . . . . . . . . . . . . 102 19. Security Considerations. . . . . . . . . . . . . . . . . . . 103 19.1. Security Considerations for Mobility. . . . . . . . . . 103 19.2. Security Considerations for Partial Check- sums. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 20. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 105 21. Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A. Appendix: Ack Vector Implementation Notes . . . . . . . . . . 106 A.1. Packet Arrival . . . . . . . . . . . . . . . . . . . . . 108 A.1.1. New Packets . . . . . . . . . . . . . . . . . . . . 108 A.1.2. Old Packets . . . . . . . . . . . . . . . . . . . . 109 A.2. Sending Acknowledgements . . . . . . . . . . . . . . . . 110 A.3. Clearing State . . . . . . . . . . . . . . . . . . . . . 110 Kohler/Handley/Floyd [Page 5] INTERNET-DRAFT Expires: August 2004 February 2004 A.4. Processing Acknowledgements. . . . . . . . . . . . . . . 112 B. Appendix: Design Motivation . . . . . . . . . . . . . . . . . 113 B.1. CsCov and Partial Checksumming . . . . . . . . . . . . . 113 Normative References . . . . . . . . . . . . . . . . . . . . . . 114 Informative References . . . . . . . . . . . . . . . . . . . . . 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 116 Intellectual Property Notice . . . . . . . . . . . . . . . . . . 117 Kohler/Handley/Floyd [Page 6] INTERNET-DRAFT Expires: August 2004 February 2004 1. Introduction This document describes the Datagram Congestion Control Protocol (DCCP), a transport protocol that implements a congestion- controlled, bidirectional stream of unreliable datagrams. Specifically, DCCP provides: o An unreliable flow of datagrams, with acknowledgements. o Reliable handshakes for connection setup and teardown. o Reliable negotiation of options, including negotiation of a suitable congestion control mechanism. o Mechanisms allowing a server to avoid holding any state for unacknowledged connection attempts or already-finished connections. o Congestion control incorporating Explicit Congestion Notification (ECN) and the ECN Nonce, as per [RFC 3168] and [RFC 3540]. o Acknowledgement mechanisms communicating packet loss and ECN mark information. Acks are transmitted as reliably as the relevant congestion control mechanism requires, possibly completely reliably. o Optional mechanisms that tell the sending application, with high reliability, which data packets reached the receiver, and whether those packets were ECN marked, corrupted, or dropped in the receive buffer. o Path Maximum Transfer Unit (PMTU) discovery, as per [RFC 1191]. DCCP is intended for applications, such as streaming media and Internet telephony, where reliable in-order delivery, combined with congestion control, can result in some information arriving at the receiver after it is no longer of use. So far, most such applications have either used TCP, with the attendant quality problems caused by late data delivery, or used UDP and implemented their own congestion control (or no congestion control at all). DCCP provides standard congestion control mechanisms for such applications. It enables the use of ECN, along with conformant end- to-end congestion control, for applications that would otherwise be using UDP. In addition, DCCP implements reliable connection setup, teardown, and feature negotiation. DCCP's target applications require the flow-based semantics of TCP, but do not want TCP's in-order delivery and reliability, or would Kohler/Handley/Floyd Section 1. [Page 7] INTERNET-DRAFT Expires: August 2004 February 2004 like different congestion control dynamics than TCP. 2. Design Rationale DCCP was intended to be used by applications that currently use UDP without end-to-end congestion control. Most streaming UDP applications should have little reason not to switch to DCCP, once it is deployed. Thus, DCCP was designed to have as little overhead as possible, both in terms of the packet header size and in terms of the state and CPU overhead required at end hosts. Only the minimal necessary functionality was included in DCCP, leaving other functionality, such as forward error correction (FEC), semi- reliability, and multiple streams, to be layered on top of DCCP as desired. This desire for minimal overhead is also one of the reasons to avoid proposing an unreliable variant of the Stream Control Transmission Protocol (SCTP, [RFC 2960]). Different forms of conformant congestion control are appropriate for different applications. For example, applications such as on-line games might want to make quick use of any available bandwidth. Other applications, such as streaming media, might trade off this responsiveness for a steadier, less bursty rate, since sudden rate changes cause unacceptable UI glitches (such as audible pauses or clicks in the playout stream). Thus, DCCP allows applications to choose between several forms of congestion control. One choice, TCP-like Congestion Control, halves the congestion window in response to a packet drop or mark, as in TCP. Applications using this congestion control mechanism will respond quickly to changes in available bandwidth, but must be able to tolerate the abrupt changes in congestion window typical of TCP. A second alternative, TCP- Friendly Rate Control (TFRC, [RFC 3448]), a form of equation-based congestion control, minimizes abrupt changes in the sending rate while maintaining longer-term fairness with TCP. DCCP also lets unreliable traffic safely use ECN. A UDP kernel API might not allow applications to set UDP packets as ECN-capable, since the API could not guarantee the application would properly detect or respond to congestion. DCCP kernel APIs will have no such issues, since DCCP itself implements congestion control. We chose not to require the use of the Congestion Manager [RFC 3124], which allows multiple concurrent streams between the same sender and receiver to share congestion control. The current Congestion Manager can only be used by applications that have their own end-to-end feedback about packet losses, but this is not the case for many of the applications currently using UDP. In addition, the current Congestion Manager does not easily support multiple congestion control mechanisms, or lend itself to the use of forms of Kohler/Handley/Floyd Section 2. [Page 8] INTERNET-DRAFT Expires: August 2004 February 2004 TFRC where the state about past packet drops or marks is maintained at the receiver rather than at the sender. DCCP should be able to make use of CM where desired by the application, but we do not see any benefit in making the deployment of DCCP contingent on the deployment of CM itself. 3. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]. 3.1. Numbers and Fields All multi-byte numerical quantities in DCCP, such as port numbers, Sequence Numbers, and arguments to options, are transmitted in network byte order (most significant byte first). We occasionally refer to the "left" and "right" sides of a bit field. "Left" means towards the most significant bit, and "right" means towards the least significant bit. Reserved bitfields in DCCP packet headers MUST be ignored by receivers, and MUST be set to zero by senders, unless otherwise specified. Random numbers in DCCP are used for their security properties, and MUST be chosen according to the guidelines in [RFC 1750]. 3.2. Parts of a Connection Each DCCP connection runs between two endpoints, which we often name DCCP A and DCCP B. DCCP connections are actively initiated by one endpoint. The active endpoint is called the client, and the passive endpoint is called the server. DCCP connections are bidirectional; data may pass from either endpoint to the other. This means that data and acknowledgements may be flowing in both directions simultaneously. Logically, however, a DCCP connection consists of two separate unidirectional connections, called half-connections. Each half-connection consists of the data packets sent by one endpoint and the corresponding acknowledgements sent by the other endpoint. We can illustrate this as follows: Kohler/Handley/Floyd Section 3.2. [Page 9] INTERNET-DRAFT Expires: August 2004 February 2004 +--------+ A-to-B half-connection: +--------+ | | --> data packets --> | | | | <-- acknowledgements <-- | | | DCCP A | | DCCP B | | | B-to-A half-connection: | | | | <-- data packets <-- | | +--------+ --> acknowledgements --> +--------+ Although they are logically distinct, in practice the half- connections overlap; a DCCP-DataAck packet, for example, contains application data relevant to one half-connection and acknowledgement information relevant to the other. In the context of a single half-connection, the HC-Sender is the endpoint sending data, while the HC-Receiver is the endpoint sending acknowledgements. For example, in the A-to-B half-connection, DCCP A is the HC-Sender and DCCP B is the HC-Receiver. 3.3. Features A feature is a DCCP connection attribute, identified by a feature number and an endpoint, on whose value the two endpoints agree. Many properties of a DCCP connection are controlled by features, including the congestion control mechanisms in use on the two half- connections, whether mobility is allowed, and whether ECN is supported. The endpoints can achieve agreement by out-of-band communication, or through the exchange of feature negotiation options in DCCP headers. The notation F/A represents the feature with feature number F located at DCCP endpoint A; the feature F/B has the same feature number, but is located at the other endpoint. Both DCCP A and DCCP B know, and agree on, the values of both F/A and F/B, but F/A and F/B may have different values. DCCP A is called the feature location for all features F/A, and the feature remote for all features F/B. 3.4. Round-Trip Times We sometimes refer to a round-trip time for setting timers, for example. If no useful round-trip time estimate is available, a DCCP implementation SHOULD use 0.2 seconds instead. 3.5. Robustness Principle DCCP implementations should follow TCP's "general principle of robustness": be conservative in what you do, be liberal in what you Kohler/Handley/Floyd Section 3.5. [Page 10] INTERNET-DRAFT Expires: August 2004 February 2004 accept from others. 4. Overview DCCP's high-level connection dynamics should seem familiar to anyone who knows TCP. DCCP connections, like TCP connections, progress through three phases: initiation (including a three-way handshake), data transfer, and termination. Data can flow both ways over the connection. An acknowledgement framework lets senders discover how much data has been lost; congestion control uses this information to avoid unfairly congesting the network. Of course, DCCP provides unreliable datagram semantics, not TCP's reliable bytestream semantics. The application must package its data into explicit frames, and must retransmit its own data as necessary. It may be useful to think of DCCP either as TCP minus bytestream semantics and reliability, or as UDP plus congestion control, handshakes, and acknowledgements. 4.1. Packet Types DCCP uses eleven packet types to implement various protocol functions. For example, every new connection attempt begins with a DCCP-Request packet sent by the client. A DCCP-Request packet thus resembles a TCP SYN; but DCCP-Request is a packet type, not a flag, so there's no way to send an unexpected combination such as TCP's SYN+FIN+ACK+RST. Eight packet types occur during the progress of a typical connection---two only during the initiation phase, three during the data transfer phase, and three only during the termination phase: Client Server ------ ------ (1) Initiation DCCP-Request --> <-- DCCP-Response DCCP-Ack --> (2) Data transfer DCCP-Data, DCCP-Ack, DCCP-DataAck --> <-- DCCP-Data, DCCP-Ack, DCCP-DataAck (3) Termination <-- DCCP-CloseReq DCCP-Close --> <-- DCCP-Reset Note the three-way handshakes during initiation and termination. The three remaining packet types are used for special purposes: when an endpoint moves, or to resynchronize after bursts of loss. Kohler/Handley/Floyd Section 4.1. [Page 11] INTERNET-DRAFT Expires: August 2004 February 2004 Every DCCP packet starts with a common, 12-byte generic header, but different packet types may include different amounts of additional data. For example, the DCCP-Ack packet type includes an Acknowledgement Number. Every packet type may also contain options, up to around 1000 bytes' worth. All of the packet types are described below. DCCP-Request Sent by the client to initiate a connection (the first part of the three-way handshake). DCCP-Response Sent by the server in response to a DCCP-Request (the second part of the three-way handshake). DCCP-Data Used to transmit data. DCCP-Ack Used for pure acknowledgements. DCCP-DataAck Used for piggybacked data-plus-acknowledgements. DCCP-CloseReq Sent by the server to request that the client close the connection. DCCP-Close Used to close the connection; elicits a DCCP-Reset in response. DCCP-Reset Used to terminate the connection, either normally or abnormally. DCCP-Move Supports multihoming and mobility. DCCP-Sync, DCCP-SyncAck Used to resynchronize sequence numbers after large bursts of loss. 4.2. Sequence Numbers Each DCCP packet carries a sequence number, so that losses can be detected and reported. But unlike TCP's byte-based sequence numbers, DCCP sequence numbers are attached to packets. Each packet sent increments the sequence number by one. For example: Kohler/Handley/Floyd Section 4.2. [Page 12] INTERNET-DRAFT Expires: August 2004 February 2004 DCCP A DCCP B ------ ------ DCCP-Data(seqno 1) --> DCCP-Data(seqno 2) --> <-- DCCP-Ack(seqno 10, ackno 2) DCCP-DataAck(seqno 3, ackno 10) --> <-- DCCP-Data(seqno 11) Note that even DCCP-Ack pure acknowledgements increment the sequence number; after the DCCP-Ack with sequence number 10, the following DCCP-Data packet uses the next sequence number, 11. This lets the endpoints tell when acknowledgements are lost in the network. It also means that endpoints can get out of sync after a long burst of loss. The DCCP-Sync and DCCP-SyncAck packet types let DCCP recover from large loss bursts; see Section 7.5. Also note that, since DCCP is an unreliable protocol, there are no retransmissions, and it doesn't make sense to have a cumulative acknowledgement field. Acknowledgement Number (ackno) fields equal the largest sequence number received, rather than the TCP-style smallest sequence number not received. Separate options indicate any intermediate sequence numbers that weren't received. 4.3. States DCCP endpoints progress through different states during the course of a connection, corresponding roughly to the three phases of initiation, data transfer, and termination. The figure below shows the typical progress through these states for a client and server. Kohler/Handley/Floyd Section 4.3. [Page 13] INTERNET-DRAFT Expires: August 2004 February 2004 Client Server ------ ------ (0) No connection CLOSED LISTEN (1) Initiation REQUEST DCCP-Request --> <-- DCCP-Response RESPOND PARTOPEN DCCP-Ack or DCCP-DataAck --> (2) Data transfer OPEN <-- DCCP-Data, Ack, DataAck --> OPEN (3) Termination <-- DCCP-CloseReq CLOSEREQ CLOSING DCCP-Close --> <-- DCCP-Reset CLOSED TIMEWAIT CLOSED The client and server's typical progress through states. The states are as follows; Section 8 describes them in more detail. CLOSED Represents a nonexistent connection. LISTEN Represents a server socket in the passive listening state. LISTEN and CLOSED are not associated with any particular DCCP connection. REQUEST The client socket enters this state, from CLOSED, after sending a DCCP-Request packet to try to initiate a connection. RESPOND A server socket enters this state, from LISTEN, after receiving a DCCP-Request from a client. PARTOPEN The client socket enters this state, from REQUEST, after receiving a DCCP-Response from the server. This state represents the third phase of the three-way handshake. The client may send data in this state, but it MUST include an Acknowledgement Number on all of its packets. OPEN The central, data transfer portion of a DCCP connection. Client Kohler/Handley/Floyd Section 4.3. [Page 14] INTERNET-DRAFT Expires: August 2004 February 2004 and server enter into this state from PARTOPEN and RESPOND, respectively. Sometimes we speak of SERVER-OPEN and CLIENT-OPEN states, corresponding to the server's OPEN state and the client's OPEN state. CLOSEREQ A server socket enters this state, from SERVER-OPEN, to signal that the connection is over, but the client must hold TIMEWAIT state. CLOSING Either server or client can enter this state to close the connection. TIMEWAIT A socket remains in this state for 2MSL after the connection has been torn down, to prevent mistakes due to the delivery of old packets. One MSL, or Maximum Segment Lifetime, is the maximum length of time a packet could survive in the network. 4.4. Congestion Control DCCP connections are congestion controlled. Unlike TCP, however, DCCP supports multiple congestion control mechanisms for applications to choose from. In fact, the two half-connections can be governed by different mechanisms. Each mechanism corresponds to a one-byte congestion control identifier, or CCID. A CCID describes how the HC-Sender limits data packet rates; how it maintains necessary parameters, such as congestion windows; how the HC- Receiver sends congestion feedback via acknowledgements; and how it manages the acknowledgement rate. The endpoints negotiate their CCIDs during connection initiation. So far, CCIDs 2 and 3 have been defined for use with DCCP; CCID 0 is reserved, and CCID 1 is used for special purposes (see Section 10.1). CCID 2 corresponds to TCP-like Congestion Control, which is similar to that of TCP. The sender maintains a congestion window and sends packets until that window is full. Packets are acknowledged by the receiver. Dropped packets and ECN [RFC 3168] are indicate congestion; the response to congestion is to halve the congestion window. Acknowledgements in CCID 2 contain the sequence numbers of all received packets within some window, similar to a super selective-acknowledgement (SACK, [RFC 3517]). CCID 3 provides TFRC Congestion Control, an equation-based form of congestion control which is intended to provide a smoother response Kohler/Handley/Floyd Section 4.4. [Page 15] INTERNET-DRAFT Expires: August 2004 February 2004 to congestion than CCID 2. The sender maintains a "transmit rate". The receiver sends acknowledgement packets containing information about the receiver's estimate of packet loss. The sender uses this information to update its transmit rate. Although CCID 3 behaves somewhat differently from TCP in its short term congestion response, it is designed to operate fairly with TCP over the long term. The behaviors of CCIDs 2 and 3 are fully defined in separate profile documents [CCID 2 PROFILE] [CCID 3 PROFILE]. 4.5. Features Agreement on DCCP feature values is achieved by explicit negotiation, using options in DCCP packet headers. This generally happens at connection startup, but negotiation can begin at any time. The relevant options are Change L, Confirm L, Change R, and Confirm R, with the "L" options sent by the feature location and the "R" options sent by the feature remote. A Change R message says to the peer, "change this feature value on your side". The peer responds with a Confirm L, meaning "I've changed it". The suggested option setting in Change R can sometimes contain multiple values, which are sorted in preference order. For example: Client Server ------ ------ Change R(CCID, 2) --> <-- Confirm L(CCID, 2) * agreement that CCID/Server = 2 * Change R(CCID, 3 4) --> <-- Confirm L(CCID, 4, 4 2) * agreement that CCID/Server = 4 * In the second exchange, the client requests that the server use either CCID 3 or CCID 4, with 3 preferred. The server chooses 4, giving its preference list of "4 2". A party that wants to change a feature located at itself issues a "Change L" option, which elicits a "Confirm R" in reply. Client Server ------ ------ <-- Change L(CCID, 3 2) Confirm R(CCID, 3, 3 2) --> * agreement that CCID/Server = 3 * Kohler/Handley/Floyd Section 4.5. [Page 16] INTERNET-DRAFT Expires: August 2004 February 2004 In this example, the server requests CCID value 3 or 2 for the server's CCID, with 3 preferred, and the client agrees. Retransmissions make feature negotiation reliable. Section 6 describes these options further. 4.6. Other Differences from TCP Interesting differences between DCCP and TCP, apart from those discussed so far, include: o Copious space for options (up to 1020 bytes). o Different acknowledgement formats. The CCID for a connection determines how much ack information needs to be transmitted. In CCID 2 (TCP-like), this is about one ack per 2 packets, and each ack must declare exactly which packets were received; in CCID 3 (TFRC), it's about one ack per RTT, and acks must declare at minimum just the lengths of recent loss intervals. o Denial-of-service (DoS) protection. Several DCCP mechanisms attempt to let servers limit the amount of state possibly- misbehaving clients can force them to maintain. An Init Cookie option, analogous to TCP's SYN Cookies [SYNCOOKIES], avoids SYN- flood-like attacks. Only one connection endpoint need hold TIMEWAIT state; the DCCP-CloseReq packet, which may only be sent by the server, passes that state to the client. Various rate limits let servers avoid attacks that might force extensive computation or packet generation. o Distinguishing different kinds of loss. A Data Dropped option (Section 11.7) lets an endpoint declare that a packet was dropped because of corruption, because of receive buffer overflow, and so on. This facilitates research into more appropriate rate-control responses for these non-network-congestion losses (although currently all losses will cause a congestion response). o Acknowledgement readiness. In TCP, a packet is acknowledged only when the data is queued for delivery to the application. This does not make sense in DCCP, where an application might request a drop-from-front receive buffer, for example. We acknowledge a packet when its options have been processed. The Data Dropped option may later say that the packet's payload was discarded. o Integrated support for mobility and multihoming via the DCCP-Move packet type. Kohler/Handley/Floyd Section 4.6. [Page 17] INTERNET-DRAFT Expires: August 2004 February 2004 o No receive window. DCCP is a congestion control protocol, not a flow control protocol. o No simultaneous open. Every connection has one client and one server. o No half-closed states. DCCP has no states corresponding to TCP's FINWAIT and CLOSEWAIT, where one half-connection is explicitly closed while the other is still active. 4.7. Example Connection The progress of a typical DCCP connection is as follows. (This description is informative, not normative.) Client Server ------ ------ 0. [CLOSED] [LISTEN] 1. DCCP-Request --> 2. <-- DCCP-Response 3. DCCP-Ack --> <-- DCCP-Ack 4. DCCP-Data, DCCP-Ack, DCCP-DataAck --> <-- DCCP-Data, DCCP-Ack, DCCP-DataAck 5. <-- DCCP-CloseReq 6. DCCP-Close --> 7. <-- DCCP-Reset 8. [TIMEWAIT] 1. The client sends the server a DCCP-Request packet specifying the client and server ports, the service being requested, and any features being negotiated, including the CCID that the client would like the server to use. The client may optionally piggyback some data on the DCCP-Request packet---an application- level request, say---which the server may ignore. 2. The server sends the client a DCCP-Response packet indicating that it is willing to communicate with the client. The response indicates any features and options that the server agrees to, begins or continues other feature negotiations if desired, and optionally includes an Init Cookie that wraps up all this information and which must be returned by the client for the connection to complete. 3. The client sends the server a DCCP-Ack packet that acknowledges the DCCP-Response packet. This acknowledges the server's initial sequence number and returns the Init Cookie if there was Kohler/Handley/Floyd Section 4.7. [Page 18] INTERNET-DRAFT Expires: August 2004 February 2004 one in the DCCP-Response. It may also continue feature negotiation. There might follow zero or more DCCP-Ack exchanges as required to finalize feature negotiation. The client may piggyback an application-level request on its final ack, producing a DCCP-DataAck packet. 4. The server and client then exchange DCCP-Data packets, DCCP-Ack packets acknowledging that data, and, optionally, DCCP-DataAck packets containing piggybacked data and acknowledgements. If the client has no data to send, then the server will send DCCP- Data and DCCP-DataAck packets, while the client will send DCCP- Acks exclusively. 5. The server sends a DCCP-CloseReq packet requesting a close. 6. The client sends a DCCP-Close packet acknowledging the close. 7. The server sends a DCCP-Reset packet with Reset Code 1, "Closed", and clears its connection state. In DCCP, unlike TCP, Resets are part of normal connection termination; see Section 5.6. 8. The client receives the DCCP-Reset packet and holds state for a reasonable interval of time to allow any remaining packets to clear the network. An alternative connection closedown sequence is initiated by the client: 5b. The client sends a DCCP-Close packet closing the connection. 6b. The server sends a DCCP-Reset packet with Reset Code 1, "Closed", and clears its connection state. 7b. The client receives the DCCP-Reset packet and holds state for a reasonable interval of time to allow any remaining packets to clear the network. 5. Header Formats The variable-length DCCP header appears first in every DCCP packet. A header can be from 12 to 1020 bytes long. The initial 12 bytes of the header are the same regardless of packet type. Following this comes optional additional fixed-length fields, depending on the packet type, and then a variable-length list of options. Finally, some packet types include application data. Kohler/Handley/Floyd Section 5. [Page 19] INTERNET-DRAFT Expires: August 2004 February 2004 +---------------------------------------+ -. | Generic Header | | +---------------------------------------+ | | Additional Fields (depending on type) | +- DCCP Header +---------------------------------------+ | | Options (optional) | | +=======================================+ -' | Application Data (optional) | +=======================================+ 5.1. Generic Header The DCCP generic header generally takes 12 bytes. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type |X| Res | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Actually, there are two types of generic header, depending on the value of X, the Extended Sequence Numbers bit. If X is zero, the Sequence Number field takes 24 bits, as above. If X is one, the Sequence Number field extends for an additional 24 bits, for a total of 48: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type |1| Res | Sequence Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Sequence Number (low bits) | Reserved |T| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source and Destination Ports: 16 bits each These fields identify the connection, similar to the corresponding fields in TCP and UDP. The Source Port represents the relevant port on the endpoint that sent this packet, the Kohler/Handley/Floyd Section 5.1. [Page 20] INTERNET-DRAFT Expires: August 2004 February 2004 Destination Port the relevant port on the other endpoint. Source Ports SHOULD be chosen randomly, to reduce the likelihood of attack. Data Offset: 8 bits The offset from the start of the DCCP header to the beginning of the packet's application data, in 32-bit words. CCVal: 4 bits Used by the HC-Sender CCID. For example, the A-to-B CCID's sender, which is active at DCCP A, MAY send 4 bits of information per packet to its receiver by encoding that information in CCVal. CCVal MUST be set to zero unless the HC- Sender CCID specifies a different value. Checksum Coverage (CsCov): 4 bits Checksum Coverage specifies what parts of the packet are covered by the Checksum field. This always includes the DCCP header and options, but if applications request it, some or all of the application data may be excluded. This can improve performance on noisy links, assuming the application can tolerate corruption. See Section 9. Checksum: 16 bits The Internet checksum of the packet's DCCP header (including options), a network-layer pseudoheader, and, depending on Checksum Coverage, some or all of the application data. See Section 9. Type: 4 bits The Type field specifies the type of the packet. The following values are defined: Type Meaning ---- ------- 0 DCCP-Request 1 DCCP-Response 2 DCCP-Data 3 DCCP-Ack 4 DCCP-DataAck 5 DCCP-CloseReq 6 DCCP-Close 7 DCCP-Reset 8 DCCP-Move 9 DCCP-Sync 10 DCCP-SyncAck 11-15 Reserved Kohler/Handley/Floyd Section 5.1. [Page 21] INTERNET-DRAFT Expires: August 2004 February 2004 Extended Sequence Numbers (X): 1 bit This bit is set to one to indicate the use of an extended generic header with 48-bit Sequence and Acknowledgement Numbers. Very-high-rate connections SHOULD set X to one, and use 48-bit sequence numbers, to gain increased protection against wrapped sequence numbers and attacks. See Section 7.6. Reserved (Res): 3 bits The version of DCCP specified here MUST ignore this field on received packets, and MUST set it to all zeroes on generated packets. Sequence Number: 24 or 48 bits Identifies the packet uniquely in the sequence of all packets the source sent on this connection. Sequence Number increases by one with every packet sent, including packets such as DCCP- Ack that carry no application data. See Section 7. Sequence Number Transition (T): 1 bit [X=1 only] Set to one to indicate an ongoing transition from 24-bit to 48-bit sequence numbers. See Section 7.6. Many packet types also carry an Acknowledgement Number in the four or eight bytes immediately following the generic header. When X=0, its format is: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ And when X=1: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Acknowledgement Number: 24 or 48 bits The Acknowledgement Number field generally acknowledges the greatest valid sequence number received so far on this connection. ("Greatest" is, of course, measured in circular sequence space.) Acknowledgement numbers make no attempt to provide precise information about which packets have arrived; options such as the Ack Vector do this. Kohler/Handley/Floyd Section 5.1. [Page 22] INTERNET-DRAFT Expires: August 2004 February 2004 Reserved: 8 bits The version of DCCP specified here MUST ignore these fields on received packets, and MUST set them to all zeroes on generated packets. 5.2. DCCP-Request Header A client initiates a DCCP connection by sending a DCCP-Request packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=0 (DCCP-Request) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | Service Code: 32 bits Describes the service to which the client application wants to connect. Examples might include RTSP and DOOM. Service Codes are intended to make application protocols independent of well- known ports, and help middleboxes identify the protocol used on a given connection. See Section 8.1.2. 5.3. DCCP-Response Header The server responds to valid DCCP-Request packets with DCCP-Response packets. This is the second phase of the three-way handshake. Kohler/Handley/Floyd Section 5.3. [Page 23] INTERNET-DRAFT Expires: August 2004 February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=1 (DCCP-Response) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | Acknowledgement Number: 24 or 48 bits The Acknowledgement Number field will generally equal the Sequence Number from the DCCP-Request. Service Code: 32 bits Echoes the Service Code on the DCCP-Request. 5.4. DCCP-Data, DCCP-Ack, and DCCP-DataAck Headers The central data transfer portion of every DCCP connection uses DCCP-Data, DCCP-Ack, and DCCP-DataAck packets. DCCP-Data packets carry application data. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=2 (DCCP-Data) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | DCCP-Ack packets dispense with the data, but contain an Acknowledgement Number. They are used for pure acknowledgements. Kohler/Handley/Floyd Section 5.4. [Page 24] INTERNET-DRAFT Expires: August 2004 February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=3 (DCCP-Ack) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DCCP-DataAck packets carry both application data and an Acknowledgement Number: acknowledgement information is piggybacked on a data packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=4 (DCCP-DataAck) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | DCCP-Data and DCCP-DataAck packets may contain zero application data bytes if the application sends a zero-length datagram. Also, a DCCP-Ack packet need not have a zero-length application data area. The receiver MUST ignore any "application data" in a DCCP-Ack packet. The sender will not generally send such data, but it may occasionally do so---to perform PMTU discovery without risking loss of user data, for example. DCCP-Ack and DCCP-DataAck packets often include additional acknowledgement options, such as Ack Vector, as required by the congestion control mechanism in use. 5.5. DCCP-CloseReq and DCCP-Close Headers DCCP-CloseReq and DCCP-Close packets begin the handshake that normally terminates a connection. Either client or server may send Kohler/Handley/Floyd Section 5.5. [Page 25] INTERNET-DRAFT Expires: August 2004 February 2004 a DCCP-Close packet, which will elicit a DCCP-Reset packet (see the next section). Only the server can send a DCCP-CloseReq packet, which indicates that the server wants to close the connection, but does not want to hold its TIMEWAIT state. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=5 (DCCP-CloseReq) or 6 (DCCP-Close) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The receiver MUST ignore any "application data" in a DCCP-CloseReq or DCCP-Close packet. 5.6. DCCP-Reset Header DCCP-Reset packets unconditionally shut down a connection. Connections normally terminate with a DCCP-Reset, but resets may be sent for other reasons, including bad port numbers, bad option behavior, incorrect ECN Nonce Echoes, and so forth. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=7 (DCCP-Reset) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reset Code | Data 1 | Data 2 | Data 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Error Text | | ... | Reset Code: 8 bits Represents the reason that the sender reset the DCCP connection. Kohler/Handley/Floyd Section 5.6. [Page 26] INTERNET-DRAFT Expires: August 2004 February 2004 Data 1, Data 2, and Data 3: 8 bits each The Data fields provide additional information about why the sender reset the DCCP connection. The meanings of these fields depend on the value of Reason. Error Text (application data area) If present, Error Text is a human-readable text string, preferably in English and encoded in Unicode UTF-8, that describes the error in more detail. For example, a DCCP-Reset with Reset Code 12, "Aggression Penalty", might contain Error Text such as "Aggression Penalty: Received 3 bad ECN Nonce Echoes, assuming misbehavior". The following Reset Codes are currently defined. The "Data" columns describe what the Data fields contain for a given Code. N/A means the Data field MUST be set to 0 by the sender of the DCCP-Reset and ignored by its receiver. Reset Section Code Name Data 1 Data 2 Data 3 Reference ----- ---- ------ ------ ------ --------- 0 Unspecified N/A N/A N/A 1 Closed N/A N/A N/A 8.3 2 Aborted N/A N/A N/A 8.1.1 3 No Connection N/A N/A N/A 8.3.1 4 Packet Error packet N/A N/A 8.3.1 type 5 Option Error option option data number (if any) 6 Mandatory Error option option data 5.9.2 number (if any) 7 Extended Seqnos N/A N/A N/A 7.6 8 Connection Refused N/A N/A N/A 8.1.3 9 Bad Service Code N/A N/A N/A 8.1.3 10 Too Busy N/A N/A N/A 8.1.3 11 Bad Init Cookie N/A N/A N/A 8.1.4 12 Aggression Penalty N/A N/A N/A 12.2 13 Move Refused N/A N/A N/A 14.4 13-127 Reserved 128-255 CCID-specific codes ... variable ... 10.4 5.7. DCCP-Move Header The DCCP-Move packet type is part of DCCP's support for multihoming and mobility, which is described further in Section 14. DCCP A sends a DCCP-Move packet to DCCP B after changing its address and/or port number. The DCCP-Move packet requests that DCCP B start sending Kohler/Handley/Floyd Section 5.7. [Page 27] INTERNET-DRAFT Expires: August 2004 February 2004 packets to a new address and port number, which are read off the packet's network header and generic DCCP header. The old address and port are defined through a Mobility ID, which provides some protection against hijacked connections. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=8 (DCCP-Move) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Mobility ID (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Mobility ID (bits 64-95) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Mobility ID (bits 32-63) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Mobility ID (low bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Mobility ID: 128 bits The value of the receiver's Mobility ID feature. This value uniquely identifies the current connection among the set of connections terminating at the receiver (meaning, the stationary endpoint); it MUST have been set in an earlier exchange. See Section 14.2. The receiver MUST ignore any "application data" in a DCCP-Move packet. 5.8. DCCP-Sync and DCCP-SyncAck Headers DCCP-Sync packets help DCCP endpoints recover synchronization after bursts of loss, or recover from half-open connections. Each valid DCCP-Sync received immediately elicits a DCCP-SyncAck. Kohler/Handley/Floyd Section 5.8. [Page 28] INTERNET-DRAFT Expires: August 2004 February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=9 (DCCP-Sync) or 10 (DCCP-SyncAck) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)when (. Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Acknowledgement Number on DCCP-Sync and DCCP-SyncAck packets need not equal the generating endpoint's greatest valid sequence number received (GSR). This differs from Acknowledgement Numbers on all other packet types. If a DCCP-Sync was generated in response to a packet with invalid sequence numbers, then the DCCP-Sync's Acknowledgement Number will equal the invalid packet's sequence number. The Acknowledgement Number on any DCCP-SyncAck packet MUST correspond to a received, valid DCCP-Sync's Sequence Number; in the presence of reordering, this might not equal GSR. The receiver MUST ignore any "application data" in a DCCP-Sync or DCCP-SyncAck packet. 5.9. Options All DCCP packets may contain options, which occupy space at the end of the DCCP header. Each option is a multiple of 8 bits in length. The combination of all options MUST add up to a multiple of 32 bits. Individual options are not padded to multiples of 32 bits, however; any option may begin on any byte boundary. All options are always included in the checksum. The first byte of an option is the option type. Options with types 0 through 31 are single-byte options. Other options are followed by a byte indicating the option's length. This length value includes the two bytes of option-type and option-length as well as any option-data bytes, and must therefore be greater than or equal to two. Options are processed sequentially, starting at the first option in the packet header. The following options are currently defined: Kohler/Handley/Floyd Section 5.9. [Page 29] INTERNET-DRAFT Expires: August 2004 February 2004 Option Section Type Length Meaning Reference ---- ------ ------- --------- 0 1 Padding 5.9.1 1 1 Mandatory 5.9.2 2 1 Slow Receiver 11.6 3-31 1 Reserved 32 variable Change L 6.1 33 variable Confirm L 6.2 34 variable Change R 6.1 35 variable Confirm R 6.2 36 variable Init Cookie 8.1.4 37 4-5 NDP Count 7.7 38 variable Ack Vector [Nonce 0] 11.4 39 variable Ack Vector [Nonce 1] 11.4 40 variable Data Dropped 11.7 41 6 Timestamp 13.1 42 6-10 Timestamp Echo 13.3 43 4-6 Elapsed Time 13.2 44 4 Data Checksum 9.3 45-127 variable Reserved 128-255 variable CCID-specific options 10.4 This section describes two generic options, Padding and Mandatory. Other options are described later. 5.9.1. Padding Option The Padding option, with type 0, is a single byte option used to pad between or after options. It either ensures the application data begins on a 32-bit boundary (as required), or ensures alignment of following options (not mandatory). +--------+ |00000000| +--------+ Type=0 5.9.2. Mandatory Option The Mandatory option, with type 1, is a single byte option that indicates that the immediately following option is mandatory. If the receiving DCCP does not understand that following option, it MUST reset the connection, generally using Reset Code 6, "Mandatory Failure". For instance, say DCCP A receives a packet with two options: a Mandatory option, and immediately following, another option O. Then DCCP A would reset the connection if it did not Kohler/Handley/Floyd Section 5.9.2. [Page 30] INTERNET-DRAFT Expires: August 2004 February 2004 understand O's type; if it understood O's type, but not O's data; if O's data was invalid for O's type; if O was a feature negotiation option, and DCCP A did not understand the enclosed feature number; if DCCP A understood O, but chose not to perform the action O implies; and so forth. Section 6.6.8 describes the behavior of Mandatory feature negotiation options in more detail. +--------+ |00000001| +--------+ Type=1 6. Feature Negotiation Four DCCP options, Change L, Confirm L, Change R, and Confirm R, implement in-band feature negotiation. Change options initiate a negotiation; Confirm options complete that negotiation. The "L" options are sent by the feature location, and the "R" options are sent by the feature remote. Change options are retransmitted to ensure reliability. All these options have the same format. The first byte of option data is the feature number, and the second and subsequent data bytes hold one or more feature values. The feature values are generally arranged in a linear preference list, where the first value is most preferred. +--------+--------+--------+--------+-------- | Type | Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Together, the feature number and the option type ("L" or "R") uniquely identify the feature to which an option applies. The exact format of the Value(s) area depends on the feature number. 6.1. Change Options Change L and Change R options initiate feature negotiation. Either endpoint can start a negotiation for any feature; if DCCP A wants to start a negotiation for feature F/A, it will send a Change L option, while to start a negotiation for F/B, it will send a Change R option. Change options are retransmitted until some response is received. Normal Change options contain at least one Value, and thus have length at least 4. Kohler/Handley/Floyd Section 6.1. [Page 31] INTERNET-DRAFT Expires: August 2004 February 2004 +--------+--------+--------+--------+-------- Change L: |00100000| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=32 +--------+--------+--------+--------+-------- Change R: |00100010| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=34 The endpoint may check a feature's current value without attempting to change it by sending an empty Change option, containing just the feature number. Such options have length 3. The endpoints must agree on feature values anyway, so these options are useful in practice only in special situations, such as when a middlebox introduced in the middle of a connection wants to check a feature value. 6.2. Confirm Options Confirm L and Confirm R options complete feature negotiation, and are sent in response to Change R and Change L options, respectively. Confirm options MUST NOT be generated except in response to Change options. Confirm options need not be retransmitted, since Change options are retransmitted as necessary. Normal Confirm options contain the selected Value, possibly followed by the sender's preference list. +--------+--------+--------+--------+-------- Confirm L: |00100001| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=33 +--------+--------+--------+--------+-------- Confirm R: |00100011| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=35 If an endpoint receives an invalid Change option -- with an unknown feature number, or an invalid value -- it will respond with an empty Confirm option containing no value. Such options have length 3. 6.3. Reconciliation Rules Reconciliation rules determine how the two sets of preferences for a given feature are resolved into a unique result. The reconciliation rule depends only on the feature number. Each reconciliation rule must have the property that the result is uniquely determined given Kohler/Handley/Floyd Section 6.3. [Page 32] INTERNET-DRAFT Expires: August 2004 February 2004 the contents of Change options sent by the two endpoints. All current DCCP features use one of two reconciliation rules, server-priority ("SP") and non-negotiable ("NN"). 6.3.1. Server-Priority The feature value is a fixed-length byte string (length determined by the feature number). Each Change option contains a preference list of values, with the most preferred value coming first. Each Confirm option contains the confirmed value, followed by the confirmer's preference list. Thus, the feature's current value will generally appear twice in Confirm options' data, once as the current value and once in the confirmer's preference list. Even responses to empty Change options contain the whole preference list. To reconcile the preference lists, select the first entry in the server's list that also occurs in the client's list. If there is no shared entry, the feature's value MUST NOT change, and the Confirm option will confirm the feature's previous value (unless the Change option was Mandatory; see Section 6.6.8). DCCP endpoints need not calculate their value preference lists before feature negotiation begins. Thus, a server might adjust its preference list based on the client's preference list, assuming the client opened the negotiation. Once a negotiation for a feature has begun, however, the preference lists MUST remain stable until the negotiation has closed. 6.3.2. Non-Negotiable The feature value is a byte string. Each option contains exactly one feature value. The feature location signals a value change by sending Change L options. The feature remote MUST accept any valid value, responding with a Confirm R option containing the new value, and it MUST send empty Confirm R options in response to invalid values. Non-negotiable features aren't really negotiated; they use feature negotiation as a mechanism for achieving reliability. Change R and Confirm L options MUST NOT be sent for non-negotiable features. 6.4. Feature Numbers This document defines the following feature numbers. Kohler/Handley/Floyd Section 6.4. [Page 33] INTERNET-DRAFT Expires: August 2004 February 2004 Rec'n Initial Section Number Meaning Rule Value Req'd Reference ------ ------- ----- ----- ----- --------- 0 Reserved 1 Congestion Control ID (CCID) SP 2 Y 10 2 ECN Capable SP 1 Y 12.1 3 Sequence Window NN 100 Y 7.5.4 4 Sequence Transition Capable SP 0 N 7.6.4 5 Mobility Capable SP 0 N 14.1 6 Mobility ID NN 0 N 14.2 7 Ack Ratio NN 2 N 11.3 8 Send Ack Vector SP 0 N 11.5 9 Send NDP Count SP 0 N 7.7.2 10 Check Data Checksum SP 0 N 9.3.1 11-127 Reserved 128-255 CCID-specific features ? ? ? 10.4 Rec'n Rule The reconciliation rule used for the feature. SP is server-priority and NN is non-negotiable. Initial Value The initial value for the feature. Every feature has a known initial value. Req'd This column is "Y" iff every DCCP implementation MUST understand the feature. If it is "N", then the feature behaves like an extension (see Section 16), and it is safe to respond to Change options for the feature with empty Confirm options. Of course, a CCID might require the feature; a DCCP that implements CCID 2 MUST support Ack Ratio and Send Ack Vector, for example. 6.5. Examples Here are three example feature negotiations for features located at the server, the first two for the Congestion Control ID feature, the last for the Ack Ratio: Client Server 1. Change R(CCID, 2 3 1) --> ("2 3 1" is client's value preference list) 2. <-- Confirm L(CCID, 3, 3 2 1) (3 is the negotiated value; "3 2 1" is server's pref list) * agreement that CCID/Server = 3 * Kohler/Handley/Floyd Section 6.5. [Page 34] INTERNET-DRAFT Expires: August 2004 February 2004 1. XXX <-- Change L(CCID, 3 2 1) 2. Retransmission: <-- Change L(CCID, 3 2 1) 3. Confirm R(CCID, 3, 2 3 1) --> * agreement that CCID/Server = 3 * 1. <-- Change L(Ack Ratio, 3) 2. Confirm R(Ack Ratio, 3) --> * agreement that Ack Ratio/Server = 3 * This example shows a simultaneous negotiation. Client Server 1a. Change R(CCID, 2 3 1) --> b. <-- Change L(CCID, 3 2 1) (both endpoints in CHANGING) 2a. <-- Confirm L(CCID, 3, 3 2 1) b. Confirm R(CCID, 3, 2 3 1) --> (both endpoints in STABLE) * agreement that CCID/Server = 3 * Example Change and Confirm options follow, with their byte encodings. Each option is sent by DCCP A. Change L(CCID, 2 3) = 32,5,1,2,3 I want to change CCID/A's value (feature number 1, a server- priority feature); my preferred values are 2 and 3, in that preference order. Change L(Sequence Window, 1024) = 32,6,3,0,4,0 Change Sequence Window/A's value (feature number 3, a non- negotiable feature) to the 3-byte string 0,4,0 (the value 1024). Empty Change L(CCID) = 32,3,1 Tell me CCID/A's value using a Confirm R option. Confirm L(CCID, 2, 2 3) = 33,6,1,2,2,3 I've changed CCID/A's value to 2; my preferred values are 2 and 3, in that preference order. Empty Confirm L(126) = 33,3,126 I don't implement feature number 126, or your proposed value for feature 126/A was invalid. Change R(CCID, 3 2) = 34,5,1,3,2 Please change CCID/B's value; my preferred values are 3 and 2, in that preference order. Kohler/Handley/Floyd Section 6.5. [Page 35] INTERNET-DRAFT Expires: August 2004 February 2004 Empty Change R(CCID) = 34,3,1 Tell me CCID/B's value using a Confirm L option. Confirm R(CCID, 2, 3 2) = 35,6,1,2,3,2 I've changed CCID/B's value to 2; my preferred values were 3 and 2, in that preference order. Confirm R(Sequence Window, 1024) = 35,6,3,0,4,0 I've changed Sequence Window/B's value to the 3-byte string 0,4,0 (the value 1024). Empty Confirm R(126) = 35,3,126 I don't implement feature number 126, or your proposed value for feature 126/B was invalid. 6.6. Option Exchange A few basic rules govern feature negotiation option exchange. 1. Every non-reordered Change option gets a Confirm option in response. 2. Change options are retransmitted until some response is received. 3. Preference lists don't change during a negotiation. 4. Feature negotiation options are processed in strictly increasing order by Sequence Number. The rest of this section describes the consequences of these rules in more detail. 6.6.1. Normal Exchange Change options are generated when a DCCP endpoint wants to change the value of some feature. Generally, this will happen at the beginning of a connection, although it may happen at any time. We say the endpoint "generates" or "sends" a Change L or Change R option; but, of course, the option must be attached to a packet. The endpoint may attach the option to a packet it would have generated anyway (such as a DCCP-Request), or it may create a new packet just to carry the options (often a DCCP-Sync). If it does create a new packet, it MUST NOT create more than one such packet per round-trip time (or 0.2 seconds, if no RTT is available). On receiving a Change L or Change R option, a DCCP endpoint examines the included preference list, reconciles that with its own Kohler/Handley/Floyd Section 6.6.1. [Page 36] INTERNET-DRAFT Expires: August 2004 February 2004 preference list, calculates the new value, and sends back a Confirm R or Confirm L option, respectively, informing its partner of the new value. The rule for reconciling the two preference lists is feature-specific; see Section 6.3. Every non-reordered Change option MUST result in a corresponding Confirm option. Any packet including a Confirm option MUST carry an Acknowledgement Number; thus, Confirm options are not allowed on DCCP-Request and DCCP-Data packets. Again, generated Confirm options may be attached to packets that would have been sent anyway (such as DCCP-Response or DCCP-SyncAck), or to new packets (usually DCCP-Ack). The Change-sending endpoint MUST wait to receive a corresponding Confirm option before changing its stored feature value. The Confirm-sending endpoint changes its stored feature value as soon as it sends the Confirm. DCCP endpoints effectively exist in one of two states, STABLE and CHANGING, relative to each feature. STABLE is the normal state, where the endpoint knows the feature's value and thinks the other endpoint agrees. An endpoint enters the CHANGING state when it first sends a Change for the feature, and returns to STABLE once it receives a corresponding Confirm. 6.6.2. Loss and Retransmission Packets containing Change and Confirm options might be lost or delayed by the network. Therefore, Change options are retransmitted to achieve reliability. A CHANGING endpoint retransmits a Change option once it realizes that it has not heard back from the other endpoint. Each retransmitted Change option MUST contain exactly the same payload as the original. The endpoint may piggyback its Change options on packets it would have sent anyway. If it generates new packets for feature negotiation, it MUST use an exponential-backoff timer. The timer's initial value is set to approximately one or two round-trip times (or 0.2-0.4 seconds, if no RTT is available), and it is pinned at roughly 32 RTTs. A CHANGING endpoint MUST continue retransmitting Change options until it gets some response. Its only recourse is to reset the connection, which it SHOULD NOT do until at least 12 transmissions have failed. Change options SHOULD NOT be transmitted more frequently than once per RTT, or the reordering protection below would prevent any Confirm option from being accepted (since no Confirm would acknowledge the most recently transmitted Change). Kohler/Handley/Floyd Section 6.6.2. [Page 37] INTERNET-DRAFT Expires: August 2004 February 2004 Confirm options are never retransmitted, but the Confirm-sending endpoint MUST generate a new Confirm option for every non-reordered Change it receives. 6.6.3. Reordering Reordering might cause packets containing Change and Confirm options to arrive in an unexpected order. Endpoints MUST be robust to reordering, by ignoring feature negotiation options that do not arrive in strictly-increasing order by Sequence Number. The most straightforward way to implement this requirement is for an endpoint to associate two sequence number variables with every feature F/X, as follows. F/X.GSR The Greatest Sequence Number Received from the other endpoint on a packet containing a Change or Confirm option for feature F/X. F/X.GSS The Greatest Sequence Number Sent by this endpoint on a packet containing a Change option for feature F/X. Then DCCP A will check options relating to feature F/A as follows: 1. Ignore any received Change R(F) option whose packet's Sequence Number is not greater than F/A.GSR. 2. Ignore any received Confirm R(F) option whose packet's Sequence Number is not greater than F/A.GSR, or whose packet could not have acknowledged F/A.GSS. Specifically, if the Acknowledgement Number is less than F/A.GSS, the endpoint MUST ignore the Confirm; and if the packet has an Ack Vector indicating that F/A.GSS was not received, the endpoint MAY ignore the Confirm. A similar procedure applies options relating to feature F/B, namely Change L(F) and Confirm L(F), except that F/B.GSR and F/B.GSS are checked. A less state-intensive way to implement this requirement would be to share the F.GSR and F.GSS variables among all features, rather than keeping one pair per feature. Then the feature negotiation options on any received packet would be treated as a unit (either all accepted or all rejected). Checking Confirm options is easier if the endpoint only sends Change options on packet types that will be acknowledged immediately, namely DCCP-Request, DCCP-Response, and DCCP-Sync. Then there is never any need to check Ack Vectors, although checking Ack Vectors Kohler/Handley/Floyd Section 6.6.3. [Page 38] INTERNET-DRAFT Expires: August 2004 February 2004 is NOT MANDATORY anyway. 6.6.4. Preference Changes Endpoints MUST NOT change their preference lists in the middle of a negotiation. This is because, if a preference list changed in the middle of a negotiation and the right packets were lost, the negotiation could terminate with the endpoints thinking the feature had different values. In particular, an endpoint MUST NOT change its preference list while in the CHANGING state; this ensures that every Change option sent during that negotiation will contain the same data. 6.6.5. Simultaneous Negotiation The two endpoints might simultaneously open negotiation for the same feature, after which an endpoint in the CHANGING state will receive a Change option for the same feature. Such received Change options can act as responses to the original Change options. The CHANGING endpoint MUST examine the received Change's preference list, reconcile that with its own preference list (as expressed in its generated Change options), and generate the corresponding Confirm option. It can then transition to the STABLE state. 6.6.6. Unknown Features An endpoint may receive a Change option referring to some feature number it does not understand. This is particularly likely to happen when an extended DCCP converses with a non-extended DCCP. The receiving endpoint MUST respond to such Change options with corresponding empty Confirm options (that is, Confirm options containing no data), which inform the CHANGING endpoint that the feature was not understood. However, if the Change option was preceded by a Mandatory option, the connection MUST be reset; see Section 6.6.8. On receiving an empty Confirm option for some feature, the CHANGING endpoint MUST transition back to the STABLE state, leaving the feature's value unchanged. Section 16 suggests that the default value for any extension feature should correspond to "extension not available". An endpoint will also send an empty Confirm option when it understood the Change's feature number, but considered the Change's value invalid or inappropriate for the feature. The next section describes this further. Kohler/Handley/Floyd Section 6.6.6. [Page 39] INTERNET-DRAFT Expires: August 2004 February 2004 Some features are required to be understood by all DCCPs (see Section 6.4); the CHANGING endpoint SHOULD reset the connection (with Reset Code 5, "Option Error") if it receives an empty Confirm option for such a feature. Since Confirm options are generated only in response to Change options, an endpoint should never receive a Confirm option referring to a feature number it does not understand. Endpoints MUST either reset the connection on receiving such options, or just ignore the options. 6.6.7. Invalid Options A DCCP endpoint might receive a Change or Confirm option that lists one or more values that it does not understand. Some, but not all, such options are invalid, depending on the relevant reconciliation rule (Section 6.3). For instance: o All features have length limitiations, and options with invalid lengths are invalid. For example, the Mobility ID feature takes 128-bit values, so valid "Confirm R(Mobility ID)" options have option length 19. o Some non-negotiable features have value limitations. The Ack Ratio feature takes two-byte, non-zero integer values, so a "Change L(Ack Ratio, 0)" option is never valid. Note that server- priority features do not have value limitations, since unknown values are handled as a matter of course. o Any Confirm option that selects the wrong value, based on the two preference lists and the relevant reconciliation rule, is invalid. An endpoint receiving an invalid Change option MUST respond with the corresponding empty Confirm option. An endpoint receiving an invalid Confirm option MUST reset the connection, with Reset Code 5, "Option Error". 6.6.8. Mandatory Feature Negotiation Change options may be preceded by Mandatory options (Section 5.9.2). Mandatory Change options are processed like normal Change options, except that various failure cases will cause the receiver to reset the connection with Reset Code 6, "Mandatory Failure", rather than send a Confirm option. Specifically, the connection MUST be reset if: o The Change option's feature number was not understood; Kohler/Handley/Floyd Section 6.6.8. [Page 40] INTERNET-DRAFT Expires: August 2004 February 2004 o The Change option's value was invalid, and the receiver would normally have sent an empty Confirm option in response; or o For server-priority features, there was no shared entry in the two endpoints' preference lists. There's no reason to mark Confirm options as Mandatory in this version of DCCP, since Confirm options are sent only in response to Change options and therefore can't mention potentially-invalid values or unexpected feature numbers. 6.6.9. Out-of-Band Agreement An endpoint MUST NOT unilaterally change the value of any DCCP feature. However, endpoints MAY cooperatively change DCCP feature values without using in-band feature negotiation options---by using a separate signalling channel, for example. 6.6.10. State Diagram This diagram illustrates feature-related state transitions, ignoring sequence number and option validity issues, for the endpoint that is the feature location. For a feature remote state transition diagram, switch the "L"s and "R"s. rcv Confirm R app/protocol evt : snd Change L : ignore +--------------------------------------------+ +----+ | | | v | rcv Change R v +------------+ rcv Confirm R : calc new value, +------------+ | | : accept value snd Confirm L | | | STABLE |<------------------------------------| CHANGING | | | rcv empty Confirm R | | +------------+ : revert to old value +------------+ | ^ | ^ +----+ +----+ rcv Change R timeout/rcv non-ack : calc new value, snd Confirm L : snd Change L This state diagram corresponds to the following procedure for reacting to received packets with feature negotiation options. The procedure refers to "P.seqno", "P.ackno", "P.optiontype", and "P.optionlen", which are properties of the packet; "F.GSR" and "F.GSS", which are the variables mentioned in Section 6.6.3; "F.state", which is the feature's state (STABLE or CHANGING); and "F.value", which is the feature's value. Kohler/Handley/Floyd Section 6.6.10. [Page 41] INTERNET-DRAFT Expires: August 2004 February 2004 If F.state == STABLE: If P.optiontype == Change R && P.seqno > F.GSR: Calculate new value Send Confirm L on next packet F.GSR := P.seqno Otherwise: Ignore option If F.state == CHANGING: If P.optiontype == Confirm R && P.ackno >= F.GSS && P potentially acknowledges F.GSS: If P.optionlen == 3: /* empty Confirm R option */ Retain old value Otherwise: Check new value F.value := new value F.state := STABLE Otherwise, if P.optiontype == Change R && P.seqno > F.GSR: Calculate new value Send Confirm L on next packet F.GSR := P.seqno Otherwise: Ignore option 7. Sequence Numbers DCCP uses 24- or 48-bit sequence numbers to arrange packets into sequence, detect losses and network duplicates, and protect against attackers, half-open connections, and the delivery of very old packets. Every packet carries a Sequence Number; most packet types carry an Acknowledgement Number as well. DCCP sequence numbers are per-packet. Thus, each endpoint increments the DCCP Sequence Number field by one (modulo 2^24 or 2^48) with every packet sent. Even DCCP-Ack and DCCP-Sync packets, and other packets that don't carry user data, increment the Sequence Number. Since DCCP is an unreliable protocol, there are no true retransmissions; but effective retransmissions, such as retransmissions of DCCP-Request packets, also increment the Sequence Number. This lets DCCP implementations detect network duplication, retransmissions, and acknowledgement loss, and is a significant departure from TCP practice. 7.1. Variables DCCP endpoints maintain a set of sequence number variables for each connection. Kohler/Handley/Floyd Section 7.1. [Page 42] INTERNET-DRAFT Expires: August 2004 February 2004 ISS The Initial Sequence Number Sent by this endpoint. This equals the Sequence Number of the first DCCP-Request or DCCP-Response sent. ISR The Initial Sequence Number Received from the other endpoint. This equals the Sequence Number of the first DCCP-Request or DCCP-Response received. GSS The Greatest Sequence Number Sent by this endpoint. ("Greatest" is of course measured in circular sequence space.) GSR The Greatest Sequence Number Received from the other endpoint on an acknowledgeable packet. (Section 7.4 defines "acknowledgeable" packets.) GAR The Greatest Acknowledgement Number Received from the other endpoint on an acknowledgeable packet. Some other variables are derived from these primitives. SWL and SWH (Sequence Number Window Low and High) The extremes of the validity window for received packets' Sequence Numbers. AWL and AWH (Acknowledgement Number Window Low and High) The extremes of the validity window for received packets' Acknowledgement Numbers. 7.2. Initial Sequence Numbers The endpoints' initial sequence numbers are set by the first DCCP- Request and DCCP-Response packets sent. Initial sequence numbers MUST be chosen to avoid two problems: o Delivery of old packets, where packets lingering in the network from an old connection are delivered to a new connection with the same addresses and port numbers. o Sequence number attacks, where an attacker can guess the sequence numbers that a future connection would use [M85]. DCCP implementations may use TCP's strategies for avoiding these problems [RFC 793] [RFC 1948]. To address the first problem, an implementation MUST ensure that the initial sequence number for a given 4-tuple doesn't overlap with recent sequence numbers on connections with the same 4-tuple ("recent" meaning sent within 2 maximum segment lifetimes). If the implementation has state for a recent connection with the same 4-tuple, it can simply pick a good initial sequence number; otherwise, it could tie initial sequence number selection to some clock, such as the 4-microsecond clock used by TCP [RFC 793]. To address the second problem, an implementation MUST provide each 4-tuple with an independent initial sequence number space; then an attacker can't learn anything about anyone else's initial sequence numbers. RFC 1948 achieves this by adding a cryptographic hash, of the 4-tuple and a secret, to any initial sequence number. For the secret, RFC 1948 recommends a combination of some truly-random data [RFC 1750], an administratively-installed passphrase, the endpoint's IP address, and the endpoint's boot time, but truly-random data is sufficient. Care should be taken when changing the secret; such a change alters all initial sequence number spaces, which might make an initial sequence number for some 4-tuple equal a recently sent sequence number for the same 4-tuple. To avoid this problem around such a change, the endpoint might remember dead connection state for each 4-tuple or stay quiet for 2 maximum segment lifetimes. 7.3. Quiet Time DCCP endpoints, like TCP endpoints, must take care before initiating connections when they boot. In particular, they MUST NOT send packets whose sequence numbers are close to the sequence numbers of packets lingering in the network from before the boot. The simplest way to enforce this rule is for DCCP endpoints to avoid sending any packets until one maximum segment lifetime (2 minutes) after boot. Other enforcement mechanisms include remembering recent sequence numbers across boots, or reserving the upper 8 or so bits of initial sequence numbers for a persistent boot counter that decrements by two each boot (this would require the use of extended sequence numbers). 7.4. Acknowledgement Numbers DCCP has no cumulative acknowledgement field; cumulative acknowledgements would be meaningless in an unreliable protocol. Therefore, the Acknowledgement Number field has a different meaning in DCCP than in TCP. A packet is classified as "acknowledgeable" if and only if its options were processed by the receiving DCCP. This means, for example, that all acknowledgeable packets have valid header checksums and sequence numbers. The Acknowledgement Number for most Kohler/Handley/Floyd Section 7.4. [Page 44] INTERNET-DRAFT Expires: August 2004 February 2004 packet types MUST equal GSR, the Greatest Sequence Number Received on an acknowledgeable packet. Note that "acknowledgeable" refers to option processing, not data processing. Even acknowledgeable packets may have their application data dropped, due to receive buffer overflow or corruption, for instance. Data Dropped options report these data losses when necessary, letting congestion control mechanisms distinguish between network losses and endpoint losses. This issue is discussed further in Sections 11.4 and 11.7. DCCP-Sync and DCCP-SyncAck packets are a special case to this rule. The Acknowledgement Number on a DCCP-Sync packet corresponds to a received packet, but not necessarily an acknowledgeable packet; in particular, it might correspond to an out-of-sync packet whose options were not processed. The Acknowledgement Number on a DCCP- SyncAck packet always corresponds to an acknowledgeable DCCP-Sync packet; if there was reordering, that Acknowledgement Number might be less than GSR. 7.5. Validity and Synchronization Any DCCP endpoint might receive packets that are not actually part of the current connection. For instance, the network might deliver an old packet, an attacker might attempt to hijack a connection, or the other endpoint might crash, causing a half-open connection. DCCP, like TCP, uses sequence number checks to detect these cases Packets whose Sequence and/or Acknowledgement Numbers are out of range are called sequence-invalid, and are not processed normally. Unlike TCP, DCCP requires a synchronization mechanism to recover from large bursts of loss. One endpoint might send so many packets during a burst of loss that when one of its packets finally got through, the other endpoint would label its Sequence Number as invalid. A handshake involving DCCP-Sync and DCCP-SyncAck packets recovers from this case. 7.5.1. Sequence-Validity Rules Sequence-validity depends on the received packet's type. This table shows the sequence and acknowledgement number checks applied to each packet; a packet is sequence-valid if it passes both tests, and sequence-invalid if it does not. Many of the checks refer to the sequence and acknowledgement number windows, [SWL, SWH] and [AWL, AWH], defined below in Section 7.5.3. Kohler/Handley/Floyd Section 7.5.1. [Page 45] INTERNET-DRAFT Expires: August 2004 February 2004 Acknowledgement Number Packet Type Sequence Number Check Check ----------- --------------------- ---------------------- DCCP-Request SWL <= seqno <= SWH (*) N/A DCCP-Response SWL <= seqno <= SWH (*) AWL <= ackno <= AWH DCCP-Data SWL <= seqno <= SWH N/A DCCP-Ack SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-DataAck SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-CloseReq SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-Close SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-Reset seqno == 0 or seqno > GSR GAR <= ackno <= AWH DCCP-Move seqno >= SWL ISS <= ackno <= AWH DCCP-Sync seqno >= SWL AWL <= ackno <= AWH DCCP-SyncAck seqno >= SWL AWL <= ackno <= AWH (*) Check not applied if connection is in LISTEN or REQUEST state. In general, packets are sequence-valid if their Sequence and Acknowledgement Numbers lie within the corresponding valid windows, [SWL, SWH] and [AWL, AWH]. The exceptions to this rule are as follows: o DCCP-Reset Sequence Numbers may be zero. This is because during the cleanup of a half-open connection, an endpoint might generate a DCCP-Reset in response to a DCCP-Request or DCCP-Data packet with no Acknowledgement Number; the resetting endpoint would then use zero for the Reset's Sequence Number, since it has no valid Sequence Number available. DCCP-Reset Acknowledgement Numbers, and non-zero Sequence Numbers, are checked more stringently than those on other packet types, however. This is because DCCP-Reset always ends a connection: no endpoint will send a non-Reset packet on a connection after it has sent a Reset. Thus, a Reset packet whose Sequence Number is less than GSR, or whose Acknowledgement Number is less than GAR, must be sequence-invalid. o DCCP-Move Sequence and Acknowledgement Numbers are not strongly checked because moves might likely happen after long loss periods, and the mandatory Mobility ID provides good protection against unexpected packets. o DCCP-Sync and DCCP-SyncAck Sequence Numbers are not strongly checked. These packet types exist specifically to get the endpoints back into sync after bursts of loss; checking their Sequence Numbers would eliminate their usefulness. Kohler/Handley/Floyd Section 7.5.1. [Page 46] INTERNET-DRAFT Expires: August 2004 February 2004 These lenient checks all allow continued operation after unusual events, such as endpoint crashes and large bursts of loss. There's no need for leniency when the endpoints are actively sending packets to one another. Therefore, a DCCP endpoint SHOULD implement the following, tighter constraints for active connections. An endpoint considers a connection active if it has received valid packets from the other endpoint within the last several round-trip times, or 1 second, if the RTT is not known. Acknowledgement Number Packet Type Sequence Number Check Check ----------- --------------------- ---------------------- DCCP-Reset GSR < seqno <= SWH GAR <= ackno <= AWH DCCP-Move SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-Sync SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-SyncAck SWL <= seqno <= SWH AWL <= ackno <= AWH Note that sequence-validity is only one of the validity checks applied to received packets. 7.5.2. Handling Sequence-Invalid Packets Sequence-invalid DCCP-Move, DCCP-Reset, DCCP-Sync, and DCCP-SyncAck packets MUST be ignored. When DCCP A receives any other sequence-invalid packet, it MUST reply with a DCCP-Sync packet. This packet MUST acknowledge the packet's Sequence Number (not GSR!). The DCCP-Sync MUST use a new Sequence Number, and thus will increase GSS; GSR will not change, however, since the received packet was sequence-invalid. DCCP A MUST NOT otherwise process sequence-invalid packets. For instance, it MUST NOT process their options. When the DCCP B endpoint receives the (sequence-valid) DCCP-Sync, it MUST update its GSR variable and reply with a DCCP-SyncAck packet acknowledging the DCCP-Sync (not necessarily GSR!). Upon receiving this DCCP-SyncAck, which will be sequence-valid since it acknowledges the DCCP-Sync, DCCP A will update its GSR variable, and the endpoints will be back in sync. Alternatively, if the connection was half-open (DCCP B is in CLOSED or REQUEST state), DCCP B will send a Reset. A DCCP endpoint MAY temporarily preserve sequence-invalid packets in case they become valid later. This can reduce the impact of bursts of loss by delivering more packets to the application. In particular, an endpoint MAY preserve a sequence-invalid packet for up to 2 round-trip times (or 1 second, if the RTT is unknown); if, within that time, the relevant sequence windows change so that the Kohler/Handley/Floyd Section 7.5.2. [Page 47] INTERNET-DRAFT Expires: August 2004 February 2004 packet becomes sequence-valid, the endpoint MAY process the packet again. To protect itself against denial-of-service attacks (where an attacker sends many sequence-invalid packets, trying to force the receiver to send many DCCP-Syncs), a DCCP implementation MAY rate- limit the DCCP-Syncs sent in response to sequence-invalid packets. 7.5.3. Sequence and Acknowledgement Number Windows Each DCCP endpoint defines sequence validity windows that are subsets of the Sequence and Acknowledgement Number spaces. These windows correspond to packets the endpoint expects to receive in the next few round-trip times. The Sequence and Acknowledgement Number windows always contain GSR and GSS, respectively; the window widths are controlled by Sequence Window features. The Sequence Number validity window for packets from DCCP B is [SWL, SWH]. This window always contains GSR, the Greatest Sequence Number Received on a sequence-valid packet from DCCP B. It is W packets wide, where W is the value of the Sequence Window/B feature. One- fourth of the sequence window, rounded down, is placed at and before GSR, with three-fourths after GSR. (This asymmetric placement assumes that bursts of loss are more common in the network than significant reordering.) invalid | valid Sequence Numbers | invalid <---------*|*===========*=======================*|*---------> GSR -|GSR + 1 - GSR GSR +|GSR + 1 + floor(W/4)|floor(W/4) ceil(3W/4)|ceil(3W/4) = SWL = SWH The Acknowledgement Number validity window for packets from DCCP B is [AWL, AWH]. The high end of the window, AWH, always equals GSS, the Greatest Sequence Number Sent by DCCP A; the window is W' packets wide, where W' is the value of the Sequence Window/A feature. invalid | valid Acknowledgement Numbers | invalid <---------*|*===================================*|*---------> GSS - W'|GSS + 1 - W' GSS|GSS + 1 = AWL = AWH SWL and AWL are initially adjusted so that they don't go below the initial Sequence Numbers received and sent, respectively: SWL := max(GSR + 1 - floor(W/4), ISR), AWL := max(GSS - W' + 1, ISS). Of course, these adjustments MUST NOT be applied after the relevant Kohler/Handley/Floyd Section 7.5.3. [Page 48] INTERNET-DRAFT Expires: August 2004 February 2004 sequence numbers wrap. 7.5.4. Sequence Window Feature The Sequence Window/A feature determines the width of the Sequence Number validity window used by DCCP B, and the width of the Acknowledgement Number validity window used by DCCP A. DCCP A sends a "Change L(Sequence Window, W)" option to notify DCCP B that the Sequence Window/A value is W. Sequence Window has feature number 3, and is non-negotiable. It takes 3- or 6-byte integer values, like DCCP sequence numbers. Change and Confirm options for Sequence Window are therefore either 6 or 9 bytes long. New connections start with Sequence Window 100 for both endpoints. A proper Sequence Window/A value should reflect how many packets DCCP A expects to be in flight. Only DCCP A can anticipate this number. Too-small values increase the risk of the endpoints getting out sync after bursts of loss; too-large values increase the risk of connection hijacking. (The next section quantifies this risk.) One good guideline is for each endpoint to set Sequence Window to a small multiple of the maximum number of packets it expects to send in a round-trip time. This value may not be available at connection initiation, when the round-trip time is unknown, but the endpoint can always send updates as the connection progresses. 7.5.5. Sequence Number Attacks Sequence and Acknowledgement Numbers form DCCP's main line of defense against attackers. An attacker that cannot guess sequence numbers cannot easily manipulate or hijack a DCCP connection, and requirements like careful initial sequence number choice eliminate the most serious attacks. An attacker might still send many packets with randomly chosen Sequence and Acknowledgement Numbers, however. If one of those probes ends up sequence-valid, it may shut down the connection or otherwise cause problems. The easiest such attacks to execute are: o Send DCCP-Sync packets with random Sequence and Acknowledgement Numbers. If one of these packets hits the valid acknowledgement number window, the receiver will shift its sequence number window accordingly, getting out of sync with the correct endpoint---perhaps permanently. o Send DCCP-Reset packets with Sequence Number zero and random Acknowledgement Numbers. If one of these packets hits the valid Kohler/Handley/Floyd Section 7.5.5. [Page 49] INTERNET-DRAFT Expires: August 2004 February 2004 acknowledgement number window, the connection will be shut down. o Send DCCP-Data packets with random Sequence Numbers. If one of these packets hits the valid sequence number window, the attack packet's application data may be inserted into the data stream. The attacker has to guess both Source and Destination Ports for any of these attacks to succeed. Additionally, the connection would have to be inactive for the DCCP-Sync and DCCP-Reset packets to succeed, assuming the victim implemented the more stringent checks for active connections recommended in Section 7.5.1. To quantify the probability of success, let N be the number of attack packets the attacker is willing to send, W be the relevant sequence window width, and L be the length of sequence numbers (24 or 48). The attacker's best strategy is to space the attack packets evenly over sequence space. Then one of these attacks will succeed with probability P = WN/2^L. For N = 1000, W = 100, and L = 24, this probability is about 0.006. (For reference, the easiest TCP attack---sending a SYN with a random sequence number, which will cause a connection reset if it falls within the window---will succeed with probability 0.002 for N = 1000, W = 8760 [a common default], and L = 32.) Connections with sequence windows much larger than 100 SHOULD use extended sequence numbers to reduce the probability of attack success. 7.5.6. Examples In the following example, DCCP A and DCCP B recover from a large burst of loss that runs DCCP A's sequence numbers out of DCCP B's appropriate sequence number window. Recovery from Burst of Loss DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) --> DCCP-Data(seq 2) XXX ... --> DCCP-Data(seq 100) XXX --> DCCP-Data(seq 101) --> ??? seqno out of range; send Sync OK <-- DCCP-Sync(seq 11, ack 101) <-- (GSS=11,GSR=1) --> DCCP-SyncAck(seq 102, ack 11) --> OK (GSS=102,GSR=11) (GSS=11,GSR=102) In the next example, a DCCP connection recovers from a simple attack. The attacker cannot guess sequence numbers. (DCCP is not Kohler/Handley/Floyd Section 7.5.6. [Page 50] INTERNET-DRAFT Expires: August 2004 February 2004 robust to attackers who can guess sequence numbers.) Recovery from Attack DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) *ATTACKER* --> DCCP-Data(seq 10^6) --> ??? seqno out of range; send Sync ??? <-- DCCP-Sync(seq 11, ack 10^6) <-- ackno out of range; ignore (GSS=1,GSR=10) (GSS=11,GSR=1) The final example demonstrates recovery from a half-open connection. Recovery from a Half-Open Connection DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) (Crash) CLOSED OPEN REQUEST --> DCCP-Request(seq 400) --> ??? !! <-- DCCP-Sync(seq 11, ack 400) <-- OPEN REQUEST --> DCCP-Reset(seq 401, ack 11) --> (Abort) REQUEST CLOSED REQUEST --> DCCP-Request(seq 402) --> ... 7.6. Extended Sequence Numbers Extended 48-bit sequence numbers increase the rate DCCP connections can achieve without wrapping sequence numbers, and provide additional protection against the sequence number attacks described above. Very-high-rate DCCP connections, and connections with large sequence windows, SHOULD therefore use extended sequence numbers rather than the default 24-bit sequence numbers. 7.6.1. When to Use Extended Sequence Numbers The sequence-validity mechanism protects against the network delivering old data, but it assumes that the network does not deliver extremely old data. In particular, it assumes that the network must have dropped any packet by the time the connection wraps around and uses its sequence number again. We can easily calculate the maximum connection rate that can be safely achieved given this constraint. Let MSL equal the maximum segment lifetime, P equal the average DCCP packet size in bits, and L equal the length of sequence numbers (24 or 48 bits). Then the maximum safe rate, in bits per second, is R = P*(2^L)/2MSL. Kohler/Handley/Floyd Section 7.6.1. [Page 51] INTERNET-DRAFT Expires: August 2004 February 2004 For the default MSL of 2 minutes, 1500-byte DCCP packets, and 24-bit sequence numbers, the safe rate is therefore approximately 800 Mb/s. Of course, 2 minutes is a very large MSL for any networks that could sustain that rate with such small packets. Nevertheless, 48-bit sequence numbers allow much higher rates, up to 14 petabits a second for 1500-byte packets and the default MSL. The probability of sequence number attack success P = WN/2^L, discussed in Section 7.5.5, may also be relevant when deciding whether to use extended sequence numbers. A fast connection will generally have a relatively high W (sequence window size), increasing the attack success probability for fixed N (number of attack packets); if the probability gets uncomfortably high with L = 24, the connection should use 48-bit sequence numbers instead. 7.6.2. Header Processing Extended sequence numbers are activated when the header's X bit is set to one (see Section 5.1). This extends the Sequence Number and Acknowledgement Number fields by an additional 24 bits, for a total of 48 bits. The 48-bit numbers are stored in network order, with most significant bit first. All packet types except for DCCP-Data and DCCP-Request will follow this generic header with an extended 48-bit Acknowledgement Number. Once an endpoint has transitioned to 48-bit sequence numbers (X=1), it MUST send all succeeding packets with 48-bit sequence numbers. Furthermore, once an endpoint has received a sequence-valid packet with 48-bit sequence numbers, it MUST either send all succeeding packets with 48-bit sequence numbers, or reset the connection with Reset Code 7, "Extended Sequence Numbers". (But note that an endpoint may send extended DCCP-Sync packets before transitioning to extended sequence numbers.) Clients SHOULD decide whether to use extended sequence numbers before sending their DCCP-Requests. However, the Transition bit (T) and Sequence Transition Capable feature support transitioning to extended sequence numbers during an active connection, in case this proves necessary; see below. A client that sends an extended DCCP- Request might receive a DCCP-Reset in response with Reset Code 7, "Extended Sequence Numbers"; the client SHOULD respond by sending another Request using 24-bit sequence numbers. Extended sequence numbers are treated simply as longer sequence numbers. For instance, the sequence-validity mechanisms work the same way whether or not sequence numbers are extended. Care is required when comparing a 24-bit sequence number with an 48-bit sequence number, however; see the next section. Kohler/Handley/Floyd Section 7.6.2. [Page 52] INTERNET-DRAFT Expires: August 2004 February 2004 7.6.3. Transitioning to Extended Sequence Numbers The Transition bit (T) following the extended Sequence Number field makes it possible to transition to 48-bit sequence numbers in the middle of a connection. T is set to one only during such a transition. When DCCP A switches to 48-bit sequence numbers, it MUST set the T bit to one on all of its packets for some period. This period SHOULD last on the order of a few round trip times, or until DCCP A receives an acknowledgement from DCCP B proving that one of its 48-bit-sequence-number packets has been received, whichever comes later. Each DCCP MUST choose its first 48-bit sequence number to have its lower 24 bits equal the 24-bit sequence number it expected to send (GSS+1). The upper 24 bits may be chosen arbitrarily. This applies to Acknowledgement Numbers as well as Sequence Numbers; if DCCP A sends an extended packet containing an Acknowledgement Number before DCCP B sends it a 48-bit Sequence Number, DCCP A can choose any value for the upper 24 bits of the Acknowledgement Number, but the lower 24 bits MUST equal the expected 24-bit Acknowledgement Number (GSR). Furthermore, DCCP A MUST leave GSR as a 24-bit number until receiving an extended packet from DCCP B. Switching to 48-bit sequence numbers in the middle of a connection complicates sequence number comparison. Endpoints must compare 48-bit sequence numbers with 24-bit sequence numbers, and compare 48-bit sequence numbers that might have different, arbitrary values in the upper 24 bits, while remaining robust to reordering and to old or malicious packets. The following procedure describes how sequence numbers should be compared during and immediately after a transition. Let P be the packet sequence number received from DCCP B, and E be the sequence number DCCP A expects. During sequence-validity computations, for example, P might be the packet's Acknowledgement Number and E might be AWL, the left edge of the appropriate acknowledgement number window. Then DCCP A should perform the comparison as follows. o If P and E are both 24 bits, compare them modulo 2^24. o If P and E are both 48 bits, you generally compare them modulo 2^48, except that during a transition, the two values might have arbitrary values in the upper 24 bits. - If the packet's Transition bit is set, and the last packet sent by DCCP A had its Transition bit set, then compare P and E modulo 2^24. Kohler/Handley/Floyd Section 7.6.3. [Page 53] INTERNET-DRAFT Expires: August 2004 February 2004 - Otherwise, compare them modulo 2^48. o If P is 48 bits but E is 24, the remote DCCP may want to transition to extended sequence numbers. - If the packet's Transition bit is set, compare P with E modulo 2^24. If the packet proves sequence-valid, then it is OK; transition to extended sequence numbers, and set E according to the full 48 bits of P. - Otherwise, the packet is sequence-invalid. Either way, if the packet proves to be sequence-invalid, send an extended DCCP-Sync if required (with T set to one), but do not yet transition to extended sequence numbers. o If P is 24 bits but E is 48, there may have been benign packet reordering. The correct action depends on whether the last sequence-valid packet received from DCCP B had the Transition bit set. - If Transition was set, extend P to a 48-bit value P'. First, let EH equal the upper 24 bits of E, and EL equal the lower 24 bits of E. Then: If EL > P, set P' = (EH << 24) | P. Otherwise, set P' = (((EH - 1) mod 2^24) << 24) | P. The "EL > P" test uses arithmetic comparison, NOT circular comparison. Compare P' with E modulo 2^48. - Otherwise, the packet is sequence-invalid. Either way, if the packet proves to be sequence-invalid, send an extended DCCP-Sync if required, with T set to one. DCCP implementations can, of course, avoid most of this complexity by disallowing transitions to extended sequence numbers (and by resetting the connection when the other endpoint attempts such a transition). Connections that use 48-bit sequence numbers throughout, starting with the DCCP-Request, MUST have T set to zero on all their packets. 7.6.4. Sequence Transition Capable Feature The Sequence Transition Capable feature expresses whether DCCP endpoints are capable of transitioning to extended sequence numbers in the course of an active connection. DCCP A