Internet Engineering Task Force Eddie Kohler INTERNET-DRAFT UCLA draft-ietf-dccp-spec-07.txt Mark Handley Expires: January 2005 UCL Sally Floyd ICIR 18 July 2004 Datagram Congestion Control Protocol (DCCP) Status of this Memo This document is an Internet-Draft. By submitting this Internet-Draft, we certify that any applicable patent or other IPR claims of which we are aware have been disclosed, or will be disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668 (BCP 79). By submitting this Internet-Draft, we accept the provisions of Section 3 of RFC 3667 (BCP 78). Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than a "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Kohler/Handley/Floyd [Page 1] INTERNET-DRAFT Expires: January 2005 July 2004 Abstract The Datagram Congestion Control Protocol (DCCP) is a transport protocol that implements bidirectional, unicast connections of congestion-controlled, unreliable datagrams. It should be suitable for use by applications such as streaming media, Internet telephony, and on-line games. Kohler/Handley/Floyd [Page 2] INTERNET-DRAFT Expires: January 2005 July 2004 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: Changes since draft-ietf-dccp-spec-06.txt: * Change extended sequence numbers. Now 48-bit sequence numbers are MANDATORY, and all packet types except Data, Ack, and DataAck always use 48-bit sequence numbers. This change improves DCCP's robustness against blind attacks. * Removed empty Change options. * Allow preference list changes during feature negotiations (this seems easier to implement than the alternative). This required a new feature negotiation state, UNSTABLE. * Add Minimum Checksum Coverage feature. * Add Reset Congestion State option. * Simplify the implementation of CCID-specific option processing: no need to check whether the CCID feature is being negotiated. * Many more minor changes. Changes since draft-ietf-dccp-spec-05.txt: * Organization overhaul. * Add pseudocode for event processing. * Remove # NDP; replace with Ack Count. * Remove Identification, Challenge, ID Regime, and Connection Nonce. * Data Checksum (formerly Payload Checksum) uses a 32-bit CRC. * Switch location of non-negotiable features to clarify presentation; now the feature location controls its value. * Rename "value type" to "reconciliation rule". * Rename "Reset Reason" to "Reset Code". * Mobility ID becomes 128 bits long. * Add probabilities to Mobility ID discussion. * Add SyncAck. Kohler/Handley/Floyd [Page 3] INTERNET-DRAFT Expires: January 2005 July 2004 Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Design Rationale. . . . . . . . . . . . . . . . . . . . . . . 8 3. Conventions and Terminology . . . . . . . . . . . . . . . . . 9 3.1. Numbers and Fields . . . . . . . . . . . . . . . . . . . 9 3.2. Parts of a Connection. . . . . . . . . . . . . . . . . . 9 3.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4. Round-Trip Times . . . . . . . . . . . . . . . . . . . . 10 3.5. Security Limitation. . . . . . . . . . . . . . . . . . . 11 3.6. Robustness Principle . . . . . . . . . . . . . . . . . . 11 4. Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1. Packet Types . . . . . . . . . . . . . . . . . . . . . . 11 4.2. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 13 4.3. States . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4. Congestion Control . . . . . . . . . . . . . . . . . . . 15 4.5. Features . . . . . . . . . . . . . . . . . . . . . . . . 16 4.6. Differences From TCP . . . . . . . . . . . . . . . . . . 17 4.7. Example Connection . . . . . . . . . . . . . . . . . . . 18 5. Header Formats. . . . . . . . . . . . . . . . . . . . . . . . 19 5.1. Generic Header . . . . . . . . . . . . . . . . . . . . . 20 5.2. DCCP-Request Header. . . . . . . . . . . . . . . . . . . 23 5.3. DCCP-Response Header . . . . . . . . . . . . . . . . . . 23 5.4. DCCP-Data, DCCP-Ack, and DCCP-DataAck Head- ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.5. DCCP-CloseReq and DCCP-Close Headers . . . . . . . . . . 26 5.6. DCCP-Reset Header. . . . . . . . . . . . . . . . . . . . 26 5.7. DCCP-Sync and DCCP-SyncAck Headers . . . . . . . . . . . 29 5.8. Options. . . . . . . . . . . . . . . . . . . . . . . . . 30 5.8.1. Padding Option. . . . . . . . . . . . . . . . . . . 31 5.8.2. Mandatory Option. . . . . . . . . . . . . . . . . . 31 6. Feature Negotiation . . . . . . . . . . . . . . . . . . . . . 32 6.1. Change Options . . . . . . . . . . . . . . . . . . . . . 33 6.2. Confirm Options. . . . . . . . . . . . . . . . . . . . . 33 6.3. Reconciliation Rules . . . . . . . . . . . . . . . . . . 34 6.3.1. Server-Priority . . . . . . . . . . . . . . . . . . 34 6.3.2. Non-Negotiable. . . . . . . . . . . . . . . . . . . 34 6.4. Feature Numbers. . . . . . . . . . . . . . . . . . . . . 35 6.5. Examples . . . . . . . . . . . . . . . . . . . . . . . . 35 6.6. Option Exchange. . . . . . . . . . . . . . . . . . . . . 37 6.6.1. Normal Exchange . . . . . . . . . . . . . . . . . . 37 6.6.2. Processing Received Options . . . . . . . . . . . . 38 6.6.3. Loss and Retransmission . . . . . . . . . . . . . . 40 6.6.4. Reordering. . . . . . . . . . . . . . . . . . . . . 41 6.6.5. Preference Changes. . . . . . . . . . . . . . . . . 42 6.6.6. Simultaneous Negotiation. . . . . . . . . . . . . . 42 6.6.7. Unknown Features. . . . . . . . . . . . . . . . . . 42 6.6.8. Invalid Options . . . . . . . . . . . . . . . . . . 43 Kohler/Handley/Floyd [Page 4] INTERNET-DRAFT Expires: January 2005 July 2004 6.6.9. Mandatory Feature Negotiation . . . . . . . . . . . 43 6.6.10. Out-of-Band Agreement. . . . . . . . . . . . . . . 44 7. Sequence Numbers. . . . . . . . . . . . . . . . . . . . . . . 44 7.1. Variables. . . . . . . . . . . . . . . . . . . . . . . . 44 7.2. Initial Sequence Numbers . . . . . . . . . . . . . . . . 45 7.3. Quiet Time . . . . . . . . . . . . . . . . . . . . . . . 46 7.4. Acknowledgement Numbers. . . . . . . . . . . . . . . . . 46 7.5. Validity and Synchronization . . . . . . . . . . . . . . 47 7.5.1. Sequence-Validity Rules . . . . . . . . . . . . . . 47 7.5.2. Handling Sequence-Invalid Packets . . . . . . . . . 49 7.5.3. Sequence and Acknowledgement Number Windows. . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.5.4. Sequence Window Feature . . . . . . . . . . . . . . 51 7.5.5. Sequence Number Attacks . . . . . . . . . . . . . . 52 7.5.6. Examples. . . . . . . . . . . . . . . . . . . . . . 53 7.6. Short Sequence Numbers . . . . . . . . . . . . . . . . . 54 7.6.1. Allow Short Sequence Numbers Feature. . . . . . . . 54 7.6.2. When to Avoid Short Sequence Numbers. . . . . . . . 55 7.7. NDP Count and Detecting Application Loss . . . . . . . . 55 7.7.1. Usage Notes . . . . . . . . . . . . . . . . . . . . 56 7.7.2. Send NDP Count Feature. . . . . . . . . . . . . . . 57 8. Event Processing. . . . . . . . . . . . . . . . . . . . . . . 57 8.1. Connection Establishment . . . . . . . . . . . . . . . . 57 8.1.1. Client Request. . . . . . . . . . . . . . . . . . . 57 8.1.2. Service Codes . . . . . . . . . . . . . . . . . . . 58 8.1.3. Server Response . . . . . . . . . . . . . . . . . . 59 8.1.4. Init Cookie Option. . . . . . . . . . . . . . . . . 60 8.1.5. Handshake Completion. . . . . . . . . . . . . . . . 61 8.2. Data Transfer. . . . . . . . . . . . . . . . . . . . . . 62 8.3. Termination. . . . . . . . . . . . . . . . . . . . . . . 62 8.3.1. Abnormal Termination. . . . . . . . . . . . . . . . 64 8.4. DCCP State Diagram . . . . . . . . . . . . . . . . . . . 64 8.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 65 9. Checksums . . . . . . . . . . . . . . . . . . . . . . . . . . 69 9.1. Header Checksum Field. . . . . . . . . . . . . . . . . . 69 9.2. Header Checksum Coverage Field . . . . . . . . . . . . . 70 9.2.1. Minimum Checksum Coverage Feature . . . . . . . . . 71 9.3. Data Checksum Option . . . . . . . . . . . . . . . . . . 71 9.3.1. Check Data Checksum Feature . . . . . . . . . . . . 72 9.3.2. Usage Notes . . . . . . . . . . . . . . . . . . . . 73 10. Congestion Control IDs . . . . . . . . . . . . . . . . . . . 73 10.1. Unspecified Sender-Based Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 10.2. TCP-like Congestion Control . . . . . . . . . . . . . . 75 10.3. TFRC Congestion Control . . . . . . . . . . . . . . . . 76 10.4. CCID-Specific Options, Features, and Reset Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 78 Kohler/Handley/Floyd [Page 5] INTERNET-DRAFT Expires: January 2005 July 2004 11.1. Acks of Acks and Unidirectional Connections . . . . . . . . . . . . . . . . . . . . . . . . . 78 11.2. Ack Piggybacking. . . . . . . . . . . . . . . . . . . . 80 11.3. Ack Ratio Feature . . . . . . . . . . . . . . . . . . . 80 11.4. Ack Vector Options. . . . . . . . . . . . . . . . . . . 82 11.4.1. Ack Vector Consistency . . . . . . . . . . . . . . 84 11.4.2. Ack Vector Coverage. . . . . . . . . . . . . . . . 85 11.5. Send Ack Vector Feature . . . . . . . . . . . . . . . . 86 11.6. Slow Receiver Option. . . . . . . . . . . . . . . . . . 86 11.7. Reset Congestion State Option . . . . . . . . . . . . . 87 11.8. Data Dropped Option . . . . . . . . . . . . . . . . . . 87 11.8.1. Data Dropped and Normal Congestion Response . . . . . . . . . . . . . . . . . . . . . . . . . 90 11.8.2. Particular Drop Codes. . . . . . . . . . . . . . . 90 12. Explicit Congestion Notification . . . . . . . . . . . . . . 91 12.1. ECN Capable Feature . . . . . . . . . . . . . . . . . . 92 12.2. ECN Nonces. . . . . . . . . . . . . . . . . . . . . . . 92 12.3. Other Aggression Penalties. . . . . . . . . . . . . . . 93 13. Timing Options . . . . . . . . . . . . . . . . . . . . . . . 94 13.1. Timestamp Option. . . . . . . . . . . . . . . . . . . . 94 13.2. Elapsed Time Option . . . . . . . . . . . . . . . . . . 94 13.3. Timestamp Echo Option . . . . . . . . . . . . . . . . . 95 14. Maximum Packet Size. . . . . . . . . . . . . . . . . . . . . 96 15. Forward Compatibility. . . . . . . . . . . . . . . . . . . . 99 16. Middlebox Considerations . . . . . . . . . . . . . . . . . . 99 17. Relations to Other Specifications. . . . . . . . . . . . . . 101 17.1. DCCP and RTP. . . . . . . . . . . . . . . . . . . . . . 101 17.2. Multiplexing Issues . . . . . . . . . . . . . . . . . . 102 18. Security Considerations. . . . . . . . . . . . . . . . . . . 102 18.1. Security Considerations for Partial Check- sums. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 19. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 104 20. Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A. Appendix: Ack Vector Implementation Notes . . . . . . . . . . 105 A.1. Packet Arrival . . . . . . . . . . . . . . . . . . . . . 107 A.1.1. New Packets . . . . . . . . . . . . . . . . . . . . 107 A.1.2. Old Packets . . . . . . . . . . . . . . . . . . . . 108 A.2. Sending Acknowledgements . . . . . . . . . . . . . . . . 109 A.3. Clearing State . . . . . . . . . . . . . . . . . . . . . 110 A.4. Processing Acknowledgements. . . . . . . . . . . . . . . 111 B. Appendix: Design Motivation . . . . . . . . . . . . . . . . . 112 B.1. CsCov and Partial Checksumming . . . . . . . . . . . . . 112 Normative References . . . . . . . . . . . . . . . . . . . . . . 113 Informative References . . . . . . . . . . . . . . . . . . . . . 114 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 116 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 116 Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 116 Kohler/Handley/Floyd [Page 6] INTERNET-DRAFT Expires: January 2005 July 2004 1. Introduction The Datagram Congestion Control Protocol (DCCP) is a transport protocol that implements bidirectional, unicast connections of congestion-controlled, unreliable datagrams. Specifically, DCCP provides: o Unreliable flows of datagrams, with acknowledgements. o Reliable handshakes for connection setup and teardown. o Reliable negotiation of options, including negotiation of a suitable congestion control mechanism. o Mechanisms allowing servers to avoid holding state for unacknowledged connection attempts and already-finished connections. o Congestion control incorporating Explicit Congestion Notification (ECN) and the ECN Nonce, as per [RFC 3168] and [RFC 3540]. o Acknowledgement mechanisms communicating packet loss and ECN information. Acks are transmitted as reliably as the relevant congestion control mechanism requires, possibly completely reliably. o Optional mechanisms that tell the sending application, with high reliability, which data packets reached the receiver, and whether those packets were ECN marked, corrupted, or dropped in the receive buffer. o Path Maximum Transfer Unit (PMTU) discovery, as per [RFC 1191]. DCCP is intended for applications, such as streaming media and Internet telephony, where the costs of long delays can exceed the costs of loss and out-of-order delivery. TCP is not well-suited for these applications, since its reliable in-order delivery, combined with congestion control, can cause arbitrarily long delays. UDP avoids long delays, but UDP applications must implement congestion control on their own. DCCP provides built-in congestion control, including ECN support, for unreliable datagram flows. DCCP avoids the arbitrary delays associated with TCP. It also implements reliable connection setup, teardown, and feature negotiation, and provides a choice of congestion control mechanisms. Kohler/Handley/Floyd Section 1. [Page 7] INTERNET-DRAFT Expires: January 2005 July 2004 2. Design Rationale Most streaming UDP applications should have little reason not to switch to DCCP, once it is deployed. To facilitate this, DCCP was designed to have as little overhead as possible, both in terms of the packet header size and in terms of the state and CPU overhead required at end hosts. Only the minimal necessary functionality was included in DCCP, leaving other functionality, such as forward error correction (FEC), semi-reliability, and multiple streams, to be layered on top of DCCP as desired. This desire for minimal overhead is also one of the reasons to avoid proposing an unreliable variant of the Stream Control Transmission Protocol (SCTP, [RFC 2960]). Different forms of conformant congestion control are appropriate for different applications. For example, on-line games might want to make quick use of any available bandwidth, while streaming media might trade off this responsiveness for a steadier, less bursty rate. (Sudden rate changes can cause unacceptable UI glitches, such as audible pauses or clicks in the playout stream.) DCCP thus allows applications to choose between several forms of congestion control. One choice, TCP-like Congestion Control, halves the congestion window in response to a packet drop or mark, as in TCP. Applications using this congestion control mechanism will respond quickly to changes in available bandwidth, but must tolerate the abrupt changes in congestion window typical of TCP. A second alternative, TCP-Friendly Rate Control (TFRC, [RFC 3448]), a form of equation-based congestion control, minimizes abrupt changes in the sending rate while maintaining longer-term fairness with TCP. DCCP also lets unreliable traffic safely use ECN. A UDP kernel API might not allow applications to set UDP packets as ECN-capable, since the API could not guarantee the application would properly detect or respond to congestion. DCCP kernel APIs will have no such issues, since DCCP implements congestion control itself. We chose not to require the use of the Congestion Manager [RFC 3124], which allows multiple concurrent streams between the same sender and receiver to share congestion control. The current Congestion Manager can only be used by applications that have their own end-to-end feedback about packet losses, but this is not the case for many of the applications currently using UDP. In addition, the current Congestion Manager does not easily support multiple congestion control mechanisms, or lend itself to the use of forms of TFRC where the state about past packet drops or marks is maintained at the receiver rather than at the sender. DCCP should be able to make use of CM where desired by the application, but we do not see any benefit in making the deployment of DCCP contingent on the deployment of CM itself. Kohler/Handley/Floyd Section 2. [Page 8] INTERNET-DRAFT Expires: January 2005 July 2004 3. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]. 3.1. Numbers and Fields All multi-byte numerical quantities in DCCP, such as port numbers, Sequence Numbers, and arguments to options, are transmitted in network byte order (most significant byte first). We occasionally refer to the "left" and "right" sides of a bit field. "Left" means towards the most significant bit, and "right" means towards the least significant bit. Random numbers in DCCP are used for their security properties, and MUST be chosen according to the guidelines in [RFC 1750]. All operations on DCCP sequence numbers, and comparisons such as "greater" and "greatest", use circular arithmetic modulo 2**48. This form of arithmetic preserves the relationships between sequence numbers as they roll over from 2**48 - 1 to 0. Reserved bitfields in DCCP packet headers MUST be set to zero by senders, and MUST be ignored by receivers, unless otherwise specified. This is to allow for future protocol extensions. In particular, DCCP processors MUST NOT reset a DCCP connection simply because a Reserved field has non-zero value [RFC 3360]. 3.2. Parts of a Connection Each DCCP connection runs between two endpoints, which we often name DCCP A and DCCP B. DCCP connections are actively initiated by one endpoint. The active endpoint is called the client, and the passive endpoint is called the server. DCCP connections are bidirectional; data may pass from either endpoint to the other. This means that data and acknowledgements may be flowing in both directions simultaneously. Logically, however, a DCCP connection consists of two separate unidirectional connections, called half-connections. Each half-connection consists of the application data sent by one endpoint and the corresponding acknowledgements sent by the other endpoint. We can illustrate this as follows: Kohler/Handley/Floyd Section 3.2. [Page 9] INTERNET-DRAFT Expires: January 2005 July 2004 +--------+ A-to-B half-connection: +--------+ | | --> application data --> | | | | <-- acknowledgements <-- | | | DCCP A | | DCCP B | | | B-to-A half-connection: | | | | <-- application data <-- | | +--------+ --> acknowledgements --> +--------+ Although they are logically distinct, in practice the half- connections overlap; a DCCP-DataAck packet, for example, contains application data relevant to one half-connection and acknowledgement information relevant to the other. In the context of a single half-connection, the terms "HC-Sender" and "HC-Receiver" denote the endpoints sending application data and acknowledgements, respectively. For example, DCCP A is the HC- Sender and DCCP B is the HC-Receiver in the A-to-B half-connection. 3.3. Features A DCCP feature is a connection attribute on whose value the two endpoints agree. Many properties of a DCCP connection are controlled by features, including the congestion control mechanisms in use on the two half-connections. The endpoints can achieve agreement through the exchange of feature negotiation options in DCCP headers, or through out-of-band communication. DCCP features are identified by a feature number and an endpoint. The notation "F/X" represents the feature with feature number F located at DCCP endpoint X. Each valid feature number thus corresponds to two features, which are negotiated separately and need not have the same value. The two endpoints know, and agree on, the value of every valid feature. DCCP A is the "feature location" for all features F/A, and the "feature remote" for all features F/B. 3.4. Round-Trip Times We sometimes refer to round-trip times -- for setting timers, for example. If no useful round-trip time estimate is available, a DCCP implementation SHOULD use 0.1 seconds instead. The maximum segment lifetime, or MSL, is the maximum length of time a packet can survive in the network. The default MSL is two minutes for this specification. Kohler/Handley/Floyd Section 3.4. [Page 10] INTERNET-DRAFT Expires: January 2005 July 2004 3.5. Security Limitation DCCP is not robust against attackers who can snoop on a connection in progress, or who can guess valid sequence numbers in other ways. Applications desiring stronger security should use IPsec or application-level cryptography. 3.6. Robustness Principle DCCP implementations will follow TCP's "general principle of robustness": "be conservative in what you do, be liberal in what you accept from others" [RFC 793]. 4. Overview DCCP's high-level connection dynamics echo those of TCP. Connections progress through three phases: initiation, including a three-way handshake; data transfer; and termination. Data can flow both ways over the connection. An acknowledgement framework lets senders discover how much data has been lost, and thus avoid unfairly congesting the network. Of course, DCCP provides unreliable datagram semantics, not TCP's reliable bytestream semantics. The application must package its data into explicit frames, and must retransmit its own data as necessary. It may be useful to think of DCCP as TCP minus bytestream semantics and reliability, or as UDP plus congestion control, handshakes, and acknowledgements. 4.1. Packet Types Ten packet types implement DCCP's protocol functions. For example, every new connection attempt begins with a DCCP-Request packet sent by the client. A DCCP-Request packet thus resembles a TCP SYN; but DCCP-Request is a packet type, not a flag, so there's no way to send an unexpected combination such as TCP's SYN+FIN+ACK+RST. Eight packet types occur during the progress of a typical connection, shown here. Note the three-way handshakes during initiation and termination. Kohler/Handley/Floyd Section 4.1. [Page 11] INTERNET-DRAFT Expires: January 2005 July 2004 Client Server ------ ------ (1) Initiation DCCP-Request --> <-- DCCP-Response DCCP-Ack --> (2) Data transfer DCCP-Data, DCCP-Ack, DCCP-DataAck --> <-- DCCP-Data, DCCP-Ack, DCCP-DataAck (3) Termination <-- DCCP-CloseReq DCCP-Close --> <-- DCCP-Reset The two remaining packet types are used to resynchronize after bursts of loss. Every DCCP packet starts with a 12-byte generic header. Particular packet types include additional fixed-size header data; for example, DCCP-Acks include an Acknowledgement Number. DCCP options and any application data follow the fixed-size header. The packet types are as follows: DCCP-Request Sent by the client to initiate a connection (the first part of the three-way initiation handshake). DCCP-Response Sent by the server in response to a DCCP-Request (the second part of the three-way initiation handshake). DCCP-Data Used to transmit application data. DCCP-Ack Used to transmit pure acknowledgements. DCCP-DataAck Used to transmit application data with piggybacked acknowledgements. DCCP-CloseReq Sent by the server to request that the client close the connection. DCCP-Close Used to close the connection; elicits a DCCP-Reset in response. Kohler/Handley/Floyd Section 4.1. [Page 12] INTERNET-DRAFT Expires: January 2005 July 2004 DCCP-Reset Used to terminate the connection, either normally or abnormally. DCCP-Sync, DCCP-SyncAck Used to resynchronize sequence numbers after large bursts of loss. 4.2. Sequence Numbers Each DCCP packet carries a sequence number, so that losses can be detected and reported. Unlike TCP's byte-based sequence numbers, DCCP sequence numbers are packet-based: each packet sent increments the sequence number by one. For example: DCCP A DCCP B ------ ------ DCCP-Data(seqno 1) --> DCCP-Data(seqno 2) --> <-- DCCP-Ack(seqno 10, ackno 2) DCCP-DataAck(seqno 3, ackno 10) --> <-- DCCP-Data(seqno 11) Even DCCP-Ack pure acknowledgements increment the sequence number. In the example, DCCP B's second packet uses sequence number 11, since sequence number 10 was used for an acknowledgement. This lets endpoints detect lost acknowledgements. It also means that endpoints can get out of sync after long bursts of loss; the DCCP- Sync and DCCP-SyncAck packet types are used to recover (Section 7.5). Since DCCP provides unreliable semantics, there are no retransmissions, and it doesn't make sense to have a TCP-style cumulative acknowledgement field. DCCP's Acknowledgement Number field equals the greatest sequence number received, rather than the smallest sequence number not received. Separate options indicate any intermediate sequence numbers that weren't received. 4.3. States DCCP endpoints progress through different states during the course of a connection, corresponding roughly to the three phases of initiation, data transfer, and termination. The figure below shows the typical progress through these states for a client and server. Kohler/Handley/Floyd Section 4.3. [Page 13] INTERNET-DRAFT Expires: January 2005 July 2004 Client Server ------ ------ (0) No connection CLOSED LISTEN (1) Initiation REQUEST DCCP-Request --> <-- DCCP-Response RESPOND PARTOPEN DCCP-Ack or DCCP-DataAck --> (2) Data transfer OPEN <-- DCCP-Data, Ack, DataAck --> OPEN (3) Termination <-- DCCP-CloseReq CLOSEREQ CLOSING DCCP-Close --> <-- DCCP-Reset CLOSED TIMEWAIT CLOSED The nine possible states are as follows. Section 8 describes them in more detail. CLOSED Represents nonexistent connections. LISTEN Represents server sockets in the passive listening state. LISTEN and CLOSED are not associated with any particular DCCP connection. REQUEST A client socket enters this state, from CLOSED, after sending a DCCP-Request packet to try to initiate a connection. RESPOND A server socket enters this state, from LISTEN, after receiving a DCCP-Request from a client. PARTOPEN A client socket enters this state, from REQUEST, after receiving a DCCP-Response from the server. This state represents the third phase of the three-way handshake. The client may send application data in this state, but it MUST include an Acknowledgement Number on all of its packets. OPEN The central, data transfer portion of a DCCP connection. Client Kohler/Handley/Floyd Section 4.3. [Page 14] INTERNET-DRAFT Expires: January 2005 July 2004 and server sockets enter this state from PARTOPEN and RESPOND, respectively. Sometimes we speak of SERVER-OPEN and CLIENT-OPEN states, corresponding to the server's OPEN state and the client's OPEN state. CLOSEREQ A server socket enters this state, from SERVER-OPEN, to signal that the connection is over, but the client must hold TIMEWAIT state. CLOSING Server and client sockets can both enter this state to close the connection. TIMEWAIT A socket remains in this state for 2MSL (4 minutes) after the connection has been torn down, to prevent mistakes due to the delivery of old packets. 4.4. Congestion Control DCCP connections are congestion controlled, but unlike in TCP, DCCP applications have a choice of congestion control mechanism. In fact, the two half-connections can be governed by different mechanisms. Mechanisms are denoted by one-byte congestion control identifiers, or CCIDs. The endpoints negotiate their CCIDs during connection initiation. Each CCID describes how the HC-Sender limits data packet rates, how the HC-Receiver sends congestion feedback via acknowledgements, and so forth. CCIDs 2 and 3 are currently defined; CCID 0 is reserved, and CCID 1 is used for special purposes. CCID 2 provides TCP-like Congestion Control, which is similar to that of TCP. The sender maintains a congestion window and sends packets until that window is full. Packets are acknowledged by the receiver. Dropped packets and ECN [RFC 3168] indicate congestion; the response to congestion is to halve the congestion window. Acknowledgements in CCID 2 contain the sequence numbers of all received packets within some window, similar to a selective acknowledgement (SACK) [RFC 3517]. CCID 3 provides TFRC Congestion Control, an equation-based form of congestion control intended to respond to congestion more smoothly than CCID 2. The sender maintains a transmit rate, which it updates using the receiver's estimate of the packet loss and mark rate. CCID 3 behaves somewhat differently from TCP in the short term, it is designed to operate fairly with TCP over the long term. Kohler/Handley/Floyd Section 4.4. [Page 15] INTERNET-DRAFT Expires: January 2005 July 2004 Section 10 describes DCCP's CCIDs in more detail. The behaviors of CCIDs 2 and 3 are fully defined in separate profile documents [CCID 2 PROFILE] [CCID 3 PROFILE]. 4.5. Features DCCP endpoints generally use Change and Confirm options to negotiate and agree on feature values, although agreement may also be achieved using an out-of-band signalling channel. Feature negotiation will almost always happen on the connection initiation handshake, but it can begin at any time. There are four feature negotiation options in all: Change L, Confirm L, Change R, and Confirm R. The "L" options are sent by the feature location, and the "R" options are sent by the feature remote. A Change R option says to the feature location, "change this feature value as follows". The feature location responds with Confirm L, meaning "I've changed it". Some features allow Change R options to contain multiple values, sorted in preference order. For example: Client Server ------ ------ Change R(CCID, 2) --> <-- Confirm L(CCID, 2) * agreement that CCID/Server = 2 * Change R(CCID, 3 4) --> <-- Confirm L(CCID, 4, 4 2) * agreement that CCID/Server = 4 * In the second exchange, the client requests that the server use either CCID 3 or CCID 4, with 3 preferred. The server chooses 4 and supplies its preference list, "4 2". The Change L and Confirm R options are used for feature negotiations initiated by the feature location. In the following example, the server requests that CCID/Server be set to 3 or 2, with 3 preferred, and the client agrees. Client Server ------ ------ <-- Change L(CCID, 3 2) Confirm R(CCID, 3, 3 2) --> * agreement that CCID/Server = 3 * Kohler/Handley/Floyd Section 4.5. [Page 16] INTERNET-DRAFT Expires: January 2005 July 2004 Section 6 describes the feature negotiation options further, including the retransmission strategies that make negotiation reliable. 4.6. Differences From TCP Differences between DCCP and TCP apart from those discussed so far include: o Copious space for options (up to 1008 bytes). o Different acknowledgement formats. The CCID for a connection determines how much acknowledgement information needs to be transmitted. In CCID 2 (TCP-like), this is about one ack per 2 packets, and each ack must declare exactly which packets were received; in CCID 3 (TFRC), it's about one ack per RTT, and acks must declare at minimum just the lengths of recent loss intervals. o Denial-of-service (DoS) protection. Several mechanisms help limit the amount of state possibly-misbehaving clients can force DCCP servers to maintain. An Init Cookie option, analogous to TCP's SYN Cookies [SYNCOOKIES], avoids SYN-flood-like attacks. Only one connection endpoint need hold TIMEWAIT state; the DCCP- CloseReq packet, which may only be sent by the server, passes that state to the client. Various rate limits let servers avoid attacks that might force extensive computation or packet generation. o Distinguishing different kinds of loss. A Data Dropped option (Section 11.8) lets an endpoint declare that a packet was dropped because of corruption, because of receive buffer overflow, and so on. This facilitates research into more appropriate rate-control responses for these non-network-congestion losses (although currently such losses will cause a congestion response). o Acknowledgement readiness. In TCP, a packet is acknowledged only when the data is queued for delivery to the application. This does not make sense in DCCP, where an application might request a drop-from-front receive buffer, for example. DCCP acknowledges a packet when its options have been processed. The Data Dropped option may later report that the packet's payload was discarded. o No receive window. DCCP is a congestion control protocol, not a flow control protocol. o No simultaneous open. Every connection has one client and one server. Kohler/Handley/Floyd Section 4.6. [Page 17] INTERNET-DRAFT Expires: January 2005 July 2004 o No half-closed states. DCCP has no states corresponding to TCP's FINWAIT and CLOSEWAIT, where one half-connection is explicitly closed while the other is still active. 4.7. Example Connection The progress of a typical DCCP connection is as follows. (This description is informative, not normative.) Client Server ------ ------ 0. [CLOSED] [LISTEN] 1. DCCP-Request --> 2. <-- DCCP-Response 3. DCCP-Ack --> <-- DCCP-Ack 4. DCCP-Data, DCCP-Ack, DCCP-DataAck --> <-- DCCP-Data, DCCP-Ack, DCCP-DataAck 5. <-- DCCP-CloseReq 6. DCCP-Close --> 7. <-- DCCP-Reset 8. [TIMEWAIT] 1. The client sends the server a DCCP-Request packet specifying the client and server ports, the service being requested, and any features being negotiated, including the CCID that the client would like the server to use. The client may optionally piggyback an application request on the DCCP-Request packet, which the server may ignore. 2. The server sends the client a DCCP-Response packet indicating that it is willing to communicate with the client. This response indicates any features and options that the server agrees to, begins other feature negotiations as desired, and optionally includes an Init Cookie that wraps up all this information and which must be returned by the client for the connection to complete. 3. The client sends the server a DCCP-Ack packet that acknowledges the DCCP-Response packet. This acknowledges the server's initial sequence number and returns the Init Cookie if there was one in the DCCP-Response. It may also continue feature negotiation. The client may piggyback an application-level request on its final ack, producing a DCCP-DataAck packet. 4. The server and client then exchange DCCP-Data packets, DCCP-Ack packets acknowledging that data, and, optionally, DCCP-DataAck Kohler/Handley/Floyd Section 4.7. [Page 18] INTERNET-DRAFT Expires: January 2005 July 2004 packets containing data with piggybacked acknowledgements. If the client has no data to send, then the server will send DCCP- Data and DCCP-DataAck packets, while the client will send DCCP- Acks exclusively. 5. The server sends a DCCP-CloseReq packet requesting a close. 6. The client sends a DCCP-Close packet acknowledging the close. 7. The server sends a DCCP-Reset packet with Reset Code 1, "Closed", and clears its connection state. DCCP-Resets are part of normal connection termination; see Section 5.6. 8. The client receives the DCCP-Reset packet and holds state for a reasonable interval of time to allow any remaining packets to clear the network. An alternative connection closedown sequence is initiated by the client: 5b. The client sends a DCCP-Close packet closing the connection. 6b. The server sends a DCCP-Reset packet with Reset Code 1, "Closed", and clears its connection state. 7b. The client receives the DCCP-Reset packet and holds state for a reasonable interval of time to allow any remaining packets to clear the network. 5. Header Formats The DCCP header can be from 12 to 1020 bytes long. The initial 12 bytes of the header have the same semantics for all packet types. Following this comes any additional fixed-length fields required by the packet type, and then a variable-length list of options. Some packet types allow application data to follow the header. +---------------------------------------+ -. | Generic Header | | +---------------------------------------+ | | Additional Fields (depending on type) | +- DCCP Header +---------------------------------------+ | | Options (optional) | | +=======================================+ -' | Application Data (optional) | +---------------------------------------+ Kohler/Handley/Floyd Section 5. [Page 19] INTERNET-DRAFT Expires: January 2005 July 2004 5.1. Generic Header The DCCP generic header takes different forms depending on the value of X, the Extended Sequence Numbers bit. If X is one, the Sequence Number field is 48 bits long and the generic header takes 16 bytes, as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |X| | . | Res |=| Type | Sequence Number (high bits) . | |1| | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Sequence Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ If X is zero, only the low 24 bits of the Sequence Number are transmitted, and the generic header is 12 bytes long. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |X| | | | Res |=| Type | Sequence Number (low bits) | | |0| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The generic header fields are defined as follows. Source and Destination Ports: 16 bits each These fields identify the connection, similar to the corresponding fields in TCP and UDP. The Source Port represents the relevant port on the endpoint that sent this packet, the Destination Port the relevant port on the other endpoint. Source Ports SHOULD be chosen randomly, to reduce the likelihood of attack. Kohler/Handley/Floyd Section 5.1. [Page 20] INTERNET-DRAFT Expires: January 2005 July 2004 Data Offset: 8 bits The offset from the start of the DCCP header to the beginning of the packet's application data, in 32-bit words. CCVal: 4 bits Used by the HC-Sender CCID. For example, the A-to-B CCID's sender, which is active at DCCP A, MAY send 4 bits of information per packet to its receiver by encoding that information in CCVal. The sender MUST set CCVal to zero unless its HC-Sender CCID specifies otherwise, and the receiver MUST ignore the CCVal field unless its HC-Receiver CCID specifies otherwise. Checksum Coverage (CsCov): 4 bits Checksum Coverage determines the parts of the packet that are covered by the Checksum field. This always includes the DCCP header and options, but some or all of the application data may be excluded. This can improve performance on noisy links for applications that can tolerate corruption. See Section 9. Checksum: 16 bits The Internet checksum of the packet's DCCP header (including options), a network-layer pseudoheader, and, depending on Checksum Coverage, some or all of the application data. See Section 9. Type: 4 bits The Type field specifies the type of the packet. The following values are defined: Type Meaning ---- ------- 0 DCCP-Request 1 DCCP-Response 2 DCCP-Data 3 DCCP-Ack 4 DCCP-DataAck 5 DCCP-CloseReq 6 DCCP-Close 7 DCCP-Reset 8 DCCP-Sync 9 DCCP-SyncAck 10-15 Reserved Receivers MUST ignore any packets with reserved type. That is, packets with reserved type MUST NOT be processed and they MUST NOT be acknowledged as received. Kohler/Handley/Floyd Section 5.1. [Page 21] INTERNET-DRAFT Expires: January 2005 July 2004 Reserved (Res): 3 bits Senders MUST set this field to all zeroes on generated packets, and receivers MUST ignore its value. Extended Sequence Numbers (X): 1 bit Set to one to indicate the use of an extended generic header with 48-bit Sequence and Acknowledgement Numbers. DCCP-Data, DCCP-DataAck, and DCCP-Ack packets MAY set X to zero or one. All DCCP-Request, DCCP-Response, DCCP-CloseReq, DCCP-Close, DCCP-Reset, DCCP-Sync, and DCCP-SyncAck packets MUST set X to one; endpoints MUST ignore any such packets with X set to zero. High-rate connections SHOULD set X to one on all packets to gain increased protection against wrapped sequence numbers and attacks. See Section 7.6. Sequence Number: 24 or 48 bits Identifies the packet uniquely in the sequence of all packets the source sent on this connection. Sequence Number increases by one with every packet sent, including packets such as DCCP- Ack that carry no application data. See Section 7. All currently defined packet types except DCCP-Request and DCCP-Data carry an Acknowledgement Number in the four or eight bytes immediately following the generic header. When X=1, its format is: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ When X=0, only the low 24 bits of the Acknowledgement Number are transmitted. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (low bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Acknowledgement Number: 24 or 48 bits Generally contains GSR, the Greatest Sequence Number Received on any acknowledgeable packet so far. A packet is acknowledgeable if and only if its header options were processed by the receiver. Section 7.4 describes this further. Options such as Ack Vector (Section 11.4) combine with the Acknowledgement Number to provide precise information about which packets have arrived. Kohler/Handley/Floyd Section 5.1. [Page 22] INTERNET-DRAFT Expires: January 2005 July 2004 Acknowledgement Numbers on DCCP-Sync and DCCP-SyncAck packets need not equal GSR; see Section 5.7. Reserved: 8 bits Senders MUST set this field to all zeroes on generated packets, and receivers MUST ignore its value. 5.2. DCCP-Request Header A client initiates a DCCP connection by sending a DCCP-Request packet. These packets MAY contain application data. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header with X=1 (16 bytes) / / with Type=0 (DCCP-Request) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | Service Code: 32 bits Describes the service to which the client application wants to connect. Examples might include RTSP and DOOM. Service Codes are intended to make application protocols independent of well- known ports, and help middleboxes identify the protocol used on a given connection. See Section 8.1.2. 5.3. DCCP-Response Header The server responds to valid DCCP-Request packets with DCCP-Response packets. This is the second phase of the three-way handshake. DCCP-Response packets MAY contain application data. Kohler/Handley/Floyd Section 5.3. [Page 23] INTERNET-DRAFT Expires: January 2005 July 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header with X=1 (16 bytes) / / with Type=1 (DCCP-Response) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | Acknowledgement Number: 48 bits Contains GSR. Since DCCP-Responses are only sent during connection initiation, this will always equal the Sequence Number on a received DCCP-Request. Service Code: 32 bits Echoes the Service Code on a received DCCP-Request. 5.4. DCCP-Data, DCCP-Ack, and DCCP-DataAck Headers The central data transfer portion of every DCCP connection uses DCCP-Data, DCCP-Ack, and DCCP-DataAck packets. DCCP-Data packets carry application data. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=2 (DCCP-Data) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | DCCP-Ack packets dispense with the data, but contain an Acknowledgement Number. They are used for pure acknowledgements. Kohler/Handley/Floyd Section 5.4. [Page 24] INTERNET-DRAFT Expires: January 2005 July 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=3 (DCCP-Ack) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+) (. Acknowledgement Number (low bits) | Reserved |) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (The parenthesized fields appear only when the header's Extended Sequence Numbers field is 1.) DCCP-DataAck packets carry both application data and an Acknowledgement Number: acknowledgement information is piggybacked on a data packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=4 (DCCP-DataAck) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+) (. Acknowledgement Number (low bits) | Reserved |) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Application Data | | ... | A DCCP-Data or DCCP-DataAck packet may have a zero-length application data area, which indicates that the application sent a zero-length datagram. This differs from DCCP-Request and DCCP- Response packets, where an empty application data area indicates the absence of application data (as opposed to the presence of zero- length application data). Receivers MUST ignore the application data area in DCCP-Ack packets. DCCP-Ack senders will generally leave this area empty. DCCP-Ack and DCCP-DataAck packets often include additional acknowledgement options, such as Ack Vector, as required by the congestion control mechanism in use. Kohler/Handley/Floyd Section 5.4. [Page 25] INTERNET-DRAFT Expires: January 2005 July 2004 5.5. DCCP-CloseReq and DCCP-Close Headers DCCP-CloseReq and DCCP-Close packets begin the handshake that normally terminates a connection. Either client or server may send a DCCP-Close packet, which will elicit a DCCP-Reset packet. Only the server can send a DCCP-CloseReq packet, which indicates that the server wants to close the connection, but does not want to hold its TIMEWAIT state. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header with X=1 (16 bytes) / / with Type=5 (DCCP-CloseReq) or 6 (DCCP-Close) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Receivers MUST ignore the application data area in DCCP-CloseReq and DCCP-Close packets. 5.6. DCCP-Reset Header DCCP-Reset packets unconditionally shut down a connection. Connections normally terminate with a DCCP-Reset, but resets may be sent for other reasons, including bad port numbers, bad option behavior, incorrect ECN Nonce Echoes, and so forth. Kohler/Handley/Floyd Section 5.6. [Page 26] INTERNET-DRAFT Expires: January 2005 July 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header with X=1 (16 bytes) / / with Type=7 (DCCP-Reset) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reset Code | Data 1 | Data 2 | Data 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Error Text | | ... | Reset Code: 8 bits Represents the reason that the sender reset the DCCP connection. Data 1, Data 2, and Data 3: 8 bits each The Data fields provide additional information about why the sender reset the DCCP connection. The meanings of these fields depend on the value of Reason. Error Text (application data area) If present, Error Text is a human-readable text string, preferably in English and encoded in Unicode UTF-8, that describes the error in more detail. For example, a DCCP-Reset with Reset Code 11, "Aggression Penalty", might contain Error Text such as "Aggression Penalty: Received 3 bad ECN Nonce Echoes, assuming misbehavior". The following Reset Codes are currently defined. Unless otherwise specified, the Data 1, 2, and 3 fields MUST be set to 0 by the sender of the DCCP-Reset and ignored by its receiver. Section references describe concrete situations that will cause each Reset Code to be generated; they are not meant to be exhaustive. 0, "Unspecified" Indicates the absence of a meaningful Reset Code. Use of Reset Code 0 is NOT RECOMMENDED: the sender should choose a Reset Code that more clearly defines why the connection is being reset. 1, "Closed" Normal connection close. See Section 8.3. Kohler/Handley/Floyd Section 5.6. [Page 27] INTERNET-DRAFT Expires: January 2005 July 2004 2, "Aborted" The sending endpoint gave up on the connection because of lack of progress. See Sections 8.1.1 and 8.1.5. 3, "No Connection" No connection exists. See Section 8.3.1. 4, "Packet Error" An unexpected packet type arrived; for example, a DCCP-Data packet arrived at a connection in the REQUEST state. See Section 8.3.1. The Data 1 field equals the offending packet type. 5, "Option Error" An option was erroneous, and the error was serious enough to warrant resetting the connection. See Sections 6.6.7, 6.6.8, and 11.4. The Data 1 field equals the offending option type; Data 2 and Data 3 equal the first two bytes of option data (or zero if the option had less than two bytes of data). 6, "Mandatory Error" The sending endpoint could not process an option marked Mandatory. The Data fields report the option type and data of the unprocessed option (not the Mandatory option), using the format of Reset Code 5, "Option Error". See Section 5.8.2. 7, "Connection Refused" The Destination Port didn't correspond to a port open for listening. Sent only in response to DCCP-Requests. See Section 8.1.3. 8, "Bad Service Code" The Service Code didn't equal the service code attached to the Destination Port. Sent only in response to DCCP-Requests. See Section 8.1.3. 9, "Too Busy" The server is too busy to accept new connections. Sent only in response to DCCP-Requests. See Section 8.1.3. 10, "Bad Init Cookie" The Init Cookie echoed by the client was incorrect or missing. See Section 8.1.4. 11, "Aggression Penalty" This endpoint has detected congestion control-related misbehavior on the part of the other endpoint. See Sections 12.2 and 13.2. Kohler/Handley/Floyd Section 5.6. [Page 28] INTERNET-DRAFT Expires: January 2005 July 2004 12-127, Reserved Receivers should treat these codes like Reset Code 0, "Unspecified". 128-255, CCID-specific codes Semantics depend on the connection's CCIDs. See Section 10.4. Receivers should treat unknown CCID-specific Reset Codes like Reset Code 0, "Unspecified". The following table summarizes this information. Reset Code Name Data 1 Data 2 & 3 ----- ---- ------ ---------- 0 Unspecified 0 0 1 Closed 0 0 2 Aborted 0 0 3 No Connection 0 0 4 Packet Error pkt type 0 5 Option Error option # option data 6 Mandatory Error option # option data 7 Connection Refused 0 0 8 Bad Service Code 0 0 9 Too Busy 0 0 10 Bad Init Cookie 0 0 11 Aggression Penalty 0 0 12-127 Reserved 128-255 CCID-specific codes 5.7. DCCP-Sync and DCCP-SyncAck Headers DCCP-Sync packets help DCCP endpoints recover synchronization after bursts of loss, or recover from half-open connections. Each valid received DCCP-Sync immediately elicits a DCCP-SyncAck. Kohler/Handley/Floyd Section 5.7. [Page 29] INTERNET-DRAFT Expires: January 2005 July 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header with X=1 (16 bytes) / / with Type=8 (DCCP-Sync) or 9 (DCCP-SyncAck) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Acknowledgement Number field has special semantics for DCCP-Sync and DCCP-SyncAck packets. First, the packet corresponding to a DCCP-Sync's Acknowledgement Number need not have been acknowledgeable. Thus, receivers MUST NOT assume that a packet was processed simply because it appears in the Acknowledgement Number field of a DCCP-Sync packet. This differs from all other packet types, where the Acknowledgement Number by definition corresponds to an acknowledgeable packet. Second, the Acknowledgement Number on any DCCP-SyncAck packet MUST correspond to the Sequence Number on an acknowledgeable DCCP-Sync packet. In the presence of reordering, this might not equal GSR. Receivers MUST ignore the application data area in DCCP-Sync and DCCP-SyncAck packets. Endpoints may find it useful to pad DCCP-Sync packets with "application data" when performing PMTU discovery; see Section 14. 5.8. Options Any DCCP packet may contain options, which occupy space at the end of the DCCP header. Each option is a multiple of 8 bits in length. The combination of all options MUST add up to a multiple of 32 bits. Individual options are not padded to multiples of 32 bits, however; any option may begin on any byte boundary. Any options present are included in the header checksum. The first byte of an option is the option type. Options with types 0 through 31 are single-byte options. Other options are followed by a byte indicating the option's length. This length value includes the two bytes of option-type and option-length as well as any option-data bytes, and must therefore be greater than or equal to two. Options are processed sequentially, starting at the first option in the packet header. Kohler/Handley/Floyd Section 5.8. [Page 30] INTERNET-DRAFT Expires: January 2005 July 2004 The following options are currently defined: Option Section Type Length Meaning Reference ---- ------ ------- --------- 0 1 Padding 5.8.1 1 1 Mandatory 5.8.2 2 1 Slow Receiver 11.6 3 1 Reset Congestion State 11.7 4-31 1 Reserved 32 variable Change L 6.1 33 variable Confirm L 6.2 34 variable Change R 6.1 35 variable Confirm R 6.2 36 variable Init Cookie 8.1.4 37 4-5 NDP Count 7.7 38 variable Ack Vector [Nonce 0] 11.4 39 variable Ack Vector [Nonce 1] 11.4 40 variable Data Dropped 11.8 41 6 Timestamp 13.1 42 6-10 Timestamp Echo 13.3 43 4-6 Elapsed Time 13.2 44 4 Data Checksum 9.3 45-127 variable Reserved 128-255 variable CCID-specific options 10.4 This section describes two generic options, Padding and Mandatory. Other options are described later. 5.8.1. Padding Option +--------+ |00000000| +--------+ Type=0 Padding is a single byte option used to pad between or after options. It either ensures the application data begins on a 32-bit boundary (as required), or ensures alignment of following options (not mandatory). 5.8.2. Mandatory Option +--------+ |00000001| +--------+ Type=1 Kohler/Handley/Floyd Section 5.8.2. [Page 31] INTERNET-DRAFT Expires: January 2005 July 2004 Mandatory is a single byte option that marks the immediately following option as mandatory. Say that the immediately following option is OP. Then the Mandatory option has no effect if the receiving DCCP endpoint understands and processes OP. If the endpoint does not understand or process OP, however, then it MUST reset the connection using Reset Code 6, "Mandatory Failure". For instance, the endpoint would reset the connection if it did not understand OP's type; if it understood OP's type, but not OP's data; if OP's data was invalid for OP's type; if OP was a feature negotiation option, and the endpoint did not understand the enclosed feature number; if the endpoint understood OP, but chose not to perform the action OP implies; and so forth. The connection is in error and should be reset with Reset Code 5, "Option Error" if option OP is absent (Mandatory was the last byte of the option list), or if option OP equals Mandatory. However, the combination "Mandatory Padding" is valid, and MUST behave like two bytes of Padding. Section 6.6.9 describes the behavior of Mandatory feature negotiation options in more detail. 6. Feature Negotiation Four DCCP options, Change L, Confirm L, Change R, and Confirm R, implement in-band feature negotiation. Change options initiate a negotiation; Confirm options complete that negotiation. The "L" options are sent by the feature location, and the "R" options are sent by the feature remote. Change options are retransmitted to ensure reliability. All these options have the same format. The first byte of option data is the feature number, and the second and subsequent data bytes hold one or more feature values. The feature values are generally arranged in a linear preference list, where the first value is most preferred. +--------+--------+--------+--------+-------- | Type | Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Together, the feature number and the option type ("L" or "R") uniquely identify the feature to which an option applies. The exact format of the Value(s) area depends on the feature number. Kohler/Handley/Floyd Section 6. [Page 32] INTERNET-DRAFT Expires: January 2005 July 2004 6.1. Change Options Change L and Change R options initiate feature negotiation. Which option to use depends on where the negotiated feature is located. To start a negotiation for feature F/A, DCCP A must send a Change L option; to start a negotiation for F/B, it must send a Change R option. Change options are retransmitted until some response is received. Change options contain at least one Value, and thus have length at least 4. +--------+--------+--------+--------+-------- Change L: |00100000| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=32 +--------+--------+--------+--------+-------- Change R: |00100010| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=34 6.2. Confirm Options Confirm L and Confirm R options complete feature negotiation, and are sent in response to Change R and Change L options, respectively. Confirm options MUST NOT be generated except in response to Change options. Any packet including a Confirm option MUST carry an Acknowledgement Number; thus, Confirm options are not allowed on DCCP-Request and DCCP-Data packets. Confirm options need not be retransmitted, since Change options are retransmitted as necessary. Normal Confirm options contain the selected Value, possibly followed by the sender's preference list. +--------+--------+--------+--------+-------- Confirm L: |00100001| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=33 +--------+--------+--------+--------+-------- Confirm R: |00100011| Length |Feature#| Value(s) ... +--------+--------+--------+--------+-------- Type=35 If an endpoint receives an invalid Change option -- with an unknown feature number, or an invalid value -- it will respond with an empty Confirm option containing no value. Such options have length 3. Kohler/Handley/Floyd Section 6.2. [Page 33] INTERNET-DRAFT Expires: January 2005 July 2004 6.3. Reconciliation Rules Reconciliation rules determine how the two sets of preferences for a given feature are resolved into a unique result. The reconciliation rule depends only on the feature number. Each reconciliation rule must have the property that the result is uniquely determined given the contents of Change options sent by the two endpoints. All current DCCP features use one of two reconciliation rules, server-priority ("SP") and non-negotiable ("NN"). 6.3.1. Server-Priority The feature value is a fixed-length byte string (length determined by the feature number). Each Change option contains a preference list of values, with the most preferred value coming first. Each Confirm option contains the confirmed value, followed by the confirmer's preference list. Thus, the feature's current value will generally appear twice in Confirm options' data, once as the current value and once in the confirmer's preference list. To reconcile the preference lists, select the first entry in the server's list that also occurs in the client's list. If there is no shared entry, the feature's value MUST NOT change, and the Confirm option will confirm the feature's previous value (unless the Change option was Mandatory; see Section 6.6.9). A single feature negotiation may, because of loss or delay, contain retransmitted Change options and multiple Confirm options. Each of the retransmitted Change options MUST contain the same payload; see Section 6.6.3. For server-priority features, this means that an endpoint sending Change options MUST NOT change its preference list during a negotiation. However, the other endpoint MAY change its preference list at will, assuming it hasn't recently sent a Change option for the same feature. Reordering protection (Section 6.6.4) ensures that agreement is reached. 6.3.2. Non-Negotiable The feature value is a byte string. Each option contains exactly one feature value. The feature location signals a new value by sending a Change L option. The feature remote MUST accept any valid value, responding with a Confirm R option containing the new value, and it MUST send empty Confirm R options in response to invalid values (unless the Change L option was Mandatory; see Section 6.6.9). Change R and Confirm L options MUST NOT be sent for non- negotiable features. Non-negotiable features use the feature negotiation mechanism to achieve reliability. Kohler/Handley/Floyd Section 6.3.2. [Page 34] INTERNET-DRAFT Expires: January 2005 July 2004 6.4. Feature Numbers This document defines the following feature numbers. Rec'n Initial Section Number Meaning Rule Value Req'd Reference ------ ------- ----- ----- ----- --------- 0 Reserved 1 Congestion Control ID (CCID) SP 2 Y 10 2 Allow Short Seqnos SP 1 Y 7.6.1 3 Sequence Window NN 100 Y 7.5.4 4 ECN Capable SP 1 Y 12.1 5 Ack Ratio NN 2 N 11.3 6 Send Ack Vector SP 0 N 11.5 7 Send NDP Count SP 0 N 7.7.2 8 Minimum Checksum Coverage SP 0 N 9.2.1 9 Check Data Checksum SP 0 N 9.3.1 10-127 Reserved 128-255 CCID-specific features 10.4 Rec'n Rule The reconciliation rule used for the feature. SP is server-priority and NN is non-negotiable. Initial Value The initial value for the feature. Every feature has a known initial value. Req'd This column is "Y" iff every DCCP implementation MUST understand the feature. If it is "N", then the feature behaves like an extension (see Section 15), and it is safe to respond to Change options for the feature with empty Confirm options. Of course, a CCID might require the feature; a DCCP that implements CCID 2 MUST support Ack Ratio and Send Ack Vector, for example. 6.5. Examples Here are three example feature negotiations for features located at the server, the first two for the Congestion Control ID feature, the last for the Ack Ratio. Kohler/Handley/Floyd Section 6.5. [Page 35] INTERNET-DRAFT Expires: January 2005 July 2004 Client Server ------ ------ 1. Change R(CCID, 2 3 1) --> ("2 3 1" is client's preference list) 2. <-- Confirm L(CCID, 3, 3 2 1) (3 is the negotiated value; "3 2 1" is server's pref list) * agreement that CCID/Server = 3 * 1. XXX <-- Change L(CCID, 3 2 1) 2. Retransmission: <-- Change L(CCID, 3 2 1) 3. Confirm R(CCID, 3, 2 3 1) --> * agreement that CCID/Server = 3 * 1. <-- Change L(Ack Ratio, 3) 2. Confirm R(Ack Ratio, 3) --> * agreement that Ack Ratio/Server = 3 * This example shows a simultaneous negotiation. Client Server ------ ------ 1a. Change R(CCID, 2 3 1) --> b. <-- Change L(CCID, 3 2 1) 2a. <-- Confirm L(CCID, 3, 3 2 1) b. Confirm R(CCID, 3, 2 3 1) --> * agreement that CCID/Server = 3 * Here are the byte encodings of several Change and Confirm options. Each option is sent by DCCP A. Change L(CCID, 2 3) = 32,5,1,2,3 DCCP B should change CCID/A's value (feature number 1, a server- priority feature); DCCP A's preferred values are 2 and 3, in that preference order. Change L(Sequence Window, 1024) = 32,6,3,0,4,0 DCCP B should change Sequence Window/A's value (feature number 3, a non-negotiable feature) to the 3-byte string 0,4,0 (the value 1024). Confirm L(CCID, 2, 2 3) = 33,6,1,2,2,3 DCCP A has changed CCID/A's value to 2; its preferred values are 2 and 3, in that preference order. Kohler/Handley/Floyd Section 6.5. [Page 36] INTERNET-DRAFT Expires: January 2005 July 2004 Empty Confirm L(126) = 33,3,126 DCCP A doesn't implement feature number 126, or DCCP B's proposed value for feature 126/A was invalid. Change R(CCID, 3 2) = 34,5,1,3,2 DCCP B should change CCID/B's value; DCCP A's preferred values are 3 and 2, in that preference order. Confirm R(CCID, 2, 3 2) = 35,6,1,2,3,2 DCCP A has changed CCID/B's value to 2; its preferred values were 3 and 2, in that preference order. Confirm R(Sequence Window, 1024) = 35,6,3,0,4,0 DCCP A has changed Sequence Window/B's value to the 3-byte string 0,4,0 (the value 1024). Empty Confirm R(126) = 35,3,126 DCCP A doesn't implement feature number 126, or DCCP B's proposed value for feature 126/B was invalid. 6.6. Option Exchange A few basic rules govern feature negotiation option exchange. 1. Every non-reordered Change option gets a Confirm option in response. 2. Change options are retransmitted until a response for the latest Change is received. 3. Feature negotiation options are processed in strictly increasing order by Sequence Number. The rest of this section describes the consequences of these rules in more detail. 6.6.1. Normal Exchange Change options are generated when a DCCP endpoint wants to change the value of some feature. Generally, this will happen at the beginning of a connection, although it may happen at any time. We say the endpoint "generates" or "sends" a Change L or Change R option, but of course the option must be attached to a packet. The endpoint may attach the option to a packet it would have generated anyway (such as a DCCP-Request). Alternatively, it may create a "feature negotiation packet", often a DCCP-Ack or DCCP-Sync, just to carry the option. Feature negotiation packets MUST be rate-limited by the relevant congestion control mechanisms. In addition, an Kohler/Handley/Floyd Section 6.6.1. [Page 37] INTERNET-DRAFT Expires: January 2005 July 2004 endpoint SHOULD generate at most one feature negotiation packet per round-trip time (0.1 seconds, if no RTT is available). On receiving a Change L or Change R option, a DCCP endpoint examines the included preference list, reconciles that with its own preference list, calculates the new value, and sends back a Confirm R or Confirm L option, respectively, informing its peer of the new value. Every non-reordered Change option MUST result in a corresponding Confirm option, and any packet including a Confirm option MUST carry an Acknowledgement Number. Generated Confirm options may be attached to packets that would have been sent anyway (such as DCCP-Response or DCCP-SyncAck), or to new feature negotiation packets, as described above. The Change-sending endpoint MUST wait to receive a corresponding Confirm option before changing its stored feature value. The Confirm-sending endpoint changes its stored feature value as soon as it sends the Confirm. Endpoints MUST NOT send packets that contain more than one feature negotiation option referring to the same feature. Note, however, that a packet is allowed to contain one L option and one R option with the same feature number F, since the two options actually refer to different features (F/A and F/B). 6.6.2. Processing Received Options DCCP endpoints exist in one of three states relative to each feature. STABLE is the normal state, where the endpoint knows the feature's value and thinks the other endpoint agrees. An endpoint enters the CHANGING state when it first sends a Change for the feature, and returns to STABLE once it receives a corresponding Confirm. The final state, UNSTABLE, indicates that an endpoint in CHANGING state changed its preference list, but has not yet transmitted a Change option with the new preference list. Feature-related state transitions at the feature location are implemented as shown in the diagram below. For feature-related state transitions at the feature remote, switch the "L"s and "R"s. The diagram ignores sequence number and option validity issues; these are handled explicitly in the pseudocode that follows the diagram. Kohler/Handley/Floyd Section 6.6.2. [Page 38] INTERNET-DRAFT Expires: January 2005 July 2004 timeout/ rcv Confirm R app/protocol evt : snd Change L rcv non-ack : ignore +---------------------------------------+ : snd Change L +----+ | | +----+ | v | rcv Change R v | v +------------+ rcv Confirm R : calc new value, +------------+ | | : accept value snd Confirm L | | | STABLE |<-----------------------------------| CHANGING | | | rcv empty Confirm R | | +------------+ : revert to old value +------------+ | ^ | ^ +----+ pref list | | snd rcv Change R changes | | Change L : calc new value, snd Confirm L v | +------------+ +---| | rcv Confirm/Change R | | UNSTABLE | : ignore +-->| | +------------+ Endpoints SHOULD use the following pseudocode, which corresponds to the state diagram, to react to each feature negotiation option on each valid packet received. The pseudocode refers to "P.seqno" and "P.ackno", which are properties of the packet; "O.type", and "O.len", which are properties of the option; "FGSR" and "FGSS", which are properties of the connection, and handle reordering as described in Section 6.6.4; "F.state", which is the feature's state (STABLE, CHANGING, or UNSTABLE); and "F.value", which is the feature's value. First, check for unknown features (Section 6.6.7); If F is unknown: If option was Mandatory: /* Section 6.6.9 */ Reset connection and return Otherwise, if O.type == Change R: Send Empty Confirm L on a future packet Return Second, check for reordering (Section 6.6.4); If F.state == UNSTABLE or P.seqno <= FGSR or (O.type == Confirm R and P.ackno < FGSS) Ignore option and return Third, process Change R options; If O.type == Change R: If option's value is valid: /* Section 6.6.8 */ Calculate new value Send Confirm L on a future packet Kohler/Handley/Floyd Section 6.6.2. [Page 39] INTERNET-DRAFT Expires: January 2005 July 2004 Set F.state := STABLE Otherwise, if option was Mandatory: Reset connection and return Otherwise: Send Empty Confirm L on a future packet /* Remain in existing state. If that's CHANGING, this endpoint will retransmit its Change L option later. */ Fourth, process Confirm R options (but only in CHANGING state). If F.state == CHANGING and O.type == Confirm R: If O.len > 3: /* nonempty */ If option's value is valid: Set F.value := new value Otherwise: Reset connection and return Set F.state := STABLE 6.6.3. Loss and Retransmission Packets containing Change and Confirm options might be lost or delayed by the network. Therefore, Change options are retransmitted to achieve reliability. A CHANGING endpoint transmits another Change option once it realizes that it has not heard back from the other endpoint. The new Change option need not contain the same payload as the original; reordering protection will ensure that agreement is reached based on the most recently transmitted option. The endpoint may piggyback its Change options on packets it would have sent anyway. If it generates new packets for feature negotiation, it MUST use an exponential-backoff timer. The timer is initially set to approximately one or two round-trip times (or 0.1-0.2 seconds, if no RTT is available), and pinned at roughly 32 RTTs. A CHANGING endpoint MUST continue retransmitting Change options until it gets some response or the connection terminates. Endpoints SHOULD NOT send Change options for a given feature more frequently than once per RTT. Otherwise, the reordering protection algorithms described in the next subsection may delay agreement, since no received Confirm option would acknowledge the most recently transmitted Change. Confirm options are never retransmitted, but the Confirm-sending endpoint MUST generate a Confirm option after every non-reordered Change. Kohler/Handley/Floyd Section 6.6.3. [Page 40] INTERNET-DRAFT Expires: January 2005 July 2004 6.6.4. Reordering Reordering might cause packets containing Change and Confirm options to arrive in an unexpected order. Endpoints MUST ignore feature negotiation options that do not arrive in strictly-increasing order by Sequence Number. The rest of this section presents two algorithms that fulfill this requirement. The first algorithm introduces two sequence number variables that each endpoint maintains for the connection. FGSR Feature Greatest Sequence Number Received: The greatest sequence number received, considering only valid packets that contained one or more feature negotiation options (Change and/or Confirm). This value is initialized to ISR - 1. FGSS Feature Greatest Sequence Number Sent: The greatest sequence number sent, considering only packets that contained one or more non-retransmitted Change options. (Retransmitted Change options MUST have exactly the same contents as previously transmitted options, so limited reordering can safely be tolerated.) This value is initialized to ISS. Each endpoint checks two conditions on sequence numbers to decide whether to process received feature negotiation options. 1. If a packet's Sequence Number is less than or equal to FGSR, then its Change options MUST be ignored. 2. If a packet's Sequence Number is less than or equal to FGSR, OR it has no Acknowledgement Number, OR its Acknowledgement Number is less than FGSS, then its Confirm options MUST be ignored. Alternatively, an endpoint MAY maintain separate FGSR and FGSS values for every feature. FGSR(F/X) would equal the greatest sequence number received, considering only packets that contained Change or Confirm options applying to feature F/X; FGSS(F/X) would be defined similarly. This algorithm requires more state, but is slightly more forgiving to multiple overlapped feature negotiations. Either algorithm MAY be used; the first algorithm, with connection- wide FGSR and FGSS variables, is RECOMMENDED. One consequence of these rules is that a CHANGING endpoint will ignore any Confirm option that does not acknowledge the latest Change option sent. This ensures that agreement, once achieved, used the most recent available information about the endpoints' Kohler/Handley/Floyd Section 6.6.4. [Page 41] INTERNET-DRAFT Expires: January 2005 July 2004 preferences. 6.6.5. Preference Changes Endpoints are allowed to change their preference lists at any time. However, an endpoint that changes its preference list while in the CHANGING state MUST transition to the UNSTABLE state. It will transition back to CHANGING once it has transmitted a Change option with the new preference list. This ensures that agreement is based on active preference lists. Without the UNSTABLE state, simultaneous negotiation -- where the endpoints began independent negotiations for the same feature at the same time -- might lead to the negotiation terminating with the endpoints thinking the feature had different values. 6.6.6. Simultaneous Negotiation The two endpoints might simultaneously open negotiation for the same feature, after which an endpoint in the CHANGING state will receive a Change option for the same feature. Such received Change options can act as responses to the original Change options. The CHANGING endpoint MUST examine the received Change's preference list, reconcile that with its own preference list (as expressed in its generated Change options), and generate the corresponding Confirm option. It can then transition to the STABLE state. 6.6.7. Unknown Features Endpoints may receive Change options referring to feature numbers they do not understand -- for instance, when an extended DCCP converses with a non-extended DCCP. Endpoints MUST respond to unknown Change options with Empty Confirm options (that is, Confirm options containing no data), which inform the CHANGING endpoint that the feature was not understood. However, if the Change option was preceded by a Mandatory option, the connection MUST be reset; see Section 6.6.9. On receiving an empty Confirm option for some feature, the CHANGING endpoint MUST transition back to the STABLE state, leaving the feature's value unchanged. Section 15 suggests that the default value for any extension feature should correspond to "extension not available". Some features are required to be understood by all DCCPs (see Section 6.4). The CHANGING endpoint SHOULD reset the connection (with Reset Code 5, "Option Error") if it receives an empty Confirm option for such a feature. Kohler/Handley/Floyd Section 6.6.7. [Page 42] INTERNET-DRAFT Expires: January 2005 July 2004 Since Confirm options are generated only in response to Change options, an endpoint should never receive a Confirm option referring to a feature number it does not understand. Endpoints MUST ignore such options. 6.6.8. Invalid Options A DCCP endpoint might receive a Change or Confirm option that lists one or more values that it does not understand. Some, but not all, such options are invalid, depending on the relevant reconciliation rule (Section 6.3). For instance: o All features have length limitiations, and options with invalid lengths are invalid. For example, the Ack Ratio feature takes 16-bit values, so valid "Confirm R(Ack Ratio)" options have option length 5. o Some non-negotiable features have value limitations. The Ack Ratio feature takes two-byte, non-zero integer values, so a "Change L(Ack Ratio, 0)" option is never valid. Note that server-priority features do not have value limitations, since unknown values are handled as a matter of course. o Any Confirm option that selects the wrong value, based on the two preference lists and the relevant reconciliation rule, is invalid. o However, unexpected Confirm options -- that refer to unknown feature numbers, or that don't appear to be part of a current negotiation -- are considered valid, although they are ignored by the receiver. An endpoint receiving an invalid Change option MUST respond with the corresponding empty Confirm option. An endpoint receiving an invalid Confirm option MUST reset the connection, with Reset Code 5, "Option Error". 6.6.9. Mandatory Feature Negotiation Change options may be preceded by Mandatory options (Section 5.8.2). Mandatory Change options are processed like normal Change options, except that the following failure cases will cause the receiver to reset the connection with Reset Code 6, "Mandatory Failure", rather than send a Confirm option. The connection MUST be reset if: o The Change option's feature number was not understood; Kohler/Handley/Floyd Section 6.6.9. [Page 43] INTERNET-DRAFT Expires: January 2005 July 2004 o The Change option's value was invalid, and the receiver would normally have sent an empty Confirm option in response; or o For server-priority features, there was no shared entry in the two endpoints' preference lists. There's no reason to mark Confirm options as Mandatory in this version of DCCP, since Confirm options are sent only in response to Change options and therefore can't mention potentially-invalid values or unexpected feature numbers. 6.6.10. Out-of-Band Agreement An endpoint MUST NOT unilaterally change the value of any DCCP feature. However, endpoints MAY cooperatively change DCCP feature values without using in-band feature negotiation options. For example, features MAY be changed via negotation over a separate signaling channel, for example. 7. Sequence Numbers DCCP uses sequence numbers to arrange packets into sequence, detect losses and network duplicates, and protect against attackers, half- open connections, and the delivery of very old packets. Every packet carries a Sequence Number; most packet types carry an Acknowledgement Number as well. DCCP sequence numbers are packet-based. That is, the packets generated by each endpoint have Sequence Numbers that increase by one, modulo 2^48, for every packet. Even DCCP-Ack and DCCP-Sync packets, and other packets that don't carry user data, increment the Sequence Number. Since DCCP is an unreliable protocol, there are no true retransmissions; but effective retransmissions, such as retransmissions of DCCP-Request packets, also increment the Sequence Number. This lets DCCP implementations detect network duplication, retransmissions, and acknowledgement loss, and is a significant departure from TCP practice. 7.1. Variables DCCP endpoints maintain a set of sequence number variables for each connection. ISS The Initial Sequence Number Sent by this endpoint. This equals the Sequence Number of the first DCCP-Request or DCCP-Response sent. Kohler/Handley/Floyd Section 7.1. [Page 44] INTERNET-DRAFT Expires: January 2005 July 2004 ISR The Initial Sequence Number Received from the other endpoint. This equals the Sequence Number of the first DCCP-Request or DCCP-Response received. GSS The Greatest Sequence Number Sent by this endpoint. Here, and elsewhere, "greatest" is measured in circular sequence space. GSR The Greatest Sequence Number Received from the other endpoint on an acknowledgeable packet. (Section 7.4 defines "acknowledgeable" packets.) GAR The Greatest Acknowledgement Number Received from the other endpoint on an acknowledgeable packet that was not a DCCP- Sync. Some other variables are derived from these primitives. SWL and SWH (Sequence Number Window Low and High) The extremes of the validity window for received packets' Sequence Numbers. AWL and AWH (Acknowledgement Number Window Low and High) The extremes of the validity window for received packets' Acknowledgement Numbers. 7.2. Initial Sequence Numbers The endpoints' initial sequence numbers are set by the first DCCP- Request and DCCP-Response packets sent. Initial sequence numbers MUST be chosen to avoid two problems: o Delivery of old packets, where packets lingering in the network from an old connection are delivered to a new connection with the same addresses and port numbers. o Sequence number attacks, where an attacker can guess the sequence numbers that a future connection would use [M85]. These problems are the same as problems faced by TCP, and DCCP implementations SHOULD use TCP's strategies to avoid them [RFC 793] [RFC 1948]. The rest of this section explains these strategies in more detail. To address the first problem, an implementation MUST ensure that the initial sequence number for a given 4-tuple doesn't overlap with Kohler/Handley/Floyd Section 7.2. [Page 45] INTERNET-DRAFT Expires: January 2005 July 2004 recent sequence numbers on previous connections with the same 4-tuple. ("Recent" means sent within 2 maximum segment lifetimes, or 4 minutes.) The implementation MUST additionally ensure that the lower 24 bits of the initial sequence number don't overlap with the lower 24 bits of recent sequence numbers (unless the implementation plans to avoid short sequence numbers; see Section 7.6). An implementation that has state for a recent connection with the same 4-tuple can pick a good initial sequence number explicitly. Otherwise, it could tie initial sequence number selection to some clock, such as the 4-microsecond clock used by TCP [RFC 793]. Two separate clocks may be required, one for the upper 24 bits and one for the lower 24 bits. To address the second problem, an implementation MUST provide each 4-tuple with an independent initial sequence number space. Then opening a connection doesn't provide any information about initial sequence numbers on other connections to the same host. RFC 1948 achieves this by adding a cryptographic hash of the 4-tuple and a secret to each initial sequence number. For the secret, RFC 1948 recommends a combination of some truly-random data [RFC 1750], an administratively-installed passphrase, the endpoint's IP address, and the endpoint's boot time, but truly-random data is sufficient. Care should be taken when changing the secret; such a change alters all initial sequence number spaces, which might make an initial sequence number for some 4-tuple equal a recently sent sequence number for the same 4-tuple. To avoid this problem, the endpoint might remember dead connection state for each 4-tuple or stay quiet for 2 maximum segment lifetimes around such a change. 7.3. Quiet Time DCCP endpoints, like TCP endpoints, must take care before initiating connections when they boot. In particular, they MUST NOT send packets whose sequence numbers are close to the sequence numbers of packets lingering in the network from before the boot. The simplest way to enforce this rule is for DCCP endpoints to avoid sending any packets until one maximum segment lifetime (2 minutes) after boot. Other enforcement mechanisms include remembering recent sequence numbers across boots, and reserving the upper 8 or so bits of initial sequence numbers for a persistent counter that decrements by two each boot. (The latter mechanism would require disallowing packets with short sequence numbers; see Section 7.6.1.) 7.4. Acknowledgement Numbers Cumulative acknowledgements are meaningless in an unreliable protocol. Therefore, DCCP's Acknowledgement Number field has a different meaning than TCP's. Kohler/Handley/Floyd Section 7.4. [Page 46] INTERNET-DRAFT Expires: January 2005 July 2004 A packet is classified as "acknowledgeable" if and only if its options were processed by the receiving DCCP. This means, for example, that all acknowledgeable packets have valid header checksums and sequence numbers. The Acknowledgement Number MUST equal GSR, the Greatest Sequence Number Received on an acknowledgeable packet, for all packet types except DCCP-Sync and DCCP-SyncAck. "Acknowledgeable" does not refer to data processing. Even acknowledgeable packets may have their application data dropped, due to receive buffer overflow or corruption, for instance. Data Dropped options report these data losses when necessary, letting congestion control mechanisms distinguish between network losses and endpoint losses. This issue is discussed further in Sections 11.4 and 11.8. DCCP-Sync and DCCP-SyncAck packets' Acknowledgement Numbers differ as follows: The Acknowledgement Number on a DCCP-Sync packet corresponds to a received packet, but not necessarily an acknowledgeable packet; in particular, it might correspond to an out-of-sync packet whose options were not processed. The Acknowledgement Number on a DCCP-SyncAck packet always corresponds to an acknowledgeable DCCP-Sync packet; it might be less than GSR in the presence of reordering. 7.5. Validity and Synchronization Any DCCP endpoint might receive packets that are not actually part of the current connection. For instance, the network might deliver an old packet, an attacker might attempt to hijack a connection, or the other endpoint might crash, causing a half-open connection. DCCP, like TCP, uses sequence number checks to detect these cases. Packets whose Sequence and/or Acknowledgement Numbers are out of range are called sequence-invalid, and are not processed normally. Unlike TCP, DCCP requires a synchronization mechanism to recover from large bursts of loss. One endpoint might send so many packets during a burst of loss that when one of its packets finally got through, the other endpoint would label its Sequence Number as invalid. A handshake of DCCP-Sync and DCCP-SyncAck packets recovers from this case. 7.5.1. Sequence-Validity Rules Sequence-validity depends on the received packet's type. This table shows the sequence and acknowledgement number checks applied to each packet; a packet is sequence-valid if it passes both tests, and Kohler/Handley/Floyd Section 7.5.1. [Page 47] INTERNET-DRAFT Expires: January 2005 July 2004 sequence-invalid if it does not. Many of the checks refer to the sequence and acknowledgement number windows [SWL, SWH] and [AWL, AWH], which are defined in Section 7.5.3. Acknowledgement Number Packet Type Sequence Number Check Check ----------- --------------------- ---------------------- DCCP-Request SWL <= seqno <= SWH (*) N/A DCCP-Response SWL <= seqno <= SWH (*) AWL <= ackno <= AWH DCCP-Data SWL <= seqno <= SWH N/A DCCP-Ack SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-DataAck SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-CloseReq GSR < seqno <= SWH GAR <= ackno <= AWH DCCP-Close GSR < seqno <= SWH GAR <= ackno <= AWH DCCP-Reset GSR < seqno <= SWH GAR <= ackno <= AWH DCCP-Sync seqno >= SWL AWL <= ackno <= AWH DCCP-SyncAck seqno >= SWL AWL <= ackno <= AWH (*) Check not applied if connection is in LISTEN or REQUEST state. In general, packets are sequence-valid if their Sequence and Acknowledgement Numbers lie within the corresponding valid windows, [SWL, SWH] and [AWL, AWH]. The exceptions to this rule are as follows: o Since DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets end a connection, they cannot have Sequence Numbers less than or equal to GSR, or Acknowledgement Numbers less than GAR. o DCCP-Sync and DCCP-SyncAck Sequence Numbers are not strongly checked. These packet types exist specifically to get the endpoints back into sync after bursts of loss; checking their Sequence Numbers would eliminate their usefulness. The lenient checks on DCCP-Sync and DCCP-SyncAck packets allow continued operation after unusual events, such as endpoint crashes and large bursts of loss. There's no need for leniency when the endpoints are actively sending packets to one another. Therefore, DCCP implementations SHOULD use the following, more stringent checks for active connections. A connection is considered active if it has received valid packets from the other endpoint within the last several round-trip times, or 0.5 seconds, if the RTT is not known. Kohler/Handley/Floyd Section 7.5.1. [Page 48] INTERNET-DRAFT Expires: January 2005 July 2004 Acknowledgement Number Packet Type Sequence Number Check Check ----------- --------------------- ---------------------- DCCP-Sync SWL <= seqno <= SWH AWL <= ackno <= AWH DCCP-SyncAck SWL <= seqno <= SWH AWL <= ackno <= AWH Finally, an endpoint MAY apply the following more stringent checks to DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets, further lowering the probability of successful blind attacks using those packet types. Since these checks can cause extra synchronization overhead and delay connection closing when packets are lost, they should be considered experimental. Acknowledgement Number Packet Type Sequence Number Check Check ----------- --------------------- ---------------------- DCCP-CloseReq seqno == GSR + 1 GAR <= ackno <= AWH DCCP-Close seqno == GSR + 1 GAR <= ackno <= AWH DCCP-Reset seqno == GSR + 1 GAR <= ackno <= AWH Note that sequence-validity is only one of the validity checks applied to received packets. 7.5.2. Handling Sequence-Invalid Packets Sequence-invalid DCCP-Sync and DCCP-SyncAck packets MUST be ignored. On receiving any other sequence-invalid packet, an endpoint (say, DCCP A) MUST reply with a DCCP-Sync packet. This packet MUST acknowledge the sequence-invalid packet's Sequence Number, not GSR. The DCCP-Sync MUST use a new Sequence Number, and thus will increase GSS; GSR will not change, however, since the received packet was sequence-invalid. DCCP A MUST NOT otherwise process sequence- invalid packets. For instance, it MUST NOT process their options. On receiving a sequence-valid DCCP-Sync, the peer endpoint (DCCP B) MUST either respond with a DCCP-Reset packet, or update its GSR variable and reply with a DCCP-SyncAck packet. The DCCP-SyncAck packet's Acknowledgement Number will equal the DCCP-Sync's Sequence Number, not necessarily GSR. Upon receiving this DCCP-SyncAck, which will be sequence-valid since it acknowledges the DCCP-Sync, DCCP A will update its GSR variable, and the endpoints will be back in sync. A DCCP endpoint MAY temporarily preserve sequence-invalid packets in case they become valid later. This can reduce the impact of bursts of loss by delivering more packets to the application. In particular, an endpoint MAY preserve sequence-invalid packets for up Kohler/Handley/Floyd Section 7.5.2. [Page 49] INTERNET-DRAFT Expires: January 2005 July 2004 to 2 round-trip times (or 0.2 seconds, if the RTT is unknown); if, within that time, the relevant sequence windows change so that the packets becomes sequence-valid, the endpoint MAY process the packets again. To protect itself against denial-of-service attacks (where an attacker sends many sequence-invalid packets, trying to force the receiver to send many DCCP-Syncs), a DCCP implementation SHOULD rate-limit the DCCP-Syncs sent in response to sequence-invalid packets. Note that sequence-invalid DCCP-Reset packets cause DCCP-Syncs to be generated. This is because endpoints in an unsynchronized state (CLOSED, REQUEST, and LISTEN) might not have enough information to generate a proper DCCP-Reset on the first try. For example, if a peer endpoint is in CLOSED state and receives a DCCP-Data packet, it cannot guess the right Sequence Number to use on the DCCP-Reset it generates (since the DCCP-Data packet has no Acknowledgement Number). The DCCP-Sync generated in response to this bad reset serves as a challenge, and contains enough information for the peer to generate a proper DCCP-Reset. However, the new DCCP-Reset may carry a different Reset Code than the original DCCP-Reset; probably the new Reset Code will be 3, "No Connection". The endpoint SHOULD use information from the original DCCP-Reset when possible. 7.5.3. Sequence and Acknowledgement Number Windows Each DCCP endpoint defines sequence validity windows that are subsets of the Sequence and Acknowledgement Number spaces. These windows correspond to packets the endpoint expects to receive in the next few round-trip times. The Sequence and Acknowledgement Number windows always contain GSR and GSS, respectively. The window widths are controlled by Sequence Window features for the two half- connections. The Sequence Number validity window for packets from DCCP B is [SWL, SWH]. This window always contains GSR, the Greatest Sequence Number Received on a sequence-valid packet from DCCP B. It is W packets wide, where W is the value of the Sequence Window/B feature. One- fourth of the sequence window, rounded down, is less than or equal to GSR, and three-fourths is greater than GSR. (This asymmetric placement assumes that bursts of loss are more common in the network than significant reordering.) Kohler/Handley/Floyd Section 7.5.3. [Page 50] INTERNET-DRAFT Expires: January 2005 July 2004 invalid | valid Sequence Numbers | invalid <---------*|*===========*=======================*|*---------> GSR -|GSR + 1 - GSR GSR +|GSR + 1 + floor(W/4)|floor(W/4) ceil(3W/4)|ceil(3W/4) = SWL = SWH The Acknowledgement Number validity window for packets from DCCP B is [AWL, AWH]. The high end of the window, AWH, equals GSS, the Greatest Sequence Number Sent by DCCP A; the window is W' packets wide, where W' is the value of the Sequence Window/A feature. invalid | valid Acknowledgement Numbers | invalid <---------*|*===================================*|*---------> GSS - W'|GSS + 1 - W' GSS|GSS + 1 = AWL = AWH SWL and AWL are initially adjusted so that they are not less than the initial Sequence Numbers received and sent, respectively: SWL := max(GSR + 1 - floor(W/4), ISR), AWL := max(GSS - W' + 1, ISS). These adjustments MUST be applied only at the beginning of the connection. (Long-lived connections may wrap sequence numbers so that they appear to be less than ISR or ISS; the adjustments MUST NOT be applied in that case.) 7.5.4. Sequence Window Feature The Sequence Window/A feature determines the width of the Sequence Number validity window used by DCCP B, and the width of the Acknowledgement Number validity window used by DCCP A. DCCP A sends a "Change L(Sequence Window, W)" option to notify DCCP B that the Sequence Window/A value is W. Sequence Window has feature number 3, and is non-negotiable. It takes 3- or 6-byte integer values, like DCCP sequence numbers. Change and Confirm options for Sequence Window are therefore either 6 or 9 bytes long. New connections start with Sequence Window 100 for both endpoints. A proper Sequence Window/A value should reflect how many packets DCCP A expects to be in flight. Only DCCP A can anticipate this number. Too-small values increase the risk of the endpoints getting out sync after bursts of loss; too-large values increase the risk of connection hijacking. (The next section quantifies this risk.) One good guideline is for each endpoint to set Sequence Window to about five times the maximum number of packets it expects to send in a round-trip time. This value may not be available at connection initiation, when the round-trip time is unknown, but the endpoint Kohler/Handley/Floyd Section 7.5.4. [Page 51] INTERNET-DRAFT Expires: January 2005 July 2004 can always send updates as the connection progresses. 7.5.5. Sequence Number Attacks Sequence and Acknowledgement Numbers form DCCP's main line of defense against attackers. An attacker that cannot guess sequence numbers cannot easily manipulate or hijack a DCCP connection, and requirements like careful initial sequence number choice eliminate the most serious attacks. An attacker might still send many packets with randomly chosen Sequence and Acknowledgement Numbers, however. If one of those probes ends up sequence-valid, it may shut down the connection or otherwise cause problems. The easiest such attacks to execute are: o Send DCCP-Data packets with random Sequence Numbers. If one of these packets hits the valid sequence number window, the attack packet's application data may be inserted into the data stream. o Send DCCP-Sync packets with random Sequence and Acknowledgement Numbers. If one of these packets hits the valid acknowledgement number window, the receiver will shift its sequence number window accordingly, getting out of sync with the correct endpoint -- perhaps permanently. The attacker has to guess both Source and Destination Ports for any of these attacks to succeed. Additionally, the connection would have to be inactive for the DCCP-Sync attack to succeed, assuming the victim implemented the more stringent checks for active connections recommended in Section 7.5.1. To quantify the probability of success, let N be the number of attack packets the attacker is willing to send, W be the relevant sequence window width, and L be the length of sequence numbers (24 or 48). The attacker's best strategy is to space the attack packets evenly over sequence space. Then the probability of hitting one sequence number window is P = WN/2^L. For N = 1000, W = 100, and L = 24, P is about 0.006. This is the probability of a successful DCCP-Data attack using short sequence numbers. (For reference, the easiest TCP attack -- sending a SYN with a random sequence number, which will cause a connection reset if it falls within the window -- will succeed with probability 0.002 for N = 1000, W = 8760 [a common default], and L = 32.) A connection can reduce this probability by requiring long sequence numbers; see Section 7.6.1. Kohler/Handley/Floyd Section 7.5.5. [Page 52] INTERNET-DRAFT Expires: January 2005 July 2004 The DCCP-Sync attack has L = 48, since DCCP-Sync packets use long sequence numbers exclusively, and attacks correspondingly have a smaller probability of success. For N = 10,000, W = 2000, and L = 48, a DCCP-Sync attack will succeed with probability 7*10^-8. Attacks involving DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets are more difficult still, since 48-bit Sequence and Acknowledgement Numbers must both be guessed. 7.5.6. Examples In the following example, DCCP A and DCCP B recover from a large burst of loss that runs DCCP A's sequence numbers out of DCCP B's appropriate sequence number window. DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) --> DCCP-Data(seq 2) XXX ... --> DCCP-Data(seq 100) XXX --> DCCP-Data(seq 101) --> ??? seqno out of range; send Sync OK <-- DCCP-Sync(seq 11, ack 101) <-- (GSS=11,GSR=1) --> DCCP-SyncAck(seq 102, ack 11) --> OK (GSS=102,GSR=11) (GSS=11,GSR=102) In the next example, a DCCP connection recovers from a simple blind attack. DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) *ATTACKER* --> DCCP-Data(seq 10^6) --> ??? seqno out of range; send Sync ??? <-- DCCP-Sync(seq 11, ack 10^6) <-- ackno out of range; ignore (GSS=1,GSR=10) (GSS=11,GSR=1) The final example demonstrates recovery from a half-open connection. Kohler/Handley/Floyd Section 7.5.6. [Page 53] INTERNET-DRAFT Expires: January 2005 July 2004 DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) (Crash) CLOSED OPEN REQUEST --> DCCP-Request(seq 400) --> ??? !! <-- DCCP-Sync(seq 11, ack 400) <-- OPEN REQUEST --> DCCP-Reset(seq 401, ack 11) --> (Abort) REQUEST CLOSED REQUEST --> DCCP-Request(seq 402) --> ... 7.6. Short Sequence Numbers DCCP sequence numbers are 48 bits long. This large sequence space protects DCCP connections against some blind attacks, such as the injection of DCCP-Resets into the connection. However, DCCP-Data, DCCP-Ack, and DCCP-DataAck packets, which make up the body of any DCCP connection, may reduce header space by transmitting only the lower 24 bits of the relevant Sequence and Acknowledgement Numbers. The receiving endpoint will extend these numbers to 48 bits using the following pseudocode: procedure Extend_Sequence_Number(S, REF) /* S is a 24-bit sequence number from the packet header. REF is the relevant 48-bit reference sequence number: GSS if S is an Acknowledgement Number, and GSR if S is a Sequence Number. */ set REF_low := low 24 bits of REF set REF_hi := high 24 bits of REF if REF_low (<) S /* CIRCULAR comparison mod 2^24 */ && S |<| REF_low: /* NON-CIRCULAR comparison */ return ((REF_hi + 1) << 24) | S otherwise: return (REF_hi << 24) | S The two different kinds of comparison in the if statement detect when the low-order bits of the sequence space have wrapped. When this happens, the high-order bits are incremented. 7.6.1. Allow Short Sequence Numbers Feature Endpoints can require that all packets use long sequence numbers by setting the Allow Short Sequence Numbers feature to false. This can reduce the risk that data will be inappropriately injected into the connection. DCCP A sends a "Change R(Allow Short Seqnos, 0)" option to ask DCCP B to send only long sequence numbers. Kohler/Handley/Floyd Section 7.6.1. [Page 54] INTERNET-DRAFT Expires: January 2005 July 2004 Allow Short Sequence Numbers has feature number 2, and is server- priority. It takes one-byte Boolean values. DCCP B MUST NOT send packets with short sequence numbers when Allow Short Seqnos/B is zero. Values of two or more are reserved. New connections start with Allow Short Sequence Numbers 1 for both endpoints. 7.6.2. When to Avoid Short Sequence Numbers Short sequence numbers increase the risks of certain kinds of attacks, including blind data injection, and reduce the rate DCCP connections can safely achieve. Very-high-rate DCCP connections, and connections with large sequence windows (Section 7.5.4), SHOULD NOT use short sequence numbers on their data packets. The rate limitation imposed by short sequence numbers is easy to calculate. The sequence-validity mechanism assumes that the network does not deliver extremely old data. In particular, it assumes that the network must have dropped any packet by the time the connection wraps around and uses its sequence number again. We can easily calculate the maximum connection rate that can be safely achieved given this constraint. Let MSL equal the maximum segment lifetime, P equal the average DCCP packet size in bits, and L equal the length of sequence numbers (24 or 48 bits). Then the maximum safe rate, in bits per second, is R = P*(2^L)/2MSL. For the default MSL of 2 minutes, 1500-byte DCCP packets, and short sequence numbers, the safe rate is therefore approximately 800 Mb/s. Of course, 2 minutes is a very large MSL for any networks that could sustain that rate with such small packets. Nevertheless, long sequence numbers allow much higher rates, up to 14 petabits a second for 1500-byte packets and the default MSL. The probability o