Infra

Living Standard — Last Updated

Participate:
GitHub whatwg/infra (new issue, open issues)
IRC: #whatwg on Freenode
Commits:
GitHub whatwg/infra/commits
Snapshot as of this commit
@infrastandard
Translation (non-normative):
日本語

Abstract

The Infra Standard aims to define the fundamental concepts upon which standards are built.

Goals

Suggestions for more goals welcome.

1. Usage

To make use of the Infra Standard in a document titled X, use X depends on the Infra Standard. Additionally, cross-referencing terminology is encouraged to avoid ambiguity.

Specification authors are also encouraged to add their specification to the list of dependent specifications in order to help the editors ensure that any future breaking changes to the Infra Standard are correctly reflected by any such dependencies.

2. Conventions

2.1. Conformance

All diagrams, examples, and notes are non-normative, as are all sections explicitly marked non-normative. Everything else is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119. [RFC2119]

These keywords have equivalent meaning when written in lowercase and cannot appear in non-normative content. Standards are encouraged to limit themselves to "must", "must not", "should", and "may", and to use these in their lowercase form as that is generally considered to be more readable.

2.2. Algorithms

Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., "must") used in introducing the algorithm or step. If no such keyword is used, must is implied.

For example, were the spec to say:

To eat an orange, the user must:

  1. Peel the orange.
  2. Separate each slice of the orange.
  3. Eat the orange slices.

it would be equivalent to the following:

To eat an orange:

  1. The user must peel the orange.
  2. The user must separate each slice of the orange.
  3. The user must eat the orange slices.

Here the key word is "must".

Modifying the above example, if the algorithm was introduced only with "To eat an orange:", it would still have the same meaning, as "must" is implied.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be easy to follow, and not intended to be performant.)

2.2.1. Control flow

The control flow of algorithms is such that a requirement to "return" or "throw" terminates the algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" will make the caller automatically rethrow the given value, if any, and thereby terminate the caller’s algorithm. Using prose the caller has the ability to "catch" the exception and perform another action.

An iteration’s flow can be controlled via requirements to continue or break. Continue will skip over any remaining steps in an iteration, proceeding to the next item. If no further items remain, the iteration will stop. Break will skip over any remaining steps in an iteration, and skip over any remaining items as well, stopping the iteration.

Let example be the list « 1, 2, 3, 4 ». The following prose would perform operation upon 1, then 2, then 3, then 4:

  1. For each item in example:

    1. Perform operation on item.

The following prose would perform operation upon 1, then 2, then 4. 3 would be skipped.

  1. For each item in example:

    1. If item is 3, then continue.
    2. Perform operation on item.

The following prose would perform operation upon 1, then 2. 3 and 4 would be skipped.

  1. For each item in example:

    1. If item is 3, then break.
    2. Perform operation on item.

2.3. Terminology

The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called out as being exclusive (with "but not both").

3. Primitive data types

3.1. Bytes

A byte is a sequence of eight bits, represented as a double-digit hexadecimal number in the range 0x00 to 0xFF, inclusive.

An ASCII byte is a byte in the range 0x00 to 0x7F, inclusive.

3.2. Byte sequences

A byte sequence is a sequence of bytes, represented as a space-separated sequence of bytes. Byte sequences with bytes in the range 0x00 to 0x7F, inclusive, can alternately be written as a string, but using backticks instead of quotation marks, to avoid confusion with an actual string.

0x48 0x49 can also be represented as `HI`.

Headers, such as `Content-Type`, are byte sequences.

To byte-lowercase a byte sequence, increase each byte it contains, in the range 0x41 to 0x5A, inclusive, by 0x20.

To byte-uppercase a byte sequence, subtract each byte it contains, in the range 0x61 to 0x7A, inclusive, by 0x20.

3.3. Code points

A code point is a Unicode code point and is represented as a four-to-six digit hexadecimal number, typically prefixed with "U+". Often the name of the code point is also included in capital letters afterward, potentially with the rendered form of the code point in parentheses. [UNICODE]

The code point rendered as 🤔 is represented as U+1F914.

When referring to that code point, we might instead say "U+1F914 THINKING FACE (🤔)", instead of just "U+1F914", to provide extra context.

In certain contexts code points are prefixed with "0x" instead of "U+".

A scalar value is a code point that is not in the range U+D800 to U+DFFF, inclusive.

An ASCII code point is a code point in the range U+0000 to U+007F, inclusive.

An ASCII tab or newline is U+0009, U+000A, or U+000D.

An ASCII whitespace is U+0009, U+000A, U+000C, U+000D, or U+0020.

A C0 control is a code point in the range U+0000 to U+001F, inclusive.

A C0 control or space is a C0 control or U+0020.

An ASCII digit is a code point in the range U+0030 to U+0039, inclusive.

An ASCII upper hex digit is an ASCII digit or a code point in the range U+0041 to U+0046, inclusive.

An ASCII lower hex digit is an ASCII digit or a code point in the range U+0061 to U+0066, inclusive.

An ASCII hex digit is an ASCII upper hex digit or ASCII lower hex digit.

An ASCII upper alpha is a code point in the range U+0041 to U+005A, inclusive.

An ASCII lower alpha is a code point in the range U+0061 to U+007A, inclusive.

An ASCII alpha is an ASCII upper alpha or ASCII lower alpha.

An ASCII alphanumeric is an ASCII digit or ASCII alpha.

For the purposes of the above definitions, "whitespace", "alpha", and "alphanumeric" are mass nouns.

3.4. Strings

A string is a sequence of code points. Strings are denoted by double quotes and monospace font.

"Hello, world!" is a string.

An ASCII string is a string whose code points are all ASCII code points.

To ASCII lowercase a string, replace all ASCII upper alpha in the string with the corresponding code points in ASCII lower alpha.

To ASCII uppercase a string, replace all ASCII lower alpha in the string with the corresponding code points in ASCII upper alpha.

A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.

4. Data structures

Conventionally, specifications have operated on a variety of vague specification-level data structures, based on shared understanding of their semantics. This generally works well, but can lead to ambiguities around edge cases, such as iteration order or what happens when you append an item to an ordered set that the set already contains. It has also led to a variety of divergent notation and phrasing, especially around more complex data structures such as maps.

This standard provides a small set of common data structures, along with notation and phrasing for working with them, in order to create common ground.

4.1. Lists

A list is a specification type consisting of a finite ordered sequence of items.

For notational convenience, a literal syntax can be used to express lists, by surrounding the list contents by « » characters and separating list items with a comma. An indexing syntax can be used by providing a zero-based index into a list inside square brackets.

Let example be the list « "a", "b", "c", "a" ». Then example[1] is the string "b".


To append to a list that is not an ordered set is to add the given item to the end of the list.

To prepend to a list that is not an ordered set is to add the given item to the beginning of the list.

The above definitions are modified when the list is an ordered set; see below for ordered set append and ordered set prepend.

To remove an item from a list is to remove all items from the list that match a given condition, or do nothing if none do.

Removing x from the list « x, y, z, x » is to remove all items from the list that are equal to x. The list now is equivalent to « y, z ».

Removing all items that start with the string "a" from the list « "a", "b", "ab", "ba" » is to remove the items "a" and "ab". The list is now equivalent to « "b", "ba" ».

A list contains an item if it appears in the list.

A list’s size is the number of items the list contains.

A list is empty if its size is zero.

To iterate over a list, performing a set of steps on each item in order, use phrasing of the form "For each item of list", and then operate on item in the subsequent prose.


The list type originates from the JavaScript specification (where it is capitalized, as List); we repeat some elements of its definition here for ease of reference, and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a List, a list as defined here can be used; they are the same type. [ECMA-262]

A list whose items are all of a particular Web IDL type T can be converted to the corresponding sequence type sequence<T> by creating a sequence whose items are the items of the list. [WEBIDL]

4.1.1. Stacks

Some lists are designated as stacks. A stack is a list, but conventionally, the following operations are used to operate on it, instead of using append, prepend, or remove.

To push onto a stack is to append to it.

To pop from a stack is to remove its last item and return it, if the stack is not empty, or to return nothing otherwise.

4.1.2. Queues

Some lists are designated as queues. A queue is a list, but conventionally, the following operations are used to operate on it, instead of using append, prepend, or remove.

To enqueue in a queue is to append to it.

To dequeue from a queue is to remove its first item and return it, if the queue is not empty, or to return nothing if it is.

4.1.3. Sets

Some lists are designated as ordered sets. An ordered set is a list with the additional semantic that it must not contain the same item twice.

Almost all cases on the web platform require an ordered set, instead of an unordered one, since interoperability requires that any developer-exposed enumeration of the set’s contents be consistent between browsers. In those cases where order is not required, we still use ordered sets; implementations can optimize based on the fact that the order is not observable.

To append to an ordered set is to do nothing if the set already contains the given item, or to perform the normal list append operation otherwise.

To prepend to an ordered set is to do nothing if the set already contains the given item, or to perform the normal list prepend operation otherwise.

4.2. Maps

A ordered map, or sometimes just "map", is a specification type consisting of a finite ordered sequence of key/value pairs, with no key appearing twice. Each key/value pair is called an entry.

As with ordered sets, by default we assume that maps must also be ordered for interoperability among implementations.

A literal syntax can be used to express ordered maps, by surrounding the contents with «[ ]» delimiters, denoting each entry as keyvalue, and separating entries with a comma. An indexing syntax can be used to look up and set values by providing a key inside square brackets.

Let example be the ordered map «[ "a" → `x`, "b" → `y` ]». Then example["a"] is the byte sequence `x`.


To get the value of an entry in an ordered map given a key is to retrieve the value of any existing entry if the map contains an entry with the given key, or if to return nothing otherwise. We can also use the indexing syntax explained above.

To set the value of an entry in an ordered map to a given value is to update the value of any existing entry if the map contains an entry with the given key, or if none such exists, to add a new entry with the given key/value to the end of the map. We can also denote this by saying, for an ordered map map, key key, and value value, "set map[key] to value".

To remove an entry from an ordered map is to remove all entries from the map that match a given condition, or do nothing if none do. If the condition is having a certain key, then we can also denote this by saying, for an ordered map map and key key, "remove map[key]".

An ordered map contains an entry with a given key if there exists an entry with that key. We can also denote this by saying that, for an ordered map map and key key, "map[key] exists".

To get the keys of an ordered map, return a new ordered set whose items are each of the keys in the map’s entries.

An ordered map’s size is the size of the result of running get the keys on the map.

An ordered map is empty if its size is zero.

To iterate over an ordered map, performing a set of steps on each entry in order, use phrasing of the form "For each keyvalue of map", and then operate on key and value in the subsequent prose.


An ordered map whose keys are all strings and whose values are all of the same Web IDL type TValue can be converted to the corresponding record type record<TKey, TValue> by first converting all its keys to the appropriate Web IDL string type TKey, and then creating corresponding record mappings for each pair of converted key/original value. [WEBIDL]

5. Namespaces

The HTML namespace is "http://www.w3.org/1999/xhtml".

The MathML namespace is "http://www.w3.org/1998/Math/MathML".

The SVG namespace is "http://www.w3.org/2000/svg".

The XLink namespace is "http://www.w3.org/1999/xlink".

The XML namespace is "http://www.w3.org/XML/1998/namespace".

The XMLNS namespace is "http://www.w3.org/2000/xmlns/".

Acknowledgments

Many thanks to Jungkee Song, Malika Aubakirova, Michael™ Smith, Mike West, Philip Jägenstedt, Simon Pieters, Tab Atkins, Tobie Langel, and Xue Fuqiao for being awesome!

This standard is written by Anne van Kesteren (Mozilla, annevk@annevk.nl) and Domenic Denicola (Google, d@domenic.me).

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ECMA-262]
ECMAScript Language Specification. URL: https://tc39.github.io/ecma262/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[UNICODE]
The Unicode Standard. URL: http://www.unicode.org/versions/latest/
[WEBIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. URL: https://heycam.github.io/webidl/