Re: (RFC 3986) Clarification around interpreting literal plus in URLs (U+002B)

Julian Reschke <julian.reschke=40gmx.de@xxxxxxxxxxxxxx> · Sat, 15 Mar 2025 07:24:31 +0100

On 14.03.2025 11:30, Raghu Saxena wrote:
Dear IETF Community,

I was writing to seek clarification around URIs & HTTP, specifically how
to handle "plus" ("+") symbols in them.

For instance, if my server receives a request for `GET /page?key=A+B` ,
where the bytes over the wire are literally [0x412B42], should I
interpret it as literally that byte sequence, or decode the "+" to a
space, thereby interpreting the bytes as [0x412042]?

It depends.

 From my reading of RFC3986, it seems that "+" is a reserved character,
but it's not clear how it is to be interpreted / decoded.

From the pov of the URI specification, "+" is not special and does not
need to be encoded (ABNF: query -> pchar -> subdelims).

The question behind the question is how should spaces be encoded to be
URL safe; it seems "%20" is the recommended approach, however some
languages (such as Golang[0]) implement query-escaping where the spaces
(0x20) are replaced by a literal "+" (0x2B). This causes problems by
some libraries which then treat this as a literal "plus" (0x2B) and then
return unexpected results.

What you observe is a layering issue.

Typically, query parameters are generated by HTML form submissions, and
those use their own encoding on top of what is needed in URIs. This
format indeed maps "+" to a space. So a library should dinstinguish
between encoding into URIs, and encoding into query parameters used in
HTML forms (FWIW, that encoding is also used in POST payloads).

See https://url.spec.whatwg.org/#application/x-www-form-urlencoded.

Best regards, Julian