Limited URL spoofing via crafted A-label

Description

We got the following report in Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=1063566):

Due to lack of proper label checking, IDNs like xn--domainname-.com are treated as safe-to-display Unicode IDNs, and after processing they are displayed in the URL bar as “domainname.com” (pic. 1), which can be used for spoofing attacks. Popular domains from domains.list [1], like google.com or fb.com, can’t be spoofed this way due to additional checks, but other sites can be. Another limitation is that it can’t be used to spoof IDNs.
How this works: before conversion to Unicode, labels in Chrome are checked only for the “xn--” prefix in IDNToUnicodeOneComponent() //src/components/url_formatter/url_formatter.cc, line 411, if prefix is in place, it calls ICU library function uidna_labelToUnicode() to convert A-label to Unicode. ICU library function u_strFromPunycode() (which is doing the actual conversion from punycode), before conversion looks for last delimiter(‘-’) and then copies all basic(in ICU terms) chars before it to array(dest) [see //src/third_party/icu/source/punycode.cpp line 407]
since condition in line 442: for(in=basicLength>0 ? basicLength+1 : 0; in<srcLength; /* no op */)
is not satisfied(in = srcLength, as there are no more symbols after last delimiter(‘-’)), the main loop is simply skipped, so after conversion we have only basic(in ICU terms) characters AND since there were no errors during conversion, the output is considered to be safe-to-display.
Example:

xn--chromium-.org → chromium.org

[1] domains.list: src/components/url_formatter/spoof_checks/top_domains/domains.list

Meacer says:

Rfc3490 has the following steps in ToUnicode definition:

....
5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
fail if there is an error. Save a copy of the result of this
step.

6. Apply ToASCII.

7. Verify that the result of step 6 matches the saved copy from step
3, using a case-insensitive ASCII comparison.

...

Reporter says:

Thanks.
I've fixed it in url_formatter.cc (just added a check for trailing hyphen), but after reading RFC3492's Decoding procedure(Chapter6.2) I think ICU's u_strFromPunycode() should be fixed. In RFC's decoding procedure there is requirement to 'fail' if there was none code points to consume. Plus, in provided "Punycode Sample Implementation" there is function punycode_decode() which has the necessary check:
if (in >= input_length) return punycode_bad_input

Activity

Show:

Markus Scherer August 14, 2020 at 9:37 PM

I also submitted a Unicode bug report for these edge cases, proposing to add them to https://www.unicode.org/reports/tr46/#ProcessingStepPunycode

Markus Scherer August 14, 2020 at 9:36 PM

Fixed with PR #1234: validate ACE label edge cases

Check for `"xn--"` and `"xn--ASCII-"` and fail equivalently to IDNA2008 ToUnicode which verifies that its input round-trips via ToASCII back to the same string. These strings turn into `""` and `"ASCII"`, different from the input.

UTS #46 otherwise validates equivalently as well. The code here uses a much simpler test for these edge cases.

Markus Scherer August 13, 2020 at 11:49 PM

As the bug report points out, RFC 3492 ToUnicode says to double-check: Convert back from Punycode (apply ToASCII) and check that the result is the same as the input.

ICU actually implements UTS #46, not vanilla IDNA2008. I don’t see such a double-check in the ToUnicode or Processing parts of this spec.

Should we update the UTS #46 spec? Are there any other types of bad inputs that only a round-trip with the extra ToASCII will catch, or would it suffice to only check for this case?

Markus Scherer August 13, 2020 at 6:38 PM

in provided "Punycode Sample Implementation" there is function punycode_decode() which has the necessary check:
if (in >= input_length) return punycode_bad_input

Yes, but that’s in the nested loop for decoding a variable-length integer. The outer loop has the condition in < input_length so for an empty string after the last hyphen the whole outer loop is just skipped. See page 29 of the RFC: https://tools.ietf.org/html/rfc3492#page-29

Same with the spec for the Decoding Procedure: https://tools.ietf.org/html/rfc3492#section-6.2

The inner loop has “consume a code point, or fail if there was none to consume“ but the outer loop has “while the input is not exhausted do begin“ so does nothing if there is no input after the last delimiter.

That is, as far as I can tell, our Punycode implementation conforms to the spec.

I will take a look at the next level up.

Shane Carr March 28, 2020 at 12:08 AM

This is scheduled for ICU 68, which is expected in Q4 2020; however, if the fix is small, you could patch it in sooner than that, depending on Markus’s timeline.

Fixed

Details

Assignee

Reporter

Components

Labels

Reviewer

Priority

Time Needed

Hours

Fix versions

Created March 25, 2020 at 2:12 AM
Updated February 10, 2022 at 5:49 AM
Resolved August 14, 2020 at 9:36 PM

Flag notifications