Limited URL spoofing via crafted A-label

General

Other Data

General

Other Data

Description

We got the following report in Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=1063566):

Due to lack of proper label checking, IDNs like xn--domainname-.com are treated as safe-to-display Unicode IDNs, and after processing they are displayed in the URL bar as “domainname.com” (pic. 1), which can be used for spoofing attacks. Popular domains from domains.list [1], like google.com or fb.com, can’t be spoofed this way due to additional checks, but other sites can be. Another limitation is that it can’t be used to spoof IDNs.
How this works: before conversion to Unicode, labels in Chrome are checked only for the “xn--” prefix in IDNToUnicodeOneComponent() //src/components/url_formatter/url_formatter.cc, line 411, if prefix is in place, it calls ICU library function uidna_labelToUnicode() to convert A-label to Unicode. ICU library function u_strFromPunycode() (which is doing the actual conversion from punycode), before conversion looks for last delimiter(‘-’) and then copies all basic(in ICU terms) chars before it to array(dest) [see //src/third_party/icu/source/punycode.cpp line 407]
since condition in line 442: for(in=basicLength>0 ? basicLength+1 : 0; in<srcLength; /* no op */)
is not satisfied(in = srcLength, as there are no more symbols after last delimiter(‘-’)), the main loop is simply skipped, so after conversion we have only basic(in ICU terms) characters AND since there were no errors during conversion, the output is considered to be safe-to-display.
Example:

xn--chromium-.org → chromium.org

[1] domains.list: src/components/url_formatter/spoof_checks/top_domains/domains.list

Meacer says:

Rfc3490 has the following steps in ToUnicode definition:
....
5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
fail if there is an error. Save a copy of the result of this
step.
6. Apply ToASCII.
7. Verify that the result of step 6 matches the saved copy from step
3, using a case-insensitive ASCII comparison.
...

Reporter says:

Thanks.
I've fixed it in url_formatter.cc (just added a check for trailing hyphen), but after reading RFC3492's Decoding procedure(Chapter6.2) I think ICU's u_strFromPunycode() should be fixed. In RFC's decoding procedure there is requirement to 'fail' if there was none code points to consume. Plus, in provided "Punycode Sample Implementation" there is function punycode_decode() which has the necessary check:
if (in >= input_length) return punycode_bad_input

Activity

Show:

Markus Scherer August 14, 2020 at 9:37 PM

I also submitted a Unicode bug report for these edge cases, proposing to add them to https://www.unicode.org/reports/tr46/#ProcessingStepPunycode

Markus Scherer August 14, 2020 at 9:36 PM

Fixed with PR #1234: validate ACE label edge cases

Check for `"xn--"` and `"xn--ASCII-"` and fail equivalently to IDNA2008 ToUnicode which verifies that its input round-trips via ToASCII back to the same string. These strings turn into `""` and `"ASCII"`, different from the input.

UTS #46 otherwise validates equivalently as well. The code here uses a much simpler test for these edge cases.

Markus Scherer August 13, 2020 at 11:49 PM

As the bug report points out, RFC 3492 ToUnicode says to double-check: Convert back from Punycode (apply ToASCII) and check that the result is the same as the input.

ICU actually implements UTS #46, not vanilla IDNA2008. I don’t see such a double-check in the ToUnicode or Processing parts of this spec.

Should we update the UTS #46 spec? Are there any other types of bad inputs that only a round-trip with the extra ToASCII will catch, or would it suffice to only check for this case?

Markus Scherer August 13, 2020 at 6:38 PM

in provided "Punycode Sample Implementation" there is function punycode_decode() which has the necessary check:
if (in >= input_length) return punycode_bad_input

Yes, but that’s in the nested loop for decoding a variable-length integer. The outer loop has the condition in < input_length so for an empty string after the last hyphen the whole outer loop is just skipped. See page 29 of the RFC: https://tools.ietf.org/html/rfc3492#page-29

Same with the spec for the Decoding Procedure: https://tools.ietf.org/html/rfc3492#section-6.2

The inner loop has “consume a code point, or fail if there was none to consume“ but the outer loop has “while the input is not exhausted do begin“ so does nothing if there is no input after the last delimiter.

That is, as far as I can tell, our Punycode implementation conforms to the spec.

I will take a look at the next level up.

Shane Carr March 28, 2020 at 12:08 AM

This is scheduled for ICU 68, which is expected in Q4 2020; however, if the fix is small, you could patch it in sooner than that, depending on Markus’s timeline.

Fixed

Details
Assignee
Markus Scherer
Reporter
Shane Carr
Components
idna
Labels
chrome
Reviewer
Mustafa Emre Acer
Priority
major
Time Needed
Hours
Fix versions
68.1

Created March 25, 2020 at 2:12 AM

Updated February 10, 2022 at 5:49 AM

Resolved August 14, 2020 at 9:36 PM

Limited URL spoofing via crafted A-label

Description

Activity

Markus Scherer August 14, 2020 at 9:37 PM

Markus Scherer August 14, 2020 at 9:36 PM

Markus Scherer August 13, 2020 at 11:49 PM

Markus Scherer August 13, 2020 at 6:38 PM

Shane Carr March 28, 2020 at 12:08 AM

Details
Assignee
Markus Scherer
Reporter
Shane Carr
Components
idna
Labels
chrome
Reviewer
Mustafa Emre Acer
Priority
major
Time Needed
Hours
Fix versions
68.1

Details

Assignee

Reporter

Components

Labels

Reviewer

Priority

Time Needed

Fix versions

Flag notifications

Something's gone wrong

Something's gone wrong

Limited URL spoofing via crafted A-label

Description

Activity

Markus Scherer August 14, 2020 at 9:37 PM

Markus Scherer August 14, 2020 at 9:36 PM

Markus Scherer August 13, 2020 at 11:49 PM

Markus Scherer August 13, 2020 at 6:38 PM

Shane Carr March 28, 2020 at 12:08 AM

DetailsAssigneeMarkus SchererMarkus SchererReporterShane CarrShane CarrComponentsidnaLabelschromeReviewerMustafa Emre AcerMustafa Emre AcerPrioritymajorTime NeededHoursFix versions68.1

Details

Assignee

Reporter

Components

Labels

Reviewer

Priority

Time Needed

Fix versions

Flag notifications

Something's gone wrong

Something's gone wrong

Details
Assignee
Markus Scherer
Reporter
Shane Carr
Components
idna
Labels
chrome
Reviewer
Mustafa Emre Acer
Priority
major
Time Needed
Hours
Fix versions
68.1