According to confusables.txt version 2.1 (shipped with ICU 4.6), U+017F (ſ) should be treated as confusable with U+0066 (f).
Currently, USpoof normalizes all input to NFKD, not NFD, before applying the confusable mapping. UTR specifies to use NFD, not NFKD:
http://unicode.org/reports/tr39 ()/
To see whether two strings X and Y are confusable according to a given table (abbreviated as X ≅ Y), an implementation uses a transform of X called a skeleton(X) defined by:
Converting X to NFD format, as described in [UAX15].
2. Successively mapping each source character in X to the target string according to the specified data table.
3. Reapplying NFD.
Because USpoof normalizes to NFKD, U+017f is normalized to "s", and thus its skeleton differs from "f". See attached test case that reproduces the issue against ICU 4.6.
I made a patch to switch USpoof to NFD (attached), but it makes several intltest tests fail, since they assumed NFKD. If this patch looks like the right approach, should we just fix or remove the bad tests?
I'll apply the patch and fix whatever tests break. UTS 39 changed from specifying NFKD to NFD between revision 3 and 4, and I overlooked it.
Milestone 4.7.1 deleted