UTrie2 was designed for lookups by code points, UTF-16, as well as UTF-8.
It has some special data structures for UTF-8 non-shortest forms which are obsolete and unused since ICU 60 changed handling of ill-formed UTF-8 to collecting "maximal subparts" of valid sequences. (This is compatible with the W3C Encoding standard, so I expect it to not change again.)
It provides for different values for lead surrogate code //units// vs. code //points//, which complicates the API, builder and runtime. We might be able to let those callers that need this handle it themselves. (Normalization properties only?)
Size overhead: A UTrie2 with no data at all is about 4.6kB large. It compacts a large amount of data fairly well, but for collation where a typical tailoring has a small amount of data, we might be able to reduce the data size. With less overhead, we might also use more separate tries where we currently combine or avoid them.
I have some ideas: http://site.icu-project.org/design/struct/utrie
I will experiment and benchmark in a branch.