Follow-up: strncpy doesn't always null-terminate
This is a follow-up ticket to
In short: strncpy doesn’t null terminate, which can lead to passing around non-null terminated buffers to functions that expect/require null terminated strings – leading to bugs, crashes, or security issues.
Some time ago, I was looking into replacing strncpy in the ICU sources. However, at the time there didn’t really seem to be a clear-cut universally accepted recommendation for what should “replace” strcpy (and strncpy). There’s some differing opinions and debate on how “best” to handle C-style strings. Though everyone agrees that strcpy is bad, and strncpy should also be avoided.
Todd C. Miller and Theo de Raadt (of OpenBSD and OpenSSH fame) obviously suggest using strlcpy.
FWIW, it’s now present on most Unix platforms (Solaris, the *BSDs, Mac OSX), and is present on Linux in the kernel but only for internal use. (There is libbsd on Linux though).
Ulrich Drepper (Red Hat) suggests using something like this instead:
((char *) mempcpy (dst, src, n)) = '\0';
However, this has two problems:
(1) mempcpy (note the “p”) isn’t standard at all – its a GNU extension (and seems to have less adoption than strlcpy).
(2) There’s no way to tell if the string was truncated prematurely.
Bruce Dawson (Google, formerly Microsoft/Valve) suggests using a hand-rolled wrapper over top of strncpy which ensures null-termination. (Since strlcpy isn’t available everywhere).
There’s also many other “string copy” functions that have varying levels of guarantees w.r.t null termination, error-handling, etc. However, many of these are platform specific [StringCch*], or have questionable error handling approaches (like terminating the program).
From what I could tell, it seemed like strlcpy was the most popular alternative. (Aside from using a higher level approach [ex: classes/wrappers in C++], or “just being more careful” when using strcpy or memcpy.).
Regarding performance concerns:
From what I could tell, it seems that when comparing a compiler provided strncpy to a library-provided strlcpy, the compiler optimized strncpy often ends up wasting more time due to the requirement that it fills the destination buffer with zeros, versus the strlcpy.
If we wanted to have something like strlcpy, then one way we could approach it would be to have new internal ICU API called “uprv_strlcpy” and then it could use whatever implementation was available.
FWIW, I also think it would be likely useful to have C++ style wrapper that infers the size of a fixed-size array (similar to Bruce Dawson’s approach). [Though I’m not sure on the name of it].
FYI My colleague was looking at icu4c/source/common/uloc_tag.cpp ultag_parse()
I’d like to, but I’m not sure how much time I’ll get to work on public ICU-related things. The ticket is marked for “future”, but I marked this “possible-68” if I get time.
Jeff: Do you plan to work on this for ICU 68?
Just to emphasize what Jeff said before: strncpy() is also stupidly inefficient because it writes not just one NUL (when there is space), it fills the rest of the destination with NUL bytes. This is almost never useful. So where we know what we are doing, some kind of memcpy+NUL replacement should be good.
Adding notes based on discussion in the ICU-TC on 2020-02-26.
There seems to be two main usages or scenarios/cases for this:
You either know ahead of time that copying the string is fine (maybe you computed the length, etc.) and thus using strncpy is actually fine.
You don’t know the exact length of string(s) and thus you probably shouldn’t even be using strncpy anyways and you should use something like CharString instead, which expands dynamically.
So adding in something like strlcpy seems like a hacky work-around to this which doesn’t really address the root of the problem.
Based on this, we should not add a uprv_strlcpy (or equivalent function) in ICU4C, and instead we should use this ticket to do the following:
Examine all usages of strncpy in ICU4C and determine if it falls into case (1) or (2) as mentioned above.
For usage that falls into case (1):
Consider possible replacing the call to strncpy with memcpy, as memcpy would be faster.
If the usage of strncpy is fine then we can perhaps suppress the warning if needed.
For usage that falls into case (2):
Replace the fixed size buffers with CharString, or some other dynamic buffer.