标签云

微信群

扫码加入我们

WeChat QR Code

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8. In some debugging code, I need to convert it to a string. Is there a one liner that will do this? Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.


"should be just an allocation and a memcopy": is not correct because a .NET string is UTF-16 encoded. A Unicode character might be one UTF-8 code unit or one UTF-16 code unit. another might be two UTF-8 code units or one UTF-16 code unit, another might be three UTF-8 code units or one UTF-16 code unit, another might be four UTF-8 code units or two UTF-16 code units. A memcopy might be able to widen but it wouldn't be able to handle UTF-8 to UTF-16 conversion.

2019年08月19日11分57秒

how does it handle null ended strings ?

2019年08月19日11分57秒

maazza for unknown reason it doesn't at all. I'm calling it like System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');.

2019年08月19日11分57秒

Hi-Angel Unknown reason? The only reason null-terminated strings ever became popular was the C language - and even that was only because of a historical oddity (CPU instructions that dealt with null-terminated strings). .NET only uses null-terminated strings when interopping with code that uses null-terminated strings (which are finally disappearing). It's perfectly valid for a string to contain NUL characters. And of course, while null-terminated strings are dead simple in ASCII (just build until you get the first zero byte), other encodings, including UTF-8, are not so simple.

2019年08月19日11分57秒

One of the beautiful features of UTF-8 is that a shorter sequence is never a subsequence of a longer sequence. So a null terminated UTF-8 string is simple.

2019年08月19日11分57秒

Well, good luck unpacking it if it has non-ascii. Just use Convert.ToBase64String.

2019年08月19日11分57秒

LINQ it: var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();

2019年08月19日11分57秒

But this assumes that there is either an encoding BOM in the byte stream or that it is in UTF-8. But you can do the same with Encoding anyway. It doesn't magically solve the problem when you don't know the encoding.

2019年08月19日11分57秒

I received System.FormatException using your method when I unpacked it with Convert.FromBase64String.

2019年08月19日11分57秒

AndrewJE this will take for even to compute if you have a large byte array like the one used from the pictures.

2019年08月19日11分57秒

But not UTF-8 methinks?

2019年08月19日11分57秒

UnicodeEncoding is the worst class name ever; unicode isn't an encoding at all. That class is actually UTF-16. The little-endian version, I think.

2019年08月19日11分57秒

This converts the byte array to a hexadecimal string representing each byte, which is generally not what you want when converting bytes to a string. If you do, then that's another question, see for example How do you convert Byte Array to Hexadecimal String, and vice versa?.

2019年08月19日11分57秒

Not what OP asked

2019年08月19日11分57秒

Mine does, actually. byteArr.TakeWhile(x => x != 0) is a quick and easy way to solve the null termination problem.

2019年08月19日11分57秒

Where's your unit test ;)?

2019年08月19日11分57秒

didnt have one. But this function is in use for binary transmission in our company-network and so far 20TB were re- and encoded correctly. So for me this function works :)

2019年08月19日11分57秒