I'm looking to convert a slice of bytes []byte into an UTF-8 string.
I want to write a function like that :
func bytesToUTF8string(bytes []byte)(string){
// Take the slice of bytes and encode it to UTF-8 string
// return the UTF-8 string
}
What is the most efficient way to perform this
EDIT :
Specifically I want to convert the output of crypto.rsa.EncryptPKCS1v15 or the output of SignPKCS1v15 to an UTF-8 encoded string.
How can I do it ?
func bytesToUTF8string(bytes []byte) string {
return string(bytes)
}
It's such a common, simple operation that it's arguably not worth wrapping in a function. Unless, of course, you need to translate the from a different source encoding, then it's an entirely different issue, with which the golang.org/x/text/encoding package might help
Related
Sorry if it's a stupid question or I didn't give enough information. I have a string which should represent an ID: "\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2", and I'm confused on what it is? I try to decode it with utf-8, utf-16, and gbk but none of them works. I realized the \x means hexadecimal, but what is \v and \nP?
The text in the question looks like binary data encoded to a Go interpreted string literal. Use strconv.Unquote to convert the text back to binary data:
s, err := strconv.Unquote(`"\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2"`)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%x\n", s) // prints 8f042e8b388e0a50bde30b4c66d6572a9276628b32
fmt.Printf("%q\n", s) // prints "\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2"
The Go language specification defines the syntax. The \n represents a byte with the value 10. The \v represents a byte with the value 11. The \xXX is hexadecimal as noted in the question.
I need to convert the data format of an int16 to a string representing its hexadecimal value.
I have tried some hex converters but they change the data instead of changing the formatting. I need it to be a string representation of its hexadecimal value.
data := (data from buffer)
fmt.Printf("BUFFER DATA : %X\n", data) // output print on screen D9DC (hex)
fmt.Println(("BUFFER DATA : ", string(data)) // output print on screen 55772 (dec)
fmt.Println(("BUFFER DATA : ", data) // output print on screen [?]
How can I convert the data format so it prints D9DC with fmt.Println?
Full code here https://play.golang.org/p/WVpMb9lh1Rx
Since fmt.Println doesn't accept format flags, it prints each variable depending on its type.
crc16.Checksum returns an int16, so fmt.Println will display the integer value of your hexadecimal string, which is 55772.
If you want fmt.Println to print D9DC instead of the integer value, you have multiple choices.
Convert your integer into a string that contains the hexadecimal value (which means if you change your integer, you will need to convert it into a string again before using it
Create your own type with a String() method, which is an integer but is represented by its hexadecimal value when printed.
For the second option, your type could be something like this:
type Hex int16
func (h Hex) String() string {
return strconv.FormatInt(int64(h), 16)
}
fmt.Println will automatically use this method because it means the Hex type implements the Stringer interface. For more info on this, here are some resources:
https://tour.golang.org/methods/17
https://golang.org/pkg/fmt/#Stringer
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying to ensure a string coming from an http request is valid for use in a base64 url param. I've been experimenting with base64.RawURLEncoding as I assumed encoding an invalid string would throw an err, or at least decoding the result of this would fail, however it quite happily encodes/decodes the string regardless of the input.
https://play.golang.org/p/3sHUfl2NSJK
I have created the above playground showing the issue I'm having (albeit an extreme example). Is there another way of ascertaining whether a string consists entirely of valid base64 characters?
To clarify, Base64 is an encoding scheme which allows you to take arbitrary binary data and safely encode it into ASCII characters which can later be decoded into the original binary string.
That means that the "Base64-encode" operation can take literally any input and produce valid, encoded data. However, the "Base64-decode" operation will fail if its input string contains characters outside of set of ASCII characters that the encoding uses (meaning that the given string was not produced by a valid Base64-encoder).
To test if a string contains a valid Base64 encoded sequence, you just need to call base64.Encoding.DecodeString(...) and test if the error is "nil".
For example (Go Playground):
func IsValidBase64(s string) bool {
_, err := base64.StdEncoding.DecodeString(s)
return err == nil
}
func main() {
ss := []string{"ABBA", "T0sh", "Foo=", "Bogus\x01"}
for _, s := range ss {
if IsValidBase64(s) {
fmt.Printf("OK: valid Base64 %q\n", s)
} else {
fmt.Printf("ERR: invalid Base64 %q\n", s)
}
}
// OK: valid Base64 "ABBA"
// OK: valid Base64 "T0sh"
// OK: valid Base64 "Foo="
// ERR: invalid Base64 "Bogus\x01"
}
base64 encoding works by interpreting an arbitrary bit stream as a string of 6-bit integers, which are then mapped one-by-one to the chosen base64 alphabet.
Your example string starts with these 8-bit bytes:
11000010 10111010 11000010 10101010 11100010 10000000
Re-arrange them into 6-bit numbers:
110000 101011 101011 000010 101010 101110 001010 000000
And map them to a base64 alphabet (here URL encoding):
w r r C q u K A
Since every 6-bit number can be mapped to a character in the alphabet (there's exactly 64 of them), there are no invalid inputs to base64. This is precisely what base64 is used for: turn arbitrary input into printable ASCII characters.
Decoding, on the other hand, can and will fail if the input contains bytes outside of the base64 alphabet — they can't be mapped back to the 6-bit integer.
Example string:
"\u0410\u043b\u0435\u043a\u0441\u0430\u043d\u0434\u0440\u044b! \n\u0421\u043f\u0430\u0441\u0438\u0431\u043e \ud83d\udcf8 link.ru \u0437\u0430 \n#hashtag Русское слово, an English word"
Without this \ud83d\udcf8 my func works well:
func convertUnicode(text string) string {
s, err := strconv.Unquote(`"` + text + `"`)
if err != nil {
// Error.Printf("can't convert: %s | err: %s\n", text, err)
return text
}
return s
}
My question is how to detect that text contains this kind of entries? And how to convert it to emoji or how to remove from the text? Thanks
Well, probably not so simple as neither \ud83d nor \udcf8 are valid code points but together are a surrogate pair used in UTF-16 encoding to encode \U0001F4F8. Now strconv.Unquote will give you two surrogate halves which you have to combine yourself.
Use strconv.Unquote to unquote as you did.
Convert to []rune for convenience.
Find surrogate pairs with unicode/utf16.IsSurrogate.
Combine surrogate pairs with unicode/utf16.DecodeRune.
Convert back to string.
I am trying to write simple TCP/IP client in Rust and I need to print out the buffer I got from the server.
How do I convert a Vec<u8> (or a &[u8]) to a String?
To convert a slice of bytes to a string slice (assuming a UTF-8 encoding):
use std::str;
//
// pub fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error>
//
// Assuming buf: &[u8]
//
fn main() {
let buf = &[0x41u8, 0x41u8, 0x42u8];
let s = match str::from_utf8(buf) {
Ok(v) => v,
Err(e) => panic!("Invalid UTF-8 sequence: {}", e),
};
println!("result: {}", s);
}
The conversion is in-place, and does not require an allocation. You can create a String from the string slice if necessary by calling .to_owned() on the string slice (other options are available).
If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.
If you need a String instead of a &str, you may also consider String::from_utf8 instead.
The library references for the conversion function:
std::str::from_utf8
std::str::from_utf8_unchecked
std::string::String::from_utf8
I prefer String::from_utf8_lossy:
fn main() {
let buf = &[0x41u8, 0x41u8, 0x42u8];
let s = String::from_utf8_lossy(buf);
println!("result: {}", s);
}
It turns invalid UTF-8 bytes into � and so no error handling is required. It's good for when you don't need that and I hardly need it. You actually get a String from this. It should make printing out what you're getting from the server a little easier.
Sometimes you may need to use the into_owned() method since it's clone on write.
If you actually have a vector of bytes (Vec<u8>) and want to convert to a String, the most efficient is to reuse the allocation with String::from_utf8:
fn main() {
let bytes = vec![0x41, 0x42, 0x43];
let s = String::from_utf8(bytes).expect("Found invalid UTF-8");
println!("{}", s);
}
In my case I just needed to turn the numbers into a string, not the numbers to letters according to some encoding, so I did
fn main() {
let bytes = vec![0x41, 0x42, 0x43];
let s = format!("{:?}", &bytes);
println!("{}", s);
}
To optimally convert a Vec<u8> possibly containing non-UTF-8 characters/byte sequences into a UTF-8 String without any unneeded allocations, you'll want to optimistically try calling String::from_utf8() then resort to String::from_utf8_lossy().
let buffer: Vec<u8> = ...;
let utf8_string = String::from_utf8(buffer)
.map_err(|non_utf8| String::from_utf8_lossy(non_utf8.as_bytes()).into_owned())
.unwrap();
The approach suggested in the other answers will result in two owned buffers in memory even in the happy case (with valid UTF-8 data in the vector): one with the original u8 bytes and the other in the form of a String owning its characters. This approach will instead attempt to consume the Vec<u8> and marshal it as a Unicode String directly and only failing that will it allocate room for a new string containing the lossily UTF-8 decoded output.