Base64 encoding doesn't fail with invalid characters [closed] - go

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying to ensure a string coming from an http request is valid for use in a base64 url param. I've been experimenting with base64.RawURLEncoding as I assumed encoding an invalid string would throw an err, or at least decoding the result of this would fail, however it quite happily encodes/decodes the string regardless of the input.
https://play.golang.org/p/3sHUfl2NSJK
I have created the above playground showing the issue I'm having (albeit an extreme example). Is there another way of ascertaining whether a string consists entirely of valid base64 characters?

To clarify, Base64 is an encoding scheme which allows you to take arbitrary binary data and safely encode it into ASCII characters which can later be decoded into the original binary string.
That means that the "Base64-encode" operation can take literally any input and produce valid, encoded data. However, the "Base64-decode" operation will fail if its input string contains characters outside of set of ASCII characters that the encoding uses (meaning that the given string was not produced by a valid Base64-encoder).
To test if a string contains a valid Base64 encoded sequence, you just need to call base64.Encoding.DecodeString(...) and test if the error is "nil".
For example (Go Playground):
func IsValidBase64(s string) bool {
_, err := base64.StdEncoding.DecodeString(s)
return err == nil
}
func main() {
ss := []string{"ABBA", "T0sh", "Foo=", "Bogus\x01"}
for _, s := range ss {
if IsValidBase64(s) {
fmt.Printf("OK: valid Base64 %q\n", s)
} else {
fmt.Printf("ERR: invalid Base64 %q\n", s)
}
}
// OK: valid Base64 "ABBA"
// OK: valid Base64 "T0sh"
// OK: valid Base64 "Foo="
// ERR: invalid Base64 "Bogus\x01"
}

base64 encoding works by interpreting an arbitrary bit stream as a string of 6-bit integers, which are then mapped one-by-one to the chosen base64 alphabet.
Your example string starts with these 8-bit bytes:
11000010 10111010 11000010 10101010 11100010 10000000
Re-arrange them into 6-bit numbers:
110000 101011 101011 000010 101010 101110 001010 000000
And map them to a base64 alphabet (here URL encoding):
w r r C q u K A
Since every 6-bit number can be mapped to a character in the alphabet (there's exactly 64 of them), there are no invalid inputs to base64. This is precisely what base64 is used for: turn arbitrary input into printable ASCII characters.
Decoding, on the other hand, can and will fail if the input contains bytes outside of the base64 alphabet — they can't be mapped back to the 6-bit integer.

Related

What the code type of this string and how to decode it?

Sorry if it's a stupid question or I didn't give enough information. I have a string which should represent an ID: "\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2", and I'm confused on what it is? I try to decode it with utf-8, utf-16, and gbk but none of them works. I realized the \x means hexadecimal, but what is \v and \nP?
The text in the question looks like binary data encoded to a Go interpreted string literal. Use strconv.Unquote to convert the text back to binary data:
s, err := strconv.Unquote(`"\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2"`)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%x\n", s) // prints 8f042e8b388e0a50bde30b4c66d6572a9276628b32
fmt.Printf("%q\n", s) // prints "\x8f\x04.\x8b8\x8e\nP\xbd\xe3\vLf\xd6W*\x92vb\x8b2"
The Go language specification defines the syntax. The \n represents a byte with the value 10. The \v represents a byte with the value 11. The \xXX is hexadecimal as noted in the question.

Getting wrong Base64 encoded result in GO [duplicate]

This question already has an answer here:
How to transfer hex strings to []byte directly in Go?
(1 answer)
Closed 12 months ago.
I have the following hex data created by converting 5 values(consists of name, numeric and date field) to TLV
0115426f627320426173656d656e74205265636f726473020f3130303032353930363730303030330314323032322d30342d32355431353a33303a30305a040a323130303130302e393905093331353031352e3135
This hex data needs to be further encoded to Base64. I wrote the below code for that
func TLVsToBase64(v string) string { // v - the TLV in hex format
encodedTLV := b64.StdEncoding.EncodeToString([]byte(v))
return encodedTLV
}
The output(which is wrong) of the aforementioned hex data is below:
MDExNTQyNmY2MjczMjA0MjYxNzM2NTZkNjU2ZTc0MjA1MjY1NjM2ZjcyNjQ3MzAyMGYzMTMwMzAzMDMyMzUzOTMwMzYzNzMwMzAzMDMwMzMwMzE0MzIzMDMyMzIyZDMwMzQyZDMyMzU1NDMxMzUzYTMzMzAzYTMwMzA1YTA0MGEzMjMxMzAzMDMxMzAzMDJlMzkzOTA1MDkzMzMxMzUzMDMxMzUyZTMxMzU=
The desired output is:
ARVCb2JzIEJhc2VtZW50IFJlY29yZHMCDzEwMDAyNTkwNjcwMDAwMwMUMjAyMi0wNC0yNVQxNTozMDowMFoECjIxMDAxMDAuOTkFCTMxNTAxNS4xNQ==
I am new to Go, so please help me to troubleshoot the issue. I might missed something
Your input is the hexadecimal representation of some data. And your expected output is not the Base64 encoding of the UTF-8 data of the hex representation, but rather the data (the bytes) the hex encoding represent, so first decode the bytes e.g. using hex.DecodeString():
func TLVsToBase64(v string) (string, error) { // v - the TLV in hex format
data, err := hex.DecodeString(v)
if err != nil {
return "", err
}
encodedTLV := base64.StdEncoding.EncodeToString(data)
return encodedTLV, nil
}
Testing it:
s := "0115426f627320426173656d656e74205265636f726473020f3130303032353930363730303030330314323032322d30342d32355431353a33303a30305a040a323130303130302e393905093331353031352e3135"
fmt.Println(TLVsToBase64(s))
Output is what you expect (try it on the Go Playground):
ARVCb2JzIEJhc2VtZW50IFJlY29yZHMCDzEwMDAyNTkwNjcwMDAwMwMUMjAyMi0wNC0yNVQxNTozMDowMFoECjIxMDAxMDAuOTkFCTMxNTAxNS4xNQ== <nil>

Get UTF-8 encoded string from byte[]

I'm looking to convert a slice of bytes []byte into an UTF-8 string.
I want to write a function like that :
func bytesToUTF8string(bytes []byte)(string){
// Take the slice of bytes and encode it to UTF-8 string
// return the UTF-8 string
}
What is the most efficient way to perform this
EDIT :
Specifically I want to convert the output of crypto.rsa.EncryptPKCS1v15 or the output of SignPKCS1v15 to an UTF-8 encoded string.
How can I do it ?
func bytesToUTF8string(bytes []byte) string {
return string(bytes)
}
It's such a common, simple operation that it's arguably not worth wrapping in a function. Unless, of course, you need to translate the from a different source encoding, then it's an entirely different issue, with which the golang.org/x/text/encoding package might help

How to be definite about the number of whitespace fmt.Fscanf consumes?

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
Whitespace.
The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
A single whitespace character (usually a newline).
I try to decode this header with the fmt.Fscanf function. The following call to
fmt.Fscanf parses the header (not addressing the caveat explained below):
var magic string
var width, height, maxVal uint
fmt.Fscanf(input,"%2s %d %d %d",&magic,&width,&height,&maxVal)
The documentation of fmt states:
Note: Fscan etc. can read one character (rune) past the input they
return, which means that a loop calling a scan routine may skip some
of the input. This is usually a problem only when there is no space
between input values. If the reader provided to Fscan implements
ReadRune, that method will be used to read characters. If the reader
also implements UnreadRune, that method will be used to save the
character and successive calls will not lose data. To attach ReadRune
and UnreadRune methods to a reader without that capability, use
bufio.NewReader.
As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.
Some testing suggests that parsing a dummy character at the end solves the issues:
var magic string
var width, height, maxVal uint
var dummy byte
fmt.Fscanf(input,"%2s %d %d %d%c",&magic,&width,&height,&maxVal,&dummy)
Is that guaranteed to work according to the specification?
No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.
By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.
buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.
Edit:
As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:
func TestFmtBehavior(t *testing.T) {
// use multireader to prevent r from implementing io.RuneScanner
r := io.MultiReader(bytes.NewReader([]byte("data ")))
n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
if n != 2 || err != nil {
t.Error("failed scan", n, err)
}
// the dummy char read 1 extra char past "data".
// one byte should still remain
if n, err := r.Read(make([]byte, 5)); n != 1 {
t.Error("assertion failed", n, err)
}
}

Go string to ascii byte array

How can I encode my string as ASCII byte array?
If you're looking for a conversion, just do byteArray := []byte(myString)
The language spec details conversions between strings and certain types of arrays (byte for bytes, int for Unicode points)
You may not need to do anything. If you only need to read bytes of a string, you can do that directly:
c := s[3]
cthom06's answer gives you a byte slice you can manipulate:
b := []byte(s)
b[3] = c
Then you can create a new string from the modified byte slice if you like:
s = string(b)
But you mentioned ASCII. If your string is ASCII to begin with, then you are done. If it contains something else, you have more to deal with and might want to post another question with more details about your data.

Resources