Convert one unicode string to array - go

How can I convert this string into an array or slice in Golang ?
Separators are unicode caracters \u001e and \uu1d.
inputString="\u001e456\u001dBernard Janv\u001d0022000\u001d250\u001d804\u001d1169\u001d\u001d168"

Use strings.Fields to split the string:
inputString := "\u001e456\u001dBernard Janv\u001d0022000\u001d250\u001d804\u001d1169\u001d\u001d168"
parts := strings.FieldsFunc(inputString, func(r rune) bool {
return r == '\u001d' || r == '\u001e'
})
for _, part := range parts {
fmt.Println(part)
}
https://play.golang.org/p/x_le2P3h8ry

Related

How do I handle empty values when initializing structs in go?

I'm trying to split a comma separated string and use the values to initalize a struct. This is how I do it right now:
type Address struct {
Street string
City string
ZipCode string
}
s := strings.Split("street,city,zip", ",")
data := Address{Street: s[0], City: s[1], ZipCode: s[2]}
The problem I'm having is that I have to handle this input as well:
"street,"
"street,city"
Any idea how to do it without going out of range? I've looked into unpacking with triple dots syntax ... but structs does not seem to support it.
Check the length of the slice before accessing the element:
data := Address{}
s := strings.Split("street,city,zip", ",")
data.Street = s[0]
if len(s) > 1 {
data.City = s[1]
}
if len(s) > 2 {
data.ZipCode = s[2]
}
If this comes up a lot, then write a simple helper function:
func get(s []string, i int) string {
if i >= len(s) {
return ""
}
return s[i]
}
Use it like this:
data := Address{Street: get(s, 0), City: get(s, 1), ZipCode: get(s, 2)}
If you'd rather use slightly more memory and less checks, you could also do:
s := strings.Split("street,city,zip", ",")
s = append(s, make([]string, 3 - len(s))...) // Change 3 to however many fields you expect
data := Address{Street: s[0], City: s[1], ZipCode: s[2]}
What this does is append empty strings to the slice to ensure it always has the right number of elements. Playground example: https://play.golang.org/p/Igj6yT5fffl

How in golang to remove the last letter from the string?

Let's say I have a string called varString.
varString := "Bob,Mark,"
QUESTION: How to remove the last letter from the string? In my case, it's the second comma.
How to remove the last letter from the string?
In Go, character strings are UTF-8 encoded. Unicode UTF-8 is a variable-length character encoding which uses one to four bytes per Unicode character (code point).
For example,
package main
import (
"fmt"
"unicode/utf8"
)
func trimLastChar(s string) string {
r, size := utf8.DecodeLastRuneInString(s)
if r == utf8.RuneError && (size == 0 || size == 1) {
size = 0
}
return s[:len(s)-size]
}
func main() {
s := "Bob,Mark,"
fmt.Println(s)
s = trimLastChar(s)
fmt.Println(s)
}
Playground: https://play.golang.org/p/qyVYrjmBoVc
Output:
Bob,Mark,
Bob,Mark
Here's a much simpler method that works for unicode strings too:
func removeLastRune(s string) string {
r := []rune(s)
return string(r[:len(r)-1])
}
Playground link: https://play.golang.org/p/ezsGUEz0F-D
Something like this:
s := "Bob,Mark,"
s = s[:len(s)-1]
Note that this does not work if the last character is not represented by just one byte.
newStr := strings.TrimRightFunc(str, func(r rune) bool {
return !unicode.IsLetter(r) // or any other validation can go here
})
This will trim anything that isn't a letter on the right hand side.

Escape unicode characters in json encode golang

Given the following example:
func main() {
buf := new(bytes.Buffer)
enc := json.NewEncoder(buf)
toEncode := []string{"hello", "wörld"}
enc.Encode(toEncode)
fmt.Println(buf.String())
}
I would like to have the output presented with escaped Unicode characters:
["hello","w\u00f6rld"]
Rather than:
["hello","wörld"]
I have attempted to write a function to quote the Unicode characters using strconv.QuoteToASCII and feed the results to Encode() however that results in double escaping:
func quotedUnicode(data []string) []string {
for index, element := range data {
quotedUnicode := strconv.QuoteToASCII(element)
// get rid of additional quotes
quotedUnicode = strings.TrimSuffix(quotedUnicode, "\"")
quotedUnicode = strings.TrimPrefix(quotedUnicode, "\"")
data[index] = quotedUnicode
}
return data
}
["hello","w\\u00f6rld"]
How can I ensure that the output from json.Encode contains correctly escaped Unicode characters?

Replace a character at a specific location in a string

I know about the method string.Replace(). And it works if you know exactly what to replace and its occurrences. But what can I do if I want to replace a char at only a known position? I'm thinking of something like this:
randLetter := getRandomChar()
myText := "This is my text"
randPos := rand.Intn(len(myText) - 1)
newText := [:randPos] + randLetter + [randPos + 1:]
But this does not replace the char at randPos, just inserts the randLetter at that position. Right?
I've written some code to replace the character found at indexofcharacter with the replacement. I may not be the best method, but it works fine.
https://play.golang.org/p/9CTgHRm6icK
func replaceAtPosition(originaltext string, indexofcharacter int, replacement string) string {
runes := []rune(originaltext )
partOne := string(runes[0:indexofcharacter-1])
partTwo := string(runes[indexofcharacter:len(runes)])
return partOne + replacement + partTwo
}
UTF-8 is a variable-length encoding. For example,
package main
import "fmt"
func insertChar(s string, c rune, i int) string {
if i >= 0 {
r := []rune(s)
if i < len(r) {
r[i] = c
s = string(r)
}
}
return s
}
func main() {
s := "Hello, 世界"
fmt.Println(s)
s = insertChar(s, 'X', len([]rune(s))-1)
fmt.Println(s)
}
Output:
Hello, 世界
Hello, 世X
A string is a read-only slice of bytes. You can't replace anything.
A single Rune can consist of multiple bytes. So you should convert the string to a (intermediate) mutable slice of Runes anyway:
myText := []rune("This is my text")
randPos := rand.Intn(len(myText) - 1)
myText[randPos] = randLetter
fmt.Println(string(myText))

Text processing in Go - how to convert string to byte?

I'm writing a small pragram to number the paragraph:
put paragraph number in front of each paragraph in the form of [1]..., [2]....
Article title should be excluded.
Here is my program:
package main
import (
"fmt"
"io/ioutil"
)
var s_end = [3]string{".", "!", "?"}
func main() {
b, err := ioutil.ReadFile("i_have_a_dream.txt")
if err != nil {
panic(err)
}
p_num, s_num := 1, 1
for _, char := range b {
fmt.Printf("[%s]", p_num)
p_num += 1
if char == byte("\n") {
fmt.Printf("\n[%s]", p_num)
p_num += 1
} else {
fmt.Printf(char)
}
}
}
http://play.golang.org/p/f4S3vQbglY
I got this error:
prog.go:21: cannot convert "\n" to type byte
prog.go:21: cannot convert "\n" (type string) to type byte
prog.go:21: invalid operation: char == "\n" (mismatched types byte and string)
prog.go:25: cannot use char (type byte) as type string in argument to fmt.Printf
[process exited with non-zero status]
How to convert string to byte?
What is the general practice to process text? Read in, parse it by byte, or by line?
Update
I solved the problem by converting the buffer byte to string, replacing strings by regular expression. (Thanks to #Tomasz Kłak for the regexp help)
I put the code here for reference.
package main
import (
"fmt"
"io/ioutil"
"regexp"
)
func main() {
b, err := ioutil.ReadFile("i_have_a_dream.txt")
if err != nil {
panic(err)
}
s := string(b)
r := regexp.MustCompile("(\r\n)+")
counter := 1
repl := func(match string) string {
p_num := counter
counter++
return fmt.Sprintf("%s [%d] ", match, p_num)
}
fmt.Println(r.ReplaceAllStringFunc(s, repl))
}
Using "\n" causes it to be treated as an array, use '\n' to treat it as a single char.
A string cannot be converted into a byte in a meaningful way. Use one of the following approaches:
If you have a string literal like "a", consider using a rune literal like 'a' which can be converted into a byte.
If you want to take a byte out of a string, use an index expression like myString[42].
If you want to interpret the content of a string as a (decimal) number, use strconv.Atoi() or strconv.ParseInt().
Please notice that it is customary in Go to write programs that can deal with Unicode characters. Explaining how to do this would be too much for this answer, but there are tutorials out there which explain what kind of things to pay attention to.

Resources