I have a string that is comma separated, so it could be
test1, test2, test3 or test1,test2,test3 or test1, test2, test3.
I split this in Go currently with strings.Split(s, ","), but now I have a []string that can contain elements with an arbitrary numbers of whitespaces.
How can I easily trim them off? What is best practice here?
This is my current code
var property= os.Getenv(env.templateDirectories)
if property != "" {
var dirs = strings.Split(property, ",")
for index,ele := range dirs {
dirs[index] = strings.TrimSpace(ele)
}
return dirs
}
I come from Java and assumed that there is a map/reduce etc functionality in Go also, therefore the question.
You can use strings.TrimSpace in a loop. If you want to preserve order too, the indexes can be used rather than values as the loop parameters:
Go Playground Example
EDIT: To see the code without the click:
package main
import (
"fmt"
"strings"
)
func main() {
input := "test1, test2, test3"
slc := strings.Split(input , ",")
for i := range slc {
slc[i] = strings.TrimSpace(slc[i])
}
fmt.Println(slc)
}
Easy way without looping
test := "2 , 123, 1"
result := strings.Split(strings.ReplaceAll(test," ","") , ",")
The encoding/csv package can handle this:
package main
import (
"encoding/csv"
"fmt"
"strings"
)
func main() {
for _, each := range []string{
"test1, test2, test3", "test1, test2, test3", "test1,test2,test3",
} {
r := csv.NewReader(strings.NewReader(each))
r.TrimLeadingSpace = true
s, e := r.Read()
if e != nil {
panic(e)
}
fmt.Printf("%q\n", s)
}
}
https://golang.org/pkg/encoding/csv#Reader.TrimLeadingSpace
If you already use regexp may be you can split using regular expressions:
regexp.MustCompile(`\s*,\s*`).Split(test, -1)
This solution is probably slower than the standard Split + TrimSpaces, but is more flexible. For example if you want to skip empty fields you can :
regexp.MustCompile(`(\s*,\s*)+`).Split(test, -1)
or to use multiple separators
regexp.MustCompile(`\s*[,;]\s*`).Split(test, -1)
You can test it in the go playground.
Related
I have a string like this
xx5645645yyxx9879869yyxx3879870977yy
Want to get result like following with loop
xx5645645yy
xx9879869yy
xx3879870977yy
I have no idea to do it, any kind of help is greatly appreciated, thanks
You can use the strings.Split() function and split on "xx", then prepend "xx" back to each of the split substrings in the loop:
package main
import (
"fmt"
"strings"
)
func main() {
s := "xx5645645yyxx9879869yyxx3879870977yy"
items := strings.Split(s, "xx")[1:] // [1:] to skip the first, empty, item
for _, item := range items {
fmt.Println("xx" + item)
}
}
Which produces:
xx5645645yy
xx9879869yy
xx3879870977yy
I can use the below code to search if the text str contains any or both of the keys, i.e.if it contains "MS" or "dynamics" or both of them
package main
import (
"fmt"
"regexp"
)
func main() {
keys := []string{"MS", "dynamics"}
keysReg := fmt.Sprintf("(%s %s)|%s|%s", keys[0], keys[1], keys[0], keys[1]) // => "(MS dynamics)|MS|dynamics"
fmt.Println(keysReg)
str := "What is MS dynamics, is it a product from MS?"
re := regexp.MustCompile(`(?i)` + keysReg)
matches := re.FindAllString(str, -1)
fmt.Println("We found", len(matches), "matches, that are:", matches)
}
I want the user to enter his phrase, so I trim unwanted words and characters, then doing the search as per above.
Let's say the user input was: This,is,a,delimited,string and I need to build the keys variable dynamically to be (delimited string)|delimited|string so that I can search for my variable str for all the matches, so I wrote the below:
s := "This,is,a,delimited,string"
t := regexp.MustCompile(`(?i),|\.|this|is|a`) // backticks are used here to contain the expression, (?i) for case insensetive
v := t.Split(s, -1)
fmt.Println(len(v))
fmt.Println(v)
But I got the output as:
8
[ delimited string]
What is the wrong part in my cleaning of the input text, I'm expecting the output to be:
2
[delimited string]
Here is my playground
To quote the famous quip from Jamie Zawinski,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Two things:
Instead of trying to weed out garbage from the string ("cleaning" it), extract complete words from it instead.
Unicode is a compilcated matter; so even after you have succeeded with extracting words, you have to make sure your words are properly "escaped" to not contain any characters which might be interpreted as RE syntax before building a regexp of them.
package main
import (
"errors"
"fmt"
"regexp"
"strings"
)
func build(words ...string) (*regexp.Regexp, error) {
var sb strings.Builder
switch len(words) {
case 0:
return nil, errors.New("empty input")
case 1:
return regexp.Compile(regexp.QuoteMeta(words[0]))
}
quoted := make([]string, len(words))
for i, w := range words {
quoted[i] = regexp.QuoteMeta(w)
}
sb.WriteByte('(')
for i, w := range quoted {
if i > 0 {
sb.WriteByte('\x20')
}
sb.WriteString(w)
}
sb.WriteString(`)|`)
for i, w := range quoted {
if i > 0 {
sb.WriteByte('|')
}
sb.WriteString(w)
}
return regexp.Compile(sb.String())
}
var words = regexp.MustCompile(`\pL+`)
func main() {
allWords := words.FindAllString("\tThis\v\x20\x20,\t\tis\t\t,?a!,¿delimited?,string‽", -1)
re, err := build(allWords...)
if err != nil {
panic(err)
}
fmt.Println(re)
}
Further reading:
https://pkg.go.dev/regexp/syntax
https://pkg.go.dev/regexp#QuoteMeta
https://pkg.go.dev/unicode#pkg-variables and https://pkg.go.dev/unicode#Categories
How to get the correct number of fields when using NewReader ?
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||""FOO""||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
}
https://go.dev/play/p/gg-KYRciWFH
It should return 5, but instead I'm getting 3:
record length: 3
Program exited.
EDIT
I'm actually working with a big CSV file containing many double quotes.
After examining your code, I decided to modify it slightly and then print the results:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`x||""FOO""|x|x\n`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v, Data: %v\n", len(record), strings.Join(record, ", "))
}
When you run this, the data is printed as x, , "FOO"||x|x\n". My thought is that when you end your entry with two double-quotes, the parser is assuming the string is still being quoted and therefore lumps the rest of the line into the third entry. This appears to be a bug with how lazy-quoting works in the csv package, however, when examining the documentation for LazyQuotes, you'll see this:
If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.
This doesn't mention anything about finding double quotes within double quotes. To fix this, you should either remove the quotes altogether or replace the double double-quotes ("") with double quotes (").
One other thing you might consider would be using the gocsv package. I've worked with this package in the past and it's reasonably stable. I'm not sure how it would respond to this specific issue, but it might be worth your time checking it out.
Note:
The encoding/csv package implements the RFC 4180 standard. If you have such input, that's not an RFC 4180 compliant CSV file and encoding/csv will not parse it properly.
You're misusing the quotes. Quoting a single field FOO is like this:
parser := csv.NewReader(strings.NewReader(`||"FOO"||`))
If you want the field to have the "FOO" value, you have to use 2 double quotes in a quoted field, so it should be:
parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))
This will output 5. Try it on the Go Playground.
What you have is this:
parser := csv.NewReader(strings.NewReader(`||""FOO""||`))
Since the second " character is not followed by a separator character, the field is not interrupted and the rest is processed as the content of the quoted field (which will terminate at the end of the line).
If you print the record:
fmt.Println(record)
fmt.Printf("%#v", record)
Output will be (try it on the Go Playground):
[ "FOO"||]
[]string{"", "", "\"FOO\"||"}
Quotes are a part of csv format.
There is a problem with go/csv shielding, you can try something like this:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||FOO||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
fmt.Println(strings.Join(record, " /SEP/ "))
}
or like this:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
fmt.Println(strings.Join(record, " SEP "))
}
This is a multiple choice question example. I want to get the chinese text like "英国、法国", "加拿大、墨西哥", "葡萄牙、加拿大", "墨西哥、德国" in the content of following code in golang, but it does not work.
package main
import (
"fmt"
"regexp"
"testing"
)
func TestRegex(t *testing.T) {
text := `( B )38.目前,亚马逊美国站后台,除了有美国站点外,还有( )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
`
fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.(\S+)?`).FindAllStringSubmatch(text, -1))
fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.`).Split(text, -1))
}
text:
( B )38.目前,亚马逊美国站后台,除了有美国站点外,还有( )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
pattern: [A-E]\.(\S+)?
Actual result: [["A.英国、法国B.加拿大、墨西哥" "英国、法国B.加拿大、墨西哥"] ["C.葡萄牙、加拿大D.墨西哥、德国" "葡萄牙、加拿大D.墨西哥、德国"]].
Expect result: [["A.英国、法国" "英国、法国"] ["B.加拿大、墨西哥" "加拿大、墨西哥"] ["C.葡萄牙、加拿大" "葡萄牙、加拿大"] ["D.墨西哥、德国" "墨西哥、德国"]]
I think it might be a greedy mode problem. Because in my code, it reads option A and option B as one option directly.
Non-greedy matching won't solve this, you need positive lookahead, which re2 doesn't support.
As a workaround can just search on the labels and extract the text in between manually.
re := regexp.MustCompile(`[A-E]\.`)
res := re.FindAllStringIndex(text, -1)
results := make([][]string, len(res))
for i, m := range res {
if i < len(res)-1 {
results[i] = []string{text[m[0]:m[1]], text[m[1]:res[i+1][0]]}
} else {
results[i] = []string{text[m[0]:m[1]], text[m[1]:]}
}
}
fmt.Printf("%q\n", results)
Should print
[["A." "英国、法国"] ["B." "加拿大、墨西哥\n"] ["C." "葡萄牙、加拿大"] ["D." "墨西哥、德国\n"]]
What is Golang's equivalent of the below python commands ?
import argparse
parser = argparse.ArgumentParser(description="something")
parser.add_argument("-getList1",nargs='*',help="get 0 or more values")
parser.add_argument("-getList2",nargs='?',help="get 1 or more values")
I have seen that the flag package allows argument parsing in Golang.
But it seems to support only String, Int or Bool.
How to get a list of values into a flag in this format :
go run myCode.go -getList1 value1 value2
You can define your own flag.Value and use flag.Var() for binding it.
The example is here.
Then you can pass multiple flags like following:
go run your_file.go --list1 value1 --list1 value2
UPD: including code snippet right there just in case.
package main
import "flag"
type arrayFlags []string
func (i *arrayFlags) String() string {
return "my string representation"
}
func (i *arrayFlags) Set(value string) error {
*i = append(*i, value)
return nil
}
var myFlags arrayFlags
func main() {
flag.Var(&myFlags, "list1", "Some description for this param.")
flag.Parse()
}
You can at least have a list of arguments on the end of you command by using the flag.Args() function.
package main
import (
"flag"
"fmt"
)
var one string
func main() {
flag.StringVar(&one, "o", "default", "arg one")
flag.Parse()
tail := flag.Args()
fmt.Printf("Tail: %+q\n", tail)
}
my-go-app -o 1 this is the rest will print Tail: ["this" "is" "the" "rest"]
Use flag.String() to get the entire list of values for the argument you need and then split it up into individual items with strings.Split().
If you have a series of integer values at the end of the command line, this helper function will properly convert them and place them in a slice of ints:
package main
import (
"flag"
"fmt"
"strconv"
)
func GetIntSlice(i *[]string) []int {
var arr = *i
ret := []int{}
for _, str := range arr {
one_int, _ := strconv.Atoi(str)
ret = append(ret, one_int)
}
return ret
}
func main() {
flag.Parse()
tail := flag.Args()
fmt.Printf("Tail: %T, %+v\n", tail, tail)
intSlice := GetIntSlice(&tail)
fmt.Printf("intSlice: %T, %+v\n", intSlice, intSlice)
}
mac:demoProject sx$ go run demo2.go 1 2 3 4
Tail: []string, [1 2 3 4]
intSlice: []int, [1 2 3 4]