Split string on space (but not all..?) - go

Here I have a mock function that takes a single argument of multiple words in a single string, e.g. "hello1 hello2 hello3 hello4 hello5 hello6 hello7" etc.
The function will first do a match to see if the word content is in the string it received and then continue to split the string into arguments. If not matched with content then do something else.
My dosomething function is handling this data with the expectance of 5 arguments.
My question now is, how can I split these by delimiter space but everything that comes after hello5 should be part of arg5 below.
There is currently no way for me to know exactly how many arguments that are going to come in with mystring, hence concatenating fixed arguments will not work, it needs to be dynamic (is my assumption).
I hope this makes sense
func testing(mystring) {
matched, err := regexp.MatchString(`content`, mystring)
if err != nil { panic() }
if matched {
r := regexp.MustCompile("[^\\s]+")
arguments := r.FindAllString(clientRequest, -1)
arg1 := string(arguments[1])
arg2 := string(arguments[2])
arg3 := string(arguments[3])
arg4 := string(arguments[4])
arg5 := string(arguments[5])
dosomething(arg1, arg2, arg3, arg4, arg5)
} else {
log.Println("Not matched")
}
}

strings.SplitN does exactly what you want.
Here is a small demo:
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Printf("%q\n", strings.SplitN("a b c d e f g", " ", 5))
}
Output:
["a" "b" "c" "d" "e f g"]
I also suggest adding an additional if statement to check that strings.SplitN returns a slice of the correct length.

If you are sure you're searching for a word and not a regex pattern, you can use this:
exists := strings.Index(str, "content")
This will return the first index of occurrence of content, or return -1 if not found. If you do find a match, you can split the string till found index-1.
Here's a sample on playground to help you:
https://play.golang.org/p/QT39T6hStul

Related

How can I clean the text for search using RegEx

I can use the below code to search if the text str contains any or both of the keys, i.e.if it contains "MS" or "dynamics" or both of them
package main
import (
"fmt"
"regexp"
)
func main() {
keys := []string{"MS", "dynamics"}
keysReg := fmt.Sprintf("(%s %s)|%s|%s", keys[0], keys[1], keys[0], keys[1]) // => "(MS dynamics)|MS|dynamics"
fmt.Println(keysReg)
str := "What is MS dynamics, is it a product from MS?"
re := regexp.MustCompile(`(?i)` + keysReg)
matches := re.FindAllString(str, -1)
fmt.Println("We found", len(matches), "matches, that are:", matches)
}
I want the user to enter his phrase, so I trim unwanted words and characters, then doing the search as per above.
Let's say the user input was: This,is,a,delimited,string and I need to build the keys variable dynamically to be (delimited string)|delimited|string so that I can search for my variable str for all the matches, so I wrote the below:
s := "This,is,a,delimited,string"
t := regexp.MustCompile(`(?i),|\.|this|is|a`) // backticks are used here to contain the expression, (?i) for case insensetive
v := t.Split(s, -1)
fmt.Println(len(v))
fmt.Println(v)
But I got the output as:
8
[ delimited string]
What is the wrong part in my cleaning of the input text, I'm expecting the output to be:
2
[delimited string]
Here is my playground
To quote the famous quip from Jamie Zawinski,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Two things:
Instead of trying to weed out garbage from the string ("cleaning" it), extract complete words from it instead.
Unicode is a compilcated matter; so even after you have succeeded with extracting words, you have to make sure your words are properly "escaped" to not contain any characters which might be interpreted as RE syntax before building a regexp of them.
package main
import (
"errors"
"fmt"
"regexp"
"strings"
)
func build(words ...string) (*regexp.Regexp, error) {
var sb strings.Builder
switch len(words) {
case 0:
return nil, errors.New("empty input")
case 1:
return regexp.Compile(regexp.QuoteMeta(words[0]))
}
quoted := make([]string, len(words))
for i, w := range words {
quoted[i] = regexp.QuoteMeta(w)
}
sb.WriteByte('(')
for i, w := range quoted {
if i > 0 {
sb.WriteByte('\x20')
}
sb.WriteString(w)
}
sb.WriteString(`)|`)
for i, w := range quoted {
if i > 0 {
sb.WriteByte('|')
}
sb.WriteString(w)
}
return regexp.Compile(sb.String())
}
var words = regexp.MustCompile(`\pL+`)
func main() {
allWords := words.FindAllString("\tThis\v\x20\x20,\t\tis\t\t,?a!,¿delimited?,string‽", -1)
re, err := build(allWords...)
if err != nil {
panic(err)
}
fmt.Println(re)
}
Further reading:
https://pkg.go.dev/regexp/syntax
https://pkg.go.dev/regexp#QuoteMeta
https://pkg.go.dev/unicode#pkg-variables and https://pkg.go.dev/unicode#Categories

How to convert the string representation of a Terraform set of strings to a slice of strings

I've a terratest where I get an output from terraform like so s := "[a b]". The terraform output's value = toset([resource.name]), it's a set of strings.
Apparently fmt.Printf("%T", s) returns string. I need to iterate to perform further validation.
I tried the below approach but errors!
var v interface{}
if err := json.Unmarshal([]byte(s), &v); err != nil {
fmt.Println(err)
}
My current implementation to convert to a slice is:
s := "[a b]"
s1 := strings.Fields(strings.Trim(s, "[]"))
for _, v:= range s1 {
fmt.Println("v -> " + v)
}
Looking for suggestions to current approach or alternative ways to convert to arr/slice that I should be considering. Appreciate any inputs. Thanks.
Actually your current implementation seems just fine.
You can't use JSON unmarshaling because JSON strings must be enclosed in double quotes ".
Instead strings.Fields does just that, it splits a string on one or more characters that match unicode.IsSpace, which is \t, \n, \v. \f, \r and .
Moeover this works also if terraform sends an empty set as [], as stated in the documentation:
returning [...] an empty slice if s contains only white space.
...which includes the case of s being empty "" altogether.
In case you need additional control over this, you can use strings.FieldsFunc, which accepts a function of type func(rune) bool so you can determine yourself what constitutes a "space". But since your input string comes from terraform, I guess it's going to be well-behaved enough.
There may be third-party packages that already implement this functionality, but unless your program already imports them, I think the native solution based on the standard lib is always preferrable.
unicode.IsSpace actually includes also the higher runes 0x85 and 0xA0, in which case strings.Fields calls FieldsFunc(s, unicode.IsSpace)
package main
import (
"fmt"
"strings"
)
func main() {
src := "[a b]"
dst := strings.Split(src[1:len(src)-1], " ")
fmt.Println(dst)
}
https://play.golang.org/p/KVY4r_8RWv6

How do Print and Printf differ from each other in Go?

I am new to Go and understanding simple syntax and functions. Here I am confused between Print and Printf function. The output of those function is similar, so what is the difference between these two functions?
package main
import (
"fmt"
"bufio"
"os"
)
func main(){
reader := bufio.NewReader(os.Stdin)
fmt.Print("Enter Text: ")
str, _ := reader.ReadString('\n')
fmt.Printf(str)
fmt.Print(str)
}
I read https://golang.org/pkg/fmt/#Print to understand, but I did not understand it.
From the docs about Printing:
For each Printf-like function, there is also a Print function that takes no format and is equivalent to saying %v for every operand. Another variant Println inserts blanks between operands and appends a newline.
So Printf takes a format string, letting you tell the compiler what format to output your variables with and put them into a string with other information, whereas Print just outputs the variables as they are. Generally you'd prefer to use fmt.Printf, unless you're just debugging and want a quick output of some variables.
In your example you're sending the string you want to print as the format string by mistake, which will work, but is not the intended use. If you just want to print one variable in its default format it's fine to use Print.
Printf method accepts a formatted string for that the codes like "%s" and "%d" in this string to indicate insertion points for values. Those values are then passed as arguments.
Example:
package main
import (
"fmt"
)
var(
a = 654
b = false
c = 2.651
d = 4 + 1i
e = "Australia"
f = 15.2 * 4525.321
)
func main(){
fmt.Printf("d for Integer: %d\n", a)
fmt.Printf("6d for Integer: %6d\n", a)
fmt.Printf("t for Boolean: %t\n", b)
fmt.Printf("g for Float: %g\n", c)
fmt.Printf("e for Scientific Notation: %e\n", d)
fmt.Printf("E for Scientific Notation: %E\n", d)
fmt.Printf("s for String: %s\n", e)
fmt.Printf("G for Complex: %G\n", f)
fmt.Printf("15s String: %15s\n", e)
fmt.Printf("-10s String: %-10s\n",e)
t:= fmt.Sprintf("Print from right: %[3]d %[2]d %[1]d\n", 11, 22, 33)
fmt.Println(t)
}
As per docs
Print: will print number variables, and will not include a line break at the end.
Printf: will not print number variables, and will not include a line break at the end.
Printf is for printing formatted strings. And it can lead to more readable printing.
For more detail visit this tutorial.

Remove lines containing certain substring in Golang

How to remove lines that started with certain substring in []byte, in Ruby usually I do something like this:
lines = lines.split("\n").reject{|r| r.include? 'substring'}.join("\n")
How to do this on Go?
You could emulate that with regexp:
re := regexp.MustCompile("(?m)[\r\n]+^.*substring.*$")
res := re.ReplaceAllString(s, "")
(The OP Kokizzu went with "(?m)^.*" +substr+ ".*$[\r\n]+")
See this example
func main() {
s := `aaaaa
bbbb
cc substring ddd
eeee
ffff`
re := regexp.MustCompile("(?m)[\r\n]+^.*substring.*$")
res := re.ReplaceAllString(s, "")
fmt.Println(res)
}
output:
aaaaa
bbbb
eeee
ffff
Note the use of regexp flag (?m):
multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
I believe using the bytes package for this task is better than using regexp.
package main
import (
"fmt"
"bytes"
)
func main() {
myString := []byte("aaaa\nsubstring\nbbbb")
lines := bytes.Replace(myString, []byte("substring\n"), []byte(""), 1)
fmt.Println(string(lines)) // Convert the bytes to string for printing
}
Output:
aaaa
bbbb
Try it here.
The question title does not have the same meaning as the way the original question was worded. The Regex provided as the accepted solution did not solve for the use case I had of removing a whole line when finding a matching substring, like the question title indicates.
In order to remove lines that contain certain substrings in Go (title of the question), you could implement something in Go very similar to the Ruby code that Kokizzu wrote initially.
func removeLinesContainingAny(input string, toRemove []string) string {
if !strings.HasSuffix(input, "\n") {
input += "\n"
}
lines := strings.Split(input, "\n")
for i, line := range lines {
for _, rm := range toRemove {
if strings.Contains(line, rm) {
lines = append(lines[:i], lines[i+1:]...)
}
}
}
input = strings.Join(lines, "\n")
input = strings.TrimSpace(input)
input += "\n"
return input
}
See test cases here: https://go.dev/play/p/K-biYIO1kjk
In particular, you need to ensure there is a new line at the end of the input string, otherwise you will get a panic for slice bounds out of range in certain cases.
This approved solution will not work when you need to remove the top line :
func main() {
s := `aaaaa substring
bbbb
cc substring ddd
eeee
ffff`
re := regexp.MustCompile("(?m)[\r\n]+^.*substring.*$")`enter code here`
res := re.ReplaceAllString(s, "")
fmt.Println(res)
}

go - print without space between items

fmt.Println("a","b")
I want to print the two strings without space padding, namely "ab", but the above will print "a b".
Go fmt
Do I just switch to using Printf ?
fmt.Printf("%s%s\n","a","b")
Plain old print will work if you make the last element "\n".
It will also be easier to read if you aren't used to printf style formatting.
See here on play
fmt.Println("a","b")
fmt.Print("a","b","\n")
fmt.Printf("%s%s\n","a","b")
will print:
a b
ab
ab
As it can be found in the doc:
Println formats using the default formats for its operands and writes
to standard output. Spaces are always added between operands and a
newline is appended. It returns the number of bytes written and any
write error encountered.
So you either need to do what you already said or you can concatenate the strings before printing:
fmt.Println("a"+"b")
Depending on your usecase you can use strings.Join(myStrings, "") for that purpose.
Println relies on doPrint(args, true, true), where first argument is addspace and second is addnewline. So Prinln ith multiple arguments will always print space.
It seems there is no call of doPrint(args, false, true) which is what you want.
Printf may be a solution, Print also but you should add a newline.
You'd have to benchmark to compare performance, but I'd rather use the following than a Printf:
fmt.Println(strings.Join([]string{"a", "b"}, ""))
Remember to import "strings", and see strings.Join documentation for an explanation.
the solution in my project
package main
import "fmt"
var formatMap = map[int]string{
0: "",
1: "%v",
}
func Println(v ...interface{}) {
l := len(v)
if s, isOk := formatMap[l]; !isOk {
for i := 0; i < len(v); i++ {
s += "%v"
}
formatMap[l] = s
}
s := formatMap[l] + "\n"
fmt.Printf(s, v...)
}
func main() {
Println()
Println("a", "b")
Println("a", "b")
Println("a", "b", "c", 1)
}

Resources