conditional split of a string - go

I want to split a string of type 'a/b/c' and allow escaping by '\'.
For example:
'foo/bar\/2.2/baz':
a=foo
b=bar/2.2
c=baz
Is there any elegant way to split by '/', ignoring '\/'?

You have two basic approaches available to you, regardless of the language you're using.
Search for all occurrences of / that are not immediately preceded by \ and perform a split.
Replace all instances of \/ with some unique symbol that does not contain /, then split on /, and replace the unique symbol with \/ again.
From a computational standpoint, the former will be more efficient.
From a coding complexity standpoint, the latter will likely be easier to write.

You can use something in the style of :
func split(str string) []string {
var parts []string
var current bytes.Buffer
escaped := false
for _, r := range str {
if r == '\\' && !escaped {
escaped = true
} else if r == '/' && !escaped {
parts = append(parts, current.String())
current.Reset()
} else {
escaped = false
current.WriteRune(r)
}
}
parts = append(parts, current.String())
return parts
}
Go Playground link : https://play.golang.org/p/RwLwFlsAW2Q

Related

Golang strings.EqualFold gives unexpected results

In golang (go1.17 windows/amd64) the program below gives the following result:
rune1 = U+0130 'İ'
rune2 = U+0131 'ı'
lower(rune1) = U+0069 'i'
upper(rune2) = U+0049 'I'
strings.EqualFold(İ, ı) = false
strings.EqualFold(i, I) = true
I thought that strings.EqualFold would check strings for equality under Unicode case folding; however, the above example seem to give a counter-example. Clearly both runes can be folded (by hand) into code points that are equal under case folding.
Question: is golang correct that strings.EqualFold(İ, ı) is false? I expected it to yield true. And if golang is correct, why would that be? Or is this behaviour according to some Unicode specification.
What am I missing here.
Program:
func TestRune2(t *testing.T) {
r1 := rune(0x0130) // U+0130 'İ'
r2 := rune(0x0131) // U+0131 'ı'
r1u := unicode.ToLower(r1)
r2u := unicode.ToUpper(r2)
t.Logf("\nrune1 = %#U\nrune2 = %#U\nlower(rune1) = %#U\nupper(rune2) = %#U\nstrings.EqualFold(%s, %s) = %v\nstrings.EqualFold(%s, %s) = %v",
r1, r2, r1u, r2u, string(r1), string(r2), strings.EqualFold(string(r1), string(r2)), string(r1u), string(r2u), strings.EqualFold(string(r1u), string(r2u)))
}
Yes, this is "correct" behaviour. These letters do not behave normal under case folding. See:
http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt
U+0131 has full case folding "F" and special "T":
T: special case for uppercase I and dotted uppercase I
- For non-Turkic languages, this mapping is normally not used.
- For Turkic languages (tr, az), this mapping can be used instead
of the normal mapping for these characters.
Note that the Turkic mappings do not maintain canonical equivalence
without additional processing.
See the discussions of case mapping in the Unicode Standard for more information.
I think there is no way of to force package strings to use the tr or az mapping.
From the strings.EqualFold source - unicode.ToLower and unicode.ToUpper are not used.
Instead, it uses unicode.SimpleFold to see if a particular rune is "foldable" and therefore potentially comparable:
// General case. SimpleFold(x) returns the next equivalent rune > x
// or wraps around to smaller values.
r := unicode.SimpleFold(sr)
for r != sr && r < tr {
r = unicode.SimpleFold(r)
}
The rune İ is not foldable. It's lowercase code-point is:
r := rune(0x0130) // U+0130 'İ'
lr := unicode.ToLower(r) // U+0069 'i'
fmt.Printf("foldable? %v\n", r != unicode.SimpleFold(r)) // foldable? false
fmt.Printf("foldable? %v\n", lr != unicode.SimpleFold(lr)) // foldable? true
If a rune is not foldable (i.e. SimpleFold returns itself) - then that rune can only match itself and no other code-point.
https://play.golang.org/p/105x0I714nS

How to rewrite wiki links in Golang?

Been stuck trying to rewrite some text in Golang: http://play.golang.org/p/0hoXx7qA0b5
How do I match several [[]] links in a text string?
log.Printf("match: %+v", match) doesn't show the log Group matches clearly. Am I missing something to help me work with matches, so I know whether it's a link with a title or not.
Is there a better approach than using regexp?
It seems that (.*) is a greedy match hence you should try limiting the first group. Based on your sample input, the |about is optional.
var re = regexp.MustCompile(`\[\[([^|]*)(?:\|(.*))?\]\]`)
func relink(input string) string {
var reform []string
for _, match := range re.FindAllStringSubmatch(input, -1) {
name, short := match[1], match[2]
if short == "" {
short = strings.ToLower(name)
}
reform = append(reform, fmt.Sprintf("[%s](%s)", name, short))
}
return strings.Join(reform, "\n")
}
Playground

How to remove quotes from around a string in Golang

I have a string in Golang that is surrounded by quote marks. My goal is to remove all quote marks on the sides, but to ignore all quote marks in the interior of the string. How should I go about doing this? My instinct tells me to use a RemoveAt function like in C#, but I don't see anything like that in Go.
For instance:
"hello""world"
should be converted to:
hello""world
For further clarification, this:
"""hello"""
would become this:
""hello""
because the outer ones should be removed ONLY.
Use a slice expression:
s = s[1 : len(s)-1]
If there's a possibility that the quotes are not present, then use this:
if len(s) > 0 && s[0] == '"' {
s = s[1:]
}
if len(s) > 0 && s[len(s)-1] == '"' {
s = s[:len(s)-1]
}
playground example
strings.Trim() can be used to remove the leading and trailing whitespace from a string. It won't work if the double quotes are in between the string.
// strings.Trim() will remove all the occurrences from the left and right
s := `"""hello"""`
fmt.Println("Before Trim: " + s) // Before Trim: """hello"""
fmt.Println("After Trim: " + strings.Trim(s, "\"")) // After Trim: hello
// strings.Trim() will not remove any occurrences from inside the actual string
s2 := `""Hello" " " "World""`
fmt.Println("\nBefore Trim: " + s2) // Before Trim: ""Hello" " " "World""
fmt.Println("After Trim: " + strings.Trim(s2, "\"")) // After Trim: Hello" " " "World
Playground link - https://go.dev/play/p/yLdrWH-1jCE
Use slice expressions. You should write robust code that provides correct output for imperfect input. For example,
package main
import "fmt"
func trimQuotes(s string) string {
if len(s) >= 2 {
if s[0] == '"' && s[len(s)-1] == '"' {
return s[1 : len(s)-1]
}
}
return s
}
func main() {
tests := []string{
`"hello""world"`,
`"""hello"""`,
`"`,
`""`,
`"""`,
`goodbye"`,
`"goodbye"`,
`goodbye"`,
`good"bye`,
}
for _, test := range tests {
fmt.Printf("`%s` -> `%s`\n", test, trimQuotes(test))
}
}
Output:
`"hello""world"` -> `hello""world`
`"""hello"""` -> `""hello""`
`"` -> `"`
`""` -> ``
`"""` -> `"`
`goodbye"` -> `goodbye"`
`"goodbye"` -> `goodbye`
`goodbye"` -> `goodbye"`
`good"bye` -> `good"bye`
You can take advantage of slices to remove the first and last element of the slice.
package main
import "fmt"
func main() {
str := `"hello""world"`
if str[0] == '"' {
str = str[1:]
}
if i := len(str)-1; str[i] == '"' {
str = str[:i]
}
fmt.Println( str )
}
Since a slice shares the underlying memory, this does not copy the string. It just changes the str slice to start one character over, and end one character sooner.
This is how the various bytes.Trim functions work.
A one-liner using regular expressions...
quoted = regexp.MustCompile(`^"(.*)"$`).ReplaceAllString(quoted,`$1`)
But it doesn't necessarily handle escaped quotes they way you might want.
The Go Playground
Translated from here.

When to use leading underscore in variable names in Go

Is there any special purpose of leading _ in a variable's name?
Example:
func (_m *MockTracker)...
from here.
There is no special meaning defined for a leading underscore in an identifier name in the spec:
Identifiers
Identifiers name program entities such as variables and types. An
identifier is a sequence of one or more letters and digits. The first
character in an identifier must be a letter.
identifier = letter { letter | unicode_digit } .
a
_x9
ThisVariableIsExported
αβ
Your sample is generated code from mockgen.go.
In the package you linked you'll see things like:
// Recorder for MockTracker (not exported)
type _MockTrackerRecorder struct {
mock *MockTracker
}
The sanitize function in the mockgen package prepends an underscore to package names and it seems that it's otherwise used for consistency and to ensure that identifier names remain private (i.e. not exported because they start with a capital letter). But it's not something that is defined in the Go spec.
// sanitize cleans up a string to make a suitable package name.
func sanitize(s string) string {
t := ""
for _, r := range s {
if t == "" {
if unicode.IsLetter(r) || r == '_' {
t += string(r)
continue
}
} else {
if unicode.IsLetter(r) || unicode.IsDigit(r) || r == '_' {
t += string(r)
continue
}
}
t += "_"
}
if t == "_" {
t = "x"
}
return t
}
It seems that there is nothing regarding the _ in a variable name in the naming convetions.
From here: effective go
Another use case is for unexported global variables. It's a convention that many Go developers follow and explained in this section of the Uber style guide.

How to strings.Split on newline?

I'm trying to do the rather simple task of splitting a string by newlines.
This does not work:
temp := strings.Split(result,`\n`)
I also tried ' instead of ` but no luck.
Any ideas?
You have to use "\n".
Splitting on `\n`, searches for an actual \ followed by n in the text, not the newline byte.
playground
For those of us that at times use Windows platform, it can
help remember to use replace before split:
strings.Split(strings.ReplaceAll(windows, "\r\n", "\n"), "\n")
Go Playground
It does not work because you're using backticks:
Raw string literals are character sequences between back quotes ``. Within the quotes, any character is legal except back quote. The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes; in particular, backslashes have no special meaning and the string may contain newlines.
Reference: http://golang.org/ref/spec#String_literals
So, when you're doing
strings.Split(result,`\n`)
you're actually splitting using the two consecutive characters "\" and "n", and not the character of line return "\n". To do what you want, simply use "\n" instead of backticks.
Your code doesn't work because you're using backticks instead of double quotes. However, you should be using a bufio.Scanner if you want to support Windows.
import (
"bufio"
"strings"
)
func SplitLines(s string) []string {
var lines []string
sc := bufio.NewScanner(strings.NewReader(s))
for sc.Scan() {
lines = append(lines, sc.Text())
}
return lines
}
Alternatively, you can use strings.FieldsFunc (this approach skips blank lines)
strings.FieldsFunc(s, func(c rune) bool { return c == '\n' || c == '\r' })
import regexp
var lines []string = regexp.MustCompile("\r?\n").Split(inputString, -1)
MustCompile() creates a regular expression that allows to split by both \r\n and \n
Split() performs the split, seconds argument sets maximum number of parts, -1 for unlimited
' doesn't work because it is not a string type, but instead a rune.
temp := strings.Split(result,'\n')
go compiler: cannot use '\u000a' (type rune) as type string in argument to strings.Split
definition: Split(s, sep string) []string

Resources