strings - get characters before a digit - go

I have some strings such E2 9NZ, N29DZ, EW29DZ . I need to extract the chars before the first digit, given the above example : E, N, EW.
Am I supposed to use regex ? The strings package looks really nice but just doesn't seem to handle this case (extract everything before a specific type).
Edit:
To clarify the "question" I'm wondering what method is more idiomatic to go and perhaps likely to provide better performance.

For example,
package main
import (
"fmt"
"unicode"
)
func DigitPrefix(s string) string {
for i, r := range s {
if unicode.IsDigit(r) {
return s[:i]
}
}
return s
}
func main() {
fmt.Println(DigitPrefix("E2 9NZ"))
fmt.Println(DigitPrefix("N29DZ"))
fmt.Println(DigitPrefix("EW29DZ"))
fmt.Println(DigitPrefix("WXYZ"))
}
Output:
E
N
EW
WXYZ
If there is no digit, example "WXYZ", and you don't want anything returned, change return s to return "".

Not sure why almost everyone provided answers in everything but Go. Here is regex-based Go version:
package main
import (
"fmt"
"regexp"
)
func main() {
pattern, err := regexp.Compile("^[^\\d]*")
if err != nil {
panic(err)
}
part := pattern.Find([]byte("EW29DZ"))
if part != nil {
fmt.Printf("Found: %s\n", string(part))
} else {
fmt.Println("Not found")
}
}
Running:
% go run main.go
Found: EW
Go playground

We don't need regex for this problem. You can easily walk through on a slice of rune and check the current character with unicode.IsDigit(), if it's a digit: return. If it isn't: continue the loop. If there are no numbers: return the argument
Code
package main
import (
"fmt"
"unicode"
)
func UntilDigit(r []rune) []rune {
var i int
for _, v := range r {
if unicode.IsDigit(v) {
return r[0:i]
}
i++
}
return r
}
func main() {
fmt.Println(string(UntilDigit([]rune("E2 9NZ"))))
fmt.Println(string(UntilDigit([]rune("N29DZ"))))
fmt.Println(string(UntilDigit([]rune("EW29DZ"))))
}
Playground link

I think the best option is to use the index returned from strings.IndexAny which will return the first index of any character in a string.
func BeforeNumbers(str string) string {
value := strings.IndexAny(str,"0123456789")
if value >= 0 && value <= len(str) {
return str[:value]
}
return str
}
Will slice the string and return the subslice up to (but not including) the first character that's in the string "0123456789" which is any number.
Way later edit:
It would probably be better to use IndexFunc rather than IndexAny:
func BeforeNumbers(str string) string {
indexFunc := func(r rune) bool {
return r >= '0' && r <= '9'
}
value := strings.IndexFunc(str,indexFunc)
if value >= 0 && value <= len(str) {
return str[:value]
}
return str
}
This is more or less equivalent to the loop version, and eliminates a search over a long string to check for a match every character from my previous answer. But I think it looks cleaner than the loop version, which is obviously a manner of taste.

The code below will continue grabbing characters until it reaches a digit.
int i = 0;
String string2test = "EW29DZ";
String stringOutput = "";
while (!Character.isDigit(string2test.charAt(i)))
{
stringOutput = stringOutput + string2test.charAt(i);
i++;
}

Related

How can I clean the text for search using RegEx

I can use the below code to search if the text str contains any or both of the keys, i.e.if it contains "MS" or "dynamics" or both of them
package main
import (
"fmt"
"regexp"
)
func main() {
keys := []string{"MS", "dynamics"}
keysReg := fmt.Sprintf("(%s %s)|%s|%s", keys[0], keys[1], keys[0], keys[1]) // => "(MS dynamics)|MS|dynamics"
fmt.Println(keysReg)
str := "What is MS dynamics, is it a product from MS?"
re := regexp.MustCompile(`(?i)` + keysReg)
matches := re.FindAllString(str, -1)
fmt.Println("We found", len(matches), "matches, that are:", matches)
}
I want the user to enter his phrase, so I trim unwanted words and characters, then doing the search as per above.
Let's say the user input was: This,is,a,delimited,string and I need to build the keys variable dynamically to be (delimited string)|delimited|string so that I can search for my variable str for all the matches, so I wrote the below:
s := "This,is,a,delimited,string"
t := regexp.MustCompile(`(?i),|\.|this|is|a`) // backticks are used here to contain the expression, (?i) for case insensetive
v := t.Split(s, -1)
fmt.Println(len(v))
fmt.Println(v)
But I got the output as:
8
[ delimited string]
What is the wrong part in my cleaning of the input text, I'm expecting the output to be:
2
[delimited string]
Here is my playground
To quote the famous quip from Jamie Zawinski,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Two things:
Instead of trying to weed out garbage from the string ("cleaning" it), extract complete words from it instead.
Unicode is a compilcated matter; so even after you have succeeded with extracting words, you have to make sure your words are properly "escaped" to not contain any characters which might be interpreted as RE syntax before building a regexp of them.
package main
import (
"errors"
"fmt"
"regexp"
"strings"
)
func build(words ...string) (*regexp.Regexp, error) {
var sb strings.Builder
switch len(words) {
case 0:
return nil, errors.New("empty input")
case 1:
return regexp.Compile(regexp.QuoteMeta(words[0]))
}
quoted := make([]string, len(words))
for i, w := range words {
quoted[i] = regexp.QuoteMeta(w)
}
sb.WriteByte('(')
for i, w := range quoted {
if i > 0 {
sb.WriteByte('\x20')
}
sb.WriteString(w)
}
sb.WriteString(`)|`)
for i, w := range quoted {
if i > 0 {
sb.WriteByte('|')
}
sb.WriteString(w)
}
return regexp.Compile(sb.String())
}
var words = regexp.MustCompile(`\pL+`)
func main() {
allWords := words.FindAllString("\tThis\v\x20\x20,\t\tis\t\t,?a!,¿delimited?,string‽", -1)
re, err := build(allWords...)
if err != nil {
panic(err)
}
fmt.Println(re)
}
Further reading:
https://pkg.go.dev/regexp/syntax
https://pkg.go.dev/regexp#QuoteMeta
https://pkg.go.dev/unicode#pkg-variables and https://pkg.go.dev/unicode#Categories

How in golang to remove the last letter from the string?

Let's say I have a string called varString.
varString := "Bob,Mark,"
QUESTION: How to remove the last letter from the string? In my case, it's the second comma.
How to remove the last letter from the string?
In Go, character strings are UTF-8 encoded. Unicode UTF-8 is a variable-length character encoding which uses one to four bytes per Unicode character (code point).
For example,
package main
import (
"fmt"
"unicode/utf8"
)
func trimLastChar(s string) string {
r, size := utf8.DecodeLastRuneInString(s)
if r == utf8.RuneError && (size == 0 || size == 1) {
size = 0
}
return s[:len(s)-size]
}
func main() {
s := "Bob,Mark,"
fmt.Println(s)
s = trimLastChar(s)
fmt.Println(s)
}
Playground: https://play.golang.org/p/qyVYrjmBoVc
Output:
Bob,Mark,
Bob,Mark
Here's a much simpler method that works for unicode strings too:
func removeLastRune(s string) string {
r := []rune(s)
return string(r[:len(r)-1])
}
Playground link: https://play.golang.org/p/ezsGUEz0F-D
Something like this:
s := "Bob,Mark,"
s = s[:len(s)-1]
Note that this does not work if the last character is not represented by just one byte.
newStr := strings.TrimRightFunc(str, func(r rune) bool {
return !unicode.IsLetter(r) // or any other validation can go here
})
This will trim anything that isn't a letter on the right hand side.

Using default value in golang func

I'm trying to implement a default value according to the option 1 of the post Golang and default values. But when I try to do go install the following error pops up in the terminal:
not enough arguments in call to test.Concat1
have ()
want (string)
Code:
package test
func Concat1(a string) string {
if a == "" {
a = "default-a"
}
return fmt.Sprintf("%s", a)
}
// other package
package main
func main() {
test.Concat1()
}
Thanks in advance.
I don't think what you are trying to do will work that way. You may want to opt for option #4 from the page you cited, which uses variadic variables. In your case looks to me like you want just a string, so it'd be something like this:
func Concat1(a ...string) string {
if len(a) == 0 {
return "a-default"
}
return a[0]
}
Go does not have optional defaults for function arguments.
You may emulate them to some extent by having a special type
to contain the set of parameters for a function.
In your toy example that would be something like
type Concat1Args struct {
a string
}
func Concat1(args Concat1Args) string {
if args.a == "" {
args.a = "default-a"
}
return fmt.Sprintf("%s", args.a)
}
The "trick" here is that in Go each type has its respective
"zero value", and when producing a value of a composite type
using the so-called literal, it's possible to initialize only some of the type's fields, so in our example that would be
s := Concat1(Concat1Args{})
vs
s := Concat1(Concat1Args{"whatever"})
I know that looks clumsy, and I have showed this mostly for
demonstration purpose. In real production code, where a function
might have a dozen of parameters or more, having them packed
in a dedicate composite type is usually the only sensible way
to go but for a case like yours it's better to just explicitly
pass "" to the function.
Golang does not support default parameters. Accordingly, variadic arguments by themselves are not analogous. However, variadic functions with the use of error handling can 'resemble' the pattern. Try the following as a simple example:
package main
import (
"errors"
"log"
)
func createSeries(p ...int) ([]int, error) {
usage := "Usage: createSeries(<length>, <optional starting value>), length should be > 0"
if len(p) == 0 {
return nil, errors.New(usage)
}
n := p[0]
if n <= 0 {
return nil, errors.New(usage)
}
var base int
if len(p) == 2 {
base = p[1]
} else if len(p) > 2 {
return nil, errors.New(usage)
}
vals := make([]int, n)
for i := 0; i < n; i++ {
vals[i] = base + i
}
return vals, nil
}
func main() {
answer, err := createSeries(4, -9)
if err != nil {
log.Fatal(err)
}
log.Println(answer)
}
Default parameters work differently in Go than they do in other languages. In a function there can be one ellipsis, always at the end, which will keep a slice of values of the same type so in your case this would be:
func Concat1(a ...string) string {
but that means that the caller may pass in any number of arguments >= 0. Also you need to check that the arguments in the slice are not empty and then assign them yourself. This means they do not get assigned a default value through any kind of special syntax in Go. This is not possible but you can do
if a[0] == "" {
a[0] = "default value"
}
If you want to make sure that the user passes either zero or one strings, just create two functions in your API, e.g.
func Concat(a string) string { // ...
func ConcatDefault() string {
Concat("default value")
}

Get current position in stream from net/html tokenizer

I'm trying to figure out if there's a way to get the current character position of a tag using the golang.org/x/net/html tokenizer library?
Simplified code looks like:
func LookForForm(body string) {
reader := strings.NewReader(body)
tokenizer := html.NewTokenizer(reader)
idx := 0
lastIdx := 0
for {
token := tokenizer.Next()
lastIdx = idx
idx = int(reader.Size()) - int(reader.Len())
switch token {
case html.ErrorToken:
return
case html.StartTagToken:
t := tokenizer.Token()
tagName := strings.ToLower(t.Data)
if tagName == "form" {
fmt.Printf("found at form at %d\n", lastIdx)
return
}
}
}
}
This doesn't work (I think) because reader is not reading character-by-character but by chunks so my calculation of Size - Len is invalid. tokenizer maintains two private span structs ( https://github.com/golang/net/blob/master/html/token.go line 147) but I am unaware of how to access them.
One possible solution that just occurred to me is to make a "reader" that only reads a single character at a time so my Size and Len calculations are always correct. But, that seems like a hack and any suggestions would be appreciated.
You might be able to accomplish what you are trying to do (not what you want) with careful arithmetic using Tokenizer's Buffered method which returns the slice of bytes currently in buffer that have yet been tokenized. But I don't think you will get what you wanted, as <div><form></form></div> would probably buffer the whole string before give you the first div token. In that case the size of the buffered content is not helpful in calculating the solution.
Tokenizing mark up lang with nested structure will almost always need to buffer the input to work. the private span attribute should be quite useless as it is only a reference in it's buffer, not absolute position from the reader.
Since the html Tokenizer is not providing an API to access the raw position of a tag in the original data, to get want you wanted I probably would just do a strings.Index or bytes.Index on the raw buffer of the token to get the position:
strings.Index(body, string(tokenizer.Raw()))
A non-buffering reader ended up working ok for me. The implementation of the reader looks something like:
package rule
import (
"errors"
"io"
"unicode/utf8"
)
type Reader struct {
s string
i int64
z int64
prevRune int64 // index of the previously read rune or -1
}
func (r *Reader) String() string {
return r.s
}
func (r *Reader) Len() int {
if r.i >= r.z {
return 0
}
return int(r.z - r.i)
}
func (r *Reader) Size() int64 {
return r.z
}
func (r *Reader) Pos() int64 {
return r.i
}
func (r *Reader) Read(b []byte) (int, error) {
if r.i >= r.z {
return 0, io.EOF
}
r.prevRune = -1
b[0] = r.s[r.i]
r.i += 1
return 1, nil
}
Then the loop for the tokenizer is fairly easy to calculate:
reader := NewReader(body)
tokenizer := html.NewTokenizer(reader)
idx := 0
lastIdx := 0
tokenLoop:
for {
token := tokenizer.Next()
switch token {
case html.ErrorToken:
break tokenLoop
case html.EndTagToken, html.TextToken, html.CommentToken, html.SelfClosingTagToken:
lastIdx = int(reader.Pos())
case html.StartTagToken:
t := tokenizer.Token()
tagName := strings.ToLower(t.Data)
idx = int(reader.Pos())
if tagName == "form" {
fmt.Printf("found at form at %d\n", lastIdx)
return
}
}
}

How to create a case insensitive map in Go?

I want to have a key insensitive string as key.
Is it supported by the language or do I have to create it myself?
thank you
Edit: What I am looking for is a way to make it by default instead of having to remember to convert the keys every time I use the map.
Edit: My initial code actually still allowed map syntax and thus allowed the methods to be bypassed. This version is safer.
You can "derive" a type. In Go we just say declare. Then you define methods on your type. It just takes a very thin wrapper to provide the functionality you want. Note though, that you must call get and set with ordinary method call syntax. There is no way to keep the index syntax or optional ok result that built in maps have.
package main
import (
"fmt"
"strings"
)
type ciMap struct {
m map[string]bool
}
func newCiMap() ciMap {
return ciMap{m: make(map[string]bool)}
}
func (m ciMap) set(s string, b bool) {
m.m[strings.ToLower(s)] = b
}
func (m ciMap) get(s string) (b, ok bool) {
b, ok = m.m[strings.ToLower(s)]
return
}
func main() {
m := newCiMap()
m.set("key1", true)
m.set("kEy1", false)
k := "keY1"
b, _ := m.get(k)
fmt.Println(k, "value is", b)
}
Two possiblities:
Convert to uppercase/lowercase if you're input set is guaranteed to be restricted to only characters for which a conversion to uppercase/lowercase will yield correct results (may not be true for some Unicode characters)
Convert to Unicode fold case otherwise:
Use unicode.SimpleFold(rune) to convert a unicode rune to fold case. Obviously this is dramatically more expensive an operation than simple ASCII-style case mapping, but it is also more portable to other languages. See the source code for EqualsFold to see how this is used, including how to extract Unicode runes from your source string.
Obviously you'd abstract this functionality into a separate package instead of re-implementing it everywhere you use the map. This should go without saying, but then you never know.
Here is something more robust than just strings.ToLower, you can use
the golang.org/x/text/cases package. Example:
package main
import "golang.org/x/text/cases"
func main() {
s := cases.Fold().String("March")
println(s == "march")
}
If you want to use something from the standard library, I ran this test:
package main
import (
"strings"
"unicode"
)
func main() {
var (
lower, upper int
m = make(map[string]bool)
)
for n := '\u0080'; n <= '\u07FF'; n++ {
q, r := n, n
for {
q = unicode.SimpleFold(q)
if q == n { break }
for {
r = unicode.SimpleFold(r)
if r == n { break }
s, t := string(q), string(r)
if m[t + s] { continue }
if strings.ToLower(s) == strings.ToLower(t) { lower++ }
if strings.ToUpper(s) == strings.ToUpper(t) { upper++ }
m[s + t] = true
}
}
}
println(lower == 951, upper == 989)
}
So as can be seen, ToUpper is the marginally better choice.

Resources