Regular Expression with If Else Condition - go

I have a problem with If Else Condition in Regex. I have a file which contains the below format. I was looking for return value to be either 0.0.985 or 3.3.5-3811.
I was trying to use if else condition in regex but unable to do so, can anyone explain me while solving the problem please.
random-app-0.0.985.tgz
busy-app-7.3.1.2-3.3.5-3811-a19874elkc-123254376584.zip
Below is the Go code I am trying to use
package main
import (
"fmt"
"io/ioutil"
"regexp"
)
func main(){
content, err:= ioutil.ReadFile("version.txt")
if err != nil{
fmt.Println(err)
}
version:= string(content)
re:= regexp.MustCompile(`(\d+).(\d+).(\d+)|(\d+).(\d+).(\d+).(\d+)`)
result:= re.FindAllStringSubmatch(version,-1)
for i:= range(result){
fmt.Println(result[i][0])
}
}
Output is coming like
0.0.985
7.3.1
2-3.3
5-3811
19874
123254376584

The following regexp can be used: [\d\.]+[\.-][\d]{2,}
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)[\d\.]+[\.-][\d]{2,}`)
var str = `random-app-0.0.985.tgz
busy-app-7.3.1.2-3.3.5-3811-a19874elkc-123254376584.zip`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
}
The output
0.0.985 found at index 0
3.3.5-3811 found at index 1
playground
?m multi line modifier. Causes ^ and $ to match the begin/end of each line (not only begin/end of string). In this case it does not make to much difference. It will work without it.
[\d\.]+ matches at least once (quantifier +) a sequence of a digit or a dot
[\.-] matches a dot or a hypen
[\d]{2,} matches at least two digits (quantifier {2,})

One problem with your code is that in a regular expression . matches any character but you're intending it to match a literal dot. Use \. or [.] instead.

Related

Go: CSV NewReader not getting the correct number of fields

How to get the correct number of fields when using NewReader ?
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||""FOO""||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
}
https://go.dev/play/p/gg-KYRciWFH
It should return 5, but instead I'm getting 3:
record length: 3
Program exited.
EDIT
I'm actually working with a big CSV file containing many double quotes.
After examining your code, I decided to modify it slightly and then print the results:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`x||""FOO""|x|x\n`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v, Data: %v\n", len(record), strings.Join(record, ", "))
}
When you run this, the data is printed as x, , "FOO"||x|x\n". My thought is that when you end your entry with two double-quotes, the parser is assuming the string is still being quoted and therefore lumps the rest of the line into the third entry. This appears to be a bug with how lazy-quoting works in the csv package, however, when examining the documentation for LazyQuotes, you'll see this:
If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.
This doesn't mention anything about finding double quotes within double quotes. To fix this, you should either remove the quotes altogether or replace the double double-quotes ("") with double quotes (").
One other thing you might consider would be using the gocsv package. I've worked with this package in the past and it's reasonably stable. I'm not sure how it would respond to this specific issue, but it might be worth your time checking it out.
Note:
The encoding/csv package implements the RFC 4180 standard. If you have such input, that's not an RFC 4180 compliant CSV file and encoding/csv will not parse it properly.
You're misusing the quotes. Quoting a single field FOO is like this:
parser := csv.NewReader(strings.NewReader(`||"FOO"||`))
If you want the field to have the "FOO" value, you have to use 2 double quotes in a quoted field, so it should be:
parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))
This will output 5. Try it on the Go Playground.
What you have is this:
parser := csv.NewReader(strings.NewReader(`||""FOO""||`))
Since the second " character is not followed by a separator character, the field is not interrupted and the rest is processed as the content of the quoted field (which will terminate at the end of the line).
If you print the record:
fmt.Println(record)
fmt.Printf("%#v", record)
Output will be (try it on the Go Playground):
[ "FOO"||]
[]string{"", "", "\"FOO\"||"}
Quotes are a part of csv format.
There is a problem with go/csv shielding, you can try something like this:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||FOO||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
fmt.Println(strings.Join(record, " /SEP/ "))
}
or like this:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))
parser.Comma = '|'
parser.LazyQuotes = true
record, err := parser.Read()
if err != nil {
log.Fatal(err)
}
fmt.Printf("record length: %v\n", len(record))
fmt.Println(strings.Join(record, " SEP "))
}

Golang Convert UTF-8 string to ASCII [duplicate]

How can I remove all diacritics from the given UTF8 encoded string using Go? e.g. transform the string "žůžo" => "zuzo". Is there a standard way?
You can use the libraries described in Text normalization in Go.
Here's an application of those libraries:
// Example derived from: http://blog.golang.org/normalization
package main
import (
"fmt"
"unicode"
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
)
func isMn(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
func main() {
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
result, _, _ := transform.String(t, "žůžo")
fmt.Println(result)
}
To expand a bit on the existing answer:
The internet standard for comparing strings of different character sets is called "PRECIS" (Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols) and is documented in RFC7564. There is also a Go implementation at golang.org/x/text/secure/precis.
None of the standard profiles will do what you want, but it would be fairly straight forward to define a new profile that did. You would want to apply Unicode Normalization Form D ("D" for "Decomposition", which means the accents will be split off and be their own combining character), and then remove any combining character as part of the additional mapping rule, then recompose with the normalization rule. Something like this:
package main
import (
"fmt"
"unicode"
"golang.org/x/text/secure/precis"
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
)
func main() {
loosecompare := precis.NewIdentifier(
precis.AdditionalMapping(func() transform.Transformer {
return transform.Chain(norm.NFD, transform.RemoveFunc(func(r rune) bool {
return unicode.Is(unicode.Mn, r)
}))
}),
precis.Norm(norm.NFC), // This is the default; be explicit though.
)
p, _ := loosecompare.String("žůžo")
fmt.Println(p, loosecompare.Compare("žůžo", "zuzo"))
// Prints "zuzo true"
}
This lets you expand your comparison with more options later (eg. width mapping, case mapping, etc.)
It's also worth noting that removing accents is almost never what you actually want to do when comparing strings like this, however, without knowing your use case I can't actually make that assertion about your project. To prevent the proliferation of precis profiles it's good to use one of the existing profiles where possible. Also note that no effort was made to optimize the example profile.
transform.RemoveFunc is deprecated.
Instead you can use the Remove function from runes package:
t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
result, _, _ := transform.String(t, "žůžo")
fmt.Println(result)
For anyone looking how to remove (or replace / flatten) Polish diacritics in Go, you may define a mapping for runes:
package main
import (
"fmt"
"golang.org/x/text/runes"
"golang.org/x/text/secure/precis"
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
)
func main() {
trans := transform.Chain(
norm.NFD,
precis.UsernameCaseMapped.NewTransformer(),
runes.Map(func(r rune) rune {
switch r {
case 'ą':
return 'a'
case 'ć':
return 'c'
case 'ę':
return 'e'
case 'ł':
return 'l'
case 'ń':
return 'n'
case 'ó':
return 'o'
case 'ś':
return 's'
case 'ż':
return 'z'
case 'ź':
return 'z'
}
return r
}),
norm.NFC,
)
result, _, _ := transform.String(trans, "ŻóŁć")
fmt.Println(result)
}
On Go Playground: https://play.golang.org/p/3ulPnOd3L91

Don't understand func strings.TrimLeft in Go

I'm trying to test code that uses func strings.TrimLeft. I needed to see an MVCE of it in action, so I went to the API specification.
It came with an example, which I exported, with the following code:
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Print(strings.TrimLeft("¡¡¡Hello, Gophers!!!", "!¡"))
}
Upon running it, you get Hello, Gophers!!!
I decided to prepend the input string, changing the code to
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Print(strings.TrimLeft("irrelevant text¡¡¡Hello, Gophers!!!", "!¡"))
}
The result string is the whole input string: irrelevant text¡¡¡Hello, Gophers!!!
Aren't at least the cutset characters supposed to be removed?!
It is an industry standard that trim implies a proper suffix or prefix.
trimLeft will only remove matching characters from the beginning of the string and stop on the first non-match. In your example, the "i" of "irrelevant" is the first character it checks. It fails the check, so it stops trimming (i.e. it does nothing).
trimRight, by comparison, removes matches starting from the end of the string in descending index order.
Aren't at least the cutset characters supposed to be removed?!
All of the ones at the beginning of the string. There are zero of those, so zero characters are removed.

TrimRight not working as I expected

Below is the code of TrimRight, on latest Go version
I am observing a behavior, which maybe I am misunderstanding but as my understanding, the below code should throw output as
Hello
But the output is coming as
Hell
Why is that so ? Note, I have kept a space before the Gophers in cutset, so fundamentally it should remove the " Gophers" from the primary string, leaving behind just Hello
package main
import (
"fmt"
"strings"
)
func main() {
result := strings.TrimRight("Hello Gophers", " Gophers")
fmt.Println(result, len(result))
}
As documented, TrimRight removes all matching characters from the right. Because o is included in your list (" Gophers"), it is also trimmed. If you want to trim that exact substring use TrimSuffix instead.

Does my gofmt work wrongly or I don't understand something?

I suppose that my gofmt works not how it's supposed to, am I right ?
Original file:
package main
import "fmt"
func main() {
fmt.Printf("hello, world\n")
}
Then I did:
gofmt -r 'h -> H' -w "hello.go"
Content of the file after:
package H
import "fmt"
func H() {
H
}
Presumably gofmt works as its authors intended, which might be different from what you expected.
The documentation says:
Both pattern and replacement must be valid Go expressions. In the pattern, single-character lowercase identifiers serve as wildcards matching arbitrary sub-expressions; those expressions will be substituted for the same identifiers in the replacement.
As you have only a single lowercase letter in the pattern, it matches all sub-expressions. And then replaces them with H. Let's take your example further, consider this:
package main
import "fmt"
func compare(a, b int) {
if a + b < a * b {
fmt.Printf("hello, world\n")
}
}
After the same gofmt command the above code becomes:
package H
import "fmt"
func H(H, H H) {
if H+H < H*H {
H
}
}
If this is not what you want, then you should use a more specific pattern expression.

Resources