Extract part of string in Golang?

Extract part of string in Golang? - go

I'm learning Golang so I can rewrite some of my shell scripts.
I have URL's that look like this:
https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value
I want to extract the following part:
https://example-1.example.com/a/c482dfad3573acff324c/list.txt
In a shell script I would do something like this:
echo "$myString" | grep -o 'http://.*.txt'
What is the best way to do the same thing in Golang, only by using the standard library?

There are a few options:
// match regexp as in question
pat := regexp.MustCompile(`https?://.*\.txt`)
s := pat.FindString(myString)
// everything before the query
s := strings.Split(myString, "?")[0] string
// same as previous, but avoids []string allocation
s := myString
if i := strings.IndexByte(s, '?'); i >= 0 {
s = s[:i]
}
// parse and clear query string
u, err := url.Parse(myString)
u.RawQuery = ""
s := u.String()
The last option is the best because it will handle all possible corner cases.
try it on the playground

you may use strings.IndexRune, strings.IndexByte, strings.Split, strings.SplitAfter, strings.FieldsFunc, url.Parse, regexp or your function.
first most simple way:
you may use i := strings.IndexRune(s, '?') or i := strings.IndexByte(s, '?') then s[:i] like this (with commented output):
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
i := strings.IndexByte(s, '?')
if i != -1 {
fmt.Println(s[:i]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
}
or you may use url.Parse(s) (I'd use this):
package main
import "fmt"
import "net/url"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
url, err := url.Parse(s)
if err == nil {
url.RawQuery = ""
fmt.Println(url.String()) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
}
or you may use regexp.MustCompile(".*\\.txt"):
package main
import "fmt"
import "regexp"
var rgx = regexp.MustCompile(`.*\.txt`)
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
fmt.Println(rgx.FindString(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
or you may use splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' }) then splits[0]:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' })
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
you may use splits := strings.Split(s, "?") then splits[0]:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.Split(s, "?")
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
you may use splits := strings.SplitAfter(s, ".txt") then splits[0]:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.SplitAfter(s, ".txt")
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
or you may use your function (most independent way):
package main
import "fmt"
func left(s string) string {
for i, r := range s {
if r == '?' {
return s[:i]
}
}
return ""
}
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
fmt.Println(left(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

If you are prosessing only URLs, you can use Go's net/url library https://golang.org/pkg/net/url/ to parse the URL, truncate the Query and Fragment parts (Query would be parm1=value,parm2=value etc.), and extract the remaining portion scheme://host/path, as in the following example (https://play.golang.org/p/Ao0jU22NyA):
package main
import (
"fmt"
"net/url"
)
func main() {
u, _ := url.Parse("https://example-1.example.com/a/b/c/list.txt?parm1=value,parm2=https%3A%2F%2Fexample.com%2Fa%3Fparm1%3Dvalue%2Cparm2%3Dvalue#somefragment")
u.RawQuery, u.Fragment = "", ""
fmt.Printf("%s\n", u)
}
Output:
https://example-1.example.com/a/b/c/list.txt

I used regexp package extract string from string .
In this example I wanted to extract between and <\PERSON> , did this by re expression and and replaced and <\PERSON> by re1 expression.
for loop used for if there there are multiple match and re1 format used for replace.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`<PERSON>(.*?)</PERSON>`)
string_l := "java -mx500m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -textFile PatrickYe.txt -outputFormat inlineXML 2> /dev/null I complained to <ORGANIZATION>Microsoft</ORGANIZATION> about <PERSON>Bill Gates</PERSON>.They told me to see the mayor of <PERSON>New York</PERSON>.,"
x := re.FindAllString(string_l, -1)
fmt.Println(x)
for v,st:= range x{
re1 := regexp.MustCompile(`<(.?)PERSON>`)
y1 := re1.ReplaceAllLiteralString(st,"")
fmt.Println(v,st," : sdf : ",y1)
}
}
Play with Go

Related

Using regular expressions in Go to Identify a common pattern

I'm trying to parse this string goats=1\r\nalligators=false\r\ntext=works.
contents := "goats=1\r\nalligators=false\r\ntext=works"
compile, err := regexp.Compile("([^#\\s=]+)=([a-zA-Z0-9.]+)")
if err != nil {
return
}
matchString := compile.FindAllStringSubmatch(contents, -1)
my Output looks like [[goats=1 goats 1] [alligators=false alligators false] [text=works text works]]
What I'm I doing wrong in my expression to cause goats=1 to be valid too? I only want [[goats 1]...]

For another approach, you can use the strings package instead:
package main
import (
"fmt"
"strings"
)
func parse(s string) map[string]string {
m := make(map[string]string)
for _, kv := range strings.Split(s, "\r\n") {
a := strings.Split(kv, "=")
m[a[0]] = a[1]
}
return m
}
func main() {
m := parse("goats=1\r\nalligators=false\r\ntext=works")
fmt.Println(m) // map[alligators:false goats:1 text:works]
}
https://golang.org/pkg/strings

Need to convert 2 Dimensional array into string and replace the last comma with full stop.(Golang)

How can I create string out of multi-dimensional array, preferably using goroutine or channel, in order to replace the last comma of the element with a full-stop?
Thanks
package main
import (
"fmt"
)
func main() {
pls := [][]string {
{"C", "C++"},
{"JavaScript"},
{"Go", "Rust"},
}
for _, v1 := range pls {
for _, v2 := range v1 {
fmt.Print(v2,", ")
}
}
}

I guess classic strings.Join would be easier to implement and maintain:
package main
import (
"fmt"
"strings"
)
func main() {
pls := [][]string{
{"C", "C++"},
{"JavaScript"},
{"Go", "Rust"},
}
var strs []string
for _, v1 := range pls {
s := strings.Join(v1, ", ")
strs = append(strs, s)
}
s := strings.Join(strs, ", ")
fmt.Println(s)
}
https://play.golang.org/p/2Nuv00PV5j

How check whether a file contains a string or not?

I've tried to search on Google for pattern matching function between file and string but I could not find it. I've also tried to use strings.Contains(), but it gives wrong result in large input file.
Is there any function in Go for searching string in some file?
If no, is there another way to resolve this problem?
Here is my code:
package main
import (
"bufio"
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
reader := bufio.NewReader(os.Stdin)
fmt.Print("Enter text: ")
text, _ := reader.ReadString('\n')
// read the whole file at once
b, err := ioutil.ReadFile("input.txt")
if err != nil {
panic(err)
}
s := string(b)
length := len(s)
//check whether s contains substring text
fmt.Println(strings.Contains(s, text))
}

If I read your question correctly you want to read from a file and determine if a string entered at the command line is in that file... And I think the problem that you are seeing has to do with the string delimiter, the reader.ReadString('\n') bit, and not string.Contains().
In my opinion it will be a little bit easier to make what you want work with fmt.Scanln; it will simplify things and will return a result that I'm pretty sure is what you want. Try this variation of your code:
package main
import (
"fmt"
"io/ioutil"
"strings"
)
func main() {
var text string
fmt.Print("Enter text: ")
// get the sub string to search from the user
fmt.Scanln(&text)
// read the whole file at once
b, err := ioutil.ReadFile("input.txt")
if err != nil {
panic(err)
}
s := string(b)
// //check whether s contains substring text
fmt.Println(strings.Contains(s, text))
}

I am just adding a flag to use command line arguments. If nothing is passed it will prompt you :).
package main
import (
"flag"
"fmt"
"io/ioutil"
"strings"
)
//Usage go run filename -text=dataYouAreLookingfor
//if looking for Nissan in file the command will be
// go run filename -text=Nissan
func main() {
var text string
// use it as cmdline argument
textArg := flag.String("text", "", "Text to search for")
flag.Parse()
// if cmdline arg was not passed ask
if fmt.Sprintf("%s", *textArg) == "" {
fmt.Print("Enter text: ")
// get the sub string to search from the user
fmt.Scanln(&text)
} else {
text = fmt.Sprintf("%s", *textArg)
}
// read the whole file at once
b, err := ioutil.ReadFile("input.txt")
if err != nil {
panic(err)
}
s := string(b)
// //check whether s contains substring text
fmt.Println(strings.Contains(s, text))
}

How to access a capturing group from regexp.ReplaceAllFunc?

How can I access a capture group from inside ReplaceAllFunc()?
package main
import (
"fmt"
"regexp"
)
func main() {
body := []byte("Visit this page: [PageName]")
search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")
body = search.ReplaceAllFunc(body, func(s []byte) []byte {
// How can I access the capture group here?
})
fmt.Println(string(body))
}
The goal is to replace [PageName] with PageName.
This is the last task under the "Other tasks" section at the bottom of the Writing Web Applications Go tutorial.

I agree that having access to capture group while inside of your function would be ideal, I don't think it's possible with regexp.ReplaceAllFunc.
Only thing that comes to my mind right now regard how to do this with that function is this:
package main
import (
"fmt"
"regexp"
)
func main() {
body := []byte("Visit this page: [PageName] [OtherPageName]")
search := regexp.MustCompile("\\[[a-zA-Z]+\\]")
body = search.ReplaceAllFunc(body, func(s []byte) []byte {
m := string(s[1 : len(s)-1])
return []byte("" + m + "")
})
fmt.Println(string(body))
}
EDIT
There is one other way I know how to do what you want. First thing you need to know is that you can specify non capturing group using syntax (?:re) where re is your regular expression. This is not essential, but will reduce number of not interesting matches.
Next thing to know is regexp.FindAllSubmatcheIndex. It will return slice of slices, where each internal slice represents ranges of all submatches for given matching of regexp.
Having this two things, you can construct somewhat generic solution:
package main
import (
"fmt"
"regexp"
)
func ReplaceAllSubmatchFunc(re *regexp.Regexp, b []byte, f func(s []byte) []byte) []byte {
idxs := re.FindAllSubmatchIndex(b, -1)
if len(idxs) == 0 {
return b
}
l := len(idxs)
ret := append([]byte{}, b[:idxs[0][0]]...)
for i, pair := range idxs {
// replace internal submatch with result of user supplied function
ret = append(ret, f(b[pair[2]:pair[3]])...)
if i+1 < l {
ret = append(ret, b[pair[1]:idxs[i+1][0]]...)
}
}
ret = append(ret, b[idxs[len(idxs)-1][1]:]...)
return ret
}
func main() {
body := []byte("Visit this page: [PageName] [OtherPageName][XYZ] [XY]")
search := regexp.MustCompile("(?:\\[)([a-zA-Z]+)(?:\\])")
body = ReplaceAllSubmatchFunc(search, body, func(s []byte) []byte {
m := string(s)
return []byte("" + m + "")
})
fmt.Println(string(body))
}

If you want to get group in ReplaceAllFunc, you can use ReplaceAllString to get the subgroup.
package main
import (
"fmt"
"regexp"
)
func main() {
body := []byte("Visit this page: [PageName]")
search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")
body = search.ReplaceAllFunc(body, func(s []byte) []byte {
// How can I access the capture group here?
group := search.ReplaceAllString(string(s), `$1`)
fmt.Println(group)
// handle group as you wish
newGroup := "<a href='/view/" + group + "'>" + group + "</a>"
return []byte(newGroup)
})
fmt.Println(string(body))
}
And when there are many groups, you are able to get each group by this way, then handle each group and return desirable value.

You have to call ReplaceAllFunc first and within the function call FindStringSubmatch on the same regex again. Like:
func (p parser) substituteEnvVars(data []byte) ([]byte, error) {
var err error
substituted := p.envVarPattern.ReplaceAllFunc(data, func(matched []byte) []byte {
varName := p.envVarPattern.FindStringSubmatch(string(matched))[1]
value := os.Getenv(varName)
if len(value) == 0 {
log.Printf("Fatal error substituting environment variable %s\n", varName)
}
return []byte(value)
});
return substituted, err
}

How to scan a big.Int from standard input in Go

Is there a way to scan a big.Int directly from the standard input in Go? Right now I'm doing this:
package main
import (
"fmt"
"math/big"
)
func main() {
w := new(big.Int)
var s string
fmt.Scan(&s)
fmt.Sscan(s, w)
fmt.Println(w)
}
I also could have used .SetString. But, is there a way to Scan the big.Int directly from the standard input without scanning a string or an integer first?

For example,
package main
import (
"fmt"
"math/big"
)
func main() {
w := new(big.Int)
n, err := fmt.Scan(w)
fmt.Println(n, err)
fmt.Println(w.String())
}
Input (stdin):
295147905179352825857
Output (stdout):
1 <nil>
295147905179352825857

As far as I know - no, there's no other way. In fact, what you've got is the default example they have for scanning big.Int in the documentation.
package main
import (
"fmt"
"log"
"math/big"
)
func main() {
// The Scan function is rarely used directly;
// the fmt package recognizes it as an implementation of fmt.Scanner.
i := new(big.Int)
_, err := fmt.Sscan("18446744073709551617", i)
if err != nil {
log.Println("error scanning value:", err)
} else {
fmt.Println(i)
}
}
You can see the relevant section here - http://golang.org/pkg/math/big/#Int.Scan

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extract part of string in Golang? - go

Related

Using regular expressions in Go to Identify a common pattern

Need to convert 2 Dimensional array into string and replace the last comma with full stop.(Golang)

How check whether a file contains a string or not?

How to access a capturing group from regexp.ReplaceAllFunc?

How to scan a big.Int from standard input in Go

Categories

Resources