I have a string in Golang that is surrounded by quote marks. My goal is to remove all quote marks on the sides, but to ignore all quote marks in the interior of the string. How should I go about doing this? My instinct tells me to use a RemoveAt function like in C#, but I don't see anything like that in Go.
For instance:
"hello""world"
should be converted to:
hello""world
For further clarification, this:
"""hello"""
would become this:
""hello""
because the outer ones should be removed ONLY.
Use a slice expression:
s = s[1 : len(s)-1]
If there's a possibility that the quotes are not present, then use this:
if len(s) > 0 && s[0] == '"' {
s = s[1:]
}
if len(s) > 0 && s[len(s)-1] == '"' {
s = s[:len(s)-1]
}
playground example
strings.Trim() can be used to remove the leading and trailing whitespace from a string. It won't work if the double quotes are in between the string.
// strings.Trim() will remove all the occurrences from the left and right
s := `"""hello"""`
fmt.Println("Before Trim: " + s) // Before Trim: """hello"""
fmt.Println("After Trim: " + strings.Trim(s, "\"")) // After Trim: hello
// strings.Trim() will not remove any occurrences from inside the actual string
s2 := `""Hello" " " "World""`
fmt.Println("\nBefore Trim: " + s2) // Before Trim: ""Hello" " " "World""
fmt.Println("After Trim: " + strings.Trim(s2, "\"")) // After Trim: Hello" " " "World
Playground link - https://go.dev/play/p/yLdrWH-1jCE
Use slice expressions. You should write robust code that provides correct output for imperfect input. For example,
package main
import "fmt"
func trimQuotes(s string) string {
if len(s) >= 2 {
if s[0] == '"' && s[len(s)-1] == '"' {
return s[1 : len(s)-1]
}
}
return s
}
func main() {
tests := []string{
`"hello""world"`,
`"""hello"""`,
`"`,
`""`,
`"""`,
`goodbye"`,
`"goodbye"`,
`goodbye"`,
`good"bye`,
}
for _, test := range tests {
fmt.Printf("`%s` -> `%s`\n", test, trimQuotes(test))
}
}
Output:
`"hello""world"` -> `hello""world`
`"""hello"""` -> `""hello""`
`"` -> `"`
`""` -> ``
`"""` -> `"`
`goodbye"` -> `goodbye"`
`"goodbye"` -> `goodbye`
`goodbye"` -> `goodbye"`
`good"bye` -> `good"bye`
You can take advantage of slices to remove the first and last element of the slice.
package main
import "fmt"
func main() {
str := `"hello""world"`
if str[0] == '"' {
str = str[1:]
}
if i := len(str)-1; str[i] == '"' {
str = str[:i]
}
fmt.Println( str )
}
Since a slice shares the underlying memory, this does not copy the string. It just changes the str slice to start one character over, and end one character sooner.
This is how the various bytes.Trim functions work.
A one-liner using regular expressions...
quoted = regexp.MustCompile(`^"(.*)"$`).ReplaceAllString(quoted,`$1`)
But it doesn't necessarily handle escaped quotes they way you might want.
The Go Playground
Translated from here.
Related
When I type in the command, give a space before hitting the enter button, it works fine, but it doesn't work if there is no space
I have tried several ways to fix this, but have been unable to
import (
"bufio"
"fmt"
"os"
"strings"
)
func main() {
var notes []string
for {
fmt.Print("Enter a command and data: ")
reader := bufio.NewReader(os.Stdin)
line, _ := reader.ReadString('\n')
var joinedNote string
var note []string
splittedString := strings.Split(line, " ")
if splittedString[0] == "create" && len(splittedString) > 1 {
i := 1
for ; i < len(splittedString); i++ {
note = append(note, splittedString[i])
}
joinedNote = strings.Join(note, "")
notes = append(notes, joinedNote)
fmt.Println("[OK] The note was successfully created")
}
if splittedString[0] == "list" || string(line) == "list" {
for i, noteList := range notes {
newNote := strings.TrimSpace(noteList)
fmt.Printf("[Info] %d: %s!\n", i, newNote)
}
}
if splittedString[0] == "clear" || line == "clear" {
notes = nil
fmt.Println("[OK] All notes were successfully deleted")
}
if splittedString[0] == "exit" || line == "exit" {
fmt.Println("[Info] Bye!")
os.Exit(0)
}
}
}
The reason for this is that you are including the \n in line you capture from the user and without the space after it, the \n gets tagged onto the word you are looking for ( create\n does not equal create ). Easiest way to fix this is to manually remove the trailing \n with line = line[:len(line)-1].
Here is a little more a deep dive. First the ReadString method says it included the delimiter, in this case \n, you give it:
ReadString reads until the first occurrence of delim in the input, returning a string containing the data up to and including the delimiter. So we know line will always have the \n at the end of it unless you manually remove it.
Your code worked when the word was followed by a space because your strings.Split(line," ") turned the input create \n into {"create","\n"}.
Let's say I have a string called varString.
varString := "Bob,Mark,"
QUESTION: How to remove the last letter from the string? In my case, it's the second comma.
How to remove the last letter from the string?
In Go, character strings are UTF-8 encoded. Unicode UTF-8 is a variable-length character encoding which uses one to four bytes per Unicode character (code point).
For example,
package main
import (
"fmt"
"unicode/utf8"
)
func trimLastChar(s string) string {
r, size := utf8.DecodeLastRuneInString(s)
if r == utf8.RuneError && (size == 0 || size == 1) {
size = 0
}
return s[:len(s)-size]
}
func main() {
s := "Bob,Mark,"
fmt.Println(s)
s = trimLastChar(s)
fmt.Println(s)
}
Playground: https://play.golang.org/p/qyVYrjmBoVc
Output:
Bob,Mark,
Bob,Mark
Here's a much simpler method that works for unicode strings too:
func removeLastRune(s string) string {
r := []rune(s)
return string(r[:len(r)-1])
}
Playground link: https://play.golang.org/p/ezsGUEz0F-D
Something like this:
s := "Bob,Mark,"
s = s[:len(s)-1]
Note that this does not work if the last character is not represented by just one byte.
newStr := strings.TrimRightFunc(str, func(r rune) bool {
return !unicode.IsLetter(r) // or any other validation can go here
})
This will trim anything that isn't a letter on the right hand side.
I'm trying to replace all html tag such as <div> </div> ... on empty string ( " " ) in golang with regex pattern ^[^.\/]*$/g to match all close tag. ex : </div>
My solution:
package main
import (
"fmt"
"regexp"
)
const Template = `^[^.\/]*$/g`
func main() {
r := regexp.MustCompile(Template)
s := "afsdf4534534!##!!#<div>345345afsdf4534534!##!!#</div>"
res := r.ReplaceAllString(s, "")
fmt.Println(res)
}
But output the same source string. What's wrong? Please help. Thank
Expect Result should: "afsdf4534534!##!!#345345afsdf4534534!##!!#"
For those who came here looking for a quick solution, there is a library that does this: bluemonday.
Package bluemonday provides a way of describing a whitelist of HTML elements and attributes as a policy, and for that policy to be applied to untrusted strings from users that may contain markup. All elements and attributes not on the whitelist will be stripped.
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// Do this once for each unique policy, and use the policy for the life of the program
// Policy creation/editing is not safe to use in multiple goroutines
p := bluemonday.StripTagsPolicy()
// The policy can then be used to sanitize lots of input and it is safe to use the policy in multiple goroutines
html := p.Sanitize(
`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
)
// Output:
// Google
fmt.Println(html)
}
https://play.golang.org/p/jYARzNwPToZ
The Problem with RegEx
This is a very simple RegEx replace method that removes HTML tags from well-formatted HTML in a string.
strip_html_regex.go
package main
import "regexp"
const regex = `<.*?>`
// This method uses a regular expresion to remove HTML tags.
func stripHtmlRegex(s string) string {
r := regexp.MustCompile(regex)
return r.ReplaceAllString(s, "")
}
Note: this does not work well with malformed HTML. Don't use this.
A better way
Since a string in Go can be treated as a slice of bytes it makes walking through the string and finding portions that are not in an HTML tag easy. When we Identify a valid portion of the string we can simply take a slice of that portion and append it using a strings.Builder.
strip_html.go
package main
import (
"strings"
"unicode/utf8"
)
const (
htmlTagStart = 60 // Unicode `<`
htmlTagEnd = 62 // Unicode `>`
)
// Aggressively strips HTML tags from a string.
// It will only keep anything between `>` and `<`.
func stripHtmlTags(s string) string {
// Setup a string builder and allocate enough memory for the new string.
var builder strings.Builder
builder.Grow(len(s) + utf8.UTFMax)
in := false // True if we are inside an HTML tag.
start := 0 // The index of the previous start tag character `<`
end := 0 // The index of the previous end tag character `>`
for i, c := range s {
// If this is the last character and we are not in an HTML tag, save it.
if (i+1) == len(s) && end >= start {
builder.WriteString(s[end:])
}
// Keep going if the character is not `<` or `>`
if c != htmlTagStart && c != htmlTagEnd {
continue
}
if c == htmlTagStart {
// Only update the start if we are not in a tag.
// This make sure we strip out `<<br>` not just `<br>`
if !in {
start = i
}
in = true
// Write the valid string between the close and start of the two tags.
builder.WriteString(s[end:start])
continue
}
// else c == htmlTagEnd
in = false
end = i + 1
}
s = builder.String()
return s
}
If we run these two functions with the OP's text and some malformed HTML you will see that the result is not consistent.
main.go
package main
import "fmt"
func main() {
s := "afsdf4534534!##!!#<div>345345afsdf4534534!##!!#</div>"
res := stripHtmlTags(s)
fmt.Println(res)
// Malformed HTML examples
fmt.Println("\n:: stripHTMLTags ::\n")
fmt.Println(stripHtmlTags("Do something <strong>bold</strong>."))
fmt.Println(stripHtmlTags("h1>I broke this</h1>"))
fmt.Println(stripHtmlTags("This is <a href='#'>>broken link</a>."))
fmt.Println(stripHtmlTags("I don't know ><where to <<em>start</em> this tag<."))
// Regex Malformed HTML examples
fmt.Println(":: stripHtmlRegex ::\n")
fmt.Println(stripHtmlRegex("Do something <strong>bold</strong>."))
fmt.Println(stripHtmlRegex("h1>I broke this</h1>"))
fmt.Println(stripHtmlRegex("This is <a href='#'>>broken link</a>."))
fmt.Println(stripHtmlRegex("I don't know ><where to <<em>start</em> this tag<."))
}
Output:
afsdf4534534!##!!#345345afsdf4534534!##!!#
:: stripHTMLTags ::
Do something bold.
I broke this
This is broken link.
start this tag
:: stripHtmlRegex ::
Do something bold.
h1>I broke this
This is >broken link.
I don't know >start this tag<.
Note: that the RegEx method does not remove all HTML tags consistently. To be honest, I am not good enough at RegEx to write a RegEx match string to properly handle stripping HTML.
Benchmarks
Aside from the advantage of being safer and more aggressive in the stripping of malformed HTML tags stripHtmlTags is about 4 times faster than stripHtmlRegex.
> go test -run=Calculate -bench=.
goos: windows
goarch: amd64
BenchmarkStripHtmlRegex-8 51516 22726 ns/op
BenchmarkStripHtmlTags-8 230678 5135 ns/op
if you want replace all HTML TAG, using strip of html tag.
regex to match HTML tags is not good idea.
package main
import (
"fmt"
"github.com/grokify/html-strip-tags-go"
)
func main() {
text := "afsdf4534534!##!!#<div>345345afsdf4534534!##!!#</div>"
stripped := strip.StripTags(text)
fmt.Println(text)
fmt.Println(stripped)
}
Starting from #Daniel Morelli function, I have created another function with some more possibilities.
I am sharing it here if it can be useful for someone:
//CreateCleanWords takes a string and returns a string array with all words in string
// rules:
// words of lenght >= of minAcceptedLenght
// everything between < and > is discarded
// admitted characters: numbers, letters, and all characters in validRunes map
// words not present in wordBlackList map
// word separators are space or single quote (could be improved with a map of separators)
func CreateCleanWords(s string) []string {
// Setup a string builder and allocate enough memory for the new string.
var builder strings.Builder
builder.Grow(len(s) + utf8.UTFMax)
insideTag := false // True if we are inside an HTML tag.
var c rune
var managed bool = false
var valid bool = false
var finalWords []string
var singleQuote rune = '\''
var minAcceptedLenght = 4
var wordBlackList map[string]bool = map[string]bool{
"sull": false,
"sullo": false,
"sulla": false,
"sugli": false,
"sulle": false,
"alla": false,
"all": false,
"allo": false,
"agli": false,
"alle": false,
"dell": false,
"della": false,
"dello": false,
"degli": false,
"delle": false,
"dall": false,
"dalla": false,
"dallo": false,
"dalle": false,
"dagli": false,
}
var validRunes map[rune]bool = map[rune]bool{
'à': true,
'è': true,
'é': true,
'ì': true,
'ò': true,
'ù': true,
'€': true,
'$': true,
'£': true,
'-': true,
}
for _, c = range s {
managed = false
valid = false
//show := string(c)
//fmt.Println(show)
// found < from here on ignore characters
if !managed && c == htmlTagStart {
insideTag = true
managed = true
valid = false
}
// found > characters are valid now
if !managed && c == htmlTagEnd {
insideTag = false
managed = true
valid = false
}
// if we are inside an HTML tag, we don't check anything because we won't take anything
// until we reach the tag end
if !insideTag {
if !managed && unicode.IsSpace(c) || c == singleQuote {
// found space if I have a valid word let's add it to word array
// only bigger than 3 letters
if builder.Len() >= minAcceptedLenght {
word := strings.ToLower((builder).String())
//first check if the word is not in a black list
if _, ok := wordBlackList[word]; !ok {
// the word is not in blacklist let's add to finalWords
finalWords = append(finalWords, word)
}
}
// make builder ready for next token
builder.Reset()
valid = false
managed = true
}
// letters and digits are welvome
if !managed {
valid = unicode.IsLetter(c) || unicode.IsDigit(c)
managed = valid
}
// other italian runes accepted
if !managed {
_, valid = validRunes[c]
}
if valid {
builder.WriteRune(c)
}
}
}
// remember to check the last word after exiting from for!
if builder.Len() > minAcceptedLenght {
//first check if the word is not in a black list
word := builder.String()
if _, ok := wordBlackList[word]; !ok {
// the word is not in blacklist let's add to finalWords
finalWords = append(finalWords, word)
}
builder.Reset()
}
return finalWords
}
We have tried this in production, but under certain corner cases, none of the proposed solutions really work. IF you need something is the robust, checkout Go internal library's unexported method (html-strip-tags-go pkg is basically an export of that with BSD-3 license). OR https://github.com/microcosm-cc/bluemonday is a pretty popular lib(BSD-3 as well) that we ended up using.
=================================================
Improvement on #Daniel Morell's answer. The only difference here is due to len of string evaluation on all utf-8 char. It will return between 1-4 for each char used. So len(è) would actually evaluate to 2. To fix that, we will convert string to rune.
https://go.dev/play/p/xo7Mrx5qw-_J
// Aggressively strips HTML tags from a string.
// It will only keep anything between `>` and `<`.
func stripHTMLTags(s string) string {
// Supports utf-8, since some char could take more than 1 byte. ie: len("è") -> 2
d := []rune(s)
// Setup a string builder and allocate enough memory for the new string.
var builder strings.Builder
builder.Grow(len(d) + utf8.UTFMax)
in := false // True if we are inside an HTML tag.
start := 0 // The index of the previous start tag character `<`
end := 0 // The index of the previous end tag character `>`
for i, c := range d {
// If this is the last character and we are not in an HTML tag, save it.
if (i+1) == len(d) && end >= start {
builder.WriteString(s[end:])
}
// Keep going if the character is not `<` or `>`
if c != htmlTagStart && c != htmlTagEnd {
continue
}
if c == htmlTagStart {
// Only update the start if we are not in a tag.
// This make sure we strip out `<<br>` not just `<br>`
if !in {
start = i
}
in = true
// Write the valid string between the close and start of the two tags.
builder.WriteString(s[end:start])
continue
}
// else c == htmlTagEnd
in = false
end = i + 1
}
s = builder.String()
return s
}
I want to split a string of type 'a/b/c' and allow escaping by '\'.
For example:
'foo/bar\/2.2/baz':
a=foo
b=bar/2.2
c=baz
Is there any elegant way to split by '/', ignoring '\/'?
You have two basic approaches available to you, regardless of the language you're using.
Search for all occurrences of / that are not immediately preceded by \ and perform a split.
Replace all instances of \/ with some unique symbol that does not contain /, then split on /, and replace the unique symbol with \/ again.
From a computational standpoint, the former will be more efficient.
From a coding complexity standpoint, the latter will likely be easier to write.
You can use something in the style of :
func split(str string) []string {
var parts []string
var current bytes.Buffer
escaped := false
for _, r := range str {
if r == '\\' && !escaped {
escaped = true
} else if r == '/' && !escaped {
parts = append(parts, current.String())
current.Reset()
} else {
escaped = false
current.WriteRune(r)
}
}
parts = append(parts, current.String())
return parts
}
Go Playground link : https://play.golang.org/p/RwLwFlsAW2Q
I have some strings such E2 9NZ, N29DZ, EW29DZ . I need to extract the chars before the first digit, given the above example : E, N, EW.
Am I supposed to use regex ? The strings package looks really nice but just doesn't seem to handle this case (extract everything before a specific type).
Edit:
To clarify the "question" I'm wondering what method is more idiomatic to go and perhaps likely to provide better performance.
For example,
package main
import (
"fmt"
"unicode"
)
func DigitPrefix(s string) string {
for i, r := range s {
if unicode.IsDigit(r) {
return s[:i]
}
}
return s
}
func main() {
fmt.Println(DigitPrefix("E2 9NZ"))
fmt.Println(DigitPrefix("N29DZ"))
fmt.Println(DigitPrefix("EW29DZ"))
fmt.Println(DigitPrefix("WXYZ"))
}
Output:
E
N
EW
WXYZ
If there is no digit, example "WXYZ", and you don't want anything returned, change return s to return "".
Not sure why almost everyone provided answers in everything but Go. Here is regex-based Go version:
package main
import (
"fmt"
"regexp"
)
func main() {
pattern, err := regexp.Compile("^[^\\d]*")
if err != nil {
panic(err)
}
part := pattern.Find([]byte("EW29DZ"))
if part != nil {
fmt.Printf("Found: %s\n", string(part))
} else {
fmt.Println("Not found")
}
}
Running:
% go run main.go
Found: EW
Go playground
We don't need regex for this problem. You can easily walk through on a slice of rune and check the current character with unicode.IsDigit(), if it's a digit: return. If it isn't: continue the loop. If there are no numbers: return the argument
Code
package main
import (
"fmt"
"unicode"
)
func UntilDigit(r []rune) []rune {
var i int
for _, v := range r {
if unicode.IsDigit(v) {
return r[0:i]
}
i++
}
return r
}
func main() {
fmt.Println(string(UntilDigit([]rune("E2 9NZ"))))
fmt.Println(string(UntilDigit([]rune("N29DZ"))))
fmt.Println(string(UntilDigit([]rune("EW29DZ"))))
}
Playground link
I think the best option is to use the index returned from strings.IndexAny which will return the first index of any character in a string.
func BeforeNumbers(str string) string {
value := strings.IndexAny(str,"0123456789")
if value >= 0 && value <= len(str) {
return str[:value]
}
return str
}
Will slice the string and return the subslice up to (but not including) the first character that's in the string "0123456789" which is any number.
Way later edit:
It would probably be better to use IndexFunc rather than IndexAny:
func BeforeNumbers(str string) string {
indexFunc := func(r rune) bool {
return r >= '0' && r <= '9'
}
value := strings.IndexFunc(str,indexFunc)
if value >= 0 && value <= len(str) {
return str[:value]
}
return str
}
This is more or less equivalent to the loop version, and eliminates a search over a long string to check for a match every character from my previous answer. But I think it looks cleaner than the loop version, which is obviously a manner of taste.
The code below will continue grabbing characters until it reaches a digit.
int i = 0;
String string2test = "EW29DZ";
String stringOutput = "";
while (!Character.isDigit(string2test.charAt(i)))
{
stringOutput = stringOutput + string2test.charAt(i);
i++;
}