How to split a string but keep delimiters as separate elements - go

I have several strings that include various symbols like the following two examples:
z=y+x
#symbol
and I want to split the strings such that I have the resulting slices:
[z = y + x]
[# symbol]
A few things I've looked at and tried:
I've looked at this question but it seems as though golang doesn't support lookarounds.
I know this solution exists using strings.SplitAfter, but I'm looking to have the delimiters as separate elements.
I tried replacing the symbol (e.g. "+") with some variant (e.g. "~+~") and doing a split on the surrounding characters (e.g. "~"), but this solution is far from elegant and runs into problems if I need to do a conditional replacement depending on the symbol (which golang doesn't seem to support either).
Perhaps I've misunderstood some of the previous question and their respective solutions.

I used a modified version of Go's strings.Split implementation https://golang.org/src/strings/strings.go?s=7505:7539#L245
func Test(te *testing.T) {
t := tester.New(te)
t.Assert().Equal(splitCharsInclusive("z=y+x", "=+"), []string{"z", "=", "y", "+", "x"})
t.Assert().Equal(splitCharsInclusive("#symbol", "#"), []string{"", "#", "symbol"})
}
func splitCharsInclusive(s, chars string) (out []string) {
for {
m := strings.IndexAny(s, chars)
if m < 0 {
break
}
out = append(out, s[:m], s[m:m+1])
s = s[m+1:]
}
out = append(out, s)
return
}
This is limited to single characters to split on. And passing something like splitCharsInclusive("(z)(y)(x)", "()") might not get you the output you want, as you'd get a few empty strings in the response. But hopefully this is a good starting point for the modifications you need.
Also, Go's version that I've linked calculates the length of the output array in advance, this is a nice optimization that I've decided to omit, but would likely be good to add back.

Related

Kotlin map not working with List of String

I have been working on code where I have to generate all possible ways to the target string. I am using the below-mentioned code.
Print Statement:
println("---------- How Construct -------")
println("${
window.howConstruct("purple", listOf(
"purp",
"p",
"ur",
"le",
"purpl"
))
}")
Function Call:
fun howConstruct(
target: String,
wordBank: List<String>,
): List<List<String>> {
if (target.isEmpty()) return emptyList()
var result = emptyList<List<String>>()
for (word in wordBank) {
if (target.indexOf(word) == 0) { // Starting with prefix
val substring = target.substring(word.length)
val suffixWays = howConstruct(substring, wordBank)
val targetWays = suffixWays.map { way ->
val a = way.toMutableList().apply {
add(word)
}
a.toList()
}
result = targetWays
}
}
return result
}
Expected Output:-
[['purp','le'],['p','ur','p','le']]
Current Output:-
[]
Your code is almost working; only a couple of small changes are needed to get the required output:
If the target is empty, return listOf(emptyList()) instead of emptyList().
Use add(0, word) instead of add(word).
The first of those changes is the important one. Your function returns a list of matches; and since each match is itself a list of strings, it returns a list of lists of strings. Once your code has matched the entire target and calls itself one last time, it returned an empty list — i.e. no matches — instead of a list containing an empty list — meaning one match with no remaining strings.
The second change simply fixes the order of strings within each match, which was reversed (because it appended the prefix after the returned suffix match).
However, there are many others ways that code could be improved. Rather than list them all individually, it's probably easier to give an alternative version:
fun howConstruct(target: String, wordBank: List<String>
): List<List<String>>
= if (target == "") listOf(emptyList())
else wordBank.filter{ target.endsWith(it) } // Look for suffixes of the target in the word bank
.flatMap { suffix: String ->
howConstruct(target.removeSuffix(suffix), wordBank) // For each, recurse to search the rest
.map{ it + suffix } } // And append the suffix to each match.
That does almost exactly the same as your code, except that it searches from the end of the string — matching suffixes — instead of from the beginning. The result is the same; the main benefit is that it's simpler to append a suffix string to a partial match list (using +) than to prepend a prefix (which is quite messy, as you found).
However, it's a lot more concise, mainly because it uses a functional style — in particular, it uses filter() to determine which words are valid suffixes, and flatMap() to collate the list of matches corresponding to each one recursively, as well as map() to append the suffix to each one (like your code does). That avoids all the business of looping over lists, creating lists, and adding to them. As a result, it doesn't need to deal with mutable lists or variables, avoiding some sources of confusion and error.
I've written it as an expression body (with = instead of { … }) for simplicity. I find that's simpler and clearer for short functions — this one is about the limit, though. It might fit as it an extension function on String, since it's effectively returning a transformation of the string, without any side-effects — though again, that tends to work best on short functions.
There are also several small tweaks. It's a bit simpler — and more efficient — to use startsWith() or endsWith() instead of indexOf(); removePrefix() or removeSuffix() is arguably slightly clearer than substring(); and I find == "" clearer than isEmpty().
(Also, the name howConstruct() doesn't really describe the result very well, but I haven't come up with anything better so far…)
Many of these changes are of course a matter of personal preference, and I'm sure other developers would write it in many other ways! But I hope this has given some ideas.

Pointer operation isn't changing its reference within a slice

I just started learning go. I have a question about pointers.
In the code below, the following line in the code doesn't do what I expect:
last_line.Next_line = &line // slice doesn't change
I want the slice to be changed as well, not only the local variable last_line.
What am I doing wrong?
type Line struct {
Text string
Prev_line *Line
Next_line *Line
}
var (
lines []Line
last_line *Line
)
for i, record := range records {
var prev_line *Line = nil
text := record[0]
if i > 0 {
prev_line = &lines[i-1]
}
line := Line{
Text: text,
Prev_line: prev_line,
Next_line: nil}
if last_line != nil {
last_line.Next_line = &line // slice doesn't change
}
lines = append(lines, line)
last_line = &line
}
Your Line type is a fairly standard-looking doubly linked list. Your lines variable holds a slice of these objects. Combining these two is a bit unusual—not wrong, to be sure, just unusual. And, as Matt Oestreich notes in a comment, we don't know quite what is in records (just that range can be used on it and that after doing so, we can use record[0] to get to a single string value), so there might be better ways to deal with things.
If records itself is a slice or has a sensible len, we can allocate a slice of Line instances all at once, of the appropriate size:
lines = make([]Line, len(records))
Here is a sample on the Go Playground that does it this way.
If we can't really get a suitable len—e.g., if records is a channel whose length is not really relevant—then we might indeed want to allocate individual lines, but in this case, it may be more sensible to avoid keeping them as a slice in the first place. The doubly linked list alone will suffice.
Finally, if you really do want both a slice and this doubly linked list, note that using append may copy the slice's elements to a new, larger slice. If and when it does so, the pointers in any elements you set up earlier will point into the old, smaller slice. This is not invalid in terms of the language itself—those objects still exist and your pointers are keeping them "alive"—but it may not be what you intended at all. In this case, it makes more sense to set all the pointers at the end, after building up the lines slice, just as in the sample code I provided.
(The sample I wrote is deliberately slightly weird in a way that is likely to get your homework or test grade knocked down a bit, if this was an attempt to cheat on homework or a test. :-) )

How to convert global enum values to string in Godot?

The "GlobalScope" class defines many fundamental enums like the Error enum.
I'm trying to produce meaningful logs when an error occurs. However printing a value of type Error only prints the integer, which is not very helpful.
The Godot documentation on enums indicates that looking up the value should work in a dictionary like fashion. However, trying to access Error[error_value] errors with:
The identifier "Error" isn't declared in the current scope.
How can I convert such enum values to string?
In the documentation you referenced, it explains that enums basically just create a bunch of constants:
enum {TILE_BRICK, TILE_FLOOR, TILE_SPIKE, TILE_TELEPORT}
# Is the same as:
const TILE_BRICK = 0
const TILE_FLOOR = 1
const TILE_SPIKE = 2
const TILE_TELEPORT = 3
However, the names of the identifiers of these constants only exist to make it easier for humans to read the code. They are replaced on runtime with something the machine can use, and are inaccessible later. If I want to print an identifier's name, I have to do so manually:
# Manually print TILE_FLOOR's name as a string, then its value.
print("The value of TILE_FLOOR is ", TILE_FLOOR)
So if your goal is to have descriptive error output, you should do so in a similar way, perhaps like so:
if unexpected_bug_found:
# Manually print the error description, then actually return the value.
print("ERR_BUG: There was a unexpected bug!")
return ERR_BUG
Now the relationship with dictionaries is that dictionaries can be made to act like enumerations, not the other way around. Enumerations are limited to be a list of identifiers with integer assignments, which dictionaries can do too. But they can also do other cool things, like have identifiers that are strings, which I believe you may have been thinking of:
const MyDict = {
NORMAL_KEY = 0,
'STRING_KEY' : 1, # uses a colon instead of equals sign
}
func _ready():
print("MyDict.NORMAL_KEY is ", MyDict.NORMAL_KEY) # valid
print("MyDict.STRING_KEY is ", MyDict.STRING_KEY) # valid
print("MyDict[NORMAL_KEY] is ", MyDict[NORMAL_KEY]) # INVALID
print("MyDict['STRING_KEY'] is ", MyDict['STRING_KEY']) # valid
# Dictionary['KEY'] only works if the key is a string.
This is useful in its own way, but even in this scenario, we assume to already have the string matching the identifier name explicitly in hand, meaning we may as well print that string manually as in the first example.
The naive approach I done for me, in a Singleton (in fact in a file that contain a lot of static funcs, referenced by a class_name)
static func get_error(global_error_constant:int) -> String:
var info := Engine.get_version_info()
var version := "%s.%s" % [info.major, info.minor]
var default := ["OK","FAILED","ERR_UNAVAILABLE","ERR_UNCONFIGURED","ERR_UNAUTHORIZED","ERR_PARAMETER_RANGE_ERROR","ERR_OUT_OF_MEMORY","ERR_FILE_NOT_FOUND","ERR_FILE_BAD_DRIVE","ERR_FILE_BAD_PATH","ERR_FILE_NO_PERMISSION","ERR_FILE_ALREADY_IN_USE","ERR_FILE_CANT_OPEN","ERR_FILE_CANT_WRITE","ERR_FILE_CANT_READ","ERR_FILE_UNRECOGNIZED","ERR_FILE_CORRUPT","ERR_FILE_MISSING_DEPENDENCIES","ERR_FILE_EOF","ERR_CANT_OPEN","ERR_CANT_CREATE","ERR_QUERY_FAILED","ERR_ALREADY_IN_USE","ERR_LOCKED","ERR_TIMEOUT","ERR_CANT_CONNECT","ERR_CANT_RESOLVE","ERR_CONNECTION_ERROR","ERR_CANT_ACQUIRE_RESOURCE","ERR_CANT_FORK","ERR_INVALID_DATA","ERR_INVALID_PARAMETER","ERR_ALREADY_EXISTS","ERR_DOES_NOT_EXIST","ERR_DATABASE_CANT_READ","ERR_DATABASE_CANT_WRITE","ERR_COMPILATION_FAILED","ERR_METHOD_NOT_FOUND","ERR_LINK_FAILED","ERR_SCRIPT_FAILED","ERR_CYCLIC_LINK","ERR_INVALID_DECLARATION","ERR_DUPLICATE_SYMBOL","ERR_PARSE_ERROR","ERR_BUSY","ERR_SKIP","ERR_HELP","ERR_BUG","ERR_PRINTER_ON_FIR"]
match version:
"3.4":
return default[global_error_constant]
# Regexp to use on #GlobalScope documentation
# \s+=\s+.+ replace by nothing
# (\w+)\s+ replace by "$1", (with quotes and comma)
printerr("you must check and add %s version in get_error()" % version)
return default[global_error_constant]
So print(MyClass.get_error(err)), or assert(!err, MyClass.get_error(err)) is handy
For non globals I made this, though it was not your question, it is highly related.
It would be useful to be able to access to #GlobalScope and #GDScript, maybe due a memory cost ?
static func get_enum_flags(_class:String, _enum:String, flags:int) -> PoolStringArray:
var ret := PoolStringArray()
var enum_flags := ClassDB.class_get_enum_constants(_class, _enum)
for i in enum_flags.size():
if (1 << i) & flags:
ret.append(enum_flags[i])
return ret
static func get_constant_or_enum(_class:String, number:int, _enum:="") -> String:
if _enum:
return ClassDB.class_get_enum_constants(_class, _enum)[number]
return ClassDB.class_get_integer_constant_list(_class)[number]

Why would you use fmt.Sprint?

I really don't understand the benefit of using fmt.Sprint compared to add strings together with +. Here is an example of both in use:
func main() {
myString := fmt.Sprint("Hello", "world")
fmt.Println(myString)
}
and
func main() {
myString := "Hello " + "World"
fmt.Println(myString)
}
What is the differences and benefits of each?
In your example there are no real differences as you are Sprintf to simply concaternate strings. That is indeed something which can be solved more easily by using the '+' operator.
Take the following example, where you want to print a clear error message like "Product with ID '42' could not be found.". How does that look with your bottom approach?
productID := 42;
myString := "Product with ID '" + productID + "' could not be found."
This would give an error (mismatched types string and int), because Go does not have support for concatenate different types together.
So you would have to transform the type to a string first.
productID := 42
myString := "Product with ID '" + strconv.Itoa(productID) + "' could not be found."
And, this you would have to do for every single data type other than strings.
The fmt package in Go and similar formatting packages in almost any other language solve this by helping you with the conversions and keeping your strings clear of mass '+' operators.
Here is how the example would look like using fmt
product := 42
myString := fmt.Sprintf("Product with ID '%d' could not be found.", product)
Here %d is the formatting verb for 'print the argument as a number'. See https://golang.org/pkg/fmt/#hdr-Printing the various other ways of printing other types.
Compared to concatenating fmt allows you to write your strings in a clear way, separating the template/text from the variables. And, it simplifies printing data types other than strings a lot.
fmt.Sprint is good for concatenation different types of its parameters as it uses reflection under the hood. So, if you need to concat strings - use "+", it's much faster, but if you need to contact number and your profit fmt.Sprint just like that:
message := fmt.Sprint(500, "internal server error")
If you call a function with concatenated string as argument, you will have to evaluate argument prior to call. Then if function chooses not to act on argument (think logging when log level is lower then needed for printing), you already incurred the overhead of concatenation.
Very similar to your example, in one case you do concatenation and in other not.
With high volume of those operations it may become noticeable. Again, logging is a good example.
In specific case of Sprint, it is not that relevant of course, but perhaps it's good to be consistent?
Most of the arguments have already been written, exclude one. Localization with Sprintf is much easier and has better defined roles between programmer and localizator (someone who speaks foreign language). Of course not each app really needs that. Let's choose:
s := fmt.Sprintf(t('%s is %d and comes from %s'), name, age, place)
or
s := name + t(' is ') + strconv.Itoa(age) + t(' and comes from ') + place
Translation of fragments of text is confusing. Also sprintf allows you formatting number etc
Like #Erwin (accepted answer) said, "mismatched types" is a problem with concatenation and using "strconv" seems overly complicated.
var unixTime = time.Now().Unix()
fmt.Println("This doesn't work: "+ string(unixTime))
fmt.Println("Unix timestamp, base 10: "+ strconv.FormatInt(unixTime, 10))
fmt.Println("Unix timestamp, Itoa: "+ strconv.Itoa(int(unixTime)))
// This looks cleaner, in my opinion...
fmt.Println("Unix timestamp, Sprint: "+ fmt.Sprint(unixTime))
Since web development usually involves concatenation of long strings that aren't going to stdout, I see Sprint as a useful tool.

Sorting strings containing numbers in a user friendly way

Being used to the standard way of sorting strings, I was surprised when I noticed that Windows sorts files by their names in a kind of advanced way. Let me give you an example:
Track1.mp3
Track2.mp3
Track10.mp3
Track20.mp3
I think that those names are compared (during sorting) based on letters and by numbers separately.
On the other hand, the following is the same list sorted in a standard way:
Track1.mp3
Track10.mp3
Track2.mp3
Track20.mp3
I would like to create a comparing alogorithm in Delphi that would let me sort strings in the same way. At first I thought it would be enough to compare consecutive characters of two strings while they are letters. When a digit would be found at some position of both the strings, I would read all digits following them to form a number and then compare the numbers.
To give you an example, I'll compare "Track10" and "Track2" strings this way:
1) read characters while they are equal and while they are letters: "Track", "Track"
2) if a digit is found, read all following digits: "10", "2"
2a) if they are equal, go to 1 or else finish
Ten is greater than two, so "Track10" is greater than "Track2"
It had seemed that everything would be all right until I noticed, during my tests, that Windows considered "Track010" lower than "Track10", while I thought the first one was greater as it was longer (not mentioning that according to my algorithm both the strings would be equal, which is wrong).
Could you provide me with the idea how exactly Windows sorts files by names or maybe you have a ready-to-use algorithm (in any programming language) that I could base on?
Thanks a lot!
Mariusz
Jeff wrote up an article about this on Coding Horror. This is called natural sorting, where you effectively treat a group of digits as a single "character". There are implementations out there in every language under the sun, but strangely it's not usually built-in to most languages' standard libraries.
The mother of all sorts:
ls '*.mp3' | sort --version-sort
The absolute easiest way, I found, was isolate the string you want, so in the OP's case, Path.GetFileNameWithoutExtension(), remove the non-digits, convert to int, and sort. Using LINQ and some extension methods, it's a one-liner. In my case, I was going on directories:
Directory.GetDirectories(#"a:\b\c").OrderBy(x => x.RemoveNonDigits().ToIntOrZero())
Where RemoveNonDigits and ToIntOrZero are extensions methods:
public static string RemoveNonDigits(this string value) {
return Regex.Replace(value, "[^0-9]", string.Empty);
}
public static int ToIntOrZero(this string toConvert) {
try {
if (toConvert == null || toConvert.Trim() == string.Empty) return 0;
return int.Parse(toConvert);
} catch (Exception) {
return 0;
}
}
The extension methods are common tools I use everywhere. YMMV.
Here's a Python approach:
import re
def tryint(s):
"""
Return an int if possible, or `s` unchanged.
"""
try:
return int(s)
except ValueError:
return s
def alphanum_key(s):
"""
Turn a string into a list of string and number chunks.
>>> alphanum_key("z23a")
["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def human_sort(l):
"""
Sort a list in the way that humans expect.
"""
l.sort(key=alphanum_key)
And a blog post with more detail: https://nedbatchelder.com/blog/200712/human_sorting.html

Resources