Golang: optimal way of typing associative slices? - go

I'm parsing loads of HTTP logs pursing a goal tell how many requests each IP address generated.
The first thing I did is:
var hits = make(map[string]uint)
// so I could populate it with
hits[ipAddr]++
However, I would like to make it "typed", so that it would be immediately clear that hits[string]uint uses an IP address as a string identifier. I thought, well maybe a struct can help me:
type Hit struct {
IP string
Count uint
}
But that way (I think) I'm loosing the performance, because now I how to really look for specific Hit to increment it's count. I tolerate that I could be paranoid here, and could simple go for the loop:
var hits = make([]Hit)
// TrackHit just damn tracks it
func TrackHit(ip string) {
for hit, _ := range hits {
if hit.IP == ip {
hit.Count++
return
}
}
append(hits, Hit{
IP: ip,
Count: 0,
})
}
But that just looks ... suboptimal. I think everything that could be written in 1 line makes you shine as professional, and when 1 line turns into 13, I tend to feel "whaaa? Doing something wrong here, mom?"
Any typed one-liners here in Go?
Thanks

As Uvelichitel pointed out, you can use a typed string:
type IP string
var hits = make(map[IP]uint)
hits[IP("127.0.0.1")]++
Or you could use the existing stdlib IP type:
var hits = make(map[net.IP]uint)
hits[net.ParseIP("127.0.0.1")]++
Either would make it clear that you're referring to IPs, without the overhead introduced by looping over a slice of structs for every increment. The latter has the advantage of giving you full stdlib support for any other IP manipulation you need to do, and a more compact representation (4 bytes for IPv4 addresses instead of a 7-15 character UTF-8 string), at the cost of parsing the strings. Which one is better will depend on your specific use case.

Related

Pointer operation isn't changing its reference within a slice

I just started learning go. I have a question about pointers.
In the code below, the following line in the code doesn't do what I expect:
last_line.Next_line = &line // slice doesn't change
I want the slice to be changed as well, not only the local variable last_line.
What am I doing wrong?
type Line struct {
Text string
Prev_line *Line
Next_line *Line
}
var (
lines []Line
last_line *Line
)
for i, record := range records {
var prev_line *Line = nil
text := record[0]
if i > 0 {
prev_line = &lines[i-1]
}
line := Line{
Text: text,
Prev_line: prev_line,
Next_line: nil}
if last_line != nil {
last_line.Next_line = &line // slice doesn't change
}
lines = append(lines, line)
last_line = &line
}
Your Line type is a fairly standard-looking doubly linked list. Your lines variable holds a slice of these objects. Combining these two is a bit unusual—not wrong, to be sure, just unusual. And, as Matt Oestreich notes in a comment, we don't know quite what is in records (just that range can be used on it and that after doing so, we can use record[0] to get to a single string value), so there might be better ways to deal with things.
If records itself is a slice or has a sensible len, we can allocate a slice of Line instances all at once, of the appropriate size:
lines = make([]Line, len(records))
Here is a sample on the Go Playground that does it this way.
If we can't really get a suitable len—e.g., if records is a channel whose length is not really relevant—then we might indeed want to allocate individual lines, but in this case, it may be more sensible to avoid keeping them as a slice in the first place. The doubly linked list alone will suffice.
Finally, if you really do want both a slice and this doubly linked list, note that using append may copy the slice's elements to a new, larger slice. If and when it does so, the pointers in any elements you set up earlier will point into the old, smaller slice. This is not invalid in terms of the language itself—those objects still exist and your pointers are keeping them "alive"—but it may not be what you intended at all. In this case, it makes more sense to set all the pointers at the end, after building up the lines slice, just as in the sample code I provided.
(The sample I wrote is deliberately slightly weird in a way that is likely to get your homework or test grade knocked down a bit, if this was an attempt to cheat on homework or a test. :-) )

How to convert global enum values to string in Godot?

The "GlobalScope" class defines many fundamental enums like the Error enum.
I'm trying to produce meaningful logs when an error occurs. However printing a value of type Error only prints the integer, which is not very helpful.
The Godot documentation on enums indicates that looking up the value should work in a dictionary like fashion. However, trying to access Error[error_value] errors with:
The identifier "Error" isn't declared in the current scope.
How can I convert such enum values to string?
In the documentation you referenced, it explains that enums basically just create a bunch of constants:
enum {TILE_BRICK, TILE_FLOOR, TILE_SPIKE, TILE_TELEPORT}
# Is the same as:
const TILE_BRICK = 0
const TILE_FLOOR = 1
const TILE_SPIKE = 2
const TILE_TELEPORT = 3
However, the names of the identifiers of these constants only exist to make it easier for humans to read the code. They are replaced on runtime with something the machine can use, and are inaccessible later. If I want to print an identifier's name, I have to do so manually:
# Manually print TILE_FLOOR's name as a string, then its value.
print("The value of TILE_FLOOR is ", TILE_FLOOR)
So if your goal is to have descriptive error output, you should do so in a similar way, perhaps like so:
if unexpected_bug_found:
# Manually print the error description, then actually return the value.
print("ERR_BUG: There was a unexpected bug!")
return ERR_BUG
Now the relationship with dictionaries is that dictionaries can be made to act like enumerations, not the other way around. Enumerations are limited to be a list of identifiers with integer assignments, which dictionaries can do too. But they can also do other cool things, like have identifiers that are strings, which I believe you may have been thinking of:
const MyDict = {
NORMAL_KEY = 0,
'STRING_KEY' : 1, # uses a colon instead of equals sign
}
func _ready():
print("MyDict.NORMAL_KEY is ", MyDict.NORMAL_KEY) # valid
print("MyDict.STRING_KEY is ", MyDict.STRING_KEY) # valid
print("MyDict[NORMAL_KEY] is ", MyDict[NORMAL_KEY]) # INVALID
print("MyDict['STRING_KEY'] is ", MyDict['STRING_KEY']) # valid
# Dictionary['KEY'] only works if the key is a string.
This is useful in its own way, but even in this scenario, we assume to already have the string matching the identifier name explicitly in hand, meaning we may as well print that string manually as in the first example.
The naive approach I done for me, in a Singleton (in fact in a file that contain a lot of static funcs, referenced by a class_name)
static func get_error(global_error_constant:int) -> String:
var info := Engine.get_version_info()
var version := "%s.%s" % [info.major, info.minor]
var default := ["OK","FAILED","ERR_UNAVAILABLE","ERR_UNCONFIGURED","ERR_UNAUTHORIZED","ERR_PARAMETER_RANGE_ERROR","ERR_OUT_OF_MEMORY","ERR_FILE_NOT_FOUND","ERR_FILE_BAD_DRIVE","ERR_FILE_BAD_PATH","ERR_FILE_NO_PERMISSION","ERR_FILE_ALREADY_IN_USE","ERR_FILE_CANT_OPEN","ERR_FILE_CANT_WRITE","ERR_FILE_CANT_READ","ERR_FILE_UNRECOGNIZED","ERR_FILE_CORRUPT","ERR_FILE_MISSING_DEPENDENCIES","ERR_FILE_EOF","ERR_CANT_OPEN","ERR_CANT_CREATE","ERR_QUERY_FAILED","ERR_ALREADY_IN_USE","ERR_LOCKED","ERR_TIMEOUT","ERR_CANT_CONNECT","ERR_CANT_RESOLVE","ERR_CONNECTION_ERROR","ERR_CANT_ACQUIRE_RESOURCE","ERR_CANT_FORK","ERR_INVALID_DATA","ERR_INVALID_PARAMETER","ERR_ALREADY_EXISTS","ERR_DOES_NOT_EXIST","ERR_DATABASE_CANT_READ","ERR_DATABASE_CANT_WRITE","ERR_COMPILATION_FAILED","ERR_METHOD_NOT_FOUND","ERR_LINK_FAILED","ERR_SCRIPT_FAILED","ERR_CYCLIC_LINK","ERR_INVALID_DECLARATION","ERR_DUPLICATE_SYMBOL","ERR_PARSE_ERROR","ERR_BUSY","ERR_SKIP","ERR_HELP","ERR_BUG","ERR_PRINTER_ON_FIR"]
match version:
"3.4":
return default[global_error_constant]
# Regexp to use on #GlobalScope documentation
# \s+=\s+.+ replace by nothing
# (\w+)\s+ replace by "$1", (with quotes and comma)
printerr("you must check and add %s version in get_error()" % version)
return default[global_error_constant]
So print(MyClass.get_error(err)), or assert(!err, MyClass.get_error(err)) is handy
For non globals I made this, though it was not your question, it is highly related.
It would be useful to be able to access to #GlobalScope and #GDScript, maybe due a memory cost ?
static func get_enum_flags(_class:String, _enum:String, flags:int) -> PoolStringArray:
var ret := PoolStringArray()
var enum_flags := ClassDB.class_get_enum_constants(_class, _enum)
for i in enum_flags.size():
if (1 << i) & flags:
ret.append(enum_flags[i])
return ret
static func get_constant_or_enum(_class:String, number:int, _enum:="") -> String:
if _enum:
return ClassDB.class_get_enum_constants(_class, _enum)[number]
return ClassDB.class_get_integer_constant_list(_class)[number]

Efficient log parsing in golang

What would be an efficient (performance and readability) of parsing lines in a log file and extracting points of interest?
For example:
*** Time: 2/1/2019 13:51:00
17.965 Pump 10 hose FF price level 1 limit 0.0000 authorise pending (Type 00)
17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
38.791 Pump 10 delivery complete, Hose 1, price 72.9500, level 1, value 100.0000, volume 1.3700, v-total 8650924.3700, m-total 21885705.8800, T13:51:38
Things I need to extract are 10 (for pump 10), Price Level. Limit
The _PSTATE changes the values from the delivery completel line etc.
Currently I'm using a regular expression to capture each one and using capture groups. But it feels inefficient and there is quite a bit of duplication.
For example, I have a bunch of these:
reStateChange := regexp.MustCompile(`^(?P<offset>.*) Pump (?P<pump>\d{2}) State change (?P<oldstate>\w+_PSTATE) to (?P<newstate>\w+)_PSTATE`)
Then inside a while loop
if match := reStateChange.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if match := otherReMatch.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if strings.Contains(text, "*** Time:") {
}
It feels that there could be a much better way to do this. I would trade some performance for readability. The log files are only really 10MB max. Often smaller.
I'm after some suggestions on how to make this better in golang.
If all your log lines are similar to that sample you posted, they seem quite structured so regular expressions might be a bit overkill and hard to generalize.
Another option would be for you to transform each of those lines to a slice of strings ([]string) by using strings.Fields, or even strings.FieldFunc so that you can strip both white space and commas.
Then you can design an interface like:
type LogLineProcessor interface {
CanParse(line []string)
GetResultFrom(line []string) LogLineResult
}
Where LogLineResult is an struct containing the extracted information.
You can then define multiple structs with methods that implement LogLineProcessor (each implementation would look at specific positions on that []string to realize if it is a line it can process or not, like looking for the words "hose", "FF" and "price" in the positions it expects to find them).
The GetResultFrom implementations would also extract each data point from specific positions in the []string (it can rely on that information being there if it already determined it was one of the lines it can process).
You can create a var processors []LogLineProcessor, put all your processors in there and then just iterate that array:
line := strings.Fields(text)
for _, processor := range processors {
if processor.CanParse(line) {
result := processor.GetResultFrom(line)
// do whatever needed with the result
}
}

Why would you use fmt.Sprint?

I really don't understand the benefit of using fmt.Sprint compared to add strings together with +. Here is an example of both in use:
func main() {
myString := fmt.Sprint("Hello", "world")
fmt.Println(myString)
}
and
func main() {
myString := "Hello " + "World"
fmt.Println(myString)
}
What is the differences and benefits of each?
In your example there are no real differences as you are Sprintf to simply concaternate strings. That is indeed something which can be solved more easily by using the '+' operator.
Take the following example, where you want to print a clear error message like "Product with ID '42' could not be found.". How does that look with your bottom approach?
productID := 42;
myString := "Product with ID '" + productID + "' could not be found."
This would give an error (mismatched types string and int), because Go does not have support for concatenate different types together.
So you would have to transform the type to a string first.
productID := 42
myString := "Product with ID '" + strconv.Itoa(productID) + "' could not be found."
And, this you would have to do for every single data type other than strings.
The fmt package in Go and similar formatting packages in almost any other language solve this by helping you with the conversions and keeping your strings clear of mass '+' operators.
Here is how the example would look like using fmt
product := 42
myString := fmt.Sprintf("Product with ID '%d' could not be found.", product)
Here %d is the formatting verb for 'print the argument as a number'. See https://golang.org/pkg/fmt/#hdr-Printing the various other ways of printing other types.
Compared to concatenating fmt allows you to write your strings in a clear way, separating the template/text from the variables. And, it simplifies printing data types other than strings a lot.
fmt.Sprint is good for concatenation different types of its parameters as it uses reflection under the hood. So, if you need to concat strings - use "+", it's much faster, but if you need to contact number and your profit fmt.Sprint just like that:
message := fmt.Sprint(500, "internal server error")
If you call a function with concatenated string as argument, you will have to evaluate argument prior to call. Then if function chooses not to act on argument (think logging when log level is lower then needed for printing), you already incurred the overhead of concatenation.
Very similar to your example, in one case you do concatenation and in other not.
With high volume of those operations it may become noticeable. Again, logging is a good example.
In specific case of Sprint, it is not that relevant of course, but perhaps it's good to be consistent?
Most of the arguments have already been written, exclude one. Localization with Sprintf is much easier and has better defined roles between programmer and localizator (someone who speaks foreign language). Of course not each app really needs that. Let's choose:
s := fmt.Sprintf(t('%s is %d and comes from %s'), name, age, place)
or
s := name + t(' is ') + strconv.Itoa(age) + t(' and comes from ') + place
Translation of fragments of text is confusing. Also sprintf allows you formatting number etc
Like #Erwin (accepted answer) said, "mismatched types" is a problem with concatenation and using "strconv" seems overly complicated.
var unixTime = time.Now().Unix()
fmt.Println("This doesn't work: "+ string(unixTime))
fmt.Println("Unix timestamp, base 10: "+ strconv.FormatInt(unixTime, 10))
fmt.Println("Unix timestamp, Itoa: "+ strconv.Itoa(int(unixTime)))
// This looks cleaner, in my opinion...
fmt.Println("Unix timestamp, Sprint: "+ fmt.Sprint(unixTime))
Since web development usually involves concatenation of long strings that aren't going to stdout, I see Sprint as a useful tool.

Simple way of getting key depending on value from hashmap in Golang

Given a hashmap in Golang which has a key and a value, what is the simplest way of retrieving the key given the value?
For example Ruby equivalent would be
key = hashMap.key(value)
There is no built-in function to do this; you will have to make your own. Below is an example function that will work for map[string]int, which you can adapt for other map types:
func mapkey(m map[string]int, value int) (key string, ok bool) {
for k, v := range m {
if v == value {
key = k
ok = true
return
}
}
return
}
Usage:
key, ok := mapkey(hashMap, value)
if !ok {
panic("value does not exist in map")
}
The important question is: How many times will you have to look up the value?
If you only need to do it once, then you can iterate over the key, value pairs and keep the key (or keys) that match the value.
If you have to do the look up often, then I would suggest you make another map that has key, values reversed (assuming all keys map to unique values), and use that for look up.
I am in the midst of working on a server based on bitcoin and there is a list of constants and byte codes for the payment scripts. In the C++ version it has both identifiers with the codes and then another function that returns the string version. So it's really not much extra work to just take the original, with opcodes as string keys and the byte as value, and then reverse the order. The only thing that niggles me is duplicate keys on values. But since those are just true and false, overlapping zero and one, all of the first index of the string slice are the numbers and opcodes, and the truth values are the second index.
To iterate the list every time to identify the script command to execute would cost on average 50% of the map elements being tested. It's much simpler to just have a reverse lookup table. Executing the scripts has to be done maybe up to as much as 10,000 times on a full block so it makes no sense to save memory and pay instead in processing.

Resources