duplicate cache in golang - go

Trying to solve a simple duplicate check using an l1 cache using either of these 2 libraries https://github.com/patrickmn/go-cache or https://github.com/karlseguin/ccache
Several goroutines are executed which fetch a large amount of data from an upstream API. Inside of each, there is a for range loop which checks if the key is a hit or a miss. In case of a hit, the loop should skip and continue onto the next iteration. As you may have guessed, the cache should be concurrency safe. I also instantiate only 1 instance of this cache (x.cacheInstance).
for _, record := range data {
...
exists := x.cacheInstance.Get(record.DocumentID)
if exists == nil {
x.cacheInstance.Set(record.DocumentID, true, time.Minute * 10)
} else {
c.logger.Warnw("DocumentID duplicate located", "documentID", record.DocumentID)
continue
}
...
}
However, the application seems to hang, and not continue iterating over records. Commenting the lru cache check fixes the problem, allowing the program to run as expected.

Related

Inserting an empty map into Redis using HSET fails in Golang

I have some code to insert a map into Redis using the HSET command:
prefix := "accounts|asdfas"
data := make(map[string]string)
if _, err := conn.Do("HSET", redis.Args{}.Add(prefix).AddFlat(data)...); err != nil {
return err
}
If data has values in it then this will work but if data is empty then it will issue the following error:
ERR wrong number of arguments for 'hset' command
It seems that this is the result of the AddFlat function converting the map to an interleaved list of keys and their associated values. It makes sense that this wouldn't work when the map is empty but I'm not sure how to do deal with it. I'd rather not add an empty value to map but that's about all I can think to do. Is there a way to handle this that's more inline with how things are supposed to be done on Redis?
As a general rule of thumb, Redis doesn't allow and never keeps an empty data structure around (there is one exception to this tho).
Here's an example:
> HSET members foo bar
(integer) 1
> EXISTS members
(integer) 1
> HDEL members foo
(integer) 1
> EXISTS members
(integer) 0
As a result if you want to keep your data structures around, you have to at least have one member inside them. You can add a dummy item inside the Hash
and ignore it in your application logic but it may not work well with other data structures like List.

In leaky-bucket algorithm, when the queue is not full what's the correct logic to achieve fixed rate?

I am learning leaky bucket algorithm and want to get my hand dirty by writing some simple code with redis plus golang http.
When I searched here with the keyword redis, leaky, bucket. There are many similar questions as shown in [1], which is nice. However I find I have a problem to understand the entire logic after going through those threads and wiki[2]. I suppose there is something I do not understand and am also not aware of it. So I would like to rephrase it again here; and please correct me if I get it wrong.
The pseudo code:
key := "ip address, token or anything that can be the representative of a client"
redis_queue_size := 5
interval_between_each_request := 7
request := obtain_http_request_from_somewhere()
if check_current_queue_size() < redis_queue_size:
if is_queue_empty()
add_request_to_the_queue() // zadd "ip1" now() now() // now() is something like seconds, milliseconds or nanoseconds e.g. t = 1
process_request(request)
else
now := get_current_time()
// add_request_to_... retrieves the first element in the queue
// compute the expected timestamp to execute the request and its current time
// e.g. zadd "ip1" <time of the first elment in the queue + interval_between_each_request> now
add_request_to_redis_queue_with_timestamp(now, interval_between_each_request) // e.g. zadd "ip" <timestamp as score> <timestamp a request is allowed to be executed>
// Below function check_the_time_left...() will check how many time left at which the current request need to wait.
// For instance, the first request stored in the queue with the command
// zadd "ip1" 1 1 // t = 1
// and the second request arrives at t = 4 but it is allowed t be executed at t = 8
// zadd "ip1" 8 4 // where 4 := now, 8 := 1 + interval_between_each_request
// so the N will be 4
N := check_the_time_left_for_the_current_request_to_execute(now, interval_between_each_request)
sleep(N) // now the request wait for 4 seconds before processing the request
process_request(http_request_obj)
else
return // discard request
I understand the part when queue is full, then the following requests will be discarded. However I suppose I may misunderstand when the queue is not full, how to reshape the incoming request so it can be executed in a fixed rate.
I appreciate any suggestions
[1]. https://stackoverflow.com/search?q=redis+leaky+bucket+&s=aa2eaa93-a6ba-4e31-9a83-68f791c5756e
[2]. https://en.wikipedia.org/wiki/Leaky_bucket#As_a_queue
If this is for simple rate-limiting the sliding window approach using a sorted set is what we see implemented by most Redis users https://github.com/Redislabs-Solution-Architects/RateLimitingExample/blob/sliding_window/app.py
If you are set on leaky bucket you might consider using a redis stream per consumerID (apiToken/IP Address etc) as follows
request comes in for consumerID
XADD requests-[consumerID] MAXLEN [BUCKET SIZE]
spawn a go routine if necessary for that consumerID
get current time
if XLEN of requests-[consumerID] is 0 exit go routine
XREAD COUNT [number_of_requests_per_period] BLOCK [time period - 1 ms] STREAMS requests-[consumerID]
get the current time and sleep for the remainder of the time period
https://redis.io/commands#stream details how streams work
There are several ways you can implement a leaky bucket but there should be two separate parts for the process. One that puts things in the bucket and another that removes them at a set interval if there is anything to remove.
You can use a separate goroutine that would consume the messages at a set interval. This would simplify your code since on one code path you would only have to look into the queue size and drop packets and another code path would just consume whatever there is.

Better way to deal with nil slice indexes

I have started a project for creating reports by utilizing excel data and the various Go excel libraries (excelize, tealeg's xlsx)
One of the biggest frustrations I have found is working with slices which have some nil indexes depending on the source of data (blank rows in the input data transfer as "nil" slice indexes when I use the xlsx library to pull data)
These nil slice index throw an "index out of range" obviously if I ever try and utilize them in one of my many for loops - which leads me to the painstaking task of ensuring each time I want to work with a slice index that is isn't actually nil by using len() and cap to death()(excerpt of code below to illustrate)
//example code excerpt
for rowNumber, cellStringSlice := range inputSlice {
for rowColumn, cellString := range cellStringSlice {
//loop var declaration
rowColumnHeading := 2
rowNumberInc := rowNumber + 1
rowNumberDec := rowNumber - 1
if rowNumber > 0 {
if len(inputSlice[rowNumber]) != 0 { //len check to stop index out of range issue with slice
previousColACellValue = inputSlice[rowNumber][rowColumn]
continue
}
if len(inputSlice[rowNumber+1]) != 0 { //len check to stop index out of range issue with slice
nextColACellValue = inputSlice[rowNumber+1][rowColumn]
continue
}
}
}
I should specify that in this 2D slice I am using:
inputSlice[rowNumber][rowColumn]
the proximal slice (rowNumber) is never nil (there is always a row) however the second distal slice it indexes (rowColumn) Can be nil on some instances - which is why in this scenario my overall loop always enters the second inner loop even when it is iterating though a row with no column data (i.e inputSlice[rowNumber][rowColumn] = nil) and brings a frequent need for me to handle index out of range issues
I can't just remove all the nil indexes and shift everything up, as these are representing "blank rows" in the final excel doc I output these rows to.
So my question is, are there any useful go functions or libraries which take care of nil indexes by swapping all nils for "" in slices and 2d/3d slices of type string? Or is it a task for the programmer to always "sanitise" his slices by removing these nils or check for them each time they ever want to access an element?
I appreciate I could write a for loop myself to swap all these nils for a "", but writing a function to do this each time I work with slices of strings containing/possibly containing nil's would seem a little bizarre to me
Your outer loop is on inputSlice, so inputSlice[rowNumber] is always valid, and since the inner loop is on that row, it is never zero. Thus the first check is unnecessary. If you have a nil or empty slice for inputSlice[rowNumber], the inner for loop will not even be entered.
The second check is necessary, but wrong:
if len(inputSlice[rowNumber+1]) != 0 {
If rowNumber is the last row, then inputSlice[rowNumber+1] is not valid as no such row exists. You have to check:
if rowNumber<len(inputSlice) {
...
}

Is it possible to combine multiple SHA1 states to get a final state in Golang?

In Go1.13, I have an upload server. This server accepts 2 types of upload.
Chunked and Chunked+Threaded. On chunked uploads everything works expected. I calculate every chunk while they are writing to disk. User can upload multiple chunks one-by-one in good order.
This means, I can save every chunk's SHA1 state to disk using BinaryMarshaler, then read previous state and continue to calculate next chunks until I find final hash. Final hash gives me whole file's SHA1 perfectly.
When its ordered, I can append to existing state. But problem starts on threaded.... (Simultaneously)
hashComplete := sha256.New()
// read previous sttate from disk
state, err := ioutil.ReadFile(ctxPath)
if err != nil {
return err
}
if len(state) > 0 {
unmarshaler, _ := hashComplete.(encoding.BinaryUnmarshaler)
if err := unmarshaler.UnmarshalBinary(state); err != nil {
return err
}
}
// In here im writing file to disk and hash. file object is simple File.
writer := io.MultiWriter(file, hashComplete)
n, err := io.Copy(writer, src) // src is source (io.Reader)
marshaler, _ := hashComplete.(encoding.BinaryMarshaler)
newState, err := marshaler.MarshalBinary()
if err != nil {
return err
}
shaCtxFile.Write(newState) // Here im saving last state to disk.
// Then later, after upload finishes, I read this file and get the SHA1 hex from it. It is correct.
Now this is chunked upload in specific/good order. The other upload method is Chunked+Threaded. This mean, User can upload chunks simultaneously at the same time then send a request to concatenate them together in given order (at last request).
I already calculate each chunk's SHA1 and save it to disk.
My question is it is possible to combine those states and get the final hash or do I need to rehash after concatenate. Is there a way to combine those states?
Assuming you mean the final hash over the whole file, then no, you cannot combine multiple SHA-1 hashes over partial data to create a hash over the whole file, as if it was calculated at once. The reason for this is that the initial SHA-1 state is always the same, and rehashing will restart at that specific state. Furthermore, the final block is padded and a length is added (internal to the hash function) before the final hash value is calculated.
However, you can of course create a hash list or hash tree, where you define how big the blocks are. Then you can hash all the hashes over the chunk to create a topmost hash value. Now you have a different hash value than just the SHA-1 over the file, but the hash is consistent with your definition and can be recalculated, even in a multi-threaded fashion. It is still unique for the data within the file (assuming of course that put in the hash values sequentially) so it can be used to validate the integrity of the file. And, as far as I know, that's for normal secure hash function the only way to use multi-threaded hash calculation.
For more information, Google about Merkle-trees.
Of course, SHA-1 has been broken for collision resistance. Unfortunately, that's exactly what you are using it for. So please use SHA-256. If 256 bits is too much then using SHA-256 and taking the leftmost 160 bits is a more secure alternative.

What does an empty select do?

I found the following code in net/http/httptest and wonder what the empty select statement does in Go.
go s.Config.Serve(s.Listener)
if *serve != "" {
fmt.Fprintln(os.Stderr, "httptest: serving on", s.URL)
select {}
}
An empty select{} statement blocks forever. It is similar to an empty for{} statement.
On most (all?) supported Go architectures, the empty select will yield CPU. An empty for-loop won't, i.e. it will "spin" on 100% CPU.
On Mac OS X, in Go, for { } will cause the CPU% to max, and the process's STATE will be running
select { }, on the other hand, will not cause the CPU% to max, and the process's STATE will be sleeping
The empty select statement just blocks the current goroutine.
As for why you'd do this, here is one reason. This snippet is equivalent
if *serve != "" {
fmt.Fprintln(os.Stderr, "httptest: serving on", s.URL)
s.Config.Serve(s.Listener)
} else {
go s.Config.Serve(s.Listener)
}
It's better in that there isn't a wasted goroutine. It's worse in that now there is code repetition. The author optimized for less code repetition over a wasted resource. Note however the permanently block goroutine is trivial to detect and may have zero cost over the duplicating version.

Resources