NATS KV history larger than specified when creating bucket - go

We are using nats with KeyValue store feature (nats KV). We develop go microservices and use the nats go client. We try to leverage the history feature of nats KV with no success yet.
Certain times using nats, we retrieve a larger history than the history specified when creating the KV.
We create the KV using :
kv, _ := js.CreateKeyValue(&nats.KeyValueConfig{
Bucket: "some-bucket",
Description: "store for some-service",
MaxValueSize: 0,
History: 10, // should we ever get more than 10 elements when reading history ?
TTL: TTL,
MaxBytes: 5000000,
Storage: nats.MemoryStorage,
Replicas: 0,
Placement: nil,
})
and we retrieve values using
kv.History("someId")
When we get results larger than the specified History, we get several KeyValueEntrys with the same delta value.
We are quite write intensive, and also reuse quite a lot the same key id :
we write values until a certain point,
call kv.Purge("someId")
and then we may reuse "someId" later on in the process.
Writes and read are asynchronous and concurrent.
Here is our client go.mod regarding nats:
github.com/nats-io/nats-server/v2 v2.8.4
github.com/nats-io/nats.go v1.16.0
and we run a nats server version 2.8.4.
note : I did not go far enough in the KV implementation details but I am worried that this is linked with jetstream. It seems like a watcher is created each time and re-reads all previous values regardless of history size. It leads me to another question : is the kv history feature appropriate for read intensive use cases ?
Thanks for your help or pointers on this matter.

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

clear prometheus metrics from collector

I'm trying to modify prometheus mesos exporter to expose framework states:
https://github.com/mesos/mesos_exporter/pull/97/files
A bit about mesos exporter - it collects data from both mesos /metrics/snapshot endpoint, and /state endpoint.
The issue with the latter, both with the changes in my PR and with existing metrics reported on slaves, is that metrics created lasts for ever (until exporter is restarted).
So if for example a framework was completed, the metrics reported for this framework will be stale (e.g. it will still show the framework is using CPU).
So I'm trying to figure out how I can clear those stale metrics. If I could just clear the entire mesosStateCollector each time before collect is done it would be awesome.
There is a delete method for the different p8s vectors (e.g. GaugeVec), but in order to delete a metric, I need to not only the label name, but also the label value for the relevant metric.
Ok, so seems it was easier than I thought (if only I was familiar with go-lang before approaching this task).
Just need to cast the collector to GaugeVec and reset it:
prometheus.NewGaugeVec(prometheus.GaugeOpts{
Help: "Total slave CPUs (fractional)",
Namespace: "mesos",
Subsystem: "slave",
Name: "cpus",
}, labels): func(st *state, c prometheus.Collector) {
c.(*prometheus.GaugeVec).Reset() ## <-- added this for each GaugeVec
for _, s := range st.Slaves {
c.(*prometheus.GaugeVec).WithLabelValues(s.PID).Set(s.Total.CPUs)
}
},

golang get kubernetes resources(30000+ configmaps) failed

I want to use client-go to get resources in Kubernetes cluster. Due to a large amount of data, when I get the configmap connection is closed.
stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 695; INTERNAL_ERROR
configmaps:
$ kubectl -n kube-system get cm |wc -l
35937
code:
cms, err := client.CoreV1().ConfigMaps(kube-system).List(context.TODO(), v1.ListOptions{})
I try to use Limit parameter, I can get some data, but I don’t know how to get all.
cms, err := client.CoreV1().ConfigMaps(kube-system).List(context.TODO(), v1.ListOptions{Limit: 1000 })
I'm new to Go. Any pointers as to how to go about it would be greatly appreciated.
The documentation for v1.ListOptions describes how it works:
limit is a maximum number of responses to return for a list call. If more items exist, the
server will set the continue field on the list metadata to a value that can be used with the
same initial query to retrieve the next set of results.
This means that you should examine the response, save the value of the continue field (as well as the actual results), then reissue the same command but with continue set to the just seen value. Repeat until the returned continue field is empty (or an error occurs).
See the API concepts page for details on handling chunking of large results.
You should use a ListPager to paginate requests that need to query many objects. The ListPager includes buffering pages, so it has improved performance over simply using the Limit and Continue values.

Traces not getting sampled in jaeger tracing

I'm new to using Jaeger tracing system and have been trying to implement it for a flask based microservices architecture. Below is my jaeger client config implemented in python:
config = Config(
config = {
'sampler': {
'type': 'const',
'param': 1,
},
'logging': True,
'reporter_batch_size': 1,
},
service_name=service,
)
I read somewhere that Sampling strategy is being used to sample the number of traces especially for the trace which doesn't have any metadata. So as per this config, does it mean that I'm sampling each and every trace or just the few traces randomly? Mysteriously, when I'm passing random inputs to create spans for my microservices, the spans are getting generated only after 4 to 5 minutes. I would like to understand this configuration spec more but not able to.
So as per this config, does it mean that I'm sampling each and every trace or just the few traces randomly?
Using the sampler type as const with 1 as the value means that you are sampling everything.
Mysteriously, when I'm passing random inputs to create spans for my microservices, the spans are getting generated only after 4 to 5 minutes. I would like to understand this configuration spec more but not able to.
There are several things that might be happening. You might not be closing spans, for instance. I recommend reading the following two blog posts to try to understand what might be happening:
Help! Something is wrong with my Jaeger installation!
The life of a span

Algorithm for checking timeouts

I have a finite set of tasks that need to be completed by clients. Clients get assigned a task on connection, and keep getting new tasks after they finished the previous task. Each tasks need to be completed by 3 unique clients. This makes sure that clients do not give wrong results to the tasks.
However, I don't want clients to take longer than 3000ms. As some tasks are dependent of each other, this could stall the progress.
The problem is that i'm having trouble checking timeouts of tasks - which should be done when no free tasks are available.
At this moment each tasks has a property called assignedClients which looks as follows:
assignedClients: [
{
client: Client,
start: Date,
completed: true
},
{
client: Client,
start: Date,
completed: true
},
{
client: Client,
start: Date,
completed: false
}
]
All tasks (roughly 1000) are stored in a single array. Basically, when a client needs a new task, the pseudo-code is like this:
function onTaskRequest:
for (task in tasks):
if (assignedClients < 3)
assignClientToTask(task, client)
return
// so no available tasks
for (task in tasks):
for (client in assignedClients):
if (client.completed === false && Time.now() - client.start > 3000):
removeOldClientFromAssignedClients()
assignClientToTask(task, client)
But this seems very inefficient. Is there an algorithm that is more effective?
What you want to do is store tasks in a priority queue (which is often implemented as a heap) by when they are available with the oldest first. When a client needs a new task you just peek at the top of the queue. If it can be scheduled at all, it can be scheduled on that task.
When the task is inserted it is given now as its priority. When you fill the task list up, you put it in at a timing that is the expiry of the oldest client to grab it.
If you're using a heap, then all operations should be no worse than O(log(n)) as compared to your current O(n) implementation.
Your data structure looks like JSON, in which case https://github.com/adamhooper/js-priority-queue is the first JavaScript implementation of a priority queue that turned up when I looked in Google. Your pseudocode looks like Python in which case https://docs.python.org/3/library/heapq.html is in the standard library. If you can't find an implementation in your language, https://en.wikipedia.org/wiki/Heap_(data_structure) should be able to help you figure out how to implement it.

Resources