Immediately Display New Metrics - metrics

I am using graphite and coda hale metrics to try and track the number of times particular API's are called and also the top 10 callers. I have assigned a metric to each user who calls the API and use graphite to bring back the top 10.
The problem is, if it is a new user - ie a new metric, this will only be displayed in Graphite when the tool is refreshed - Has anyone come across a work around for this ? Is there some way Graphite can automatically detect new meters?
Just to be clear - I can see the top ten API callers for the last 30 minutes.........unless it is a brand new user that has never logged in before.

It seems that graphite-web uses an on disk index generated by a glorified find command. Another script is available so you can run it as cron to update the metric index file.
Whenever you update the index file, graphite-web process will detect it and reload it.
Since reloading the index might be heavy for large (1M) number of metrics, I would advise to modify the update script a bit to conditionnaly update the file (only if different for instance).
EDIT: after test, graphite does not seem to call the reloading code

Related

Flow Triggering Itself(Possibly), Each run hits past IDs that were edited

I am pretty new to power automate. I created a flow that triggers when an item is created or modified. It initializes some variables and then does some switch cases to assign values to each of them. The variables then go into an array and another variable is incremented to get the total of the array. I then have a conditional to assign a value to a column in the list. I tested the flow specifically going into the modern view of the list and clicking the save button. This worked a bunch of times and I sent it for user testing. One of the users edited multiple items by double clicking into the item which saves after each column change(which I assume triggers a run of the flow)
The flow seemingly works but seemed to get bogged down at a point based on run history. I let it sit overnight and then tested again and now it shows runs from multiple IDs at a time even though I only edited one specific one.
I had another developer take a look at my flow and he could not spot anything wrong with it and it never had a hard error in testing only warnings about conditionals causing a loop but all my conditionals rectify. Pictures included. I am just not sure of any caveats I might be missing.
I am currently letting the flow sit to see if it finishes getting caught up. I read about the concurrent run option as well as conditions on the trigger itself. I am curious as to why it seems to run on two records(or more) all at once without me or anyone editing each one.
You might be able to ignore the updates from the service account/account which is used in the connection of the actions by using the following trigger condition expression:
#not(equals(triggerOutputs()?['body/Editor/Claims'], 'i:0#.f|membership|johndoe#contoso.onmicrosoft.com'))

ElasticSearch delete_by_query: Trying to create too many scroll contexts

I am trying to fix this error while running delete_by_query API on AWS ElasticSearch:
Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.
After going through few posts I have some basic idea as to what scroll-context is, however, my knowledge of "open scroll contexts" still remains murky.
I have a few questions regarding my understanding:
Does it mean that the API will open up a scroll context for the specified time (scroll = some period) and after processing (within the stipulated scroll time), open up a new context?
If so, do the previously processed contexts remain open till the API terminates?
I have 4 java EC2 instances running each of will execute delete_by_query API. Can this also cause too many scroll contexts to remain open? or it's unrelated?
Please do shed some light if there's anything lacking.
Coming to fixing the error:
The straight forward solution would be to increase the search.max_open_scroll_context parameter, however it has negative side effects as mentioned in this document's tip/note section.
Are there any other solutions?
Can increasing the batch size per scroll help?
Edit: The EC2 instances (2 in east-1 and 2 in west-2) are running a spring application at a frequency of 1s (this frequency is deliberate and can't be changed due to some restrictions), listening to SQS (in corresponding regions) for messages, and delete_by_query will use act upon these messages (delete based on some parameter from the message received).
Note: The SQS has considerable amount of data coming in.

In Firebase Database, how to read a large list then get child updates, efficiently

I want to load a large list 5000 items, say, from a Firebase Database as fast as possible and then get any updates when a child is added or changed.
My approach so far is use ObserveSingleEvent to load the entire list at startup
pathRef.ObserveSingleEvent(DataEventType.Value, (snapshot) =>
and when that has finished use ObserveEvent for child changes and additions:
nuint observerHandle1 = pathRef.ObserveEvent(DataEventType.ChildChanged, (snapshot) =>
nuint observerHandle2 = pathRef.ObserveEvent(DataEventType.ChildAdded, (snapshot) =>
The problem is the ChildAdded event is triggered 5000 times, all of which are usually unnecessary and it takes a long time to complete.
(From the docs: "child_added is triggered once for each existing child and then again every time a new child is added to the specified path)
And I can't just ignore the first update for each child, because it might be a genuine change (eg database updated just after the ObserveSingleEvent completed).
So I have to check each item on the client and see if it has changed. This is all very slow and inefficient.
I've seen posts suggesting to use a query with LimitToLast etc, but this is not reliable as child changes may be missed.
My Plan B is to not do ObserveSingleEvent to load the list and just rely on the 5000 child added calls, but that seems crazy and is slower to get the list initially, and probably uses more of the user's data allowance.
I would think this is a common scenario, so what is the best solution?
Also, I have persistence enabled, so a couple of related questions:
1) If the entire list has been loaded before, does ObserveSingleEvent with Value just load from disk and there's no network use, when online (but no changes have happened since last download) or does it download the whole list again?
2) Similarly with the 5000 calls to child added - if it's all on disk, is there any network use, when online if no changes?
(I'm using C# for iOS, but I'm sure its the same issue for other platforms)
The wire traffic for a value listener and for a child_ listener is exactly the same. The events are purely a client-side interpretation of the underlying data.
If you need all children, consider using only the child_* listeners and no pathRef.ObserveSingleEvent(DataEventType.Value. It'll lead to the simplest code.

How to deal with Elasticsearch index delay

Here's my scenario:
I have a page that contains a list of users. I create a new user through my web interface and save it to the server. The server indexes the document in elasticsearch and returns successfully. I am then redirected to the list page which doesn't contain the new user because it can take up to 1-second for documents to become available for search in elasticsearch
Near real-time search in elasticsearch.
The elasticsearch guide says you can manually refresh the index, but says not to do it in production.
...don’t do a manual refresh every time you index a document in production; it will hurt your performance. Instead, your application needs to be aware of the near real-time nature of Elasticsearch and make allowances for it.
I'm wondering how other people get around this? I wish there was an event or something I could listen for that would tell me when the document was available for search but there doesn't appear to be anything like that. Simply waiting for 1-second is plausible but it seems like a bad idea because it presumably could take much less time than that.
Thanks!
Even though you can force ES to refresh itself, you've correctly noticed that it might hurt performance. One solution around this and what people often do (myself included) is to give an illusion of real-time. In the end, it's merely a UX challenge and not really a technical limitation.
When redirecting to the list of users, you could artificially include the new record that you've just created into the list of users as if that record had been returned by ES itself. Nothing prevents you from doing that. And by the time you decide to refresh the page, the new user record would be correctly returned by ES and no one cares where that record is coming from, all the user cares about at that moment is that he wants to see the new record that he's just created, simply because we're used to think sequentially.
Another way to achieve this is by reloading an empty user list skeleton and then via Ajax or some other asynchronous way, retrieve the list of users and display it.
Yet another way is to provide a visual hint/clue on the UI that something is happening in the background and that an update is to be expected very shortly.
In the end, it all boils down to not surprise users but to give them enough clues as to what has happened, what is happening and what they should still expect to happen.
UPDATE:
Just for completeness' sake, this answer predates ES5, which introduced a way to make sure that the indexing call would not return until the document is either visible when searching the index or return an error code. By using ?refresh=wait_for when indexing your data you can be certain that when ES responds, the new data will be indexed.
Elasticsearch 5 has an option to block an indexing request until the next refresh ocurred:
?refresh=wait_for
See: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docs-refresh.html#docs-refresh
Here is a fragment of code which is what I did in my Angular application to cope with this. In the component:
async doNewEntrySave() {
try {
const resp = await this.client.createRequest(this.doc).toPromise();
this.modeRefreshDelay = true;
setTimeout(() => {
this.modeRefreshDelay = false;
this.refreshPage();
}, 2500);
} catch (err) {
this.error.postError(err);
}
}
In the template:
<div *ngIf="modeRefreshDelay">
<h2>Waiting for update ...</h2>
</div>
I understand this is a quick-and-dirty solution but it illustrates how the user experience should work. Obviously it breaks if the real-world latency turns out to be more than 2.5 seconds. A fancier version would loop until the new record showed up in the page delay (with a limit of course).
Unless you completely redesign ElasticSearch you will always have some latency between the successful index operation and the time when that document shows up in search results.
Data should be available immediately after indexing is complete. Couple of general questions:
Have you checked CPU and RAM to determine whether you are taxing your ES cluster? If so, you may need to beef up your hardware config to account for it. ES loves RAM!
Are you using NAS (network-attached-storage) or virtualized storage like EBS? Elastic recommends not doing so because of the latency. If you can use DAS (direct-attached) and SSD, you'll be in much, much better shape.
To give you an AWS example, moving from m4.xlarge instances to r3.xlarge made HUGE performance improvements for us.

Segmenting on users who have performed a behaviour not behaving as expected

I want to look at the effect of having performed a specific action sequence at any (tracked) time in the past on user retention and engagement.
The action sequence is that of performing an optional New User Flow.
This is signalled to Google Analytics via sending it appropriate events. That works fine. The events show up in reports as expected.
My problem is what happens to results when I used these events to create segments. I have tried two different ways of creating a segment based on this in Advanced Segmentations, via Conditions (defining the segment via the end event, filtered over users not sessions), and via Sequences (defining start and end events, again filtered over users not sessions).
What I get when I look at various retention/loyalty reports, using either of these segments, is ever so very clearly a result which is doing this segmentation within session, not across uses sessions. So for NUF completers , I am seeing all my loyalty/recency on Session 1, in which people are most likely to do the NUF, if they ever do it at all. This is not what I want. (Mind you it is something that could be really useful in other context, with another event! But not for the new user flow.)
What are my options for getting what I want? I see two possible ways forward:
Using custom dimensions, assigning a custom dimension value in the code when the New User Flow is completed. However I do not know if this will solve the cross-session persistence problem.
Injecting a UserID, which we do not currently do, and (somehow!) using the reports available when you inject a UserID to do this.
Are either of these paths plausible? Is there a better way forward? Is it silly to even try to do this in Google Analytics? I'm way more familiar with App Tracking solutions (e.g. Flurry, Mixpanel, DeltaDNA) which do this as a matter of course, than with Google Analytics, and the fact this is at the very least awkward in Google Analytics is coming a bit of a surprise.
thanks,
Heather

Resources