How to implement queue management system in Redis - algorithm

Im trying to implement queuing system using redis. So lets say we have this list:
> LPUSH listTickets 1 2 3 4 5 6 7 8 9
and as mobile app user someone was assigned number 4, and another user number 6. Now I want to display to them how many tickets are in front of them ( also to calculate the waiting estimated time). Now this may seem easy,
> LPOP listTicket
"4"
then we broadcast the result (the current ticket number to be called), then on mobile app everyone will subtract it from their own ticket number. for example my current ticket is 6 so 6-4=2 That way every user know how many tickets are ahead
However once you want to add feature to let the user delete their ticket or pushing it to the end of the queue, things get complicated. After deleting for example
> LRANGE listTicket 0 -1
1) "2"
2) "4"
3) "6"
4) "7"
5) "8"
when we LPOP listTicket we will get number 2 and the mobile app with ticket 6 will calculate 6-2 and get 4 which is the wrong calculation.
Do you have any algorithm in mind? Is getting the index of every ticket in the list every time someone delete their ticket an expensive process to calculate? Should I just send all tickets in the queue and let the users calculate their own position?
I want the system to be scale-able. So can the redis-node server handle total of 50 thousands (over different queues or lists) connected mobile apps getting their ranking in the queue they subscribed their ticket to.
I thought about using ZRANK but how will this load the server?

Related

Dataflow job has high data freshness and events are dropped due to lateness

I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}

Youtube API - Subscriptions list returns different number of total results in set

I'm trying to get the complete list of my subscriptions. I've tried 3 methods, all of them returns different amount of subscriptions and I don't know what to do :)
1: Using Subscriptions: list with channel ID:
https://www.googleapis.com/youtube/v3/subscriptions?part=snippet&channelId=MY_CHANNEL_ID&maxResults=50&key=MY_API_KEY
"totalResults" is 942
2: Using Subscriptions: list with "mine" flag. the "totalResult" field is 991.
Where do 49 subscriptions appear from?
3: Open browser in incognite mode, go to
https://www.youtube.com/channel/MY_CHANNEL_ID
Click on "Channels" tab, scroll down to the end of the subscriptions list, open console and type something like that
document.querySelectorAll("#contents #items > *").length
I see 1039. Where do another 48 subscriptions come from?
And the 1039 seems to be the most accurace number - I have 6 subscriptions in a row and the last row has only 1 item. 173*6+1 = 1039
So the questions is - how do I get all the 1039 subscriptions by API? And why does it return wrong amount of subscriptions?
You are using Subscriptions: list and shouldn't have such kind of bugs with totalResults however maybe there is a YouTube Data API v3 endpoint bug as documented in Search: list totalResults is:
integer
The total number of results in the result set. Please note that the value is an approximation and may not represent an exact value. In addition, the maximum value is 1,000,000.
You should not use this value to create pagination links. Instead, use the nextPageToken and prevPageToken property values to determine whether to show pagination links.
So I would recommend you to enumerate all subscriptions you have with the different methods you explained and so count on your own by using nextPageToken.

How do you keep running count of aws spot fleet (request and maintain) request instances?

How can I get a running count of total number of spot fleet instances?
If the target capacity is 5 instances and sometime after if 2 of them somehow terminates and aws spot fleet automatically relaunch 2 instances to sustain the target capacity and I want to keep count of total instances launched as being 7 ( 5 target + 2 new instances because 2 of them terminated)
I can use this command to get the index value:
curl -s http://169.254.169.254/latest/meta-data/ami-launch-index
However, once the spot fleet is met and if one of the instances terminates for some reason, the spot fleet will relaunch another instance automatically to fulfill the TargetCapacity, but the launch index will revert to "0" vs incrementing by 1 from the very last number assigned.
thanks!

how to calculate network sync status - Logical discussion

Let me explain:
I am making a network routing protocol analyze function
I get line for NODE 1 saying
2->4 4->4 5->4 6->6
Meaning to get to 2 go to 4, to get to 4 go to 4, to go to 5 go to 4...
I also hove "god view" of the topology
And now for the Question :) :
How should i calculate the sync % of a node?
1) goodChoices/allNeededPaths (if both 0 then sync=1)
2) goodChoices/allmychoices * badOrMissingChoices/allNeededPaths
3) what is your idea
Problems that need to think about
If god view is NONE-CONNECTED but in my NODE topology i see 4->4 5->4 what % should i right?

eBay API error : You have exceeded your maximum call limit

I have a table of eBay itemid, and for each id I want to apply a reviseitem call, but from the second call I get the following error:
You have exceeded your maximum call limit of 3000 for 5 seconds. Try back after 5 seconds.
NB: I have just 4 calls.
How can I fix this problem?
ebay count the calls per second per unique IP's. So please make sure your all calls from your application must be less than 3000 per 5 seconds. hope this would help.
I have just finished an eBay project and this error can be misleading. eBay allow a certain amount of calla a day and if you exceed that amount in one 24 hour period you can get this error. You can get this amount increased by completing an Application Check form http://go.developer.ebay.com/developers/ebay/forums-support/certification
The eBay Trading API, to which your ReviseItem call belongs, allows up to 5000 calls per 24 hour period for all applications, and up to 1.5M calls / 24hrs for "Compatible Applications", i.e. applications that have undergone a vetting process called "Compatible Application Check". More details here: https://go.developer.ebay.com/developers/ebay/ebay-api-call-limits
However, that's just the generic, "Aggregate" call limit. There are different limits for specific calls, some of which are more liberal (AddItem: 100.000 / day) and others of which are more strict (SetApplication: 50 / day) than that. Additionally, there are hourly and periodic limits.
You can find out any application's applicable limits by executing the GetApiAccessRules call:
<GetApiAccessRulesResponse xmlns="urn:ebay:apis:eBLBaseComponents">
<Timestamp>2014-12-02T13:25:43.235Z</Timestamp>
<Ack>Success</Ack>
<Version>889</Version>
<Build>E889_CORE_API6_17053919_R1</Build>
<ApiAccessRule>
<CallName>ApplicationAggregate</CallName>
<CountsTowardAggregate>true</CountsTowardAggregate>
<DailyHardLimit>5000</DailyHardLimit>
<DailySoftLimit>5000</DailySoftLimit>
<DailyUsage>10</DailyUsage>
<HourlyHardLimit>6000</HourlyHardLimit>
<HourlySoftLimit>6000</HourlySoftLimit>
<HourlyUsage>0</HourlyUsage>
<Period>-1</Period>
<PeriodicHardLimit>10000</PeriodicHardLimit>
<PeriodicSoftLimit>10000</PeriodicSoftLimit>
<PeriodicUsage>0</PeriodicUsage>
<PeriodicStartDate>2006-02-14T07:00:00.000Z</PeriodicStartDate>
<ModTime>2014-01-20T11:20:44.000Z</ModTime>
<RuleCurrentStatus>NotSet</RuleCurrentStatus>
<RuleStatus>RuleOn</RuleStatus>
</ApiAccessRule>
<ApiAccessRule>
<CallName>AddItem</CallName>
<CountsTowardAggregate>false</CountsTowardAggregate>
<DailyHardLimit>100000</DailyHardLimit>
<DailySoftLimit>100000</DailySoftLimit>
<DailyUsage>0</DailyUsage>
<HourlyHardLimit>100000</HourlyHardLimit>
<HourlySoftLimit>100000</HourlySoftLimit>
<HourlyUsage>0</HourlyUsage>
<Period>-1</Period>
<PeriodicHardLimit>0</PeriodicHardLimit>
<PeriodicSoftLimit>0</PeriodicSoftLimit>
<PeriodicUsage>0</PeriodicUsage>
<ModTime>2014-01-20T11:20:44.000Z</ModTime>
<RuleCurrentStatus>NotSet</RuleCurrentStatus>
<RuleStatus>RuleOn</RuleStatus>
</ApiAccessRule>
You can try that out four your own application by pasting an AuthToken for your application into the form at https://ebay-sdk.intradesys.com/s/9a1158154dfa42caddbd0694a4e9bdc8 and then press "Execute call".
HTH.

Resources