I am trying out WSO2 CEP for our new requirement. Currently i have written a query to find timedout events.
from InputStream#window.time(5 minutes)
select *
insert into TimeoutRequest for expired-events.
But my requirement is, the 5 minutes mentioned in time window will vary from each request. Some request should get timeout in 5 mins and some in 10 minutes. How to pass dynamic value for window.time(n minutes). If we can do via Custom Transformer or Custom Window, i am not getting the right context on how to do this.
There can be different approaches to implement this:
Custom window - you can write you own window (extending the time window) which looks for a specific attribute in events to determine their timeout durations.
If there is only a limited set of time durations, you can simply define a window for each duration and point the incoming events to the relevant window using a filter.
e.g:
from InputStream[timeoutValue == 5]#window.time(5 minutes) select * insert into TimeoutRequest for expired-events
from InputStream[timeoutValue == 10]#window.time(10 minutes) select * insert into TimeoutRequest for expired-events
Related
I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}
I'm trying to trigger a notification which has no expiry but must be closed by pressing the top-right X close button. Is this possible?
I've been able to trigger a timed notification which also closes when anywhere else is clicked. With this answer.
[reflection.assembly]::loadwithpartialname("System.Windows.Forms")
[reflection.assembly]::loadwithpartialname("System.Drawing")
$notify = new-object system.windows.forms.notifyicon
$notify.icon = [System.Drawing.SystemIcons]::Information
$notify.visible = $true
$notify.showballoontip(10,"New Chat!","You have received New Chat!",[system.windows.forms.tooltipicon]::None)
Per Microsofts NotifyIcon.ShowBalloonTip Method documentation, the actual timeout property is set by the current system settings.
Minimum and maximum timeout values are enforced by the operating system and are typically 10 and 30 seconds, respectively, however this can vary depending on the operating system. Timeout values that are too large or too small are adjusted to the appropriate minimum or maximum value. In addition, if the user does not appear to be using the computer (no keyboard or mouse events are occurring) then the system does not count this time towards the timeout.
According to a couple of more google searches, you can set the time for your profile through the Registry ( Regedit - HKEY_CURRENT_USER\Control Panel\Accessibility: MessageDuration - didn't work for me).
Through group policy, or using theSystemParametersInfo API which is out of my league to explain any further. Only reference I can find was configuring the Accessibility/System Parameter: SPI_SETMESSAGEDURATION.
Its C++ though and only other article I could find was this one:SystemParametersInfoA function.
Seems possible but, it will definitely be a hassle to get it working.
I have state change duration data between my object state in milliseconds.I am sending this data to graphite. I want to create a single stat panel which show me the percentage of the duration less than 20 seconds. How can I create it? Any idea or any similar scenario example will be useful.
myProjectName.FromStateToState.duration 10000ms
myProjectName.FromStateToState.duration 15000ms
myProjectName.FromStateToState.duration 21000ms
myProjectName.FromStateToState.duration 25000ms
myProjectName.FromStateToState.duration 30000ms
Assume for above scenario I expect my percentage should be %40. Because I have 5 duration data and 2 of them is less than 20 seconds. I am using Graphite as data source and Grafana as visualizing.
Temporary Solution
Because I couldn't get enough attention and any answer, I will add my temprorary solution to here. If I learn exact solution in the future I will post as an answer too.
Basically I created two counter like counterSuccess and counterFail. If state change duration is less than 20 seconds increase counterSuccess otherwise increase counterFail. Then get percentage of the success rate via following basic formula counterSuccess/(counterSuccess + counterFail).
Graphite commands at Grafana Panel:
A : sumSeries(myProjectName.FromStateToState.counterSuccess.count)
B : sumSeries(myProjectName.FromStateToState.counterFail.count)
C : sumSeries(#A, #B)
D : divideSeries(#A,#C)
I defined a single stat at grafana to show it as single percentage;
I need to collect event logs from Windows those are logged before 10 seconds. Using pull subscription I could collect already saved logs before execution of program and saving logs while program is running. I tried with the code available on MSDN:
Subscribing to Events
"I need to start to collect the event logged 10 seconds ago". Here I think I need to set value for LPWSTR pwsQuery to achieve that.
L"*[System/Level= 2]" gives the events with level equal to 2.
L"*[System/EventID= 4624]" gives events with eventID is 4624.
L"*[System/Level < 1]" gives events with level < 2.
Like that I need to set the value for pwsQuery to get event logged near 10 seconds. Can I do in the same way as above? If so how? If not what are the other ways to do it?
EvtSubscribe() gives you new events as they happen. You need to use EvtQuery() to get existing events that have already been logged.
The Consuming Events documentation shows a sample query that retrieves events beginning at a specific time:
// The following query selects all events from the channel or log file where the severity level is
// less than or equal to 3 and the event occurred in the last 24 hour period.
XPath Query: *[System[(Level <= 3) and TimeCreated[timediff(#SystemTime) <= 86400000]]]
So, you can use TimeCreated[timediff(#SystemTime) <= 10000] to get events in the last 10 seconds.
The TimeCreated element is documented here:
TimeCreated (SystemPropertiesType) Element
The timediff() function is described on the Consuming Events documentation:
The timediff function is supported. The function computes the difference between the second argument and the first argument. One of the arguments must be a literal number. The arguments must use FILETIME representation. The result is the number of milliseconds between the two times. The result is positive if the second argument represents a later time; otherwise, it is negative. When the second argument is not provided, the current system time is used.
I have a table of eBay itemid, and for each id I want to apply a reviseitem call, but from the second call I get the following error:
You have exceeded your maximum call limit of 3000 for 5 seconds. Try back after 5 seconds.
NB: I have just 4 calls.
How can I fix this problem?
ebay count the calls per second per unique IP's. So please make sure your all calls from your application must be less than 3000 per 5 seconds. hope this would help.
I have just finished an eBay project and this error can be misleading. eBay allow a certain amount of calla a day and if you exceed that amount in one 24 hour period you can get this error. You can get this amount increased by completing an Application Check form http://go.developer.ebay.com/developers/ebay/forums-support/certification
The eBay Trading API, to which your ReviseItem call belongs, allows up to 5000 calls per 24 hour period for all applications, and up to 1.5M calls / 24hrs for "Compatible Applications", i.e. applications that have undergone a vetting process called "Compatible Application Check". More details here: https://go.developer.ebay.com/developers/ebay/ebay-api-call-limits
However, that's just the generic, "Aggregate" call limit. There are different limits for specific calls, some of which are more liberal (AddItem: 100.000 / day) and others of which are more strict (SetApplication: 50 / day) than that. Additionally, there are hourly and periodic limits.
You can find out any application's applicable limits by executing the GetApiAccessRules call:
<GetApiAccessRulesResponse xmlns="urn:ebay:apis:eBLBaseComponents">
<Timestamp>2014-12-02T13:25:43.235Z</Timestamp>
<Ack>Success</Ack>
<Version>889</Version>
<Build>E889_CORE_API6_17053919_R1</Build>
<ApiAccessRule>
<CallName>ApplicationAggregate</CallName>
<CountsTowardAggregate>true</CountsTowardAggregate>
<DailyHardLimit>5000</DailyHardLimit>
<DailySoftLimit>5000</DailySoftLimit>
<DailyUsage>10</DailyUsage>
<HourlyHardLimit>6000</HourlyHardLimit>
<HourlySoftLimit>6000</HourlySoftLimit>
<HourlyUsage>0</HourlyUsage>
<Period>-1</Period>
<PeriodicHardLimit>10000</PeriodicHardLimit>
<PeriodicSoftLimit>10000</PeriodicSoftLimit>
<PeriodicUsage>0</PeriodicUsage>
<PeriodicStartDate>2006-02-14T07:00:00.000Z</PeriodicStartDate>
<ModTime>2014-01-20T11:20:44.000Z</ModTime>
<RuleCurrentStatus>NotSet</RuleCurrentStatus>
<RuleStatus>RuleOn</RuleStatus>
</ApiAccessRule>
<ApiAccessRule>
<CallName>AddItem</CallName>
<CountsTowardAggregate>false</CountsTowardAggregate>
<DailyHardLimit>100000</DailyHardLimit>
<DailySoftLimit>100000</DailySoftLimit>
<DailyUsage>0</DailyUsage>
<HourlyHardLimit>100000</HourlyHardLimit>
<HourlySoftLimit>100000</HourlySoftLimit>
<HourlyUsage>0</HourlyUsage>
<Period>-1</Period>
<PeriodicHardLimit>0</PeriodicHardLimit>
<PeriodicSoftLimit>0</PeriodicSoftLimit>
<PeriodicUsage>0</PeriodicUsage>
<ModTime>2014-01-20T11:20:44.000Z</ModTime>
<RuleCurrentStatus>NotSet</RuleCurrentStatus>
<RuleStatus>RuleOn</RuleStatus>
</ApiAccessRule>
You can try that out four your own application by pasting an AuthToken for your application into the form at https://ebay-sdk.intradesys.com/s/9a1158154dfa42caddbd0694a4e9bdc8 and then press "Execute call".
HTH.