Algorithm for randomly selecting object - algorithm

I want to implement a simulation: there are 1000 objects; during a period of time 1800 seconds, each object is randomly selected (or whatever action); the number of selected objects along time follows a rough distribution: 30% will be selected within 60 seconds, 40% will be selected after 60 seconds but within 300 seconds, 20% will be selected after 300 seconds but within 600 seconds, and 10% will be selected after 600 seconds.
So what is the probability for each object being selected every second?

This might be more appropriate to the Programmers section of StackExchange here: Programmers Exchange
But just taking a quick swipe at this, you select 300 objects in the first 60 seconds, 400 objects in the next 240 seconds, 200 objects in the next 300 seconds, and 100 objects in the last 1200 seconds. That gives you a sense of objects per second for each second of your simulation.
So, for example, you select 5 objects per second for the first 60 seconds, so there is a 5/1000 or 0.5% probability of selecting any specific object in each second of those first 60 seconds.
I think that should lead you to the answer if I understand your question correctly.

Related

NiFi MergeRecords leaving out one file

I'm using NiFi to take in some user data and combine all the JSONs into one record. The MergeRecord processor is working just like I need, except it always leaves out one record (usually the same one every time). The processor is set to run ever 60 seconds. I can't understand why because there are only 56 records to merge. I've included images below for any help y'all may have.
Firstly, you have 56 FlowFiles, that does not necessarily mean 56 Records unless you have 1 Record per FlowFile.
You are using MergeRecord which counts Records, not files.
Your current config is set to Min 50 - Max 1000 Records
If you have 56 files with 1 Record in each, then merging 50 files is enough to meet the Minimum condition and release the bucket.
You also say Merge is set to run every 60 seconds, and perhaps this is not doing what you think it is. In almost all cases, Merge should be left to the default 0 sec schedule.
NiFi has no idea what all means, it takes an input and works on it - it does not know if or when the next input will come.
If every FlowFile is 1 Record, and it is categorically always 56 and that will never change, then your setting could be Min 56 - Max 56 and that will always merge 56 times.
However, that is very inflexible to change - if it suddenly changed to 57, you need to modify the flow.
Instead, you could set the Min-Max to very high numbers, say 10,000-20,000 and then set a Max Bin Age to 60 seconds (and the processor scheduling back to 0 sec). This would have the effect of merging every Record that enters the processor until A) 10-20k Records have been merged, or B) 60 seconds expire.
Example scenarios:
A) All 56 arrives within the first 2 seconds of the flow starting
All 56 are merged into 1 file after 60 seconds of the first file arriving
B) 53 arrive within the first 60 seconds, 3 arrive in the second 60 seconds
The first 53 are merged into 1 file after 60 seconds of the first file arriving, the last 3 are merged into another file after 60 seconds from the frst of the 3 arriving
C) 10,000 arrive in the first 5 seconds
All 10k will merge immediately into 1 file, they will not wait for 60 seconds

Understanding locust users and spwan rate

I am trying to understand the exact relationship between the number of users and the spawn rate in locust which I could not find anywhere in the documentation.
So assume that I set the number of users to be 100, the spawn rate to be 50 users per second, and the test duration to be 10 seconds.
Is this a correct interpretation that in 2 seconds (100/50) all the required 100 users are ready and for the next 8 seconds we have 100 constant users running the tasks?
Furthermore, if I use the LoadTestShape class where I can change the number of users and the spawn rate at every tick so that for the next tick I increase the number of users to 150, does it take 1 second to spawn another 50 users to reach 150, or it spawns a whole new batch of 150 users in 3 seconds?
Is this a correct interpretation that in 2 seconds (100/50) all the required 100 users are ready and for the next 8 seconds we have 100 constant users running the tasks?
Yes.
Furthermore, if I use the LoadTestShape class where I can change the number of users and the spawn rate at every tick so that for the next tick I increase the number of users to 150, does it take 1 second to spawn another 50 users to reach 150, or it spawns a whole new batch of 150 users in 3 seconds?
With LoadTestShape you set the target user count (total) and the rate at which you want to approach that number, so option #1.
Note that the exact rate of the ramp up is not 100% guaranteed. If you set the new rate to occur after X seconds, you cant be sure you’ll have all 150 users running at X+1 seconds.

Apache Storm UI window

In Apache Storm UI, Window specifies The past period of time for which the statistics apply. So it may be 10 mins, 3 hr, 1day. But actually when a topology is running, Is the number of tuples emitted/ transferred be computed using this window time because If I see the actual time 10 mins is quite big but the window shows 10 mins statistics before actual 10 mins which doesn't make sense?
For Example: emitted = 1764260 tuples, so will the rate of tuples emission is 1764260/600= 9801 tuples/sec?
It does not display the average, it displays the total number of tuples emitted in the last period of time (10 min, 3h or 1 day).
Therefore, if you started the application 2 minutes ago, it will display all tuples emitted the last two minutes and you'll see that the number increases until you get to 10 minutes.
After 10 minutes, it will only show the number of tuples emitted in the last 10 minutes, and not an average of the tuples emitted. So if, for example, you started the application 30 minutes ago, it will display the number of tuples emitted between minutes 20 to 30.

Why is Average response time is reducing when we are increasing the number of users?

I am using J-Meter to run a performance test with different number of users. With 1 user, the avg response time is 1.4 seconds, but with more number of users, it's logical that the avg response time will go up, but instead it is reducing. Can anyone explain why? The test scenario is that I am interacting a few times (2-3 interactions) with a chat bot.
Please help me understand this confusing results below:
1 user - 30 seconds - 1.3 seconds (average response time)
5 users - 60 seconds - 0.92 seconds (average response time)
10 users - 60 seconds - 0.93 seconds (average response time)
20 users - 120 seconds - 0.92 seconds (average response time)
First iteration of first user often involves some overhead on client side (most commonly DNS resolution), and can have some overhead on server side (server "warmup"). That overhead is not required in the following iterations or users.
Thus what you see as reduction in average time is actually reduction of the impact of the slower "first user first iteration" execution time on overall outcome. This is why it's important to provide a sufficient sample, so that such local spike does not matter that much anymore. My rule of thumb is at least 10000 iterations before looking at any averages, although level of comfort is up to every tester to set.
Also when increasing number of users, you should not expect average to be worse, unless you reached a saturation point: it should be stable rather. So if you expect your app to be able to support not more than 20 users, than your result is surprising, but if you expect application to support 20000 users, you should not have any average degradation at 20 users.
To test if this is what happens, try to run 1 user, but for much longer, so that total number of iterations is similar to running 20 users for example. Roughly you need to increase duration of test with 1 user to 20 min to get to similar number of iterations (i.e. same length of test would be 120 sec, but also x20 iterations with 20 users, giving you rough number of 20 min total for 1 user)

Speed per second on update interval less than a second

Let's, for example here, look at this widget. It reads from sysfs, more precisely the files:
/sys/class/net/wlan0/statistics/tx_bytes
/sys/class/net/wlan0/statistics/rx_bytes
And displays the bandwidth in Megabits per second. Now, the drill is, the widget is set to update every 1/4 of a second, 250ms. How, can the widget them calculate speed per second, if a second did not pass? Does is multiply the number it gets with 4? What's the drill?
The values read from tx_bytes and rx_bytes are always current. The Widget just has to read the values every 250 ms and memorize at least the last 4 values. On each update, the difference between the current value and value read 1 second ago can be taken, divided by 125.000 and correctly be reported as the bandwidth in Megabits per second.

Resources