I have a Firebase database with Items in it. There could potentially be up to 1000 items in the database.
I am looking to pull 45 random children out of the database to use.
Any idea how I can do this without pulling them all out first and then weeding them down to what I need?
Assign each item an index, 1-1000
-Jhsu498984
item_name: "my item 0"
item_index: 0
-Ynkkj93ov9
item_name: "my item 24"
item_index: 24
then, with a random number generator, generate 45 random numbers (which match the item_index) and query for those specific items.
or
create all of the items and in separate node, keep their node refs
item_refs
-Jhsu498984: true
-Ynkkj93ov9: true
then you just need to load in the item_refs (into an array) and then randomly pick 45 from the array. Then query for those items.
Related
I have a dataset with n rows, how can I access a specific number of rows every specific number of rows through the whole dataset using Python?
For example, in 100 rows data set and I want to access 10 rows every 10 rows, like 1:10, 20:30, 40:50, 60:70, 80:90
I could think of something like this
df.iloc[np.array([int(x/10) for x in df.index]) % 2 == 0]
It takes the index of the dataframe, divides it by 10 and casts it to an int. This basically just removes the last digit in this example.
With the modulo statement the first 10 rows are True, the next 10 False and so on. This is then used with iloc to get just the lines with the True value.
This requires a continuously increasing index. If for example some rows were already filtered out this is not the case. reset_index can be used to reset the index.
I have a table in which we maintain user's login and logout time. Now I want to display a table to admin with number of active users with times like :
00:00 - 250
00:15 - 225
00:30 - 240
00:45 - 190
01:00 - 240
....
..
What algorithm we should use?
Thanks in advance :)
Make list/array with pairs (time; incr = +1 for login, -1 for logout)
Sort list by time key
Make ActiveCount = 0
Traverse list, adding incr to ActiveCount.
Value of ActiveCount in every moment corresponds to the number of active users
login: 0; logout: 4
login: 2; logout: 6
list
(0;1), (2;1), (4;-1), (6;-1)
count
0 1 2 1 0
You can simply iterate over the list of login-logout pairs for all the users and simply put the user (or increment the count for that user) in appropriate bucket. Now if a particular user spans over multiple buckets, you'll have to consider putting that user (or increment the count for that user) in multiple buckets.
That's all about the algorithm. Probably the simplest one.
If you want to go in the implementation detail, you can use a HashMap or unordered_map whose keys will be your time at which you want to report the number of users and the value will start from zero, you will increment this value each time when you get a new user.
I made a search app with elasticsearch.
Items have name and follower count. I use follower count to boost elasticsearch result.
Ex: Let's say i have two item. item_1 = [name = "abc def", follower = 1000] and item_2 = [name = "abc", follower = 10].
So, when user search for "abc", even though item_2 is exact match I bring item_1 as most likely result.
This works just fine for me.
But I want to add new feature to this.
I want to be able to detect items that getting popular and boost their score.
So, I think if I store follower count daily for a week or a month.
Like;
ItemNo Day1 Day2 Day3 Day4 ...
1 1000 1030 1040 1050 ...
2 50 100 200 400 ...
3 1M 1.001M 1.002m 1.003M ...
4 1.1M 1.1M 1.1M 1.1M ...
So, if daily follower count increase like this for item 1, 2, 3 and 4.
Then, I should be able to detect increase of follower count for item 2 and boost it over item 1.
Because, even though item 1 has more follower, item 2 is getting more follower every day.
But, item 3 and should not be boosted over item 4 because percentage of increase on item 3 is very small.
Bottom line, i want to be able to detect increasing popularity but it should be based on increase percentage.
So, do you have any suggestion to do this. or can you refer any paper that help me to solve this problem?
Let's say I have a game with player ids. Each id can have multiple character names (playerNames) and we have a score for each of those names. I would like to total all the scores per playerName, and calculate the percentage score per player name per id.
So, for instance:
id playerName playerScore
01 Test 45
01 Test2 15
02 Joe 100
would output
id {(playerName, playerScore, percentScore)}
01 {(Test, 45, .75), (Test2, 15, .25)}
02 {(Joe, 100, 1.0)}
Here's how I did it:
data = LOAD 'someData.data' AS (id:int, playerName:chararray, playerScore:int);
grouped = GROUP data BY id;
withSummedScore = FOREACH grouped GENERATE SUM(data.playerScore) AS summedPlayerScore, FLATTEN(data);
withPercentScore = FOREACH withSummedScore GENERATE data::id AS id, data::playerName AS playerName, (playerScore/summedPlayerScore) AS percentScore;
percentScoreIdroup = GROUP withPercentScore By id;
Currently, I do this with 2 GROUP BY statements, and I was curious if they were both necessary, or if there's a more efficient way to do this. Can I reduce this to a single GROUP BY? Or, is there a way I can iterate over the bag of tuples and add percentScore to all of them without flattening the data?
No, you can not do this without 2 GROUP, and the reason is more fundamental than just Pig:
To get the total number of points you need a linear pass through the player's scores.
Then, you need another linear pass over the player's scores to calculate the fraction. You can not do this before you know the sum.
Having said that, if the player's number of playerNames is small, I'd write a UDF that takes a bag of player scores and outputs a bag of score-per-playerName tuples, since each GROUP will generate a reducer and the process becomes ridiculously slow. A UDF that takes the bag would have to do those 2 linear passes as well, but if the bags are small enough, it won't matter and it'll certainly be an order of magnitude faster than creating another reducer.
Ok, say I have a subreport that populates a chart I have from data in a table. I have a summary sum field that adds up the total of each row displayed. I am about to add two new rows that need to be displayed but not totaled up in the sum. There is a field in the table that has a number from 1-7 in it. If I added these new fields into the database, I would assign a negative number to this like -1 and -2 to differentiate it between the other records. How can I set up a formula so that it will sum up all of the amount fields except for the records that have an 'order' number we will call it of either -1 or -2? Thanks!
Use a Running Total Field and set the evaluate formula to something like {new_field} >= 0. So it will only sum the value when it passes that test.
The way to accomplish this without a running total is with a formula like this:
if {OrderNum} >= 0 Then {Amount}