Closest to in active record - rails-activerecord

We have a model called "games" with the following attributes: id, contest_id:integer, high:integer
We need to get a list of games with the high closest to say 500 for each contest_id?
Can this be done only using ActiveRecord?
Note: The “high” value can be higher or lower than 500 (as long as it's the closest), and one game per contest_id.

Related

Need to optimize Neo4j Query

I am creating a social media app whose back-end is Neo4j. Here, I want to fetch friends activities on user's homepage. I have put a limit of 10 activities per scroll.
This is my current Neo4j Query:
MATCH (m:Person { id: '$id'})-[:FOLLOWS*1..1]->(f:Person)-[:HAS_ACTIVITY]->(la:Activity)-[:NEXT*0..]->(act:Activity)
WHERE act.verb IN ['post' ,'post_media_item' ,'watch_activity']
RETURN f, act ORDER BY act.published DESC LIMIT 10
In the above query user are depicted by Person node and User-feed is depicted by Activity node. "m" is current user whose friends "f" are depicted by the relationship "Follows". Activity for each user are interlinked with the relationship "NEXT" and are linked with Person node with relationship "HAS_ACTIVITY". It is intentional to keep maximum hops blank as a Person can have any number of activities.
For this query as Friend List of a person grows, so does the overall query execution time.
Currently for a user having 50 friends the time taken is 3-5 seconds.
I have to perform further aggregation from the result of this query which takes query execution time to 7-10 seconds.
Please find attached a screenshot of PROFILE and EXPLAIN:
How can I optimize my Cypher query so that I get the quickest result irrespective of a number of friends?
Neo4j Version used: 3.0.6
A simple thing would be to update to Neo4j 3.5.x at least.
If a person has all activities linked to it with HAS_ACTIVITY relationship, why do you need to further traverse the NEXT relationship? Would the following query not suffice?
MATCH (m:Person { id: '58c1370350b91a0005be0136'})-[:FOLLOWS*1..1]->(f:Person)-[:HAS_ACTIVITY]->(la:Activity)
WHERE la.verb IN ['post' ,'post_media_item' ,'watch_activity']
RETURN f, la ORDER BY la.published DESC LIMIT 10

Datadog distinct-like custom metrics

Given following scenario:
A lambda receives an event via SQS
The lambda receives a uuid pointing to an entity.
The lambda may fail with an error
SQS will retrial that particular entity several times
The lambda will be called with different entities thousand of times
Right now we monitor a custom error-count metric like myService.errorType.
Which gives us an exact number of how many times an error occurred - independent from a specific entity: If an entity can't be processed like 100 times, then the metric value will be 100.
What I'd like to have, though, is a distinct metric based on the UUID.
Example:
entity with id 123 fails 10 times
entity with id 456 succeeds
entity with id 789 fails 20 times
Then I'd like to have a metric with the value of 2 - because the processes failed for two entities only (and not for 30, as it would be reported right now).
While searching for a solution I found the possibility of using tags. But as the docs point out they are not meant for such a use-case:
Tags shouldn’t originate from unbounded sources, such as epoch timestamps, user IDs, or request IDs. Doing so may infinitely increase the number of metrics for your organization and impact your billing.
So are there any other possibilities to achieve my goals?
I've solved it now by verifying the status via code and by adding tags to the metrics:
occurrence:first
subsequent
This way I can filter in my dashboard for occurrence:first only.
To make sure things are clear, you have a metric called myService.errorType with a tag entity. This metric is a counter that will increase every time an entity is in error. You will then use this metric query:
sum:myService.errorType{*} by {entity}
When you speak about UUID, it seems that the cardinality is small (here you show 3). Which means that every hour you will have small amount of UUID available. In that case, adding UUID to the metric tags is not as critical as user ID, timestamp, etc. which have a limitless number of options.
I would invite you to add this uuid tag, and check the cardinality in the metric summary page to ensure it works.
Then to get the number of UUID concerned by errors, you can use something like:
count_not_null(sum:myService.errorType{*} by {uuid})
Finally, as an alternative, if the cardinality of UUID can go through the roof, I would invite you to work with logs or work with Christopher's solution which seems to limit the cardinality increase as well.

DAX COUNT/COUNTA functions

I've looked at many threads regarding COUNT and COUNTA, but I can't seem to figure out how to use it correctly.
I am new to DAX and am learning my way around. I have attempted to look this up and have gotten a little ways to where I need to be but not exactly. I think I am confused about how to apply a filter.
Here's the situation:
Four separate queries used to generate the data in the report; but only need to use two for the DAX function (Products and Display).
I have three columns I need to filter by, as follows:
Customer (Display or Products query; can do either)
Brand (Products query)
Location (Display query)
I want to count the columns based on if the data is unique.
Here's an example:
Customer: Big Box Buy;
Item: Lego Big Blocks;
Brand: Lego;
Location: Toys;
BREAK
Customer: Big Box Buy;
Item: Lego Star Wars;
Brand: Lego;
Location: Toys;
BREAK
Customer: Big Box Buy;
Item: Surface Pro;
Brand: Microsoft;
Location: Electronics;
BREAK
Customer: Little Shop on the Corner;
Item: Red Bicycle;
Brand: Trek;
Location: Racks;
In this example, no matter the fact that the items are different, we want to look at just the customer, the brand, and the location. We see in the first two records, the customer is "Big Box Buy" and the brand is "Lego" and the location is "Toys". This appears twice, but I want to count it distinct as "1". The next "Big Box Buy" store has the brand "Microsoft" and the location is "Electronics". It appears once and only once, and thus the distinct count is "1" anyway. This means that there are two separate entries for "Big Box Buy", both with a count of 1. And lastly there is "Little Shop on the Corner" which appears just once and is counted just once.
The "skeleton" of the code I have is basically just to see if I can get a count to work at all, which I can. It's the FILTER that I think is the problem (not used in the below example) judging by other threads I've read.
TotalDisplays = CALCULATE(COUNTA(products[Brand]))
Obviously I can't just count the amount of times a brand appears as that would give me duplicates. I need it unique based on if the following conditions are met:
Customer must be the same
Brand must be the same
Location must be the same
If so, we distinctly count it as one.
I know I ranted a bit and may seem to have gone in circles, but I was trying to figure out how to explain it. Please let me know if I need to edit this post or post clarification.
Many thanks in advance as I go through my journey with DAX!
I believe I have the answer. I used a NATURALINNERJOIN in DAX to create a new, merged table since I needed to reference all values in the same query (couldn't figure out how to do it otherwise). I also created an "unique identity" calculated column that combined data from multiple rows, but was hidden behind the scenes (not actually displayed on the report) so I could then take a measure of the unique values that way.
TotalDisplays = COUNTROWS(DISTINCT('GD-DP-Merge'[DisplayCountCalcCol]))
My calculated column is as follows:
DisplayCountCalcCol = 'GD-DP-Merge'[CustID] & 'GD-DP-Merge'[Brand] & 'GD-DP-Merge'[Location] & 'GD-DP-Merge'[Order#]
So the measure TotalDisplays now reports back the distinct count of rows based on the unique value of the customer ID, the brand, and the location of the item. I also threw in an order number just in case.
Thanks!
I am semi new to DAX and was struggling with Count and CountA formula, you post has helped me with answers. I would like to add the solution which i got for my query: Wanted count for Right Time start Achieved hence if anyone is looking for this kind of answer use below, filter will be selecting the table and adding string which you want to
RTSA:=calculate(COUNTA([RTS]),VEO_Daily_Services[RTS]="RTSA")

Efficient way to query

My app has a class that saves picture that users upload. Each object in the class has a city property that holds the name of the city that the picture was taken at, and a like property that tracks the number of likes.
I want to be able to send a query that returns one picture per city and each picture should have the highest ranking of likes in the city it belongs to. How can I do that?
One way which I first thought about is doing multiple queries by fetching the most liked picture of a city and save it in an array, and then do the same to other cities.
However, each country has more than one city, thus it's not that efficient.
Parse doesn't support the ordinary operations used in databases. Besides, I tried to use a compound query. Unfortunately, I can't set limit or ordering on the subqueries. Any good solution for this?
It would be easy using group by. Unfortunately, Parse does not support "select distinct" or "group by" features.
As you've suggested you need to fetch for each country all the cities, and for each one get the top most rated photo.
BUT, since Parse has strict restrictions on the duration time execution of a request ( 3 sec for an event listener, 7 sec for a custom function ), I suggest you to do this in a background job, saving in a new table the top rated photo for each city. In this way you can easily query the db from client. The Background jobs can be executed up to 15 minuted before parse drop them, so you could make that kind of queries without timeouts.
Hope it helps

Creating DAX peer measure

The scenario:
We are an insurance brokerage company. Our fact table is claim metrics current table. This table has unique rows for multiple claim sid-s, so that, countrows(claim current) gives the correct count of the number of unique claims. Now, this table also has clientsid and industrysid. The relation between client and industry here is that, 1 industry can have multiple clients, and 1 client can belong to only 1 industry.
Now, let us consider a fact called claimlagdays, which is present in the table at the granularity of claimsid.
Now, one requirement is that, we need to find out "peer" sum(claimlagdays). This, for a particular client, is basically calculated as:
sum(claimlagdays) for the industry of the client being filtered (minus) sum(claimlagdays) for this particular client. Let's call this measure A.
Similar to above, we need to calculate "peer" claim count , which is claimcount for the industry of the client being filtered (minus) claimcount for this particular client.
Let's call this measure B.
In the final calculation, we need to divide A by B, to get the "peer" average lag days.
So basically, the hard part here is this: find the industry of the particular client which is being filtered for, and then, apply this filter to the fact table (claim metrics current) to find out the total claim count/other metric only for this industry. then of course, subtract the client figure from this industry figure to get the "peer" measure. This has to be done for each row, keeping intact any other filters which might be applied in the slicer(date/business unit, etc.)
There are a couple of other filters static which need to be considered, which are present in other tables, such as "Claim Type"(=Indemnity/Medical) and Claim Status(=Closed).
My solution:
For measure B
I tried creating a calculated column, as:
Claim Count_WC_MO_Industry=COUNTROWS(FILTER(FILTER('Claim Metrics Current',RELATED('Claim WC'[WC Claim Type])="Medical" && RELATED('Coverage'[Coverage Code])="WC" && RELATED('Claim Status'[Status Code])="CL"),EARLIER('Claim Metrics Current'[IndustrySID])='Claim Metrics Current'[IndustrySID]))
Then I created the measure
Claim Count - WC MO Peer:=CALCULATE(SUM([Claim Count_WC_MO_Industry])/[Claim - Count])- [Claim - Count WC MO]
{I did a sum because, tabular model doesn't directly allow me to use a calculated column as a measure, without any aggregation. And also, that wouldn't make any sense since tabular model wouldn't understand which row to take}
The second part of the above measure is obviously, the claim count of the particular client, with the above-mentioned filters.
Problem with my solution:
The figures are all wrong.I am not getting a client-wise or year-wise separation of the industry counts or the peer counts. I am only getting a sum of all the industry counts in the measure.
My suspicion is that this is happening because of the sum which is being done. However, I don't really have a choice, do I, as I can't use a calculated column as a measure without some aggregation...
Please let me know if you think the information provided here is not sufficient and if you'd like me to furnish some data (dummy). I would be glad to help.
So assuming that you are filtering for the specific client via a frontend, it sounds like you just want
ClientLagDays :=
CALCULATE (
SUM ( 'Claim Metrics Current'[Lag Days] ),
Static Filters Here
)
Just your base measure of appropriate client lag days, including your static filters.
IndustryLagDays :=
CALCULATE (
[ClientLagDays],
ALL ( 'Claim Metrics Current'[Client] ),
VALUES ( 'Claim Metrics Current'[IndustrySID] )
)
This removes the filter on client but retains the filter on Industry to get the industry-wide total of lag days.
PeerLagDays:=[IndustryLagDays]-[ClientLagDays]
Straightforward enough.
And then repeat for claim counts, and then take [PeerLagDays] / [PeerClaimCount] for your [Average Peer Lag Days].

Resources