Find duplicates by LEVENSHTEIN distance - laravel

I'm trying to get duplicates with two conditions :
the standard groupBy method (by name and type)
and also by similarities on the name field.
In all articles about this, they always find similarities from a string, but here i want to compare with all others records on the same table.
So far I have tried the following query which retrieves the distance well but does not allow to group the results on the distance :
ERROR: window functions are not allowed in HAVING LINE
$query->select('name', 'type', DB::raw('COUNT(*) as count'), DB::raw('LEVENSHTEIN(UPPER(name),UPPER(lag(name) OVER (order by name))) as distance'))
->groupBy('name', 'type')
->havingRaw('COUNT(*) > 1')
->orHavingRaw('LEVENSHTEIN(UPPER(name),UPPER(lag(name) OVER (order by name))) > 20');
How to get duplicates using the levenshtein distance on the same field over all the table (postgresql)?

Related

Need to filter result set in DAX Power BI

I have following simple relationship:
I have created following visuals in Power BI:
I want to show Store Name, Orders (by Salesman selected in slicer) and Total Orders in that Store (ignoring Salesman selected in slicer). I have created two very simple measure (can be seen in above visual) and used in matrix visuals. Visual is showing All stores while I want to show only those stores where Salesman X (selected salesman in slicer) have orders i.e. I don't want Store B row.
while solving, I suspected that it is due to fact that visual is not cross filtering. I used crossfilter but it made no difference. data can be seen in below image:
Please guide. Thanks in advance.
Try to change [Total Orders] to this measure, but keep [Total Orders].
IF( ISBLANK([Orders Count]), BLANK(), [Total Orders])
By Adding VALUES('Order'[Store ID]) in measure solved the problem. complete measure definition is as follows:
Total Orders = CALCULATE(
count('Order'[Order ID]),
REMOVEFILTERS(Salesman[Salesman Name]),
VALUES('Order'[Store ID]))
This issues the problem but I could not understand how? Because VALUES bring only those stores where salesman has Order. But when salesman removed from the filter context by REMOVEFILTERS, then how come VALUES bring only stores where salesman have orders?
a) You intend to utilize Store.salesmanName from Store in a slicer, meaning whatever is selected from there, you intend that selection to be applied on Order to give you the Order.StoreName. So when X is selected only A and C are returned.
b) Once that selection happens, you intend DAX to return the total count of each Order.StoreName whether it has a corresponding Store.salesmanID in Order.salesmanID or not. In other words, in this layer of the analysis, you want the previous selection to remain applied in the outer loop but to be ignored in the inner loop.
To be able to do that, you can do this,
totalCount =
VAR _store =
MAX ( 'Order'[storeID] ) //what is the max store ID
VAR _count =
CALCULATE (
COUNT ( 'Order'[SalesmanId] ),
FILTER ( ALL ( 'Order' ), 'Order'[storeID] = _store ) //remove any filters and apply the value from above explicitly in the filter
)
RETURN
_count

Power BI DAX measure: Count occurences of a value in a column considering the filter context of the visual

I want to count the occurrences of values in a column. In my case the value I want to count is TRUE().
Lets say my table is called Table and has two columns:
boolean value
TRUE() A
FALSE() B
TRUE() A
TRUE() B
All solutions I found so far are like this:
count_true = COUNTROWS(FILTER(Table, Table[boolean] = TRUE()))
The problem is that I still want the visual (card), that displays the measure, to consider the filters (coming from the slicers) to reduce the table. So if I have a slicer that is set to value = A, the card with the count_true measure should show 2 and not 3.
As far as I understand the FILTER function always overwrites the visuals filter context.
To further explain my intent: At an earlier point the TRUE/FALSE column had the values 1/0 and I could achieve my goal by just using the SUM function that does not specify a filter context and just acts within the visuals filter context.
I think the DAX you gave should work as long as it's a measure, not a calculated column. (Calculated columns cannot read filter context from the report.)
When evaluating the measure,
count_true = COUNTROWS ( FILTER ( Table, Table[boolean] = TRUE() ) )
the first argument inside FILTER is not necessarily the full table but that table already filtered by the local filter context (including report/page/visual filters along with slicer selections and local context from e.g. rows/column a matrix visual).
So if you select Value = "A" via slicer, then the table in FILTER is already filtered to only include "A" values.
I do not know for sure if this will fix your problem but it is more efficient dax in my opinion:
count_true = CALCULATE(COUNTROWS(Table), Table[boolean])
If you still have the issue after changing your measure to use this format, you may have an underlying issue with the model. There is also the function KEEPFILTERS that may apply here but I think using KEEPFILTERS is overcomplicating your case.

How to orderBy before sum

I have the following tables:
competitions (id, ...)
questions (id, ...)
teams (id, ...)
team_user (id, user_id, team_id, ...)
answer_user (user_id, competition_id, question_id, points, ...)
I am trying to build a query to list all teams who participated in the competition id = 20, given that we sort the list based on accumulation of the teams' users' points. However, sometimes 2 teams receive the same total points, so we need to show the team who was able to reach that total of points first in the list before others (based on the answers created_at column)
I am able to get this list using the following query, but not able to sort based on answer_user created_at column:
$ranks = Team::withCount(['answers' => function ($q) use ($competition_id) {
$q->where('competition_id', $competition_id)
->select(DB::raw('sum(points)'));
}])
->where('competition_id', $competition_id)
->orderBy('answers_count', 'desc')
->get();
Edit 1
I can achieve the sort required when I replace the
->select(DB::raw('sum(points)')
with
->select(DB::raw('max(answer_user.created_at)')
However, I want actually both aggregations to work. Basically, finding the total points each team scored, then sort them based on the time, first to score is the one show first.
Please try to use sortBy() or sortByDesc() function after get()
https://laravel.com/docs/5.8/collections#method-sortbydesc
$ranks = Team::withCount(['answers'=>function($q) use($competition_id){
$q->where('competition_id',$competition_id)->select(DB::raw('sum(points)'));
}])->where('competition_id',$competition_id)->get()->sortByDesc('answer_user');

Sum attributes of relation tables after performing division to them

I couldn't come up with an appropriate title, excuse me for which.
The situation is the following:
I've got two tables: montages and orders, where Montage belongs to Order.
My goal is to build a single mysql query which to return a single float value to represent a sum of values for multiple montages. For each montage in the query I need to divide the budget of its order by the number of montages which belong to the same order. The result of this division should be an attribute of ecah montage. Finally, I want to sum those attributes and retrieve a single value.
I've tried a lot of variation of something like the following, but none seemed to be written in correct syntax, so I kept getting errors:
$sum = App\Montage::where(/*this doesn't matter*/)
->join('orders', 'montages.order_id', '=', 'orders.id') //join the orders table
->select('montages.*, orders.budget') //include the budget column
->selectRaw('count(* where order_id = order_id) as all') //count all the montages of the same order and assing that count to the current montage
->selectRaw('(orders.budget / all) as daily_payment') //divide the budget of the order by the count; store the result as `daily_payment`
->sum('daily_payment') //sum the daily payments
I'm really lost with the proper syntax and can't figure it out. I'd estimate that to be a rather trivial sql task for people who know their stuff, but unfortunately I don't seem to be one of them... Any help is greatly appreciated!

Tableau - Filter Measure Based on Different Variables of the Same Dimension

I have the following dimensions: Patients and Collection Type (Blood or Tissue). Measure: Collections.
I am counting how many blood and tissue collections for each patient have been made.
Here is my table: Collections per Patient by Collection Type
Now I want to filter this table: I want to display only those Patients who have more then 2 Blood Collections and more then 2 Tissue Collections.
So, I want to see only Patient B, D, and E.
How can I do this?
There are a variety of ways you could accomplish your desired result. Probably one of the easier ways would be to unpivot your data such that 'blood collections' and 'tissue collections' are separate columns instead of one. I don't believe Tableau natively supports this while importing a data source currently; however, you can created two additional calculated fields to replicate an unpivot.
Blood Field:
IF [Collection_Type] = 'Blood'
THEN [Collection]
ELSE Null
END
Tissue Field:
IF [Collection_Type] = 'Tissue'
THEN [Collection]
ELSE Null
END
EDIT: Create a Calculated field that contains your desired condition for filtering, Ex.:
(SUM([Blood_field]) > 2 AND SUM([Tissue Field]) > 2)
Calculated field will evaluate to TRUE or FLASE. Filter for records for TRUE on this field

Resources