Fuzzy Match (Same merchant, different rows) - fuzzy-search

I have a data set of 4k merchants who have done some updates last December, I want to check if the two update descriptions are the same or match upto 80%.
Columns are
Update Id, MerchantId, CreatedOn, Description
I want to find on the same date or on a different date how much percentage match is there.
Tools R, Excel, Tableau
I have checked the length of the characters but matching the same len rows will be difficult.

Related

Google sheets Query, How to use query to select a value in rows based on header dates

I have sheet with two tabs and named a ranged where I'm trying to match and select a value in the budget tab based values from the month tab using query. The challenge I'm running into is how to use the dates in the header row of the budget tab in the Where of the query.
Find the value from named range Budget WHERE Category in Month tab matches Category in Budget AND date from Month tab matches date from budget tab, else value in Budget tab column lookup.
I think the sample sheets does a better job of showing this. Thanks in advance for any help.
https://docs.google.com/spreadsheets/d/1heN3I1tWiqBJ0LdHRK-26Koafk_8EudqfMy0cFWxFjI/edit?usp=sharing
I think in this case and the sheet layout it is better to use VLOOKUPs.
Just use this arrayformula across all months
=ARRAYFORMULA(if($H$16:$H="","",if(VLOOKUP($H$16:$H,budget,ifna(match(I$15,Budget!$E$1:$K$1,0),COLUMNS(budget)),false)="", VLOOKUP($H16:$H,budget,columns(budget),false), VLOOKUP($H$16:$H,budget,ifna(match(I$15,Budget!$E$1:$K$1,0),COLUMNS(budget)),false))))
Please note that I expanded the budget named range to cover the entire range, including Category and Lookup columns.

Spreadsheet - query-importrange sort by date and keep text in the same column

I am using 3 different spreadsheets which i have linked to a third spreadsheet where it shows up specific columns shorted by date asc (col2). The problem is that in the initial spreadsheets (where i importing the data from) the col30 (which i am trying to sort as col2 in final spreadsheet) has dates and text. What i need is that in the final spreadsheet to have the date sorted and to show also the text (in the col2 of final spreadsheet-which imports data from col30 of the 3 different spreadsheets).
The dates are sorted but neither the text appears nor the rest of the data which are in the same row with the date (on initial spreadsheets). The total data of the columns chosen when "Col6 CONTAINS '"&$B$1&"' are only appears when i put a date on col30 on initial spreadsheets. Otherwise, when it is no date but onlly text on col30 it doesn't return any variables.
Any suggestions? Thank you in advance.
What i have tried so far, which works without showing the text that i need to be shown:
=QUERY(QUERY({IMPORTRANGE("url1 ";"sheet1!A2:AJ1000");IMPORTRANGE("url2 ";"sheet2!A2:AJ1000");IMPORTRANGE("url3 ";"sheet3!A2:AJ1000")};"Select Col5,Col30,Col31,Col21,Col22,Col23,Col24,Col34,Col35,Col36 where Col6 CONTAINS '"&$B$1&"'");"Select * where Col2 is not null order by Col2")
Here is what I believe you are trying to achieve:
=QUERY(
{
IMPORTRANGE("1usAXftvFrpCHz7LN43avWrWqSIO14iKM-pgwuG9jMeE";"Sheet1!A2:AJ")\
ARRAYFORMULA(
TO_TEXT(
IMPORTRANGE("1usAXftvFrpCHz7LN43avWrWqSIO14iKM-pgwuG9jMeE";"Sheet1!AD2:AE")
)
)
};
"SELECT Col5,Col37,Col38,Col21,Col22,Col23,Col24,Col34,Col35,Col36 WHERE Col6='"&$B$1&"' AND Col37 is not null ORDER BY Col30, Col31"
)
Let's unpack the changes:
Remove the outer query. You don't need it. Instead add the condition and order by in the first query.
change the range to be an open ended one
Add columns with the text version of the dates / times.
The last point is important as query only supports a single type at a time. This means that when you were querying over the date and time, you were loosing the text (because they are of another type). Adding 2 more columns and forcing them to be text allows you to add them in the result without loosing information and keeping the originals allows you to order by them.
References
QUERY (Docs editors help)
TO_TEXT (Docs editors help)
ARRAYFORMULA (Docs editors help)

How to handle data duplicated due to same cases remaining each month in SPSS

I am writing a thesis on Airbnb's presence in Ireland and its effect on house prices. I've downloaded data from InsideAirbnb (.CSV), which describes each Airbnb host and house on a monthly basis. Each host has a unique host_id, each house has a unique house_id, and each host can have multiple house_id's.
Due it being monthly statistics, the same users are documented each month which causes duplicates when the tables are merged. These duplicates have the exact same data columns except the date (written in format mmm yyyy) and the Row_ID.
I'm not sure how to handle this data as obviously it is inaccurate due to the duplicated data. Is there a way to group the data based on the date, or should I have an array of date values in a single column for each? Any suggestions would be greatly appreciated.

SSRS Matrix Bespoke Headers (Still from datasource!!)

I have to create a matrix in SSRS to detail the number uses leaving an organisation.
The columns will all represent spaces of time spanning 1 week and the rows will all represent departements in the organisation. The detail portion will be a count of people who have left that area in that week.
I have a leaving date field in the DB but nothing that flags the specific intevals I have been told to use. That means that as the matrix is, it counts each of users that have left a specific department however the date range columns is 1 day, not 1 week. Is there a way to force the column headers to respect the week intervals I want given that they are currently coming from the dataset and are not hard coded?
Firstly try to manage your data in sql itself by using Group By with date and making each group as one week period. That way you can manage to get all data in your required format
I don't know what is your columns so I am just showing a way to get the week groups from table and get the count of the people
SELECT DATEPART(wk, datevaluecolumn) weekno
, SUM(peopleleavingcolumn) totalvalue
FROM yourTable
GROUP BY DATEPART(wk, datevalue)

SSRS linking two matrix tables

I am using SSRS 2008.
I have a report with 2 different matrix tables having two different datasets as their sources.
The data comes fine in both the tables individually.
BUT
My issue starts where I have to use data from one table to calculate percentage in the second table.
Here are the details:
Table 1:
Contains columns: Date, Referal_Status ('1' for each valid row), Department
Table 2:
Contains Columns: Date, Membership_Status ('1' for each valid row), Department
In table 1, I need to show referral counts (sum of valid Status) grouped by month in columns, and grouped by department in rows. Also an additional row and column for totals of the same.
This is implemented with no issues.
In table 2, I need to show membership counts (sum of valid Status) AND referals to membership percentage grouped by each month in columns, and grouped by department in rows. Also an additional row and column for totals of the same. The issue starts when I try to implement the percentage calculation.
Let's say I have the membership counts number for April 2014 in the membership table. How do I take the referrals count number for April 2014 from the referral table and compute the April 2014 percentage as referal_num/Membership_Num * 100
The issue that I face is the scopes of both the matrix tables being different.
Please help me attain the above in the SSRS matrix tables.
Am I providing enough information to get through to you folks about my issue? Please let me know in case you need more information from me.
This is often a road to misery, but anyway ...
I would use the Lookup Function to retrieve the Referrals count. You will need to concatenate your two key columns (Date and Department) into one expression.
This sounds great and often works well. However when it doesn't work on odd rows or combinations of data, you are flying blind trying to debug it.
Good luck!
PS: actually for a reliable solution that is easy to debug, I would go back and combine the data upstream so it can be presented to SSRS in one Dataset. I would probably use SSIS for this.

Resources