Count data entries per day per person

Count data entries per day per person - sorting

My employees enter samples into our database. This is done by a forum. I have access to Google sheet that houses all the info they provide. They enter multiple entries a day. I am trying to find who is the top performer, that has entered the most entries in one day.
How do I get it to search my data to tell me the person's name and how many samples they are entered in per day?
Ex.
Tim= 15 entries on Sept 10
John= 5 entries on Sept 10
Brian= 2 entries on sept 10
Tim= 5 entries on sept 9
John= 20 entries sept 9
Brian= 1 entrie on sept 9
I want it to look through the list and pick up that Johns entries on the 9th are the highest and it outputs me johns name and 20 entries. Until a day comes that someone enters more than 20 entries it will hold john as being my top performer

you can use QUERY like:
=QUERY(A:C, "order by C desc limit 1", 0)
=QUERY(B2:E, "select B,count(B) where B is not null group by B label count(B)''", 0)
=QUERY(B2:E, "select B,count(B) where B is not null group by B pivot E", 0)
=ARRAYFORMULA(QUERY(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(
{B2:B&"♦"&TEXT(E2:E, "m/d/yyyy")},
"select Col1,count(Col1)
where Col1 is not null
group by Col1
order by count(Col1) desc
label count(Col1)''", 0))&"♦",,999^99)), "♦"),
"limit 1
offset 1", 0))

Related

AWS-quicksight: Migrate SAMEPERIODLASTYEAR from PowerbI

I'm working with amazon-quicksight and I'm trying to migrate one funcionality from PowerBi to Quicksight but it has not been possible.
The idea is: I have a dataset with two columns "date" and "sales". The user will have a filter with the column "date". More than one date can be selected. Depending on the dates selected by the user we need to get two KPIs, the first one is the sum of sales for those dates (this is already done), but the second one is my problem, it should be the sum of the sales on the same days selected by the user but in the previous year (it depends on which year was selected for each date).
Example:
DataSet:
DATE
SALES
2020-01-05
1
2020-02-01
1
2020-06-10
4
2020-06-17
1
2021-01-01
1
2021-02-01
3
2021-06-10
3
2021-06-15
5
If the user select the dates: 2021-02-01, 2021-06-10 and 2021-06-15, the result should be:
KPI 1: Sum of sales (for those dates): 11
KPI 2: Sum of sales for those dates in the previous year: 5 -> (Days to use 2020-02-01, 2020-06-10 and 2020-06-15)
Do you have any idea about how can I calculate the KPI 2? Any suggestion?
In powerBI this (KPI 2) was done with the function: SAMEPERIODLASTYEAR
Thanks in advance.

I was able to solve this issue taking into account that the user must filter data from the two years, for example 2020 and 2021.
After that, I created the "calculated field":
calc_sum_sales = sum(sales)
After that:
calc_sum_sales_last_year = ({calc_sum_sales}-periodOverPeriodDifference(sum(sales),date,YEAR,1))
After that, I created an "visual table" and added in "group by" the field "date" and in "value" the field "calc_sum_sales_last_year"
I must give click in the column date to agroup by "year". In that way I get just two rows, one row for each year.
Finally, I change the "visual table" to "KPI" and I got the expected result.

How to put measures from multiple tables into one matrix in Power BI?

I have 8 tables with data of sold products. Each table is about a unique product. In Power BI, I want to create a matrix, containing the sold quantities (values) per product (rows), per month (columns), and the number of unique customers who bought the products.
Each of the 8 tables with the sales data has the following structure. So the App ID is different for each table, but is constantly the same within a table. Example for a Cars table:
Customer ID Month App ID
29273 2020-3 1
90283 2018-5 1
55824 2016-12 1
55824 2018-10 1
55824 2021-1 1
So, a bicycle table would have the same structure, but then the App ID's would be, for example 2, in the entire table.
I have two tables that are connected with the 8 product tables in a one-to-many relationship. The Calendar table based on the Month column, and the App table based on the App ID column.
The table Calendar:
Month
2015-1
2015-2
2015-3
2015-4
2015-5
...
...
The table Apps:
ID Name
1 Cars
2 Bicycle
3 Scooter
4 ...
So, the structure is:
I created the Calendar en Apps tables so that I could use them for the matrix, but it doesn't work like I want so far. At the end, I want to create a matrix like this (where P = the number of products sold, and C = the number of customers in that month for that product):
Product 2015-1 2015-2 2015-3 2015-4 2015-5 ...
P C P C P C P C P C
Cars 3 2 5 5 7 6 2 1 4 2
Bicycle 11 9 17 14 5 5 4 4 8 6
Scooter ...
Skateboard ...
As mentioned, I made that Calendar and App table so that I can use the columns from it to fill the labels in the rows and columns. What I am unable to do is create such a 'general variable' of the number of products sold per product, and the number of customers associated with it.
Can someone explain to me how I can fill the matrix with the numbers of products (and customers) sold, so that the matrix looks like the one described above?

I think this is pretty straight forward. You actually don't need the 'Calendar' table as it only contains the same info as is already in the 'Sales' table.
You should configure the matrix like this:
Rows: 'Name' (from the 'Apps' table)
Columns: 'Month' (from the
'Sales' table)
Values:
C = Count distinct of CustomerId (from 'Sales' table) [this counts the unique customers per month and app)
P = Count of CustomerId (from 'Sales' table) [this counts the rows of the 'Sales' table which is your number of products if every row represents 1 sale)
The different aggregations (count distinct, count) can be found under the Values' options:

Select single random sample from group by in Hive

I have a table that looks like so:
Name Age Num_Hobbies Num Shoes
Jane 31 10 2
Bob 23 3 4
Jane 60 2 200
Jane 31 100 6
Bob 10 8 7
etc etc
I would like to group this table by Name and Age, and at random pick one row from the rest of the columns.
In pandas, I would do the following:
df.groupby(['Name', 'Age']).apply(lambda x: x.sample(n=1))
In hive, I know how to create the group, but not how to choose a single random sample from group.
I saw this question on stack overflow: How to sample for each group in hive?
However, I do not understand how to apply Dynamic partitions or Hive bucketing to select a single sample from a group.

You can use rank() or row_number() with rand()
select * from
(
select name,age,rank() (partition by name,age order by rand()) as rank
from table
) t
where rank = 1

obiee count distinct by as part of total count distinct

I need create following "numeric distribution" report in OBIEE :
item name,
count(distinct item_id by item_name),
count(distinct item_id by item_name)/count(distinct item_id)
(third column should be percent of count distinct by item name / count distinct by total)
How to get total count distinct ? I'm stuck and i will be appreciate any help.
The result should be:
Item 1 , 10, 50%
Item 2 , 20, 100%
where total number of customers is 20, and 10 od them buy Item_1, and 20 of them buy Item_2

Click on Measure in Pivot table then click on 'show Data as'>Percent of>Column.
Thats it

Grouping data by date ranges

I wonder how do I select a range of data depending on the date range?
I have these data in my payment table in format dd/mm/yyyy
Id Date Amount
1 4/1/2011 300
2 10/1/2011 200
3 27/1/2011 100
4 4/2/2011 300
5 22/2/2011 400
6 1/3/2011 500
7 1/1/2012 600
The closing date is on the 27 of every month. so I would like to group all the data from 27 till 26 of next month into a group.
Meaning to say I would like the output as this.
Group 1
1 4/1/2011 300
2 10/1/2011 200
Group 2
1 27/1/2011 100
2 4/2/2011 300
3 22/2/2011 400
Group 3
1 1/3/2011 500
Group 4
1 1/1/2012 600

It's not clear the context of your qestion. Are you querying a database?
If this is the case, you are asking about datetime but it seems you have a column in string format.
First of all, convert your data in datetime data type (or some equivalent, what db engine are you using?), and then use a grouping criteria like this:
GROUP BY datepart(month, dateadd(day, -26, [datefield])), DATEPART(year, dateadd(day, -26, [datefield]))
EDIT:
So, you are in Linq?
Different language, same logic:
.GroupBy(x => DateTime
.ParseExact(x.Date, "dd/mm/yyyy", CultureInfo.InvariantCulture) //Supposed your date field of string data type
.AddDays(-26)
.ToString("yyyyMM"));

If you are going to do this frequently, it would be worth investing in a table that assigns a unique identifier to each month and the start and end dates:
CREATE TABLE MonthEndings
(
MonthID INTEGER NOT NULL PRIMARY KEY,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
);
INSERT INTO MonthEndings VALUES(201101, '27/12/2010', '26/01/2011');
INSERT INTO MonthEndings VALUES(201102, '27/01/2011', '26/02/2011');
INSERT INTO MonthEndings VALUES(201103, '27/02/2011', '26/03/2011');
INSERT INTO MonthEndings VALUES(201112, '27/11/2011', '26/01/2012');
You can then group accurately using:
SELECT M.MonthID, P.Id, P.Date, P.Amount
FROM Payments AS P
JOIN MonthEndings AS M ON P.Date BETWEEN M.StartDate and M.EndDate
ORDER BY M.MonthID, P.Date;
Any group headings etc are best handled out of the DBMS - the SQL gets you the data in the correct sequence, and the software retrieving the data presents it to the user.
If you can't translate SQL to LINQ, that makes two of us. Sorry, I have never used LINQ, so I've no idea what is involved.

SELECT *, CASE WHEN datepart(day,date)<27 THEN datepart(month,date)
ELSE datepart(month,date) % 12 + 1 END as group_name
FROM payment

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio