How can i get an identified number with each groups? - sorting

I want following forms,
stnd_y person_id recu_day date sick_sym Admission
2002 100 20020929 02-09-29 A 1
2002 100 20020929 02-09-29 B 1
2002 100 20020929 02-09-29 D 1
2002 100 20020930 02-09-30 B 2
2002 100 20020930 02-09-30 E 2
2002 100 20021002 02-10-02 X 3
2002 100 20021002 02-10-02 W 3
2002 101 20020927 02-09-27 S 1
2002 101 20020927 02-09-27 O 1
2002 101 20020928 02-09-28 C 2
2002 102 20021001 02-10-01 F 1
2002 103 20021003 02-10-03 G 1
2002 104 20021108 02-11-08 H 1
2002 104 20021108 02-11-08 A 1
2002 104 20021112 02-11-12 B 2
proc sort data=a out=a1;
by person_id recu_fr_dt;
data a3;
set a1 ;
by person_id recu_fr_dt;
if first.person_id then adm+1;
run;
According to above codes, the results is following, as i don't mean it.
stnd_y person_id recu_day date sick_sym Admission
2002 100 20020929 02-09-29 A 1
2002 100 20020929 02-09-29 B 2
2002 100 20020929 02-09-29 D 3
2002 100 20020930 02-09-30 B 4
2002 100 20020930 02-09-30 E 5
2002 100 20021002 02-10-02 X 6
2002 100 20021002 02-10-02 W 7
2002 101 20020927 02-09-27 S 1
2002 101 20020927 02-09-27 O 2
2002 101 20020928 02-09-28 C 3
2002 102 20021001 02-10-01 F 1
2002 103 20021003 02-10-03 G 1
2002 104 20021108 02-11-08 H 1
2002 104 20021108 02-11-08 A 2
2002 104 20021112 02-11-12 B 3
At also, I used followings with sas,
proc sort data=old out=new;
by person_id recu_day;
data new1;
set new;
retain admission 0;
by person_id recu_day;
if recu_day^=lag(recu_day) and(or) person_id^=lag(person_id) then
admission+1;
run;
And,
data new1;
set new ;
by person_id recu_day;
retain adm 0;
if first.person_id and(or) first.recu_day then admission=admission+1;
run;
But, those are not working. How can i solve this? Please let me know about this.
Thank you
How could i fix it?
Thank you! :D
Please visit http://stackoverflow.com/questions/46076468/how-can-i-get-the-identification-number-with-each-groups/

Here's a modification to my answer to your previous question.
This time, it adds 1 to the adm variable each time the day changes for a given person_id. The retain statement ensures that the current value is copied for all subsequent rows where the peson_id and recu_day are the same.
data have;
input stnd_y person_id recu_day date :yymmdd8. sick_sym $ Admission;
datalines;
2002 100 20020929 02-09-29 A 1
2002 100 20020929 02-09-29 B 1
2002 100 20020929 02-09-29 D 1
2002 100 20020930 02-09-30 B 2
2002 100 20020930 02-09-30 E 2
2002 100 20021002 02-10-02 X 3
2002 100 20021002 02-10-02 W 3
2002 101 20020927 02-09-27 S 1
2002 101 20020927 02-09-27 O 1
2002 101 20020928 02-09-28 C 2
2002 102 20021001 02-10-01 F 1
2002 103 20021003 02-10-03 G 1
2002 104 20021108 02-11-08 H 1
2002 104 20021108 02-11-08 A 1
2002 104 20021112 02-11-12 B 2
;
run;
data want;
set have;
by person_id recu_day;
retain adm;
if first.person_id then adm=0;
if first.recu_day then adm+1;
run;

Related

Google sheet query 2 columns as search key and search

I met some problem with google sheet function.
I have 2 tables. I want to search table1 Date+User as key value in table2.
example:
Date User Unit
2022/05/30 A 109
2022/05/30 B 119
2022/05/30 C 119
2022/05/29 D 109
2022/05/29 E 114
Date User Amount
2022/05/30 A 1
2022/05/30 B 2
2022/05/30 C 3
2022/05/30 D 41
2022/05/30 E 5
2022/05/29 D 6
2022/05/29 E 7
2022/05/29 F 81
2022/05/29 G 9
2022/05/29 A 101
2022/05/29 B 11
2022/05/29 C 121
2022/05/29 D 13
after query I hope the table looks like
Hope Result
Date User Unit Amount
2022/05/30 A 109 1
2022/05/30 B 119 2
2022/05/30 C 119 3
2022/05/29 D 109 6
2022/05/29 E 114 7
This is a sample google sheet
https://docs.google.com/spreadsheets/d/1oxhWMVPt-GziG10agob-xbiNYfKrZVFK9ro0Pj7tn6Y/edit#gid=0
Can I ask for help ?
Many Thanks
Two options. The first pulls all matching combinations of DATE and USER
=ARRAYFORMULA(
QUERY(
{E2:G,
IF(ISBLANK(E2:E),,
IFERROR(
VLOOKUP(
E2:E&"|"&F2:F,
{A2:A&"|"&B2:B,C2:C},
2,FALSE)))},
"select Col1, Col2, Col4, Col3
where Col4 is not null
label
Col1 'Date',
Col2 'User',
Col3 'Amount',
Col4 'Unit'"))
which returns
Date
User
Unit
Amount
2022/05/30
A
109
1
2022/05/30
B
119
2
2022/05/30
C
119
3
2022/05/29
D
109
6
2022/05/29
E
114
7
2022/05/29
D
109
13
The second matches your output exactly, but does omit that second D value for the 29th (13)
=ARRAYFORMULA(
QUERY(
{IFERROR(
VLOOKUP(
UNIQUE(E2:E&"|"&F2:F),
{E2:E&"|"&F2:F,E2:G},
{2,3,4},FALSE)),
IFERROR(
VLOOKUP(
UNIQUE(E2:E&"|"&F2:F),
{A2:A&"|"&B2:B,C2:C},
2,FALSE))},
"where Col4 is not null
format Col1 'yyyy/mm/dd'"))
Both have been added to your sheet. If either of these work out for you, I can break it down.

Retrieve the list of data from database using hibernate criteria

I have a table called employee_comp_field, where salary fields are available
comp_field id | year_id | compensation_field
1 101 salary
2 101 bonus
3 101 pf
4 101 allowance
5 102 salary
6 102 bonus
7 102 pf
8 102 allowance
Then I have another table where employee salary data get stored emp_compensation against each field. As you can see emp_id 10 has three set of records as he got three time salary hike in the same year(year_id=101), which can be identified by salary_order field.
id | year_id | emp_id | comp_field_id | amount | comp_order
1 101 10 1 10000 1
2 101 10 2 1000 1
3 101 10 3 1000 1
4 101 10 4 100 1
5 101 10 1 12000 2
6 101 10 2 100 2
7 101 10 3 10000 2
8 101 10 4 10000 2
9 101 10 1 15000 3
10 101 10 2 500 3
11 101 10 3 150 3
12 101 10 4 1500 3
13 101 11 1 13000 1
14 101 11 2 1300 1
15 101 11 3 null 1
16 101 11 4 150 1
I want to identify all the employees list with max salary_order
my desire output will be below:
id | year_id | emp_id | comp_field_id | amount | comp_order
9 101 10 1 15000 3
10 101 10 2 500 3
11 101 10 3 150 3
12 101 10 4 1500 3
13 101 11 1 13000 1
14 101 11 2 1300 1
15 101 11 3 null 1
16 101 11 4 150 1
as emp_id 10 got three time salary hike...so I retrieve the list of records with salary_order 3
and emp_id 11 got one ony so I retrieve that set of records ony with salary_order 1
Can someone please help me here, how to retrieve my desire output using hibernate criteria.
My thought is to first retrieve all the list based on emp_id and then using java stream if we can filter it out to get the desired output.
Please suggest the best possible way.
The best possible way.
is subjective. It can be the fastest, it can be the shortest. It could be anything.
I will give you an example of how you could build a query in mysql to replicate your output. This might be tricky to solve with Criteria though since the table is being self joined.
select a.*
from emp_compensation a
left outer join emp_compensation b on a.emp_id = b.emp_id
and a.comp_field_id = b.comp_field_id
and a.comp_order < b.comp_order
where b.emp_id is null

How to create industry-year average of all variables from firm- year data using Stata?

I am having a panel dataset with the following format
Firm Year Industry Sales Profit Export intensity R&D
1 2000 1 x x x x
2 2000 1 x x x x
3 2000 2 x x x x
4 2000 2 x x x x
1 2001 1 x x x x
2 2001 1 x x x x
3 2001 2 x x x x
4 2001 2 x x x x
1 2002 1 x x x x
2 2002 1 x x x x
3 2002 2 x x x x
4 2002 2 x x x x
1 2003 1 x x x x
2 2003 1 x x x x
3 2003 2 x x x x
4 2003 2 x x x x
I want to create industry average per year of all variables. The real data set has 2000 firms * 10 years observations and 25 industries.
If you want to maintain your data structure, the easiest way is probably to combine egen's by() option with a loop:
foreach v of varlist Sales Profit Export RD {
egen IndAvg`v' = mean(`v') , by(Industry Year)
}
E.g.,
clear all
input Firm Year Industry Sales Profit Export RD
1 2000 1 831 135 196 30
2 2000 1 44 847 885 780
3 2000 2 818 112 859 306
4 2000 2 777 700 903 858
1 2001 1 491 563 325 324
2 2001 1 411 468 927 720
3 2001 2 731 872 170 556
4 2001 2 587 273 833 656
1 2002 1 155 558 497 427
2 2002 1 210 853 792 575
3 2002 2 279 282 969 549
4 2002 2 683 176 902 538
1 2003 1 805 475 479 599
2 2003 1 226 178 37 225
3 2003 2 129 693 746 652
4 2003 2 347 509 406 102
end
foreach v of varlist Sales Profit Export RD {
egen IndAvg`v' = mean(`v') , by(Industry Year)
}
sort Industry Year Firm
li , sepby(Industry)
However, you may also want to look into collapse:
collapse (mean) Sales Profit Export RD , by(Industry Year)

Pulling data after sort the table

My problem is pulling right variables from data. My data is as below:
id term grade number
35 2005 I 0
35 2005 F 1
35 2005 W 2
46 2003 A 0
46 2003 B 1
46 2003 F 2
46 2003 I 3
I sorted the table I have and gave number 0-1-2 and so on. This is the example after sorting. What I need is if the same id's grades are starts with I and F and W. Like id 35. So I need in this table is first three observations id 35.
Here is one Proc SQL approach, you can also try 2XDOW:
data have;
input (id term grade) (:$8.) number;
cards;
35 2005 I 0
35 2005 F 1
35 2005 W 2
46 2003 A 0
46 2003 B 1
46 2003 F 2
46 2003 I 3
;
proc sql;
create table want as
select * from have
group by id
having sum(GRADE='I' AND NUMBER=0) >0
AND sum(GRADE='F' AND NUMBER=1) >0
AND sum(GRADE='W' AND NUMBER=2) >0
;
QUIT;

Oracle 10g ROLLUP and TO_CHAR function

I have 2 queries:
select zam_klt_id,zam_order_date, count(*) as sum from orders
group by rollup (zam_order_date,zam_klt_id);
which produce output:
ZAM_KLT_ID ZAM_order_date SUM
---------- ------------------- ----------
1002 98/03/13 1
98/03/13 1
1004 98/03/14 1
98/03/14 1
1003 98/09/11 1
98/09/11 1
1003 99/01/05 1
99/01/05 1
1003 99/03/01 1
99/03/01 1
1003 99/07/26 1
99/07/26 1
1003 99/10/30 1
99/10/30 1
1002 00/05/08 1
00/05/08 1
1004 00/06/14 1
00/06/14 1
00/07/12 1
00/07/12 1
1000 00/12/10 2
00/12/10 2
1004 00/12/21 1
00/12/21 1
13
This is OK, under every date there is short summary (count) like in
1000 00/12/10 2
00/12/10 2
However then I wanted to know in every year how many clients made orders, so I changed preveious query (zam_order_date was changed to to_char(zam_order_date,'yyyy'))
select zam_klt_id,to_char(zam_order_date,'yyyy'), count(*) as sum from orders
group by rollup ((to_char(zam_order_date,'yyyy'),zam_klt_id));
will produce output
ZAM_KLT_ID TO_CHAR(ZAM_order_date,'YYYY') SUM
---------- ----------------------------------- ----------
1002 1998 1
1003 1998 1
1004 1998 1
1003 1999 4
2000 1
1000 2000 2
1002 2000 1
1004 2000 2
13
9 rows selected
this time there is no summary under every date (year in this case), I think the output should look like this :
ZAM_KLT_ID TO_CHAR(ZAM_order_date,'YYYY') SUM
---------- ----------------------------------- ----------
1002 1998 1
1003 1998 1
1004 1998 1
*1998* *3*
etc
Why summaries are not added this time, is it have something to do with the to_char function?

Resources