Merging two files and ordering them - sorting

I want to merge two files in one and order them based on the values of the second column. The example is the following:
File 1:
+ 1.01 id 120
- 1.20 id 145
+ 2.15 id 411
(continues)
File 2:
r 0.21 id 4
r 1.78 id 85
r 102 id 850
(continues)
I want to merge them in one file but I would like to put them in ascending order based on the column 2 like this:
File 3:
r 0.21 id 4
+ 1.01 id 120
- 1.20 id 145
r 1.78 id 85
+ 2.15 id 411
r 102 id 850
How could I do this?

how about
sort -k2n file1 file2
f1 and f2 are your files:
kent$ sort -k2n f1 f2
r 0.21 id 4
+ 1.01 id 120
- 1.20 id 145
r 1.78 id 85
+ 2.15 id 411
r 102 id 850

Related

Hierarchical Update in Hive

I got a hive table as follows:
Table A
docid corr_docid header
100 a
101 100 b
102 c
105 101 d
106 102 e
107 106 f
108 107 g
109 h
Is it possible to create another table.
Here corr_docid 107 corrects the document with docid 107.
Table B as follows:
Table A
docid corr_docid header newdocid
100 a 105
101 100 b 105
102 c 108
105 101 d 105
106 102 e 108
107 106 f 108
108 107 g 108
109 h 109
Is this possible in hive.
You can try this native SQL to get desired result, This will work only if you know the hierarchy depth/level, is 4 here.
`select a.docid,
a.corr_docid,
case when b.docid is null then a.docid
when c.docid is null then b.docid
when d.docid is null then c.docid
else d.docid
end newdocid
from Table_A a left join Table_A b on a.docid = b.corr_docid
left join Table_A c on b.docid = c.corr_docid
left join Table_A d on c.docid = d.corr_docid ;`

SAS Dataset - Assign incremental value for each change in variable - sort by timestamp

Similar to
How to assign an ID to a group of variables
My dataset is sorted by ID, then timestamp. I need to create an 'order' variable, incrementing on each change in say Status, but my sort must remain time stamp, so I think I am correct in suggesting that by BY (group) will not work. The order field below illustrates what I seek...
ID Status Timestamp Order
188 3 12:15 1
188 4 12:45 2
188 4 13:10 2
188 3 14:20 3
189 10 11:00 1
189 11 13:00 2
189 10 13:30 3
189 10 13:35 3
The first and second '3's are separate, likewise the first and subsequent '10's.
You can use the NOTSORTED option to have SAS automatically set the FIRST.STATUS flag for you.
data want ;
set have ;
by id status notsorted;
if first.id then order=0;
order + first.status;
run;
As you mentioned, it is very similar to that other question. The trick here is to set the order of the first observation in each by group to zero.
data temp;
input ID $ Status $ Timestamp $;
datalines;
188 3 12:15
188 4 12:45
188 4 13:10
188 3 14:20
189 10 11:00
189 11 13:00
189 10 13:30
189 10 13:35
;
run;
data temp2;
set temp;
by id;
if first.id then order = 0;
if status ~= lag(status) then order + 1;
run;

Sum data in one column in a specific order in Spotfire

Does anyone know how to create a calculated column (in Spotfire) that will sum data in order of increasing values contained within another column?
For example, what would the expression be to Sum data in [P] in increasing order of [K], for each [Well]
Some example data:
Well Depth P K
A 85 0.191 108
A 85.5 0.192 102
A 87 0.17 49
A 88 0.184 47
A 89 0.192 50
B 298 0.215 177
B 298.5 0.2 177
B 300 .017 105
B 301 0.23 200
You can use:
Sum([P]) OVER (intersect([Well],AllPrevious([K])))
This returns the cumulative sum of P in order of K per Well in ascending order of K.
Well K P Cumulative Sum of P
A 47 0,184 0,184
A 49 0,17 0,354
A 50 0,192 0,546
A 102 0,192 0,738
A 108 0,191 0,929
B 105 0,017 0,017
B 177 0,215 0,432
B 177 0,2 0,432
B 200 0,23 0,662
Edit Based on OP's comment:
you can use to get the cumulative sum in descending order of K:
Sum([P]) OVER (intersect([Well],AllNExt([K])))

Computing lag in Hive by a variable

My input table looks like:
guest_id days
101 79
101 70
101 68
101 61
102 101
102 90
102 55
103 99
103 90
Note that, days are in descending order,by guest_id
Desired output table:
guest_id days days_diff
101 79 0
101 70 9
101 68 2
101 61 7
102 101 0
102 90 11
102 55 35
103 99 0
103 90 9
days_diff is the first order difference by guest_id (not throughout days column)
You need to have a unique id column as well (otherwise Hive doesn't know about the order of your rows).
Then you can just self join on id=id+1 to get your differences:
select a.guest_id,
a.days,
case when a.guest_id = b.guest_id then b.days-a.days else 0 end days_diff
from
input a
join input b on a.id=b.id-1
Edit: As pointed out by Kunal in the comments, Hive does have a Lag window function which requires a PARTITION BY ... ORDER BY clause; you still need something to order your table by, for example if you have a date column you would used this like the following:
SELECT guest_id,
days,
LAG(days, 1, 0) OVER (PARTITION BY guest_id ORDER BY date)
FROM input;

SSRS Multiple Column

I am new to this SSRS.
I have a record like this:
Table Denomination ResultType Quantity
A1 1.00 FC 5
A1 1.00 FR 10
A1 2.00 FC 21
A1 2.00 FR 23
A1 5.00 FC 11
A1 5.00 FR 16
A2 1.00 FC 15
A2 1.00 FR 20
A2 2.00 FC 25
A2 2.00 FR 26
A2 5.00 FC 10
A2 5.00 FR 17
I am only able to do 1 part using matrix. I'd tried using pivot too but due to the denomination is a dynamic field.
FC
Denomination 1.00 2.00 5.00
------------ ---- ---- ----
A1 5 21 11
A2 15 25 10
I want to populate something like this using tablix instead.
FC | FR
Denomination 1.00 2.00 5.00 | 1.00 2.00 5.00
------------ ---- ---- ---- ---- ---- ----
A1 5 21 11 10 23 16
A2 15 25 10 20 26 17
Thank you.
It might be a little late to help you but...
You should be able to do this be creating a matrix (tablix in BIDS 2005) with Table(? - the field with your A1, A2 data) as the Row Group and Denomination as the Column Group.
Then in the Column Groups, click on the Denomination group down arrow and Add Group->Parent Group and Group By ResultType. This will add the extra layer to the columns by FC and FR in your last example.

Resources