nmtfmt on a specifc row instead of field - format

My goal is to simply convert the bytes to human readable format for vmstat output.
I can't find a solution that did it for a row, instead of field.
Example:
vmstat -a|numfmt --from-unit=1024 --to=iec
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
3 5 14333872 3056072 7379640 19487608 3 13 6893 1472 11 8 20 11 32 36 0
Expecting something like following
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
3 5 14GB 2.9GB 7.3GB 1.9GB 3 13 6893 1472 11 8 20 11 32 36 0

Related

Group columns and separate in R

I have a dataset, where I want to group 2 columns together (Proc Right and Days to Proc Right) and separate it from the next group of 2 columns (Proc Left and Days to Proc Left). During separation, I want to separate based on chronology of days to procedure, and assign 0 and NA to the other 2 columns which chronologically are later. I then want to create a new column pulling only the days to procedure.
To summarise:
Have this:
ID
Proc ID
Proc Right
Days to Proc Right
Proc Left
Days to Proc Left
1
108
4
41
4
168
1
105
4
169
4
42
1
101
3
270
0
NA
Want this:
ID
Proc ID
Proc Right
Days to Proc Right
Proc Left
Days to Proc Left
Days to Proc
1
108
4
41
0
NA
41
1
108
0
NA
4
168
168
1
105
0
NA
4
42
42
1
105
4
169
0
NA
169
1
101
3
270
0
NA
270
Would appreciate any help. Thanks
I have tried unite and cSplit, which separates the column groups, but doesn't help me assign 0 and NA to the other columns.

Creating subset of a data

I have a column called Project_Id which lists the names of many different projects, say Project A, Project B and so on. A second column lists the sales for each project.
A third column shows time series information. For example:
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
B 22 1
B 38 2
B 76 3
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
D 14 1
D 62 2
From this dataset, I need to choose (and thus create a new data set) only those projects which have at least 4 time series points, say, to get the new dataset (How do I get this by using an R code, is my question):
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
Could someone please help?
Thanks a lot!
I tried to do some filtering with R but had little success.

Finding tuples if it only exists in all occurrences of a constraint

Database (all entries are integers):
ID | BUDGET
1 | 20
8 | 20
10 | 20
5 | 4
9 | 4
10 | 4
1 | 11
9 | 11
Suppose my constraint is having a budget of >= 10.
I would want to return ID of 1 only in this case. How do I go about it?
I've tried taking the cross product of itself after selecting budget >= 10 and returning if id1 = id2 and budget1 <> budget2 but that does not work in the case where there's only 1 budget that is >= 10. (EG below)
ID | BUDGET
1 | 20
8 | 20
10 | 20
1 | 4
5 | 4
9 | 4
10 | 4
9 | 4
If I were to do what I did for the first example, nothing will be returned as budget1 <> budget2 will result in an empty table.
EDIT1: I can only use relational algebra to solve the problem. So SQL's exist, where and count keywords cant be used.
Edit2: Only project, select, rename, set difference, set union, left join, right join, full inner join, natural joins, set intersection and cross product allowed
The question is not completely clear to me. If you want to return all the ID for which there is a budget greater than 10, and no budget less than 10, the expression is simply the following:
π(ID)(σ(BUDGET>=10)(R)) - π(ID)(σ(BUDGET<10)(R))
If, an the other hand, you want all the ID which have all the budgets present in the relation and greater then 10, then we must use the ÷ operator:
R ÷ π(BUDGET)(σ(BUDGET>=10)(R))
From your comment, the second case is the correct one. Let’s see how to compute the division from its definition (applied to two generic relations R(A) and S(B)):
R ÷ S = πA-B(R) - πA-B((πA-B(R) x S) - R)
where R is the original relation, and
S = π(BUDGET)(σ(BUDGET>=10)(R)),
that is:
BUDGET
------
20
11
Starting from the inner expression:
πA-B(R) is equal to πID(R) =
ID
--
1
5
8
9
10
then πA-B(R) x S) is:
ID BUDGET
---------
1 20
1 11
5 20
5 11
8 20
8 11
9 20
9 11
10 20
10 11
then ((πA-B(R) x S) - R) is:
ID BUDGET
---------
5 20
5 11
8 11
9 20
10 20
then πA-B((πA-B(R) x S) - R) is:
ID
__
5
8
9
10
and, finally, subtracting this relation from πA-B(R) we obtain the result:
ID
--
1

SAS Dataset - Assign incremental value for each change in variable - sort by timestamp

Similar to
How to assign an ID to a group of variables
My dataset is sorted by ID, then timestamp. I need to create an 'order' variable, incrementing on each change in say Status, but my sort must remain time stamp, so I think I am correct in suggesting that by BY (group) will not work. The order field below illustrates what I seek...
ID Status Timestamp Order
188 3 12:15 1
188 4 12:45 2
188 4 13:10 2
188 3 14:20 3
189 10 11:00 1
189 11 13:00 2
189 10 13:30 3
189 10 13:35 3
The first and second '3's are separate, likewise the first and subsequent '10's.
You can use the NOTSORTED option to have SAS automatically set the FIRST.STATUS flag for you.
data want ;
set have ;
by id status notsorted;
if first.id then order=0;
order + first.status;
run;
As you mentioned, it is very similar to that other question. The trick here is to set the order of the first observation in each by group to zero.
data temp;
input ID $ Status $ Timestamp $;
datalines;
188 3 12:15
188 4 12:45
188 4 13:10
188 3 14:20
189 10 11:00
189 11 13:00
189 10 13:30
189 10 13:35
;
run;
data temp2;
set temp;
by id;
if first.id then order = 0;
if status ~= lag(status) then order + 1;
run;

Oracle Sql query to count time span with certain criteria

Oracle Sql query , I was trying to count the grand total for time difference that is greater than 2, but when I tried this it just counted all the rows from the query instead of just the rows that have the criteria I was looking for. Anybody have an idea of what I am missing or a better approach . Thanks
This is my query
select DC.CUST_FIRST_NAME,DC.CUST_LAST_NAME,oi.customer_id,oi.order_timestamp,oi.order_timestamp - LAG(oi.order_timestamp) OVER (ORDER BY oi.order_timestamp) AS "Difference(In Days)" ,
(select Count('Elapsed Order Difference')
from demo_orders oi,
demo_customers dc
where OI.CUSTOMER_ID = DC.CUSTOMER_ID
group by 'Elapsed Order Difference'
having count('Elapsed Order Difference') > 3
)Total
from demo_orders oi,
demo_customers dc
where OI.CUSTOMER_ID = DC.CUSTOMER_ID
Results
CUST_FIRST_NAME CUST_LAST_NAME CUSTOMER_ID ORDER_TIMESTAMP Difference(In Days) TOTAL
Eugene Bradley 7 8/14/2013 5:59:11 PM 10
William Hartsfield 2 8/28/2013 5:59:11 PM 14 10
Edward "Butch" OHare 4 9/8/2013 5:59:11 PM 11 10
Edward Logan 3 9/10/2013 5:59:11 PM 2 10
Edward Logan 3 9/20/2013 5:59:11 PM 10 10
Albert Lambert 6 9/25/2013 5:59:11 PM 5 10
Fiorello LaGuardia 5 9/30/2013 5:59:11 PM 5 10
William Hartsfield 2 10/8/2013 5:59:11 PM 8 10
John Dulles 1 10/14/2013 5:59:11 PM 6 10
Eugene Bradley 7 10/17/2013 5:59:11 PM 3 10
This is untested, but I think it might give you what you're after.
with raw_data as (
select
dc.cust_first_name, dc.cust_last_name,
oi.customer_id, oi.order_timestamp,
oi.order_timestamp - LAG(oi.order_timestamp) OVER
(ORDER BY oi.order_timestamp) AS "Difference(In Days)",
case
when oi.order_timestamp - LAG(oi.order_timestamp)
over (ORDER BY oi.order_timestamp) > 2 then 1
else 0
end as gt2
from
demo_orders oi,
demo_customers dc
where
oi.customer_id = dc.customer_id
)
select
cust_first_name, cust_last_name,
customer_id, order_timestamp,
"Difference(In Days)",
sum (gt2) over (partition by 1) as total
from raw_data
When you do Count('Elapsed Order Difference') above, you are counting every row, no matter what. You could have put count ('frog') or count (*) and have gotten the same result. The having count > 3 was already satisfied since the count of all rows was 10.
In general, I'd try to avoid using a scalar for a field in a query as you have in your example. I'm not saying it's never a good idea, but I would argue that there is usually a better way to do it. With 10 rows, you'll hardly notice a performance difference, but as your datasets grow, this can create issues.
Expected output:
fn ln id order date dif total
E B 7 8/14/2014 8
W H 2 8/28/2014 14 8
E O 4 9/8/2014 11 8
E L 3 9/10/2014 2 8
E L 3 9/20/2014 10 8
A L 6 9/25/2014 5 8
F L 5 9/30/2014 5 8
W H 2 10/8/2014 8 8
J D 1 10/14/2014 6 8
E B 7 10/17/2014 3 8

Resources