Sorting with -k - bash

I have this list:
DI bpg01001:PGE00 3:1 ------ 1 1 (No fault)
DI bpg01001:VOL00 2:13 ------ 1 1 (Normal)
DI dca06001:HPR00 3:12 ------ 1 1 (Normal)
DI dca06001:HUH00 3:15 ------ 1 1 (Normal)
DI dca06001:PWS00 3:14 ------ 1 1 (Normal)
DI dca06001:UOL00 3:13 ------ 1 1 (Normal)
DI rcf10001:ACO00 2:0 ------ 1 1 (Present)
DI rcf10001:BDC00 2:4 ------ 1 1 (Normal)
DI rcf10001:ERR00 2:2 ------ 1 1 (Normal)
DI rcf10001:ERS00 2:3 ------ 1 1 (Normal)
DO bpg01001:PGS00 1:4 ------ 0 0 (Stop)
My goal is to sort everything from 1:4 to 3:15 but |sort -k3 seems to fail in terms of human readings. Any ideas?

If you sort does not support -V you can do a decorate / sort / undecorate to achieve what you are describing:
awk '{split($3,a,":"); print a[1],a[2],"|" $0}' file | sort -nk1,1 -nk2,2 | sed 's/^[^|]*\|//'
DO bpg01001:PGS00 1:4 ------ 0 0 (Stop)
DI rcf10001:ACO00 2:0 ------ 1 1 (Present)
DI rcf10001:ERR00 2:2 ------ 1 1 (Normal)
DI rcf10001:ERS00 2:3 ------ 1 1 (Normal)
DI rcf10001:BDC00 2:4 ------ 1 1 (Normal)
DI bpg01001:VOL00 2:13 ------ 1 1 (Normal)
DI bpg01001:PGE00 3:1 ------ 1 1 (No fault)
DI dca06001:HPR00 3:12 ------ 1 1 (Normal)
DI dca06001:UOL00 3:13 ------ 1 1 (Normal)
DI dca06001:PWS00 3:14 ------ 1 1 (Normal)
DI dca06001:HUH00 3:15 ------ 1 1 (Normal)

You may try this sort on field 3 with version sort option:
sort -Vk3 file
DO bpg01001:PGS00 1:4 ------ 0 0 (Stop)
DI rcf10001:ACO00 2:0 ------ 1 1 (Present)
DI rcf10001:ERR00 2:2 ------ 1 1 (Normal)
DI rcf10001:ERS00 2:3 ------ 1 1 (Normal)
DI rcf10001:BDC00 2:4 ------ 1 1 (Normal)
DI bpg01001:VOL00 2:13 ------ 1 1 (Normal)
DI bpg01001:PGE00 3:1 ------ 1 1 (No fault)
DI dca06001:HPR00 3:12 ------ 1 1 (Normal)
DI dca06001:UOL00 3:13 ------ 1 1 (Normal)
DI dca06001:PWS00 3:14 ------ 1 1 (Normal)
DI dca06001:HUH00 3:15 ------ 1 1 (Normal)
Update: if your sort doesn't support -V then you can try this work around solution:
awk '{print $3 "#" $0}' file | sort -t: -k1n -k2n | cut -d# -f2-
DO bpg01001:PGS00 1:4 ------ 0 0 (Stop)
DI rcf10001:ACO00 2:0 ------ 1 1 (Present)
DI rcf10001:ERR00 2:2 ------ 1 1 (Normal)
DI rcf10001:ERS00 2:3 ------ 1 1 (Normal)
DI rcf10001:BDC00 2:4 ------ 1 1 (Normal)
DI bpg01001:VOL00 2:13 ------ 1 1 (Normal)
DI bpg01001:PGE00 3:1 ------ 1 1 (No fault)
DI dca06001:HPR00 3:12 ------ 1 1 (Normal)
DI dca06001:UOL00 3:13 ------ 1 1 (Normal)
DI dca06001:PWS00 3:14 ------ 1 1 (Normal)
DI dca06001:HUH00 3:15 ------ 1 1 (Normal)

Related

Writing A'B'CD+ABC' using two inverters and 5 2:1 multiplexers

The question says draw F(A,B,C,D)=∑(3,7,11,12,13). I derived A'B'CD+ABC'. I am trying to draw it using two inverters and 5 2:1 multiplexers but i couldn't connect the output to the separate components i wrote. I know the correct answer but i just couldn't understand it.
Here's the correct solution
Why is the last mux connected to the 0 instead of 1 like we did all the other components? And why did they give 1 to 1 in mux in the answer?
OK, then maybe this will help:
F(A,B,C,D)=∑(3,7,11,12,13).
w/ 2 nots; 5 2/1 muxs
truth table
ABCD R
0000 0
0001 0
0010 0
0011 1
0100 0
0101 0
0110 0
0111 1
1000 0
1001 0
1010 0
1011 1
1100 1
1101 1
1110 0
1111 0
kmap
\ CD 00 01 11 10
AB \
00 0 0 1 0
01 0 0 1 0
11 1 1 0 0
10 0 0 1 0
expression
ABC'+A'CD+B'CD
simplifying
ABC'+(A'+B')CD
ABC'+(A'+B')''CD
ABC'+(AB)'CD
(AB)'CD + ABC'
aux truth table:
(AB)'CD ABC' ((AB)'CD + ABC')
0 0 0 see note 2
0 1 1 see note 1
1 0 1 see note 2
1 1 1 see note 1
note 1: If ABC' is true (mux select is 1) then output is true (mux's 1 input is set to 1)
note 2: If ABC' is false (mux select is 0) then output is (AB)'CD (mux's 0 input is set to (AB)'CD), the "see note 2" outputs are true only when (AB)'CD is true

Parsing Data from SQLPlus

Someone kindly dumped the data out of a number of tables in SQL*Plus.
Is there a nice awk or similar script to turn it into CSV or something more easily parsed to load into another system. Sadly getting it re-run is not an option.
They used
SQL> set pages 10000 lines 10000
followed by SELECT * from the table
There are the column_names, ---- 's and then data lines. It looks like the structure is spaces or tabs between column names and --- ---- with the number of --- probably being field length. The following is the columns,---'s and first 2 lines from one of the tables.
CM D ORDR_REF LINE_NUM SUPP BYR LINE_REVN TXT_NUM L L T G ACCPT_US A PERF ITEM MANUF PART_NO EC_ CMDTY CLSFCTN RCPT_CNT DESCR ST IN STORE EAN QUOM QTY_ON_ORDR QTY_OUTSTG QTY_ADVD QTY_ADVD_OUTSTG QTY_RECV QTY_REJECT QTY_CR QTY_INVCE_OUTSTG QTY_INVCD QTY_INVCE_HELD QTY_CR_OUTSTG QTY_CRDTD QTY_CR_HELD DLVRY_SI DATE_DUE DATE_ACK DATE_XPCT DATE_XPED XPED_USR XP LEASE CMMT_DATE A A MIN_AUTH ACT_AUTH CURR_AUTH_SEQ_NUM TAX TAX_DATE HA PUOM DSCNT_1 DSCNT_2 DSCNT_3 ENTRD_PRC PRC MIN_PRC P ENTRD_VAL MIN_ENTRD_VAL UNIT_COST VAL_ON_ORDR VAL_RECV VAL_OUTSTG VAL_ACCRU VAL_INVCE_OUTSTG VAL_INVCD VAL_INVCE_HELD VAL_CR_OUTSTG VAL_CRDTD VAL_CR_HELD VAL_REJECT VAL_CR VAL_TAX MIN_ORDR_VAL MIN_VAL_TAX L S CNTRCT_REF CNTRCT_LINE_NUM C GL_TRA AIRCRFT_RE AIRL FLGHT_ LEG_NUM SRVC_QTY RATE_PRC CHRG_VAL UPDT_DATE UPDT_TIME USR_DATA L VAT_NON_REC_VALUE VAT_REC_VALUE PEV_LINE_COST A
-- - -------------------- ---------- ------------ -------- ---------- ---------- - - - - -------- - ---- -------------------- ------------ -------------------- --- ---------------------- ---------- -------- ---------------------------------------- -- -- -------- ------------- ---- ----------- ---------- ---------- --------------- ---------- ---------- ---------- ---------------- ---------- -------------- ------------- ---------- ----------- -------- --------- --------- --------- --------- -------- -- -------------------- --------- - - ---------- ---------- ----------------- --- --------- -- ---- ---------- ---------- ---------- ---------- --- ---------- - ---------- ------------- ---------- ----------- ---------- ---------- ---------- ---------------- ---------- -------------- ------------- ---------- ----------- ---------- ---------- ---------- ------------ ----------- - - -------------------- --------------- - ------ ---------- ---- ------ ---------- ---------- ---------- ---------- --------- --------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - ----------------- ------------- ------------- -
AR O PO415966 1 040960 LFOSTER 0 0 2 2 Y Stirrers and cleaning tabs - ivan 0 0 0 0 0 0 0 0 0 0 0 0 0 CIVIC 01-APR-20 01-JAN-00 01-APR-20 01-JAN-00 31-MAR-20 0 0 0 0 0 01 01-JAN-00 ER 0 0 0 0 0 1 75.51 0 75.51 75.51 75.51 0 0 0 75.51 0 0 0 0 0 0 15.1 0 0 0 0 022704 0 0 0 0 03-APR-20 01-JAN-00 2 0 15.1 75.51
AR O PO415967 1 015552 LFOSTER 0 0 2 2 Y extras to PO414840 - Sam 0 0 0 0 0 0 0 0 0 0 0 0 0 CIVIC 01-APR-20 01-JAN-00 01-APR-20 01-JAN-00 31-MAR-20 0 0 0 0 0 01 01-JAN-00 ER 0 0 0 0 0 1 60 0 60 60 60 0 0 0 60 0 0 0 0 0 0 12 0 0 0 0 022705 0 0 0 0 01-APR-20 01-JAN-00 2 0 12 60
You may not need to go externally , but use the csv options natively in sql itself.
Something like this?
MySQL Query to CSV
Figuring out the format, was a challenge since it is a mix of tabs and spaces, however it aligned with standard UNIX tabs so expand converted the tabs into spaces correctly aligned, and then the awk script did the rest.
The column names were white space delimited and then using the --- lines gave the column width to use FIELDWIDTHS for the data.
A couple of bash 1 lines to expand and then process.
for f in *.txt;do expand "$f" > "${f%.txt}.fix"; done
Then call awk to convert into delimited format
for f in *.fix;do awk -f parse.awk "$f" > "${f%.fix}.del"; done
The awk script (parse.awk) uses a couple of tricks $1=$1 read everything in using default input FS for line 1 and FIELDWIDTHS after line 2. The next print outputs using the output file separator (OFS) which is set to ¬ in BEGIN, you can use what you like. It was not in the data so no need to escape , etc.
NR==1 prints out the columns
NR==2 gets the field lengths by measuring the --- --- -----
NR>2 Processes the rest of the file to the end using
fixed width set in 2
BEGIN {
OFS = "¬"
}
NR==1 {
$1=$1
print
}
NR==2 {
fw=""
for(i=1;i<=NF;i++) {
fw = fw length($i)+1 " "
}
FIELDWIDTHS= fw
}
NR>2 {
$1=$1
print
}

How to interpolate values in panel data using a loop

I have a panel dataset. My variable identifiers are cc for country codes and Year for years:
clear
input long cc float(sch Year)
2 0 1960
2 0 1961
2 0 1962
2 0 1963
2 0 1964
2 0 1965
2 0 1966
2 0 1967
2 0 1968
2 0 1969
2 0 1970
2 0 1971
2 0 1972
2 0 1973
2 0 1974
2 0 1975
2 0 1976
2 0 1977
2 .733902 1978
2 .7566 1979
2 .78 1980
2 .875 1981
2 .9225 1982
2 1.0174999 1983
2 1.0649999 1984
2 1.16 1985
2 1.2425 1986
2 1.28375 1987
2 1.36625 1988
2 1.4075 1989
2 1.49 1990
2 1.5825 1991
2 1.62875 1992
2 1.72125 1993
2 1.7675 1994
2 1.86 1995
2 1.935 1996
2 1.9725 1997
2 2.0475001 1998
2 2.085 1999
2 2.16 2000
2 2.27 2001
2 2.325 2002
2 2.435 2003
2 2.49 2004
2 2.6 2005
2 2.7575 2006
2 2.83625 2007
2 2.99375 2008
2 3.0725 2009
2 3.23 2010
2 3.15125 2011
2 3.190625 2012
2 3.1709375 2013
2 3.1807814 2014
2 3.1758595 2015
2 3.1783204 2016
2 3.17709 2017
2 3.177705 2018
4 0 1960
4 0 1961
4 0 1962
4 0 1963
4 0 1964
4 0 1965
4 0 1966
4 0 1967
4 0 1968
4 0 1969
4 0 1970
4 0 1971
4 0 1972
4 0 1973
4 0 1974
4 0 1975
4 0 1976
4 0 1977
4 4.657455 1978
4 4.8015 1979
4 4.95 1980
4 5.4 1981
4 5.625 1982
4 6.075 1983
4 6.3 1984
4 6.75 1985
4 7.02 1986
4 7.155 1987
4 7.425 1988
4 7.56 1989
4 7.83 1990
4 7.8275 1991
4 7.82625 1992
4 7.82375 1993
4 7.8225 1994
4 7.82 1995
4 8.195 1996
4 8.3825 1997
4 8.7575 1998
4 8.945 1999
4 9.32 2000
4 9.412499 2001
4 9.45875 2002
4 9.55125 2003
4 9.5975 2004
4 9.69 2005
4 9.73 2006
4 9.75 2007
4 9.79 2008
4 9.81 2009
4 9.85 2010
4 9.83 2011
4 9.84 2012
4 9.835 2013
4 9.8375 2014
4 9.83625 2015
4 9.836875 2016
4 9.836563 2017
4 9.83672 2018
end
I would like to interpolate the sch variable for decreasing years. Variable sch has observations over years 1979-2018. By using the observation for 1978 I would like to interpolate the value of 1977:
sch_1977 = 0.97 * sch_1978
The code I have tried is the following:
forvalues y = 1977 1976 1975{
local i = `y' - 1958
bysort cc (Year): generate sch`y' = 0.97*sch[`i']
replace sch`y' = 0 if Year != `y'
replace sch = sch + sch`y'
}
Here i corresponds to the row where the year of 1978 placed for variable cc. By using a forvalues loop, in every iteration I wanted to create a new variable (sch1977, sch1978, sch1979) with an interpolated observation in the corresponding year and zeros for all other observations. Next, I would like to sum up this new variable with sch. However, Stata complains that the code is invalid.
The following works for me:
foreach x in 1977 1976 1975 {
local i = (2018 - 1960) - (2018 - `x') + 2
bysort cc (Year): generate sch_`x' = 0.97 * sch[`i']
replace sch_`x' = 0 if Year != `x'
replace sch = sch + sch_`x'
}
Results:
bysort cc (Year): list if inrange(Year, 1970, 1980), sepby(cc)
-> cc = 2
+-------------------------------------------------------+
| cc sch Year sch_1977 sch_1976 sch_1975 |
|-------------------------------------------------------|
11. | 2 0 1970 0 0 0 |
12. | 2 0 1971 0 0 0 |
13. | 2 0 1972 0 0 0 |
14. | 2 0 1973 0 0 0 |
15. | 2 0 1974 0 0 0 |
16. | 2 .6698126 1975 0 0 .6698126 |
17. | 2 .6905284 1976 0 .6905284 0 |
18. | 2 .7118849 1977 .7118849 0 0 |
19. | 2 .733902 1978 0 0 0 |
20. | 2 .7566 1979 0 0 0 |
21. | 2 .78 1980 0 0 0 |
+-------------------------------------------------------+
-> cc = 4
+-------------------------------------------------------+
| cc sch Year sch_1977 sch_1976 sch_1975 |
|-------------------------------------------------------|
11. | 4 0 1970 0 0 0 |
12. | 4 0 1971 0 0 0 |
13. | 4 0 1972 0 0 0 |
14. | 4 0 1973 0 0 0 |
15. | 4 0 1974 0 0 0 |
16. | 4 4.250733 1975 0 0 4.250733 |
17. | 4 4.382199 1976 0 4.382199 0 |
18. | 4 4.517731 1977 4.517731 0 0 |
19. | 4 4.657455 1978 0 0 0 |
20. | 4 4.8015 1979 0 0 0 |
21. | 4 4.95 1980 0 0 0 |
+-------------------------------------------------------+

pandas: time difference in groupby

How to calculate time difference for each id between current row and next for
dataset below:
time id
2012-03-16 23:50:00 1
2012-03-16 23:56:00 1
2012-03-17 00:08:00 1
2012-03-17 00:10:00 2
2012-03-17 00:12:00 2
2012-03-17 00:20:00 2
2012-03-20 00:43:00 3
and get next result:
time id tdiff
2012-03-16 23:50:00 1 6
2012-03-16 23:56:00 1 12
2012-03-17 00:08:00 1 NA
2012-03-17 00:10:00 2 2
2012-03-17 00:12:00 2 8
2012-03-17 00:20:00 2 NA
2012-03-20 00:43:00 3 NA
I see that you need result in minutes by id. Here is how to do it :
use diff() in groupby :
# first convert to datetime with the right format
data['time']=pd.to_datetime(data.time, format='%Y-%m-%d %H:%M:%S')
data['tdiff']=(data.groupby('id').diff().time.values/60000000000).astype(int)
data['tdiff'][data['tdiff'] < 0] = np.nan
print(data)
output
time id tdiff
0 2012-03-16 23:50:00 1 NaN
1 2012-03-16 23:56:00 1 6.0
2 2012-03-17 00:08:00 1 12.0
3 2012-03-17 00:10:00 2 NaN
4 2012-03-17 00:12:00 2 2.0
5 2012-03-17 00:20:00 2 8.0
6 2012-03-20 00:43:00 3 NaN

Awk While and For Loop

I have two files (file1 and file2)
file1:
-11.61
-11.27
-10.47
file2:
NAME
NAME
NAME
I want to use awk to search for first occurrence of NAME in file 2 and add the 1st line of file1 before it, and so on. The desired output is
########## Energy: -11.61
NAME
########## Energy: -11.27
NAME
########## Energy: -10.47
NAME
I tried this code
#!/bin/bash
file=file1
while IFS= read line
do
# echo line is stored in $line
echo $line
awk '/MOLECULE/{print "### Energy: "'$line'}1' file2` > output
done < "$file"
But this was the output that I got
########## Energy: -10.47
NAME
########## Energy: -10.47
NAME
########## Energy: -10.47
NAME
I don't know why the script is putting only the last value of file1 before each occurrence of NAME in file2.
I appreciate your help!
Sorry if I wasn't clear in my question. Here are the samples of my files (energy.txt and sample.mol2):
[user]$cat energy.txt
-11.61
-11.27
-10.47
[user]$cat sample.mol2
#<TRIPOS>MOLECULE
methane
5 4 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 C 2.8930 -0.4135 -1.3529 C.3 1 <1> 0.0000
2 H1 3.9830 -0.4135 -1.3529 H 1 <1> 0.0000
3 H2 2.5297 0.3131 -0.6262 H 1 <1> 0.0000
4 H3 2.5297 -1.4062 -1.0869 H 1 <1> 0.0000
5 H4 2.5297 -0.1476 -2.3456 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
#<TRIPOS>MOLECULE
ammonia
4 3 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 N 8.6225 -3.5397 -1.3529 N.3 1 <1> 0.0000
2 H1 9.6325 -3.5397 -1.3529 H 1 <1> 0.0000
3 H2 8.2858 -2.8663 -0.6796 H 1 <1> 0.0000
4 H3 8.2858 -4.4595 -1.1065 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
#<TRIPOS>MOLECULE
water
3 2 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 O 7.1376 3.8455 -3.4206 O.3 1 <1> 0.0000
2 H1 8.0976 3.8455 -3.4206 H 1 <1> 0.0000
3 H2 6.8473 4.4926 -2.7736 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
This is the output that I need
########## Energy: -11.61
#<TRIPOS>MOLECULE
methane
5 4 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 C 2.8930 -0.4135 -1.3529 C.3 1 <1> 0.0000
2 H1 3.9830 -0.4135 -1.3529 H 1 <1> 0.0000
3 H2 2.5297 0.3131 -0.6262 H 1 <1> 0.0000
4 H3 2.5297 -1.4062 -1.0869 H 1 <1> 0.0000
5 H4 2.5297 -0.1476 -2.3456 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
########## Energy: -11.27
#<TRIPOS>MOLECULE
ammonia
4 3 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 N 8.6225 -3.5397 -1.3529 N.3 1 <1> 0.0000
2 H1 9.6325 -3.5397 -1.3529 H 1 <1> 0.0000
3 H2 8.2858 -2.8663 -0.6796 H 1 <1> 0.0000
4 H3 8.2858 -4.4595 -1.1065 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
########## Energy: -10.47
#<TRIPOS>MOLECULE
water
3 2 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 O 7.1376 3.8455 -3.4206 O.3 1 <1> 0.0000
2 H1 8.0976 3.8455 -3.4206 H 1 <1> 0.0000
3 H2 6.8473 4.4926 -2.7736 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
paste -d "\n" <(sed 's/^/########## Energy: /' file1) file2
########## Energy: -11.61
NAME
########## Energy: -11.27
NAME
########## Energy: -10.47
NAME
Or, sticking with awk
awk '{
print "########## Energy: " $0
getline < "file2"
print
}' file1
Using awk:
awk 'NR==FNR{a[NR]=$0;next} /#<TRIPOS>MOLECULE/
{print "########## Energy: ", a[++i]}1' energy.txt sample.mol2
Explanation:
FNR - line number of the current file
NR - line number of the total lines of two files.
NR==FNR{a[NR]=$0;next} is applied for the first energy.txt
so above statement populates an array with index as 1,2,3... and value as $0
/#<TRIPOS>MOLECULE/ search is executed on the 2nd file sample.mol2
When above search is successful it prints quoted static string and a line from array created from 1st file
++i moves the counter to next element in array after printing

Resources