Sort file by different format of date field

Sort file by different format of date field - shell

I am trying to sort a file by a date field. I realize this has been done before, however, I cannot find an example that has the following date format.
Canada Goose + 1x03 + For the Triumph of Evil + Sep/30/2013
Rucksack + 10x03 + Everybody's Crying Mercy + Oct/03/13
Test + 4x01 + Season 4, Episode 1 + Jun/01/14
New Family + 3x03 + Double Date + Oct/01/2013
I tried this command but it doesn't work
sort -t '+' -k 4.8,4.11 -k 4.4M -k 4.1,4.2 -b Test.txt

If you have a GNU awk installed, you may want to try this approach.
sort.awk
#!/bin/gawk -f
function convertToSeconds(date, fields) {
split(date, fields, /\//)
fields[1]=months[tolower(fields[1])]
fields[2]=sprintf("%02d", fields[2])
fields[3]=(length(fields[3]) == 2) ? sprintf("2%03d", fields[3]) : fields[3]
return mktime(sprintf("%s %s %s 00 00 00", fields[3], fields[1], fields[2]))
}
BEGIN {
FS="( \\+ )"
months["jan"]="01"; months["feb"]="02"; months["mar"]="03"; months["apr"]="04"
months["may"]="05"; months["jun"]="06"; months["jul"]="07"; months["aug"]="08"
months["sep"]="09"; months["oct"]="10"; months["nov"]="11"; months["dec"]="12"
}
{
arr[convertToSeconds($4)]=$0
}
END {
asorti(arr, dst)
for(i=1; i<=FNR; ++i) {
print arr[dst[i]]
}
}
Give it an execute permission, then run it:
$ chmod +x ./sort.awk
$ ./sort.awk Test.txt
To save the changes into a new file, append this > operator.
$ ./sort.awk Test.txt > SortedTest.txt

** UPDATE 1 **
revised sort key to explicitly list 4 digit year as prefix to circumvent year-end crossover issues
since OP only wants to sort date field, the exact epochs mapping isn't needed at all ::
mawk '$++NF = 366 * ( (_=($3) % 100) + 1900 + 100 * (_<50) ) \
+ int(_ * 10^8) + ($2) + (31) * \
(index(" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC", toupper($2)) / 3 - 1)'
23284 SEP 30 2013 201300737036
23285 OCT 1 2013 201300737038
23287 OCT 3 2013 201300737040
23541 JUN 14 2014 201400737293
1st column is original date generation order (the correct rank ordering), and the last column is the calculated sort index value - i tested every date from jan 1st 1950 to dec 31 2025, and this simplistic approach ranks order just fine, even though it doesn't bother to calculate exact julian dates, or exact leap years,
since the objective is merely finding a rank ordering method that yields the same sorting output as exact epoch seconds

You're nearly there. Use sed, for example, to add the missing centuries
then the M option of the 2nd KEYDEF field works with GNU sort:
sed 's:/\([0-9][0-9]\)$:/20\1:' << 'HERE' |
f1 + f2 + f3 + NoV/30/15
f1 + f2 + f3 + Sep/30/2013
f1 + f2 + f3 + Oct/03/13
f1 + f2 + f3 + Jun/01/14
f1 + f2 + f3 + Oct/01/2013
f1 + f2 + f3 + mAr/11/11
f1 + f2 + f3 + oct/03/2013
f1 + f2 + f3 + juL/17/1998
HERE
LC_ALL=C sort -t '+' -k 4.9 -k 4.2M,4.4 -k 4.6,4.7
Output:
f1 + f2 + f3 + juL/17/1998
f1 + f2 + f3 + mAr/11/2011
f1 + f2 + f3 + Sep/30/2013
f1 + f2 + f3 + Oct/01/2013
f1 + f2 + f3 + Oct/03/2013
f1 + f2 + f3 + oct/03/2013
f1 + f2 + f3 + Jun/01/2014
f1 + f2 + f3 + NoV/30/2015

Related

Unix converting time format to integer value

I have the following text file.
Account1,2h 01m 00s
Account2,4h 25m 23s
Account3,5h 43m 59s
I wish to add the values of hours, minutes and seconds in order to total them to their respective minute totals.
Account1 minute total = 121
Account2 minute total = 265
Account3 minute total = 343
I have the following bash file
cat data.txt | cut -f2 -d','
This isolates the time values; however, from here I don't know what steps I would take to isolate the time, convert it to integers and then convert it to minutes. I have tried using a PARAM but to no avail.

If awk is an option, you can try this
awk -F"[, ]" '{h=60; m=1; s=0.01666667}{split($2,a,/h/); split($3,b,/m/); split($4,c,/s/); print$1, "minute total = " int(a[1] * h + b[1] * m + c[1] * s)}' input_file
$ cat awk.script
BEGIN {
FS=",| "
} {
h=60
m=1
s=0.01666667
}{
split($2,a,/h/)
split($3,b,/m/)
split($4,c,/s/)
print $1, "minute total = " int(a[1] * h + b[1] * m + c[1] * s)
}
Output
awk -f awk.script input_file
Account1 minute total = 121
Account2 minute total = 265
Account3 minute total = 343

How to align text in a file to looks like a table in bash based on pattern text?

I have the following text
' 14411.7647 e0 - 2647.0588 e3 + 7352.9412 e12 + 14411.7647 e123 21828.2063'
' - 2647.0588 e3 + 7352.9412 e12 7814.9002'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 0.0000 e0 + 0.0000 e123 1.9293e-12'
' 14411.7647'
and I'd like to align to look like a table based on the eXXX terms. This could be an example output:
' 14411.7647 e0 - 2647.0588 e3 + 7352.9412 e12 + 14411.7647 e123 21828.2063'
' - 2647.0588 e3 + 7352.9412 e12 7814.9002'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 0.0000 e0 + 0.0000 e123 1.9293e-12'
' 14411.7647'
The most important part is to align the eXXX terms along with it's coefficients.
UPDATE: the columns are originally separated by spaces. The output could be separated by tabs, for example.
UPDATE2: The first row indicates the total number of columns. There are no more columns than those in the first row. The exxx in second and following rows can be the same or not than in the first row, but you'll never find more terms than the first row nor will be unordered (i.e. e12 will be after e3 always)
Can this be achieved using awk or similar?

$ cat tst.awk
BEGIN { OFS="\t" }
{
# Get rid of all single quotes at the start/end of lines
gsub(/^\047|\047$/,"")
# Attach the +/- sign when present to the number to its right
# to normalize how the fields are presented on each line.
gsub(/\+ /,"+")
gsub(/- /,"-")
}
NR==1 {
# Consider each pair like "14411.7647 e0" to be one field with
# "e0" as the key that determines the output order for that field
# and "14411.7647" as the value associated with that key. Here
# we create an array that remembers the order of the keys.
for (i=1; i<=NF; i+=2) {
key = $(i+1)
fldNr2key[++numFlds] = key
}
}
{
# Populate an array that maps the key to its value
delete key2val
for (i=1; i<=NF; i+=2) {
key = $(i+1)
val = $i
key2val[key] = val
}
# Print the values by the order of the keys
out = ""
for (fldNr=1; fldNr<=numFlds; fldNr++) {
key = fldNr2key[fldNr]
fld = ""
if (key in key2val) {
val = key2val[key]
fld = val (key ~ /./ ? " " key : "")
sub(/^[-+]/,"& ",fld) # restore the blank after a leading +/-
}
out = out fld (fldNr<numFlds ? OFS : "")
}
print "\047 " out "\047"
}
Tab-separated output:
$ awk -f tst.awk file
' 14411.7647 e0 - 2647.0588 e3 + 7352.9412 e12 + 14411.7647 e123 21828.2063'
' - 2647.0588 e3 + 7352.9412 e12 7814.9002'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 0.0000 e0 + 0.0000 e123 1.9293e-12'
' 14411.7647'
Visually tabular output (or use printfs with an appropriate width for each field in the script):
$ awk -f tst.awk file | column -s$'\t' -t
' 14411.7647 e0 - 2647.0588 e3 + 7352.9412 e12 + 14411.7647 e123 21828.2063'
' - 2647.0588 e3 + 7352.9412 e12 7814.9002'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 0.0000 e0 + 0.0000 e123 1.9293e-12'
' 14411.7647'

Looks the fields can be split by multi-spaces, then you can try using FS="*\047 *| +", this way, your final expected lines(based on NR==1) can be split into eXXX columns(from $2 to $(NF-2)), a regular column if exists at $(NF-1). both $1 and $NF are always EMPTY.
$ cat t17.1.awk
BEGIN{ FS = " *\047 *| +"; OFS = "\t"; }
# on the first line, set up the total N = NF
# the keys and value lengths for the 'eXXX' cols
# to sort and format fields for all rows
NR == 1 {
N = NF
for (i=2; i < N-1; i++) {
n1 = split($i, a, " ")
e_cols[i] = a[n1]
e_lens[i] = length($i)
}
# the field-length of the regular column which is non eXXX-cols
len_last = length($(NF-1))
}
{
printf "\047 "
# hash the e-key for field from '2' to 'NF-1'
# include NF-1 in case the last regular column is missing
for (i=2; i < NF; i++) {
n1 = split($i, a, " ")
hash[a[n1]] = $i
}
# print the eXXX-cols based on the order as in NR==1
for (i=2; i < N-1; i++) {
printf("%*s%s", e_lens[i], hash[e_cols[i]], OFS)
}
# print the regular column at $(NF-1) or EMPTY if it is an eXXX-cols
printf("%*s\047\n", len_last, match($(NF-1),/ e[0-9]+$/)?"":$(NF-1))
# reset the hash
delete hash
}
Run the above script and you will get the following result: (Note, I appended one extra row so that an eXXX-cols + 14411.7647 e123 is at the end of the line before the trailing ')
$ awk -f t17.1.awk file.txt
' 14411.7647 e0 - 2647.0588 e3 + 7352.9412 e12 + 14411.7647 e123 21828.2063'
' - 2647.0588 e3 + 7352.9412 e12 7814.9002'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 14411.7647 e0 + 14411.7647 e123 20381.3131'
' 0.0000 e0 + 0.0000 e123 1.9293e-12'
' 14411.7647'
' + 14411.7647 e123 '
Note:
you might need gawk to make "%*s" work for printf(), in case it's not working, try a fixed number, for example: printf("%18s%s", hash[e_cols[i]], OFS)
some of values in the e-cols might have longer size than the corresponding one at NR==1, to fix this, you can manually specify an array for lengths or just use a fixed number

How a loop works in awk ? and do we get matched data from two files?

I am trying to extract data from two files with a common column but I am unable to fetch the required data.
File1
A B C D E F G
Dec 3 abc 10 2B 21 OK
Dec 1 %xyZ 09 3F 09 NOK
Dec 5 mnp 89 R5 11 OK
File2
H I J K
abc 10 6.3 A9
xyz 00 0.2 2F
pqr 45 6.9 3c
I am able to get output A B C D E F G but unable to add columns of File2 in between columns in File1 column.
Trail:
awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
{k=$3; sub(/^\%/,"",k)} k in a{print $1,$2,$3,$a[2,3,4],$4,$5,$6,$7; delete a[k]}
END{for(k in a) print k,a[k] > "unmatched"}' File2 File1 > matched
Required output:
matched:
A B I C J K D E F G
Dec 3 10 abc 6.3 A9 10 2B 21 OK
Dec 1 00 %syz 0.2 2F 09 3F 09 NOK
unmatched :
H I J K
pqr 45 6.9 3c
Could you please help me for getting this output please ? Thank you.

awk '
FNR == 1 { next }
FNR==NR {
As[ $3] = $0
S3 = $3
gsub( /%/, "", S3)
ALs[ tolower( S3)] = $3
next
}
{
Bs[ tolower( $1)] = $0
}
END {
print "matched:"
print "A B I C J K D E F G"
for ( B in Bs){
if ( B in ALs){
split( As[ ALs[B]] " " Bs[B], Fs)
printf( "%s %s %s %s %s %s %s %s %s %s\n", Fs[1], Fs[2], Fs[9], Fs[3], Fs[10], Fs[11], Fs[4], Fs[5], F[6], F[7])
}
}
print "unmatched :"
print "H I J K"
for ( B in Bs) if ( ! ( B in ALs)) print Bs[ B]
}
' File1 File2
added non define constraint of ignore case of reference (%xyZ vs xyz)
need to keep both file in memory (array) to treat at the end. Matched could be done at reading. I keep, for understanding purpose output at END level
Your problem:
you mainly take reference to wrong file in your code (k=$3 is used when reading File2 with field from File1 reference, ...)

linux how to use dynamic variables

I have made a shell script that does some calculations.
The user inputs 2 numbers:
the first number being the month (if desired date is february 2010 for example he puts in 2)
the second number being the year (if desired date is february 2010 for example he puts in 2010)
My script will then calculate the number of days that have passed from every day in januari 2000 to the date the user has inputed using the following code.
EDIT (had some stupid syntax errors in my code)
a=$(echo "(14-$1)/12" | bc)
y=$(echo "$2 + 4800 - $a" | bc)
m=$(echo "12 * $a - 3 + $1" | bc)
jdn=$(echo "dd + ((153 * $m +2)/5) + (365 * $y) + ($y/4) - ($y/100) + ($y/400) - 32045" | bc)
Because there are 31 days in a month (yes in my script I will just assume every month has 31 days) my "dd" variable in the last line of code will change 31 times.
I wonder how to do this without copy pasting the formula 31 times changing the code each time.

It could be something like that:
a=$((14-mm)/12 | bc)
y=$(yyyy + 4800 - $a | bc)
m=$(12 * $a - 3 + mm)
for dd in $(seq 1 31);
do
jdn=($dd + (153 * $m +2)/5 + 365 * $y + $y/4 - $y/100 + $y/400 - 32045)
done

Add different value to each column in array

How can i add a different value to each column in a bash script?
Example: Three function f1(x) f2(x) f3(x) plotted over x
test.dat:
# x f1 f2 f3
1 0.1 0.01 0.001
2 0.2 0.02 0.002
3 0.3 0.03 0.003
Now i want to add to each function a different offset value
values = 1 2 3
Desired result:
# x f1 f2 f3
1 1.1 2.01 3.001
2 1.2 2.02 3.002
3 1.3 2.03 3.003
So first column should be unaffected, otherwise the value added.
I tried this, but it doesn work
declare -a energy_array=( 1 2 3 )
for (( i =0 ; i < ${#energy_array[#]} ; i ++ ))
do
local energy=${energy_array[${i}]}
cat "test.dat" \
| awk -v "offset=${energy}" \
'{ for(j=2; j<NF;j++) printf "%s",$j+offset OFS; if (NF) printf "%s",$NF; printf ORS} '
done

You can try the following:
declare -a energy_array=( 1 2 3 )
awk -voffset="${energy_array[*]}" \
'BEGIN { n=split(offset,a) }
NR> 1{
for(j=2; j<=NF;j++)
$j=$j+a[j-1]
print;next
}1' test.dat
With output:
# x f1 f2 f3
1 1.1 2.01 3.001
2 1.2 2.02 3.002
3 1.3 2.03 3.003

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sort file by different format of date field - shell

Related

Unix converting time format to integer value

How to align text in a file to looks like a table in bash based on pattern text?

How a loop works in awk ? and do we get matched data from two files?

linux how to use dynamic variables

Add different value to each column in array

Categories

Resources