awk how to find first available date field? - bash

Fields 1,2,3,4 are date fields yyyy-mm-dd.
Delimited by ";"
"-" if no date.
Field 4 will always have a date
Examples;
-; 2016-08-19; 2016-08-19; 2018-07-17; Beach-Rangiroa.jpg
-; -; -; 2018-09-12; MV3_0034-copy.webp
2016-12-10; 2016-12-10; 2016-12-20; 2018-07-18; Sukhothai-61.jpg
-; -; -; 2018-07-19; Gdu9Rwhu6W3Q5W6q_1Qag.jpg
Objective: Use awk to print the 1st available date in order fields 1,2,3,4
I've tried this;
awk -F";" '{if ($1!="-") print $1; else if ($2!="-") print $2; else if ($3!="-") prin$3; else if ($4!="-") print $4}'
Results...
2016-08-19
-
-
bash version 4.3.48
I am trying to achieve this: e.g. line 1 in example...
2016-08-19; Beach-Rangiroa.jpg
echo '-; -; -; 2018-07-15; Stock-Photo-114398301.webp; WEBP; image/webp; 2000; 1333' | \
awk -F';' 'OFS=";" {for(i=1; i<5; ++i) { if ($i ~ /[0-9]{4}-[0-9]{,2}-[0-9]{,2}/) { print $i,$5,$6,$7,$8,$9; next; }}}'
Result;
2018-07-15; Stock-Photo-114398301.webp; WEBP; image/webp; 2000; 1333
This works nicely, except the 1st space on the date, also is there a method available to verify the date, e.g. date -d "%Y-%m-%d" ?
Thank you.

This is a gnu only gawk solution using FPAT:
awk 'BEGIN{FPAT="[0-9]{4}-[0-9]{,2}-[0-9]{,2}"}{print $1}' file1
2016-08-19
2018-09-12
2018-07-19
With FPAT you actually instruct gawk what to consider as a field, a whole regex here. If the input line has also a second date this will appear as $2, $NF will return the last date field of each line,NF will return the total date fields,and so on.

You can use a variable for field numbers:
awk -F\; '{for(i=1; i<5; ++i) { if ($i ~ /[0-9]/) { print $i; next; }}}' in

Solution without awk:
You said you wanted the 1st available date. When you only want 1 line output, you can use
grep -Eo "[0-9]{4}-[0-9]{2}-[0-9]{2}" inputfile| head -1
When you want to have the first date for each line, change the grep or use sed:
grep -Eo "[0-9]{4}-[0-9]{2}-[0-9]{2}.*" inputfile| cut -d';' -f1
# or
sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/; s/.*([0-9]{4}-[0-9]{2}-[0-9]{2})/\1/' inputfile

Thank you for all for your help.
I think this acomplishes the objective;
echo '-; -; -; 2018-07-25; Redwood-Forest-Sequoia-4.jpg; JPEG; image/jpeg; 1280; 720' | \
awk -F'; ' 'OFS="; " {for(i=1; i<5; ++i) { if ($i ~ /[0-9]{4}-[0-9]{,2}-[0-9]{,2}/) { print $i,$5,$6,$7,$8,$9; next; }}}'
Result;
2018-07-25; Redwood-Forest-Sequoia-4.jpg; JPEG; image/jpeg; 1280; 720
Best regards.

Related

BASH How to get minimum value from each row

I have csv file like this:
-0.106992, -0.106992, -0.059528, -0.059528, -0.028184, -0.028184, 0.017793, 0.017793, 0.0, 0.220367
-0.094557, -0.094557, -0.063707, -0.063707, -0.020796, -0.020796, 0.003707, 0.003707, 0.200767, 0.200767
-0.106038, -0.106038, -0.056540, -0.056540, -0.015119, -0.015119, 0.032954, 0.032954, 0.237774, 0.237774
-0.049499, -0.049499, -0.006934, -0.006934, 0.026562, 0.026562, 0.067442, 0.067442, 0.260149, 0.260149
-0.081001, -0.081001, -0.039581, -0.039581, -0.008817, -0.008817, 0.029912, 0.029912, 0.222084, 0.222084
-0.046782, -0.046782, -0.000180, -0.000180, 0.030788, 0.030788, 0.075928, 0.075928, 0.266452, 0.266452
-0.082107, -0.082107, -0.026791, -0.026791, 0.001874, 0.001874, 0.052341, 0.052341, 0.249779, 0.249779
enter image description here
I want to get the minimum value from each row.
Expected output must be:
-0.106992
-0.094557
-0.106038
-0.049499
-0.08100
-0.046782
-0.082107
I tried get it by awk but awk doesn't give minimum values:
awk command:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
output:
-0.028184,
-0.020796,
-0.015119,
-0.006934,
-0.008817,
-0.000180,
-0.026791,
Perl makes short work of this:
perl -MList::Util=min -F', ' -E 'say min #F' file.csv
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
Using any awk in any shell on every Unix box whether you have blanks after each comma or not:
$ awk -F', *' '{min=$1; for (i=2;i<=NF;i++) if ($i<min) min=$i; print min}' file
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
with ruby :-D
ruby -F', ' -ane 'puts $F.map(&:to_f).min' file.csv
Your code is correct:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
Except that you must add a comma to the field separator:
awk -F '[[:blank:],]' '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
[[:blank:],] is spaces, tabs, and commas.

AWK to display a column based on Column name and remove header and last delimiter

Id,responseId,name,test1,test2,bcid,stype
213,A_123456,abc,test,zzz,987654321,alpha
412,A_234566,xyz,test,xxx,897564322,gama
125,A_456314,ttt,qa,yyy,786950473,delta
222,A_243445,hds,test,fff,643528290,alpha
456,A_466875,sed,test,hhh,543819101,beta
I want to extract columns responseId, and bcid from above. I found an answer which is really close
awk -F ',' -v cols=responseID,bcid '(NR==1){n=split(cols,cs,",");for(c=1;c<=n;c++){for(i=1;i<=NF;i++)if($(i)==cs[c])ci[c]=i}}{for(i=1;i<=n;i++)printf "%s" FS,$(ci[i]);printf "\n"}' <file_name>
however, it prints "," in the end and the header as shown below.
responseId,bcid,
A_123456,987654321,
A_234566,897564322,
A_456314,786950473,
A_243445,643528290,
A_466875,543819101,
How can I make it to not print the header and the "," after bcid??
Input
$ cat infile
Id,responseId,name,test1,test2,bcid,stype
213, A_123456, abc, test, zzz, 987654321, alpha
412, A_234566, xyz, test, xxx, 897564322, gama
125, A_456314, ttt, qa, yyy, 786950473, delta
222, A_243445, hds, test, fff, 643528290, alpha
456, A_466875, sed, test, hhh, 543819101, beta
Script
$ cat byname.awk
FNR==1{
split(header,h,/,/);
for(i=1; i in h; i++)
{
for(j=1; j<=NF; j++)
{
if(tolower(h[i])==tolower($j)){ d[i]=j; break }
}
}
next
}
{
for(i=1; i in h; i++)
printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");
print "";
}
How to execute ?
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" -f byname.awk infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
One-liner
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" 'FNR==1{split(header,h,/,/);for(i=1; i in h; i++){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){ d[i]=j; break }}}next}{for(i=1; i in h; i++)printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");print "";}' infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
try:
awk '{NR==1?FS=",":FS=", ";$0=$0} {print $2 OFS $(NF-1)}' OFS=, Input_file
Checking if line is 1st line then making delimiter as "," and other lines making field separator as ", " then printing the 2nd field and 2nd last field. Setting OFS(output field separator) as ,

awk print something if column is empty

I am trying out one script in which a file [ file.txt ] has so many columns like
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha| |325
xyz| |abc|123
I would like to get the column list in bash script using awk command if column is empty it should print blank else print the column value
I have tried the below possibilities but it is not working
cat file.txt | awk -F "|" {'print $2'} | sed -e 's/^$/blank/' // Using awk and sed
cat file.txt | awk -F "|" '!$2 {print "blank"} '
cat file.txt | awk -F "|" '{if ($2 =="" ) print "blank" } '
please let me know how can we do that using awk or any other bash tools.
Thanks
I think what you're looking for is
awk -F '|' '{print match($2, /[^ ]/) ? $2 : "blank"}' file.txt
match(str, regex) returns the position in str of the first match of regex, or 0 if there is no match. So in this case, it will return a non-zero value if there is some non-blank character in field 2. Note that in awk, the index of the first character in a string is 1, not 0.
Here, I'm assuming that you're interested only in a single column.
If you wanted to be able to specify the replacement string from a bash variable, the best solution would be to pass the bash variable into the awk program using the -v switch:
awk -F '|' -v blank="$replacement" \
'{print match($2, /[^ ]/) ? $2 : blank}' file.txt
This mechanism avoids problems with escaping metacharacters.
You can do it using this sed script:
sed -r 's/\| +\|/\|blank\|/g' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
If you don't want the |:
sed -r 's/\| +\|/\|blank\|/g; s/\|/ /g' File
abc pqr lmn 123
pqr xzy 321 azy
lee cha blank 325
xyz blank abc 123
Else with awk:
awk '{gsub(/\| +\|/,"|blank|")}1' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i++) if ($i ~ /^ *$/) $i="blank"} 1' file
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123

unix time to date and replace in bash using awk

I am trying to convert unix time to date and time;
1436876820 blah1 stop none john
1436876820 blah0 continu none john
1436876821 blah2 stop good bob
I would like to convert the first column to have two more column date and time as below
14-07-15 13:27:00 blah1 stop none john
14-07-15 13:27:00 blah0 continu none john
14-07-15 13:27:01 blah2 stop good bob
etc..
So I have started to do the following.
IN="${1}"
for i in $(awk '{print $1}' ${IN});
do
DD=$(date -d #${i} +'%d-%m-%Y %H:%M:%S')
awk '{ ${1}="'"${DD}"'" }' < ${IN}
done
This does not work due to the syntax and give such of error:
awk: { ${1}="14-07-2015 13:27" }
awk: ^ syntax error
I could use sed instead of awk:
sed "s/^1........./${DD}/" ${IN}
Any help with awk is really welcome.
Al.
Get rid of the shell loop and just do it one awk invocation:
awk '{
cmd = "date -d #" $1 " +\"%d-%m-%Y %H:%M:%S\""
if ( (cmd | getline dd) > 0 ) {
$1 = dd
}
close(cmd)
print
}' "$1"
If you have GNU awk you can just use it's internal strftime() instead of the date+getline:
awk '{
$1 = strftime("%d-%m-%Y %H:%M:%S",$1)
print
}' "$1"

Edit text format with shell script

I am trying to make a script for text editing. In this case I have a text file named text.csv, which reads:
first;48548a;48954a,48594B
second;58757a;5875b
third;58756a;58576b;5867d;56894d;45864a
I want to make text format to like this:
first;48548a
first;48954a
first;48594B
second;58757a
second;5875b
third;58756a
third;58576b
third;5867d
third;56894d
third;45864a
What is command should I use to make this happen?
I'd do this in awk.
Assuming your first line should have a ; instead of a ,:
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
Untested.
Here is a pure bash solution that handles both , and ;.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
for item in "${data[#]}"; do
printf '%s;%s\n' "$id" "$item"
done
done < input.txt
UPDATED - alternate printing method based on chepner's suggestion:
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
Explanation: awk implicitly splits data into records(by default separeted by newline, i.e. line == record) which then are split into numbered fields by given field separator(FS for input field separator and OFS for output separator).
For each record this script prints first field(which is record name), along with i-th field, and that's exactly what you need.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
or
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
And
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
thanks all for your suggestions, :d. It's really give me a new knowledge..

Resources