shell scripting validating a single awk variable - shell

{gsub(/[ \t]+$/, "", $4); length($4) < 9 || length($4) > 12 } {print $4$1} {print length($4)} { fails4++ }
so I have this portion above that supposed to validate the 4th field thus($4) for if lenght < 9 or if lenght is greater than 11 characters its supposed to fail the validation... even after i print the length i get 11 characters and I set validation to greater than 12 but its still failing
What I am trying to DO is correctly account for the length of the field, if there are any white spaces in $4 field it supposed to trim and get the length and fail if its less that 9 or greater 11 characters
length($4) < 9 || length($4) > 11 {print $4$1} {print length($4)} { fails4++ }

It sounds like you want:
{gsub(/^[[:space:]]+|[[:space:]]+$/, "", $4); lgth=length($4)} lgth < 9 || lgth > 11{print $4 $1, lgth; fails4++}
If not, post some sample input and expected output.

Related

Using one awk output into another awk command

I have one file (excel file) which has some columns (not fixed, changes dynamically) and I need to get values for couple of particular columns. I'm able to get the columns using one awk command and then printing rows using these columns numbers into another awk command. Is there any way I can combine into one?
awk -F',' ' {for(i=1;i < 9;i++) {if($i ~ /CLIENT_ID/) {print i}}} {for(s=1;s < 2;s++) {if($s ~ /SEC_DESC/) {print s}}} ' <file.csv> | awk -F "," '!($5~/...[0-9]L/ && $21~/FUT /) {print $0}' <file.csv>
Gives me output as 5 and 9 for columns (client_idandsec_desc`), which is their column number (this changes with different files).
Now using this column number, I get the desired output as follows:
awk -F "," '!($5~/...[0-9]L/ && $21~/FUT /) {print $0}' <file.csv>
How can I combine these into one command? Pass a variable from the first to the second?
Input (csv file having various dynamic columns, interested in following two columns)
CLIENT_ID SEC_DESC
USZ256 FUT DEC 16 U.S.
USZ256L FUT DEC 16 U.S. BONDS
WNZ256 FUT DEC 16 CBX
WNZ256L FUT DEC 16 CBX BONDS
Output give me rows- 2 and 4 that matched my regex pattern in second awk command (having column numbers as 5 & 21). These column numbers changes as per file so first have to get the column number using first awl and then giving it as input to second awk.
I think I got it.
awk -F',' '
NR == 1 {
for (i=1; i<=NF; ++i) {
if ($i == "CLIENT_ID") cl_col = i
if ($i == "SEC_DESC") sec_col = i
}
}
NR > 1 && !($cl_col ~ /...[0-9]L/ && $sec_col ~ /FUT /) {print $0}
' RED_FUT_TST.csv
To solve your problem you can test when you're processing the first row, and put the logic to discover the column numbers there. Then when you are processing the data rows, use the column numbers from the first step.
(NR is an awk built-in variable containing the record number being processed. NF is the number of columns.)
E.g.:
$ cat red.awk
NR == 1 {
for (i=1; i<=NF; ++i) {
if ($i == "CLIENT_ID") cl_col = i;
if ($i == "SEC_DESC") sec_col = i;
}
}
NR > 1 && $cl_col ~ /...[0-9]L/ && $sec_col ~ /FUT /
$ awk -F'\t' -f red.awk RED_FUT_TST.csv
USZ256L FUT DEC 16 U.S. BONDS
WNZ256L FUT DEC 16 CBX BONDS

Conditional replacement of field value in csv file

I am converting a lot of CSV files with bash scripts. They all have the same structure and the same header names. The values in the columns are variable of course. Col4 is always an integer.
Source file:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;12
Name3;Street3;City3;15
Name4;Street4;City4;10
Name5;Street5;City5;3
Now when Col4 contains a certain value, for example "10", the value has to be changed in "10 pcs" and the complete line has to be duplicated.
For every 5 pcs one line.
So you could say that the number of duplicates is the value of Col4 divided by 5 and then rounded up.
So if Col4 = 10 I need 2 duplicates and if Col4 = 12, I need 3 duplicates.
Result file:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name4;Street4;City4;... of 10
Name4;Street4;City4;... of 10
Name5;Street5;City5;3
Can anyone help me to put this in a script. Something with bash, sed, awk. These are the languages I'm familiar with. Although I'm interested in other solutions too.
Here is the awk code assuming that the input is in a file called /tmp/input
awk -F\; '$4 < 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
Explanation:
There are two rules.
First rule prints any rows where the $4 is less than 5. This will also print the header
$4 < 5 {print}
The second rule print if $4 is greater than 5. The loop runs $4/5 times:
$4 > 5 {for (i=0; i< ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}
Output:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name4;Street4;City4;...of 10
Name4;Street4;City4;...of 10
Name5;Street5;City5;3
The code does not handle the use case where $4 == 5. You can handle that by adding a third rule. I did not added that. But I think you got the idea.
Thanks Jay! This was just what I needed.
Here is the final awk code I'm using now:
awk -F\; '$4 == "Col4" {print}; $4 < 5 {print}; $4 == 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
I added the rule below to print the header, because it wasn't printed
$4 == "Col4" {print}
I added the rule below to print the lines where the value is equal to 5
$4 == 5 {print}

making awk and for statement smarter

I have the following commands (below) which I like to make a bit smarter in two aspects:
Get the for statement shorter, something like:
for i in seq `1 22` X;
Would that work?
And getting the awk statement a bit smarter. Something like:
awk '{print $1,$2,'$i',$4-$10,$12-$21}'
That will subtract the value of column 10 from 4, and 21 from 12. I want it to print 4 through 10, etc. How do I do that?
Thanks a lot!
Sander
Original commands are below
grep 'alternate_ids' 1000g/aegscombo_pp_1000G_sum_stat_chrX.out > 1000g/aegscombo_pp_1000G_sum_stat_allchr.txt
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X;
do
echo "Grepping data for chromosome: "$i
tail -n +13 1000g/aegscombo_pp_1000G_sum_stat_chr$i.out | wc -l
tail -n +13 1000g/aegscombo_pp_1000G_sum_stat_chr$i.out |
awk '{print $1,$2,'$i',$4,$5,$6,$7,$8,$9,$10,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21}' \
>> 1000g/aegscombo_pp_1000G_sum_stat_allchr.txt
done
for i in {1..22} X; do
If the number of fields to not print is smaller than the number of fields to print you could try emptying the fields you want to ignore and then print the whole line.
Any time you write a loop in shell just to manipulate text you have the wrong approach. The shell is just an environment from which to call tools and the UNIX tool for general purpose text processing is awk. Your script should look something like this:
awk '
BEGIN {
for (i=1; i<=22; i++) {
ARGV[ARGC++] = "1000g/aegscombo_pp_1000G_sum_stat_chr" i ".out"
}
ARGV[ARGC++] = "1000g/aegscombo_pp_1000G_sum_stat_chrX.out"
}
NR == FNR {
if (/alternate_ids/) {
print
}
next
}
FNR == 1{
chr = FILENAME
gsub(/^.*chr|\.out$/,"",chr)
print "Grepping data for chromosome:", chr | "cat>&2"
}
{
for (i=1; i<=21; i++) {
printf "%s%s", (i==3?chr:$i), (i<21?OFS:ORS)
}
}
' 1000g/aegscombo_pp_1000G_sum_stat_chrX.out > 1000g/aegscombo_pp_1000G_sum_stat_allchr.txt

Bash First Element in List Recognition

I'm very new to Bash so I'm sorry if this question is actually very simple. I am dealing with a text file that contains many vertical lists of numbers 2-32 counting up by 2, and each number has a line of other text following it. The problem is that some of the lists are missing numbers. Any pointers for a code that could go through and check to see if each number is there, and if not add a line and put the number in.
One list might look like:
2 djhfbadsljfhdsalkfjads;lfkjs
4 dfhadslkfjhasdlkfjhdsalfkjsahf
6 dsa;fghds;lfhsdalfkjhds;fjdsklj
8 daflgkdsakfjhasdlkjhfasdjkhf
12 dlsagflakdjshgflksdhflksdahfl
All the way down to 32. How would I in this case make it so the 10 is recognized as missing and then added in above the 12? Thanks!
Here's one awk-based solution (formatted for readability, not necessarily how you would type it):
awk ' { value[0 + $1] = $2 }
END { for (i = 2; i < 34; i+=2)
print i, value[i]
}' input.txt
It basically just records the existing lines in a key/value pair (associative array), then at the end, prints all the records you care about, along with the (possibly empty) value saved earlier.
Note: if the first column needs to be seen as a string instead of an integer, this variant should work:
awk ' { value[$1] = $2 }
END { for (i = 2; i < 34; i+=2)
print i, value[i ""]
}' input.txt
You can use awk to figure out the missing line and add it back:
awk '$1==NR*2{i=NR*2+2} i < $1 { while (i<$1) {print i; i+=2} i+=2}
END{for (; i<=32; i+=2) print i} 1' file
Testing:
cat file
2 djhfbadsljfhdsalkfjads;lfkjs
4 dfhadslkfjhasdlkfjhdsalfkjsahf
6 dsa;fghds;lfhsdalfkjhds;fjdsklj
20 daflgkdsakfjhasdlkjhfasdjkhf
24 dlsagflakdjshgflksdhflksdahfl
awk '$1==NR*2{i=NR*2+2} i < $1 { while (i<$1) {print i; i+=2} i+=2}
END{for (; i<=32; i+=2) print i} 1' file
2 djhfbadsljfhdsalkfjads;lfkjs
4 dfhadslkfjhasdlkfjhdsalfkjsahf
6 dsa;fghds;lfhsdalfkjhds;fjdsklj
8
10
12
14
16
18
20 daflgkdsakfjhasdlkjhfasdjkhf
22
24 dlsagflakdjshgflksdhflksdahfl
26
28
30
32

from xyz to matrix with awk

I have a problem that I managed to solve with a work around so I am here hoping to learn from you more elegant solutions ;-)
I have to parse the output of a program: it writes a file of three columns x y z like this
1 1 11
1 2 12
1 3 13
1 4 14
2 1 21
2 2 22
2 3 23
2 4 24
3 1 31
3 2 32
3 3 33
3 4 34
4 1 41
4 2 42
4 3 43
4 4 44
in a matrix like this
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
I solved with a two line bash script like this
dim_matrix=$(awk 'END{print sqrt(NR)}' file_xyz) #since I know that the matrix has to be squared and there are no blank lines in the file_xyz
awk '{printf("%s%s",$3, !(NR%'${dim_matrix}'==0) ? OFS :ORS ) }' file_xyz
Can you please suggest me a way to perform the same only with awk?
awk does not do real multidimensional arrays, but you can fake it with a properly constructed string:
awk '
{mx[$1 "," $2] = $3}
END {
size=sqrt(NR)
for (x=1; x<=size; x++) {
for (y=1; y<=size; y++)
printf("%s ",mx[x "," y])
print ""
}
}
' filename
You can accomplish your example with a single awk call and a call to wc
awk -v "nlines=$(wc -l < filename)" '
BEGIN {size = sqrt(nlines)}
{printf("%s%s", $3, (NR % size == 0 ? ORS : OFS))
}' filename
A "not so" readable version:
awk '($0=$NF x)&&ORS=NR%4?FS:RS' infile
Parameters added as per OP's request:
awk '
($0 = $NF x) && ORS = NR % n ? FS : RS
' n="$1" infile
In the script above I'm using $1, but you can use any shell variable.
The explanation follows:
$0 = $NF - set $0 (the entire current input record)
to the current value of the last field ($NF).
ORS = NR % n ? FS : RS - using the ternary operator:
expression ? return_this_if_true : return_this_otherwise,
set the OutputRecordSeparator to:
when NR % n evaluates true (i.e. returns value different than 0)
set ORS to the current value of FS (FieldSeparator - runs of white space
characters by default)
otherwise set it to RS (which defaults to a newline)
The x (an unitialized variable and thus a NULL string when used in concatenation)
is needed in order to handle correctly the output
when the last field is 0 (or an empty string).
This is because the assignement statement in awk
actually in this case returns the assigned value,
if $NF is 0, the rest of the && boolean statement
will be ignored.
I am not totally sure what you try do, try this:
awk 'NR%4==0{print s " " $NF;s="";next}{s=s?s " " $NF:$NF}' file1

Resources