find numbers divisible by 3 in csv file using shell script - bash

I have csv file having content like below :
1|2|3
4|5|6
7|8|9
Now I would like to find the numbers which are divisible by 3 using shell scripting.
I would like to use awk command for this. I am learning shell scripting. So could you please help me out to find solution.

awk -F'|' '{for(i=1;i<=NF;i++)if(!($i%3))print $i}' file
this awk one-liner shoud do.
With your example, the cmd outputs:
3
6
9

Using GNU awk, which allows for a multi-character record separator:
awk -v RS='[|[:space:]]+' '$0 % 3 == 0' file
This sets the record separator to one or more pipes or space characters, printing each record that divides evenly by 3.

You can use this awk:
awk -v RS='\\||\n' '$0 % 3 == 0' file
3
6
9
-v RS='\||\n' will set input record separator as | or newline, thus giving us each number in $0.
!($0 % 3) (modulo) will make sure to print only when $0 % %3 is zero.

Here's how to read a CSV file into a variable:
http://www.cyberciti.biz/faq/unix-linux-bash-read-comma-separated-cvsfile/
Here's how to find the length of the array to which the CSVs are saved:
L=${#array[#]}
Now run a for loop like this:
for i in `seq 1 $L`; do
if [ ${array[${i}]}%3 -eq 0 ] then
echo ${array[${i}]}
fi
done

Related

How can one dynamically create a new csv from selected columns of another csv file?

I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas.
I save these indexes to an array that looks like
echo "${cols_needed[#]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[#]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[#]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[#]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
Your first approach is ok, just:
change -v fields=${cols_needed[#]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
ie. BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[#]}"; do
fields[j++]="\$$i"
done
IFS=,
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
Expanding on my comments re: passing the bash array into awk:
Passing the array in as an awk variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i++) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[++n]=$1;next} END {for (i=1;i<=n;i++) print i,f[i]}' <(printf "%s\n" "${cols_needed[#]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash array in as an awk variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i++) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
Suppose you have this variable in bash:
$ echo "${cols_needed[#]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[#]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

cut string in a specific column in bash

How can I cut the leading zeros in the third field so it will only be 6 characters?
xxx,aaa,00000000cc
rrr,ttt,0000000yhh
desired output
xxx,aaa,0000cc
rrr,ttt,000yhh
or here's a solution using awk
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3)}1'
output
xxx,aaa,0000cc
rrr,ttt,000yhh
awk uses -F (or FS for FieldSeparator) and you must use OFS for OutputFieldSeparator) .
sub(/srchtarget/, "replacmentstring", stringToFix) is uses a regular expression to look for 4 0s at the front of (^) the third field ($3).
The 1 is a shorthand for the print statement. A longhand version of the script would be
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3);print}'
# ---------------------------------------------------------^^^^^^
Its all related to awk's /pattern/{action} idiom.
IHTH
If you can assume there are always three fields and you want to strip off the first four zeros in the third field you could use a monstrosity like this:
$ cat data
xxx,0000aaa,00000000cc
rrr,0000ttt,0000000yhh
$ cat data |sed 's/\([^,]\+\),\([^,]\+\),0000\([^,]\+\)/\1,\2,\3/
xxx,0000aaa,0000cc
rrr,0000ttt,000yhh
Another more flexible solution if you don't mind piping into Python:
cat data | python -c '
import sys
for line in sys.stdin():
print(",".join([f[4:] if i == 2 else f for i, f in enumerate(line.strip().split(","))]))
'
This says "remove the first four characters of the third field but leave all other fields unchanged".
Using awks substr should also work:
awk -F, -v OFS=, '{$3=substr($3,5,6)}1' file
xxx,aaa,0000cc
rrr,ttt,000yhh
It just take 6 characters from 5 position in field 3 and set it back to field 3

AWK: parsing arguments in a loop

I'm trying to write a simple script that will display the fields specified by the user as bash arguments. For example I've got text file looks like this:
1 2 3 4 5
1 2 3 4 5
a b c d e
And for example user types:
./script.sh text 1 2 5
Where $1 = text, and other parameters (like $2 $3 and $4) are the fields, so output will look like this:
1 2 5
1 2 5
a b e
I've got this code which prints all the columns defined as a arguments, but one below the others:
#!/bin/bash
text="$1"
shift
for x in $#; do
awk '{print $var}' var="$x" $text
done
Output for example ./script.sh text 1 2 5:
1
1
a
2
2
b
5
5
e
I guess output looks like that because loop "for" is outside of AWK. Is it a good solution for this task to place the loop inside AWK? I tried a few things but always have trouble with the syntax.
Thank you for your time and help!
file="$1"
shift
awk -v flds="$*" 'BEGIN{n=split(flds,f)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' "$file"
You don't need to loop over the params, pass all of them to awk with -v option:
awk -v v1=$2 -v v2=$3 -v v3=$4 '{print $v1, $v2, $v3;}' $1
You may want to perform additional checks such as whether the file ($1) contains enough fields, the file ($1) exists etc. But the idea is the same.
In your code, you are reading the file multiple times, each checking for only a particular field but to get your desired output, each line must be checked for multiple fields at the same time.
Pass the columns to awk and split them into an array and print the column corresponding to each value in the array:
file=$1
shift
p="$#"
awk -v l="$p" '{t=split(l,a," "); for (i=1;i<=t;i++) printf $(a[i]) " ";printf "\n";}' $file

Awk changes tabs to spaces

Data:
Sandnes<space>gecom<tab>Hansen<tab>Ola<space>Timoteivn<space>10
I am substituting a specific column (ex:2th column) value with a variable in a file. So I am using the command:
varz="zipval"
awk -v VAR=$varz '{$2=VAR}1' OutputFile.log
The awk substitute all the tabs to space after processing. So I have used OFS="\t" .
But it removes every space to tabs
Sandnes<tab>gecom<tab>Hansen<tab>zipval<tab>Timoteivn<tab>10
How to handle it.
Thanks
Your problem is that awk splits your input on FS=[ \t]+ and then reassembles it with OFS=' ' or OFS='\t'. I don't think you can get around doing an extra split. Something like this works:
<data awk -v VAR="$varz" 'BEGIN { FS=OFS="\t" } { split($1, a, " +"); $1 = a[1]" "VAR } 1'
Output:
Sandnes zipval^IHansen^IOla Timoteivn 10
Use this script to pass column no to your awk script:
varz="zipval"
awk -v VAR=$varz -v N=6 '{sub($N, VAR)}1' OutputFile.log
The below is working fine at my place:
> setenv var "hi"
> echo "1 2 3 4 5 6 7" | awk -v var1=$var '{$6=var1}1'
1 2 3 4 5 hi 7
>
You didn't post your desired output or even tell us which specific text you wanted replaced ("2th field" could mean several things) so this is a guess, but assuming your input file is tab-separated fields, you just need to quote your shell variable and assign FS as well as OFS:
varz="zipval"
awk -v VAR="$varz" 'BEGIN{FS=OFS="\t"} {$2=VAR} 1' OutputFile.log
I'd also recommend you don't use all-upper case for your variable name since that's used to identify awk builtin variables (NR, NF, etc.).

Resources