Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command - bash

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.

You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file

Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"

There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

Related

awk issue inside for loop

I have many files with different names that end with txt.
rtfgtq56.txt
fgutr567.txt
..
So I am running this command
for i in *txt
do
awk -F "\t" '{print $2}' $i | grep "K" | awk '{print}' ORS=';' | awk -F "\t" '{OFS="\t"; print $i, $1}' > ${i%.txt*}.k
done
My problem is that I want to add the name of every file in the first column, so I run this part:
awk -F "\t" '{OFS="\t"; print $i, $1}' > ${i%.txt*}
$i means the file that are in the for loop,
but it did not work because awk can't read the $i in the for loop.
Do you know how I can solve it?
You want to refactor eveything into a single Awk script anyway, and take care to quote your shell variables.
for i in *.txt
do
awk -F "\t" '/K/{a = a ";" $2}
END { print FILENAME, substr(a, 1) }' "$i" > "${i%.txt*}.k"
done
... assuming I untangled your logic correctly. The FILENAME Awk variable contains the current input file name.
More generally, if you genuinely want to pass a variable from a shell script to Awk, you can use
awk -v awkvar="$shellvar" ' .... # your awk script here
# Use awkwar to refer to the Awk variable'
Perhaps see also useless use of grep.
Using the -v option of awk, you can create an awk Variable based on a shell variable.
awk -v i="$i" ....
Another possibility would be to make i an environment variable, which means that awk can access it via the predefined ENVIRON array, i.e. as ENVIRON["i"].

Replace one line of a file with another line in a second file if it matches the condition

I am here wondering that if I can read each line of a.txt and compare it to each line in b.txt. If any line in a.txt matches the beginning part of the line in b.txt, we replace the matched line with the line we found in a.txt. So let's say there are two lines: alias cd /correct/path/ and alias cd /wrong/path/sth in a.txt b.txt respectively. Now after I execute my command I would like the lines to be all like: alias cd /correct/path/ on both files. My own solution is to do two while...read.. functions and use sed -i /// to replace the line, but I think it is very clumsy and not efficient. I am looking to be enlightened with a more clean & efficient solution. Here is my code if it helps by any chance:
awk 'NR==FNR { array[$0]; next } { delete array[$0] } END{for (key in array) { print key } }' a.txt b.txt > tmp
input="tmp"
while IFS= read -r line
do
echo "$line"
cat b.txt > n_tmp
n_input="$n_tmp"
while IFS= read -r n_line
do
if $n_line | awk '{print $1, $2}' == $line | awk '{print $1, $2}'; then
sed -i "s/$n_line/$line/" b.txt
fi
done < "$n_input"
rm -rf n_tmp
done < "$input"
rm -rf tmp```
There are a few mistakes in this script and most of them are within the line: if $n_line | awk '{print $1, $2}' == $line | awk '{print $1, $2}'; then. First of all the way to get result from $n_line | awk '{print $1, $2}' is wrong as there is no action for n_line variable. There needs to be added an echo so that we can get the output of the string and the awk command can follow up. Secondly there is no double quotes for strings or whatever I was trying to get from the $n_line | awk '{print $1, $2}' command. Lastly, there is a double bracket needed to wrap around the two sides of the comparator. So in the end it should look something like this:
b_string=`echo "$n_line" | awk '{print $1, $2}'`
if [[ "$a_string" == "$b_string" ]]; then
I figured to declare the echoing part into a variable as well, it may look a bit cleaner and easier to handle. There are still some other problems with this script, but as of now I think the primary issue is solved.

Sed remove selected line to file using shell script variable

I have shell script variable var="7,8,9"
These are the line number use to delete to file using sed.
Here I tried:
sed -i "$var"'d' test_file.txt
But i got error `sed: -e expression #1, char 4: unknown command: ,'
Is there any other way to remove the line?
sed command doesn't accept comma delimited line numbers.
You can use this awk command that uses a bit if BASH string manipulation to form a regex with the given comma separated line numbers:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
This will set awk variable var as this regex:
^(7|8|9)$
And then condition NR !~ var ensures that we print only those lines that don't match above regex.
For inline editing, if you gnu-awk with version > 4.0 then use:
awk -i inplace -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
Or for older awk use:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt > $$.tmp && mv $$.tmp test_file.txt
I like sed, you were close to it. You just need to split each line number into a separate command. How about this:
sed -e "$(echo 1,3,4 | tr ',' '\n' | while read N; do printf '%dd;' $N; done)"
do like this:
sed -i "`echo $var|sed 's/,/d;/g'`d;" file
Another option to consider would be ed, with printf '%s\n' to put commands onto separate lines:
lines=( 9 8 7 )
printf '%s\n' "${lines[#]/%/d}" w | ed -s file
The array lines contains the line numbers to be deleted; it's important to put these in descending order! The expansion ${lines[#]/%/d} adds a d (delete) command to each line number and w writes to the file at the end. You can change this to ,p instead, to check the output before overwriting your file.
As an aside, for this example, you could also just use 7,9 as a single entry in the array.

Assigning deciles using bash

I'm learning bash, and here's a short script to assign deciles to the second column of file $1.
The complicating bit is the use of awk within the script, leading to ambiguous redirects when I run the script.
I would have gotten this done in SAS by now, but like the idea of two lines of code doing the job.
How can I communicate the total number of rows (${N}) to awk within the script? Thanks.
N=$(wc -l < $1)
cat $1 | sort -t' ' -k2gr,2 | awk '{$3=int((((NR-1)*10.0)/"${N}")+1);print $0}'
You can set an awk variable from the command line using -v.
N=$(wc -l < "$1" | tr -d ' ')
sort -t' ' -k2gr,2 "$1" | awk -v n=$N '{$3=int((((NR-1)*10.0)/n)+1);print $0}'
I added tr -d to get rid of the leading spaces that wc -l puts in its result.

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

Resources