How to modify a file by using awk? - bash

awk '/<ul>/ {ul++} ul == 6 { getline } 1' /var/www/html/INFOSEC/english/test.html
if i run this line of code, the shell will not help me to modify the file, instead, it only output the result in the shell. Can any one help???thx

The simplest solution is to just send the output to a file; you might want to copy the file beforehand so you don't have to overwrite the file you're reading (which might otherwise lead to undesired behavior).
cp test.html test.html.orig
awk 'your awk script here' test.html.orig >test.html
# and then optionally remove the copy:
rm test.html.orig

Related

Adding headers to sometimes empty CSV file via bash script

I am trying to add a few headers to occasionally empty CSV file (so sometimes the file will be empty and sometimes it will have data) via bash. So far I have tried using sed and awk.
Sed:
sed -i '1liAge,Sex,Location' $nameOfFile
Awk:
awk 'BEGIN{print "Age,Sex,Location"}1' $nameOfFile
Both of these do not appear to work since no headers are added. When I echo $nameOfFile, csvInfo.csv is printed so I don't believe there's anything wrong with $nameOfFile. Not sure where i'm going wrong here.
Your sed command requires there to already be at least one line in the file, so I don't think you can make that work reliably.
The Awk command you have should work as such, but it writes its results to standard output. If you have GNU Awk, you can use -i inplace to get behavior similar to sed -i; but the portable solution is to write the results to a temporary file, then move back on top of the old file.
awk 'BEGIN{print "Age,Sex,Location"}1' "$nameOfFile" >tmp
mv tmp "$nameOfFile"
Notice also When to wrap quotes around a shell variable.
If you want to avoid adding a header when it's already there, you can keep the BEGIN block, but suppress printing of the old header, if present:
awk 'BEGIN{print "Age,Sex,Location"}
NR>1 || $0 !~ /^Age,Sex,Location$/' "$nameOfFile" >tmp
mv tmp "$nameOfFile"
Proper hygiene around temporary files is somewhat more involved for a production script, but if this is just for your personal use, let's leave it at this. For more information, perhaps review Removing created temp files in unexpected bash exit

Prevent awk from interpeting variable contents?

I'm attempting to parse a make -n output to make sure only the programs I want to call are being called. However, awk tries to interpret the contents of the output and run (?) it. Errors are something like awk: fatal: Cannot find file 'make'. I have gotten around this by saving the output as a temporary file and then reading that into awk. However, I'm sure there's a better way; any suggestions?
EDIT: I'm using the output later in my script and would like to avoid saving a file to increase speed if possible.
Here's what isn't working:
my_input=$(make -n file)
my_lines=$(echo $my_input | awk '/bin/ { print $1 }') #also tried printf and cat
Here's what works but obviously takes longer than it has to because of writing the file:
make -n file > temp
my_lines=$(awk '/bin/ { print $1 }' temp)
Many thanks for your help!
You can directly parse the output when it is generated by the following command and save the result in a file.
make -n file | grep bin > result.out
If you really want to go for an overkill awk solution, change your second line in the following way:
my_lines="$(awk '/bin/ { print }' temp)"

Bash - Comment out a specific line that partly matches a string in a file

I have a file with an argument
testArgument=
It could have something equal to it or nothing but I want to comment it and add the new line with supplied info
Before:
testArgument=Something
Results:
#testVariable=Something
#Comments to let the user know of why the change
testVariable=NewSomething
Should I loop it or should I use something like sed? I need it to be compatible for Ubuntu and Debian and bash.
You could use sed like this:
sed 's/^\(testArgument\)=.*/#&\n\n#Comment here\n\1=NewSomething/' file
& prints the full match in the replacement and \1 refers to the first capture group "testArgument".
To perform the substitution on the file in-place (i.e. replace the contents of the original file), add the -i switch. Otherwise, if you want to output the command to a new file, do sed '...' file > newfile.
If you are using a different version of sed that doesn't support \n newlines in the replacement, see this answer for some ways to deal with it.
Alternatively, using GNU awk:
gawk '/^testArgument/ {$0 = gensub(/^(testArgument)=.*/, "#\\0\n\n#Comment here\n\\1=NewSomething", 1)}1' file
You can use awk
awk '/^testArugment/ {$0="#"$0"\n\n#Comments to let the user know of why the change\ntestVariable=NewSomething"}1' file
cat file
some data
testArugment=Something
more data
awk '/^testArugment/ {$0="#"$0"\n\n#Comments to let the user know of why the change\ntestVariable=NewSomething"}1' file
some data
#testArugment=Something
#Comments to let the user know of why the change
testVariable=NewSomething
more data
To change the original file
awk 'code....' file > tmp && mv tmp file

how to write finding output to same file using awk command

awk '/^nameserver/ && !modif { printf("nameserver 127.0.0.1\n"); modif=1 } {print}' testfile.txt
It is displaying output but I want to write the output to same file. In my example testfile.txt.
Not possible per se. You need a second temporary file because you can't read and overwrite the same file. Something like:
awk '(PROGRAM)' testfile.txt > testfile.tmp && mv testfile.tmp testfile.txt
The mktemp program is useful for generating unique temporary file names.
There are some hacks for avoiding a temporary file, but they rely mostly on caching and read buffers and quickly get unstable for larger files.
Since GNU Awk 4.1.0, there is the "inplace" extension, so you can do:
$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3
To keep a backup copy of original files, try this:
$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3
This can be used to simulate the GNU sed -i feature.
See: Enabling In-Place File Editing
Despite the fact that using a temp file is correct, I don't like it because :
you have to be sure not to erase another temp file (yes you can use mktemp - it's a pretty usefull tool)
you have to take care of deleting it (or moving it like thiton said) INCLUDING when your script crash or stop before the end (so deleting temp files at the end of the script is not that wise)
it generate IO on disk (ok not that much but we can make it lighter)
So my method to avoid temp file is simple:
my_output="$(awk '(PROGRAM)' source_file)"
echo "$my_output" > source_file
Note the use of double quotes either when grabbing the output from the awk command AND when using echo (if you don't, you won't have newlines).
Had to make an account when seeing 'awk' and 'not possible' in one sentence. Here is an awk-only solution without creating a temporary file:
awk '{a[b++]=$0} END {for(c=1;c<=b;c++)print a[c]>ARGV[1]}' file
You can also use sponge from moreutils.
For example
awk '!a[$0]++' file|sponge file
removes duplicate lines and
awk '{$2=10*$2}1' file|sponge file
multiplies the second column by 10.
Try to include statement in your awk file so that you can find the output in a new file. Here total is a calculated value.
print $total, total >> "new_file"
This inline writing worked for me. Redirect the output from print back to the original file.
echo "1" > test.txt
awk '{$1++; print> "test.txt"}' test.txt
cat test.txt
#$> 2

Best way to modify a file when using pipes?

I often have shell programming tasks where I run into this pattern:
cat file | some_script > file
This is unsafe - cat may not have read in the entire file before some_script starts writing to it. I don't really want to write the result to a temporary file (its slow, and I don't want the added complication of thinking up a unique new name).
Perhaps, there is there is a standard shell command that will buffer a whole stream until EOF is reached? Something like:
cat file | bufferUntilEOF | script > file
Ideas?
Like many others, I like to use temporary files. I use the shell process-id as part of the temporary name so that if multiple copies of the script are running at the same time, they won't conflict. Finally, I then only overwrite the original file if the script succeeds (using boolean operator short-circuiting - it's a little dense but very nice for simple command lines). Putting that all together, it would look like:
some_script < file > smscrpt.$$ && mv smscrpt.$$ file
This will leave the temporary file if the command fails. If you want to clean up on error, you can change that to:
some_script < file > smscrpt.$$ && mv smscrpt.$$ file || rm smscrpt.$$
BTW, I got rid of the poor use of cat and replaced it with input redirection.
You're looking for sponge.
Using a temporary file is the correct solution here. When you use a redirection like '>', it is handled by the shell, and no matter how many commands are in your pipeline, the shell is free to delete and overwrite the output file before any command is executed (during pipeline setup).
Another option is just to read the file into a variable:
file_contents=$(cat file)
echo "$file_contents" | script1 | script2 > file
Using mktemp(1) or tempfile(1) saves you the expense of having to think up unique filename.
In response to the OP's question above about using sponge without external dependencies, and building on #D.Shawley's answer, you can have the effect of sponge with only a dependency on gawk, which is not uncommon on Unix or Unix-like systems:
cat foo | gawk -voutfn=foo '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'
The check for NR>0 is to truncate the input file.
To use this in a shell script, change -voutfn=foo to -voutfn="$1" or whatever syntax your shell uses for filename arguments. For example:
#!/bin/bash
cat "$1" | gawk -voutfn="$1" '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'
Note that, unlike real sponge, this may be limited to the size of RAM. sponge actually buffers in a temporary file if necessary.
Using a temporary file is IMO better than attempting to buffer the data in the pipeline.
It almost defeats the purpose of pipelines to buffer them.
I think you need to use mktemp. Something like this will work:
FILE=example-input.txt
TMP=`mktemp`
some_script <"$FILE" >"$TMP"
mv "$TMP" "$FILE"
I think that the best way is to use a temp file. However, if you want another approach, you can use something like awk to buffer up the input into memory before your application starts receiving input. The following script will buffer the all of the input into the lines array before it starts to output it to the next consumer in the pipeline.
{ lines[NR] = $0; }
END {
for (line_no=1; line_no<=NR; ++line_no) {
print lines[line_no];
}
}
You can collapse it into a one-liner if you want:
cat file | awk '{lines[NR]=$0;} END {for(i=1;i<=NR;++i) print lines[i];}' > file
With all of that, I would still recommend using a temporary file for the output and then overwriting the original file with it.

Resources