Using awk to put a header in a text file - bash

I have lots of text files and need to put a header on each one of them depending of the data on each file.
This awk command accomplishes the task:
awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' my_text.file
But this prints it on the screen and I want to put this output in the header of each of my file, and saving the modifications with the same file name.
Here is what I've tried:
for i in *.txt
do
echo Processing ${i}
cat awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' "${i}" ${i} > $$.tmp && mv $$.tmp "${i}"
done
So I guess I can't use cat to put them as a header, or am I doing something wrong?
Thanks in advance

UPDATE:
with awk:
awk 'BEGIN{print "header"}1' test.txt
without awk:
with cat & echo:
cat <(echo "header") test.txt
(OR)
using tac:
tac test.txt | echo "header" >> test.txt | tac test.txt

I THINK what you're trying to do with your loop is:
for i in *.txt
do
echo "Processing $i"
awk 'NR==1{first=$1}{sum+=$1}END{last=$1;print NR,last,"L"}' "$i" > $$.tmp &&
cat "$i" >> $$.tmp &&
mv $$.tmp "$i"
done
but it's not clear what you're really trying to do since you never use first or sum and setting last in the END section is a bad idea as it will not work across all awks and there's a simple alternative.
If you update your question with some sample input and expected output we can help you.

Related

redirect output of loop to current reading file

I have simple script that looks like
for file in `ls -rlt *.rules | awk '{print $9}'`
do
cat $file | awk -F"|" -v DATE=$(date +%Y"_"%m"_"%d) '!$3{$3=DATE} !$4{$4=DATE} 1' OFS="|" $file
done
How can i redirect output of awk to the same file which it is reading to perform action.
files have data before running above script
123|test||
After running script files should have data like
123|test|2017_04_05|2017_04_05
You cannot replace your files on the fly like this, mostly because you increase their size.
The way is to use temporary file, then replace the current:
for file in `ls -1 *.rules `
do
TMP_FILE=/tmp/${file}_$$
awk -F"|" -v DATE=$(date +%Y"_"%m"_"%d) '!$3{$3=DATE} !$4{$4=DATE} 1' OFS="|" $file > ${TMP_FILE}
mv ${TMP_FILE} $file
done
I would modify Michael Vehrs otherwise good answer as follows:
ls -rt *.rules | while read file
do
TMP_FILE="/tmp/${file}_$$"
awk -F"|" -v DATE=$(date +%Y"_"%m"_"%d) \
'!$3{$3=DATE} !$4{$4=DATE} 1' OFS="|" "$file" > "$TMP_FILE"
mv "$TMP_FILE" "$file"
done
Your question uses ls(1) to sort the files by time, oldest first. The above preserves that property. I removed the {} braces because they add nothing in a shell script if the variable name isn't being interpolated, and quotes to cope with filenames that include whitespace.
If time-order doesn't matter, I'd consider an inside-out solution: in awk, write to a temporary file instead of standard output, and then rename it with system in an END block. Then if something goes wrong your input is preserved.
First of all, it is silly to use a combination of ls -rlt and awk when the only thing you need is the file name. You don't even need ls because the shell glob is expanded by the shell, not ls. Simply use for file in *.rules. Since the date would seem to be the same for every file (unless you run the command at midnight), it is sufficient to calculate it in advance:
date=$(date +%Y"_"%m"_"%d)
for file in *.rules
do
TMP_FILE=$(mktemp ${file}_XXXXXX)
awk -F"|" -v DATE=${date} '!$3{$3=DATE} !$4{$4=DATE} 1' OFS="|" $file > ${TMP_FILE}
mv ${TMP_FILE} $file
done
However, since awk also knows which file it is reading, you could do something like this:
awk -F"|" -v DATE=$(date +%Y"_"%m"_"%d) \
'!$3{$3=DATE} !$4{$4=DATE} { print > FILENAME ".tmp" }' OFS="|" *.rules
rename .tmp "" *.rules.tmp

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

Unix: Split a file into two based on matched string

I want to split a file into two, but cannot find a way to do this.
Master.txt
Happy Birthday to you! [[#HAPPY]]
Stop it. [[#COMMAND]]
Make a U-turn. [[#COMMAND]]
I want to split into two files, with the 2nd file starting when it matches the regex pattern [[#
Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.
Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]
I've tried using awk:
awk -v RS="[[#*" '{ print $0 > "temp" NR }'
but it doesn't give my desired output -- any help would be appreciated!
Here is one way with GNU awk:
awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master
Test:
$ ls
master
$ cat master
Happy Birthday to you! [[#HAPPY]]
Stop it. [[#COMMAND]]
Make a U-turn. [[#COMMAND]]
$ awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master
$ ls
master Output1.txt Output2.txt
$ head Out*
==> Output1.txt <==
Happy Birthday to you!
Stop it.
Make a U-turn.
==> Output2.txt <==
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]
A pure bash solution might be a little slower, but is very readable:
while read line; do
[[ $line =~ (.*)(\[\[#.*]]) ]]
printf "%s" "${BASH_REMATCH[1]}" >&3
printf "%s" "${BASH_REMATCH[2]}" >&4
done 3> output1.txt 4> output2.txt
you can write small script like this…
#!/bin/ksh
sed -i -e 's/ \[\[#/,\[\[#/' $1
cut -d, -f1 $1 > $1.part1
cut -d, -f2 $1 > $1.part2
---------------------------------------------
OR…use multi-command line
# sed -i -e 's/ \[\[#/,\[\[#/' Master.txt ; cut -d, -f1 Master.txt > output1.txt ; cut -d, -f1 Master.txt > output.txt
Simpler in sed, IMHO:
$ sed 's/^\([^[]*\).*/\1/' Master.txt > Output1.txt
$ sed 's/^[^[]*//' Master.txt > Output2.txt
sed -n 's/\[\[#/\
&/;P
/\n/ {s/.*\n//;H;}
$ {x;s/\n//;w Output2.txt
}' YourFile > Output1.txt
in 1 sed but awk is better suite for this task
This might work for you(GNU sed):
sed -n 's/\[\[#/\n&/;P;s/.*\n//w file3' file1 >file2
No need for gnu awk, this should work for any awk
awk -F'\\[\\[#' '{print $1>"Output1.txt";print "[[#"$2>"Output2.txt"}' Master.txt
cat Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.
cat Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

Using the output of awk as the list of names in a for loop

How can I pass the output of awk to a for file in loop?
for file in awk '{print $2}' my_file; do echo $file done;
my_file contains the name of the files whose name should be displayed (echoed).
I get just a
>
instead of my normal prompt.
Use backticks or $(...) to substitute the output of a command:
for file in $(awk '{print $2}' my_file)
do
echo "$file"
done
for file in $(awk '{print $2}' my_file); do echo "$file"; done
The notation to use is $(...) or Command Substitution.
for file in $(awk '{print $2}' my_file)
do
echo $file
done
Where I assume that you do more in the body of the loop than just echo since you could then leave the loop out altogether:
awk '{print $2}' my_file
Or, if you miss typing semicolons and don't like to spread code over multiple lines for readability, then you can use:
for file in $(awk '{print $2}' my_file); do echo $file; done
You will also find in (mostly older) code the backticks used:
for file in `awk '{print $2}' my_file`
do
echo $file
done
Quite apart from being difficult to use in the Markdown used to format comments (and questions and answers) on Stack Overflow, the backticks are not as friendly, especially when nested, so you should recognize them and understand them but not use them.
Incidentally, the reason you got the > prompt is that this command line:
for file in awk '{print $2}' my_file; do echo $file done;
is missing a semicolon before the done. The shell was still waiting for the done. Had you typed done and return, you would have seen the output:
awk done
{print $2} done
my_file done
Using backticks or $(awk ...) for command substitution is an acceptable solution for a small number of files; however, consider using xargs for single commands or pipes or a simple while read ... for more complex tasks (but it will work for simple ones too)
awk '...' |while read FILENAME; do
#do work with each file here using $FILENAME
done
This will allow processing to be done as each filename is processed instead of having to wait for the whole awk script to complete and allow for a larger set of filenames (you can only give so many args to a for x in ...; do) This will typically speed up your scripts and allow the same kinds of operations you would get in a for in loop without its limitations.

grep-ing multiple files

I want to grep multiple files in a directory and collect the output of each grep in a separate file ..So if I grep 20 files, I should get 20 output-files which contain the searched item. Can anybody help me with this? Thanks.
Use a for statement:
for a in *.txt; do grep target $a >$a.out; done
just one gawk command
gawk '/target/ {print $0 > FILENAME".out"}' *.txt
you can use just the shell, no need external commands
for file in *.txt
do
while read -r line
do
case "$line" in
*pattern*) echo $line >> "${file%.txt}.out";;
esac
done < "$file"
done

Resources