Adding headers to sometimes empty CSV file via bash script

Adding headers to sometimes empty CSV file via bash script - bash

I am trying to add a few headers to occasionally empty CSV file (so sometimes the file will be empty and sometimes it will have data) via bash. So far I have tried using sed and awk.
Sed:
sed -i '1liAge,Sex,Location' $nameOfFile
Awk:
awk 'BEGIN{print "Age,Sex,Location"}1' $nameOfFile
Both of these do not appear to work since no headers are added. When I echo $nameOfFile, csvInfo.csv is printed so I don't believe there's anything wrong with $nameOfFile. Not sure where i'm going wrong here.

Your sed command requires there to already be at least one line in the file, so I don't think you can make that work reliably.
The Awk command you have should work as such, but it writes its results to standard output. If you have GNU Awk, you can use -i inplace to get behavior similar to sed -i; but the portable solution is to write the results to a temporary file, then move back on top of the old file.
awk 'BEGIN{print "Age,Sex,Location"}1' "$nameOfFile" >tmp
mv tmp "$nameOfFile"
Notice also When to wrap quotes around a shell variable.
If you want to avoid adding a header when it's already there, you can keep the BEGIN block, but suppress printing of the old header, if present:
awk 'BEGIN{print "Age,Sex,Location"}
NR>1 || $0 !~ /^Age,Sex,Location$/' "$nameOfFile" >tmp
mv tmp "$nameOfFile"
Proper hygiene around temporary files is somewhat more involved for a production script, but if this is just for your personal use, let's leave it at this. For more information, perhaps review Removing created temp files in unexpected bash exit

Related

bash for with awk command inside

i have this piece of code of a bash script
for file in "$(ls | grep .*.c)"
do
cat $file |awk '/.*open/{print $0}'|awk -v nomeprog=$file 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
done
this give me this error :
*awk: cmd. line:1: file.c
awk: cmd. line:1: ^ syntax error
*i have this error when i have more of a file c into the folder , with just 1 file it works

Overall, you should probably follow Charles Duffy's recommendation to use more appropriate tools for the task. But I'd like to go over why the current script isn't working and how to fix it, as a learning exercise.
Also, two quick recommendations for shell script checking & troubleshooting: run your scripts through shellcheck.net to point out common mistakes, and when debugging put set -x before the problem section (and set +x after), so the shell will print out what it thinks is going on as the script runs.
The problem is due to how you're using the file variable. Let's look at what this does:
for file in "$(ls | grep .*.c)"
First, ls prints a list of files in the current directory, one per line. ls is really intended for interactive use, and its output can be ambiguous and hard to parse correctly; in a script, there are almost always better ways to get lists of filenames (and I'll show you one in a bit).
The output of ls is piped to grep .*.c, which is wrong in a number of ways. First, since that pattern contains a wildcard character ("*"), the shell will try to expand it into a list of matching filenames. If the directory contains any hidden (with a leading ".") .c files, it'll replace it with a list of those, and nothing is going to work at all right. Always quote the pattern argument to grep to prevent this.
But the pattern itself (".*.c") is also wrong; it searches for any number of arbitrary characters (".*"), followed by a single arbitrary character ("." -- this is in a regex, so "." is not treated literally), followed by a "c". And it searches for this anywhere in the line, so any filename that contains a "c" somewhere other than the first position will match. The pattern you want would be something like '[.]c$' (note that I wrapped it in single-quotes, so the shell won't try to treat $ as a variable reference like it would in double-quotes).
Then there's another problem, which is (part of) the problem you're actually experiencing: the output of that ls | grep is expanded in double-quotes. The double-quotes around it tell the shell not to do its usual word-split-and-wildcard-expand thing on the result. The common (but still wrong) thing to do here is to leave off the double-quotes, because word-splitting will probably break the list of filenames up into individual filenames, so you can iterate over them one-by-one. (Unless any filenames contain funny characters, in which case it can give weird results.) But with double-quotes it doesn't split them, it just treats the whole thing as one big item, so your loop runs once with file set to "src1.c\nsrc2.c\nsrc3.c" (where the \n's represent actual newlines).
This is the sort of trouble you can get into by parsing ls. Don't do it, just use a shell wildcard directly:
for file in *.c
This is much simpler, avoids all the confusion about regex pattern syntax vs wildcard pattern syntax, ambiguity in ls's output, etc. It's simple, clear, and it just works.
That's probably enough to get it to work for you, but there are a couple of other things you really should fix if you're doing something like this. First, you should double-quote variable references (i.e. use "$file" instead of just $file). This, is another part of the error you're getting; look at the second awk command:
awk -v nomeprog=$file 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
With file set to "src1.c\nsrc2.c\nsrc3.c", the shell will do its word-split-and-wildcard-expand thing on it, giving:
awk -v nomeprog=src1.c src2.c src3.c 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
awk will thus set its nomeprog variable to "src1.c", and then try to run "src2.c" as an awk command (on input files named "src3.c" and "BEGIN{FS=..."). "src2.c" is, of course, not a valid awk command, so you get syntax error.
This sort of confusion is typical of the chaos that can result from unquoted variable references. Double-quote your variable references.
The other thing, which is much less important, is that you have a useless use of cat. Anytime you have the pattern:
cat somefile | somecommand
(and it's just a single file, not several that need to be catenated together), you should just use:
somecommand <somefile
and in some cases like awk and grep, the command itself can take input filename(s) directly as arguments, so you can just use:
somecommand somefile
so in your case, rather than
cat "$file" | awk '/.*open/{print $0}' | awk -v nomeprog="$file" 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
I'd just use:
awk '/.*open/{print $0}' "$file" | awk -v nomeprog="$file" 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
(Although, as Charles Duffy pointed out, even that can be simplified quite a lot.)

Prevent awk from interpeting variable contents?

I'm attempting to parse a make -n output to make sure only the programs I want to call are being called. However, awk tries to interpret the contents of the output and run (?) it. Errors are something like awk: fatal: Cannot find file 'make'. I have gotten around this by saving the output as a temporary file and then reading that into awk. However, I'm sure there's a better way; any suggestions?
EDIT: I'm using the output later in my script and would like to avoid saving a file to increase speed if possible.
Here's what isn't working:
my_input=$(make -n file)
my_lines=$(echo $my_input | awk '/bin/ { print $1 }') #also tried printf and cat
Here's what works but obviously takes longer than it has to because of writing the file:
make -n file > temp
my_lines=$(awk '/bin/ { print $1 }' temp)
Many thanks for your help!

You can directly parse the output when it is generated by the following command and save the result in a file.
make -n file | grep bin > result.out
If you really want to go for an overkill awk solution, change your second line in the following way:
my_lines="$(awk '/bin/ { print }' temp)"

sed delete lines from a logfile that respect numbers in another file

I have a logfile that is starting to grow in size, and I need to remove certain lines that match a given pattern from it. I used grep -nr for extracting the target lines and copied them in a temp file, but I can't figure how can I tell sed to delete those lines from the log file.
I have found something similar here: Delete line from text file with line numbers from another file but this doesn't actually delete the lines, it only prints the wanted output.
Can anyone give me a hint?
Thank you!

I think, what you really need is sed -i '/pattern/d' filename.
But to answer your question:
How to delete lines matching the line numbers from another file:
(Assuming that there are no special characters in the line_numbers file, just numbers one per line...)
awk 'NR==FNR{a[$0]=1; next}; !(FNR in a)' line_numbers input.log

If you already have a way of printing what you want to standard output, there's no reason why you can't just overwrite the original file. For example, to only print lines that don't match a pattern, you could use:
grep -v 'pattern' original > tmp && mv tmp original
This redirects the output of the grep command to a temporary file, then overwrites the original file. Any other solution that does this "in-place" is only pretending to do so, after all.
There are numerous other ways to do this, using sed as suggested in the comments, or awk:
awk '!/pattern/' original > tmp && mv tmp original

If you want to use sed and your file is growing continuously, then you will have to execute sed -i '/REGEX/d' FILENAME more frequently.
Instead, you can make use of syslog-ng. You just have to edit the /etc/syslog-ng/syslog-ng.conf, wherein you need to create/edit an appropriate filter (somewhat like: f_example { not match(REGEX); }; ), save file, restart the service and you're done.
The messages containing that particular pattern will not be dumped in the log file. In this way, your file would not only stop growing, but also you need not process it periodically using sed or grep.
Reference

To remove a line with sed, you can do:
sed "${line}d" <originalLogF >tmpF
If you want remove several lines, you can pass a sed script. Here I delete the first and the second lines:
sed '1d;2d' <originalLogF >tmpF
If your log file is big, you probably have two pass. The first one to generate the sed script in a file, and a second one to apply the sed script. But it will be more efficient to have only one pass if you be able to recognize the pattern directly (and do not use "${line}d" at all). See Tom Fenech or anishsane answers, I think it is what you really need.
By the way you have to preserve the inode (not only the file name) because most of logger keep the file opened. So the final command (if you don't use sed -i) should be:
cat tmpF >originalLogF
By the way, the "-i" option (sed) is NOT magic, sed will create a temporary buffer, so if we have concurrent append to the log file, you can loose some lines.

Read the n-th line of multiple files into a single output

I have some dump files called dump_mydump_0.cfg, dump_mydump_250.cfg, ..., all the way up to dump_mydump_40000.cfg. For each dump file, I'd like to take the 16th line out, read them, and put them into one single file.
I'm using sed, but I came across some syntax errors. Here's what I have so far:
for lineNo in 16 ;
for fileNo in 0,40000 ; do
sed -n "${lineNo}{p;q;}" dump_mydump_file${lineNo}.cfg >> data.txt
done

Considering your files are named with intervals of 250, you should get it working using:
for lineNo in 16; do
for fileNo in {0..40000..250}; do
sed -n "${lineNo}{p;q;}" dump_mydump_file${fileNo}.cfg >> data.txt
done
done
Note both the bash syntax corrections -- do, done, and {0..40000..250} --, and the input file name, that should depend on ${fileNo} instead of ${lineNo}.

Alternatively, with (GNU) awk:
awk "FNR==16{print;nextfile}" dump_mydump_{0..40000..250}.cfg > data.txt
(I used the filenames as shown in the OP as opposed to the ones which would have been generated by the bash for loop, if corrected to work. But you can edit as needed.)
The advantage is that you don't need the for loop, and you don't need to spawn 160 processes. But it's not a huge advantage.

This might work for you (GNU sed):
sed -ns '16wdata.txt' dump_mydump_{0..40000..250}.cfg

How to modify a file by using awk?

awk '/<ul>/ {ul++} ul == 6 { getline } 1' /var/www/html/INFOSEC/english/test.html
if i run this line of code, the shell will not help me to modify the file, instead, it only output the result in the shell. Can any one help???thx

The simplest solution is to just send the output to a file; you might want to copy the file beforehand so you don't have to overwrite the file you're reading (which might otherwise lead to undesired behavior).
cp test.html test.html.orig
awk 'your awk script here' test.html.orig >test.html
# and then optionally remove the copy:
rm test.html.orig

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Adding headers to sometimes empty CSV file via bash script - bash

Related

bash for with awk command inside

Prevent awk from interpeting variable contents?

sed delete lines from a logfile that respect numbers in another file

Read the n-th line of multiple files into a single output

How to modify a file by using awk?

Categories

Resources