Bash count non-commented lines and write to output with filename - bash

I would like to count the non-commented lines in multiple files and append the result to an output file
This is how I would count the non-commented lines for multiple files, but I don't know how to store the result together with the filename in an output.txt file.
for file in *txt
do
cat "$file" | sed '/^\s*#/d' | wc -l
done

You can write several things per line, and you can redirect the output of the whole loop to a file:
for file in *txt
do
echo -n $file' '
cat "$file" | sed '/^\s*#/d' | wc -l
done > output.txt
Also you can shorten the file processing down to:
egrep -v '^\s*#' "$file" | wc -l

for file in *txt
do
cat "$file" | sed '/^\s*#/d' | wc -l >> output.txt
done

Related

Estimate number of lines in a file and insert that value as first line

I have many files for which I have to estimate the number of lines in each file and add that value as first line. To estimate that, I used something like this:
wc -l 000600.txt | awk '{ print $1 }'
However, no success on how to do it for all files and then to add the value corresponding to each file as first line.
An example:
a.txt b.txt c.txt
>>print a
15
>> print b
22
>>print c
56
Then 15, 22 and 56 should be added respectively to: a.txt b.txt and c.txt
I appreciate the help.
You can add a pattern for example (LINENUM) in first line of file and then use the following script.
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i 's/LINENUM/LINENUM:{}/' a.txt
or just use from this script:
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' a.txt
This way you can add the line number as the first line for all *.txt files in current directory. Also using that group command here would be faster than inplace editing commands, in case of large files. Do not modify spaces or semicolons into the grouping.
for f in *.txt; do
{ wc -l < "$f"; cat "$f"; } > "${f}.tmp" && mv "${f}.tmp" "$f"
done
For iterate over the all file you can add use from this script.
for f in `ls *` ; do if [ -f $f ]; then wc -l $f | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' $f ; fi; done
This might work for you (GNU sed):
sed -i '1h;1!H;$!d;=;x' file1 file2 file3 etc ...
Store each file in memory and insert the last lines line number as the file size.
Alternative:
sed -i ':a;$!{N;ba};=' file?

How to get output of awk into a tab-delimited file merging two lines to a line every time?

I have multiple files in gz format and used this script which counts lines in each file and prints 1/4 of lines for each file:
for file in *.gz;
do echo $file;
gunzip -c $file | wc -l | awk '{print, $1/4}';
done
STDOUT:
AB.gz
12
CD.gz
4
How I can pipe outputs of awk into a tab-delimited file like this merging two lines each time:
AB.gz 12
CD.gz 4
I tried paste by piping | paste -sd '\t' > output.txt in the script but it didn't work.
You can use a script like this:
for file in *.gz; do
gzcat "$file" | awk -v fn="$file" -v OFS='\t' 'END{print fn, int(NR/4)}'
done
Do not echo a newline after the file:
for file in *.gz;
do
printf "%s " "${file}"
gunzip -c $file | wc -l | awk '{print, $1/4}';
done

One line command with variable, word count and zcat

I have many files on a server which contains many lines:
201701010530.contentState.csv.gz
201701020530.contentState.csv.gz
201701030530.contentState.csv.gz
201701040530.contentState.csv.gz
I would like with one line command this result:
170033|20170101
169865|20170102
170010|20170103
170715|20170104
The goal is to have the number of lines of each file, just by keeping the date which is already in the filename of the file.
I tried this but the result is not in one line but two...
for f in $(ls -1 2017*gz);do zcat $f | wc -l;echo $f | awk '{print substr($0,1,8)}';done
Thanks in advance guys.
Just use zcat file | wc -l to get the number of lines.
For the name, I understand it is enough to extract the first 8 characters:
$ t="201701030530.contentState.csv.gz"
$ echo "${t:0:8}"
20170103
All together:
for file in 2017*gz;
do
lines=$(zcat "$file" | wc -l)
printf "%s|%s\n" "$lines" "${file:0:8}"
done > myresult.csv
Note the usage of for file in 2017*gz; to go through the files matching the 2017*gz pattern: this suffices, no need to parse ls!
Use zgrep -c ^ file to count the lines, here encapsulated in awk:
$ awk 'FNR==1{ "zgrep -c ^ " FILENAME | getline s; print s "|" substr(FILENAME,1,8) }' *.gz
12|20170101
The whole "zgrep -c ^ " FILENAME should probably be in a var (s) and then s | getline s.

BASH Grep for Specific Email Address in CSV

I'm trying to compare two CSV files by reading the first line-by-line and grepping the second file for a match. Using Diff is not a viable solution. I seem to be having a problem with having the email address stored as a variable when I grep the second file.
#!/bin/bash
LANG=C
head -2 $1 | tail -1 | while read -r line; do
line=$( echo $line | sed 's/\n//g' )
echo $line
cat $2 | cut -d',' -f1 | grep -iF "$line"
done
Variable $line contains an email address that DOES exist in file $2, but I'm not getting any results.
What am I doing wrong?
File1
Email
email#verizon.net
email#gmail.com
email#yahoo.com
File2
email,,,,
email#verizon.net,,,,
email#gmail.com,,,,
email#yahoo.com,,,,
Given:
# csv_0.csv
email
me#me.com
you#me.com
fee#me.com
and
# csv_1.csv
email,foo,bar,baz,bim
bee#me.com,3,2,3,4
me#me.com,4,1,1,32
you#me.com,7,4,6,6
gee#me.com,1,2,2,6
me#me.com,5,7,2,34
you#me.com,22,3,2,33
I ran
$ pattern=$(head -2 csv_0.csv | tail -1 | sed s/,.*//g)
$ grep $pattern csv_1.csv
me#me.com,4,1,1,32
me#me.com,5,7,2,34
To do this for each line in csv_0.csv
#!/bin/bash
LANG=C
filename="$1"
{
read # don't read csv headers
while read line
do
pattern=$(echo $line | sed s/,.*//g)
grep $pattern $2
done
} <"$filename"
Then
$ ./csv_read.sh csv_2.csv csv_3.csv
me#me.com,4,1,1,32
me#me.com,5,7,2,34
you#me.com,7,4,6,6
you#me.com,22,3,2,33

awk parse filename and add result to the end of each line

I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string
for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done
Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file
Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done
If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).
Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done

Resources