How to get output of awk into a tab-delimited file merging two lines to a line every time? - bash

I have multiple files in gz format and used this script which counts lines in each file and prints 1/4 of lines for each file:
for file in *.gz;
do echo $file;
gunzip -c $file | wc -l | awk '{print, $1/4}';
done
STDOUT:
AB.gz
12
CD.gz
4
How I can pipe outputs of awk into a tab-delimited file like this merging two lines each time:
AB.gz 12
CD.gz 4
I tried paste by piping | paste -sd '\t' > output.txt in the script but it didn't work.

You can use a script like this:
for file in *.gz; do
gzcat "$file" | awk -v fn="$file" -v OFS='\t' 'END{print fn, int(NR/4)}'
done

Do not echo a newline after the file:
for file in *.gz;
do
printf "%s " "${file}"
gunzip -c $file | wc -l | awk '{print, $1/4}';
done

Related

Estimate number of lines in a file and insert that value as first line

I have many files for which I have to estimate the number of lines in each file and add that value as first line. To estimate that, I used something like this:
wc -l 000600.txt | awk '{ print $1 }'
However, no success on how to do it for all files and then to add the value corresponding to each file as first line.
An example:
a.txt b.txt c.txt
>>print a
15
>> print b
22
>>print c
56
Then 15, 22 and 56 should be added respectively to: a.txt b.txt and c.txt
I appreciate the help.
You can add a pattern for example (LINENUM) in first line of file and then use the following script.
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i 's/LINENUM/LINENUM:{}/' a.txt
or just use from this script:
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' a.txt
This way you can add the line number as the first line for all *.txt files in current directory. Also using that group command here would be faster than inplace editing commands, in case of large files. Do not modify spaces or semicolons into the grouping.
for f in *.txt; do
{ wc -l < "$f"; cat "$f"; } > "${f}.tmp" && mv "${f}.tmp" "$f"
done
For iterate over the all file you can add use from this script.
for f in `ls *` ; do if [ -f $f ]; then wc -l $f | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' $f ; fi; done
This might work for you (GNU sed):
sed -i '1h;1!H;$!d;=;x' file1 file2 file3 etc ...
Store each file in memory and insert the last lines line number as the file size.
Alternative:
sed -i ':a;$!{N;ba};=' file?

Bash count non-commented lines and write to output with filename

I would like to count the non-commented lines in multiple files and append the result to an output file
This is how I would count the non-commented lines for multiple files, but I don't know how to store the result together with the filename in an output.txt file.
for file in *txt
do
cat "$file" | sed '/^\s*#/d' | wc -l
done
You can write several things per line, and you can redirect the output of the whole loop to a file:
for file in *txt
do
echo -n $file' '
cat "$file" | sed '/^\s*#/d' | wc -l
done > output.txt
Also you can shorten the file processing down to:
egrep -v '^\s*#' "$file" | wc -l
for file in *txt
do
cat "$file" | sed '/^\s*#/d' | wc -l >> output.txt
done

Efficient way to get unique value from log file

There is a large log file of 10GB, and formatted as following:
node123`1493000000`POST /api/info`app_id=123&token=123&sign=abc
node456`1493000000`POST /api/info`app_id=456&token=456&sign=abc
node456`1493000000`POST /api/info`token=456&app_id=456&sign=abc
node456`1493000000`POST /api/info`token=456&sign=abc&app_id=456
Now I want to get unique app_ids from the log file. For example, the expected result of the log file above should be:
123
456
I do that with shell script awk -F 'app_id=' '{print $2}' $filename | awk -F '&' '{print $1}' | sort | uniq, and is there a more efficient way?
If your log file's name is log_file.txt,you can use these commands:
grep -Po "(?<=&app_id=)[0-9]+" log_file.txt
awk -F "[&=]" '{print $4}' log_file.txt
Change the logfile name
awk '{print $17" "$18" "$19" "$20}' log.txt |sort -k1|uniq >> z #apache
# filename on line number(0-9) awk result
while read x;
do
echo $x
grep "$x" log.txt | wc -l
done < z

One line command with variable, word count and zcat

I have many files on a server which contains many lines:
201701010530.contentState.csv.gz
201701020530.contentState.csv.gz
201701030530.contentState.csv.gz
201701040530.contentState.csv.gz
I would like with one line command this result:
170033|20170101
169865|20170102
170010|20170103
170715|20170104
The goal is to have the number of lines of each file, just by keeping the date which is already in the filename of the file.
I tried this but the result is not in one line but two...
for f in $(ls -1 2017*gz);do zcat $f | wc -l;echo $f | awk '{print substr($0,1,8)}';done
Thanks in advance guys.
Just use zcat file | wc -l to get the number of lines.
For the name, I understand it is enough to extract the first 8 characters:
$ t="201701030530.contentState.csv.gz"
$ echo "${t:0:8}"
20170103
All together:
for file in 2017*gz;
do
lines=$(zcat "$file" | wc -l)
printf "%s|%s\n" "$lines" "${file:0:8}"
done > myresult.csv
Note the usage of for file in 2017*gz; to go through the files matching the 2017*gz pattern: this suffices, no need to parse ls!
Use zgrep -c ^ file to count the lines, here encapsulated in awk:
$ awk 'FNR==1{ "zgrep -c ^ " FILENAME | getline s; print s "|" substr(FILENAME,1,8) }' *.gz
12|20170101
The whole "zgrep -c ^ " FILENAME should probably be in a var (s) and then s | getline s.

Shell script: count the copies of each line from a txt

I would like to count the copies of each line in a txt file and I have tried so many things until know, but none worked well. In my case the text has just a word in each line.
This was my last try
echo -n 'enter file for edit: '
read file
for line in $file ; do
echo 'grep -w $line $file'
done; <$file
For example:
input file
a
a
a
c
c
Output file
a 3
c 2
Thanks in advance.
$ sort < $file | uniq -c | awk '{print $2 " " $1}'

Resources