awk execute same command on different files one by one - bash

Hi I have 30 txt files in a directory which are containing 4 columns.
How can I execute a same command on each file one by one and direct output to different file.
The command I am using is as below but its being applied on all the files and giving single output. All i want is to call each file one by one and direct outputs to a new file.
start=$1
patterns=''
for i in $(seq -43 -14); do
patterns="$patterns /cygdrive/c/test/kpi/SIGTRAN_Load_$(exec date '+%Y%m%d' --date="-${i} days ${start}")*"; done
cat /cygdrive/c/test/kpi/*$patterns | sed -e "s/\t/,/g" -e "s/ /,/g"| awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2> /cygdrive/c/test/kpi/SIGTRAN_Load.csv

Sth like this
for fileName in /path/to/files/foo*.txt
do
mangleFile "$fileName"
done
will mangle a list of files you give via globbing. If you want to generate the file name patterns as in your example, you can do it like this:
for i in $(seq -43 -14)
do
for fileName in /cygdrive/c/test/kpi/SIGTRAN_Load_"$(exec date '+%Y%m%d' --date="-${i} days ${start}")"*
do
mangleFile "$fileName"
done
done
This way the code stays much more readable, even if shorter solutions may exist.
The mangleFile of course then will be the awk call or whatever you would like to do with each file.

Use the following idiom:
for file in *
do
./your_shell_script_containing_the_above.sh $file > some_unique_id
done

You need to run a loop on all the matching files:
for i in /cygdrive/c/test/kpi/*$patterns; do
tr '[:space:]\n' ',\n' < "$i" | awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2 > "/cygdrive/c/test/kpi/SIGTRAN_Load-$i.csv"
done
PS: I haven't tried much to refactor your piped commands that can probably be shortened too.

Related

Combine multiple files into one including the file name

I have been looking around trying to combine multiple text files into including the name of the file.
My current file content is:
1111,2222,3333,4444
What I'm after is:
File1,1111,2222,3333,4444
File1,1111,2222,3333,4445
File1,1111,2222,3333,4446
File1,1111,2222,3333,4447
File2,1111,2222,3333,114444
File2,1111,2222,3333,114445
File2,1111,2222,3333,114446
I found multiple example how to combine them all but nothing to combine them including the file name.
Could you please try following. Considering that your Input_file names extensions are .csv.
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.csv > output_file
After seeing OP's comments if file extensions are .txt then try:
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.txt > output_file
Assuming all your files have a .txt extension and contain only one line as in the example, you can use the following code:
for f in *.txt; do echo "$f,$(cat "$f")"; done > output.log
where output.log is the output file.
Well, it works:
printf "%s\n" *.txt |
xargs -n1 -d $'\n' bash -c 'xargs -n1 -d $'\''\n'\'' printf "%s,%s\n" "$1" <"$1"' --
First output a newline separated list of files.
Then for each file xargs execute sh
Inside sh execute xargs for each line of file
and it executes printf "%s,%s\n" <filename> for each line of input
Tested in repl.
Solved using grep "" *.txt -I > $filename.

Concatenate files based on numeric sort of name substring in awk w/o header

I am interested in concatenate many files together based on the numeric number and also remove the first line.
e.g. chr1_smallfiles then chr2_smallfiles then chr3_smallfiles.... etc (each without the header)
Note that chr10_smallfiles needs to come after chr9_smallfiles -- that is, this needs to be numeric sort order.
When separate the two command awk and ls -v1, each does the job properly, but when put them together, it doesn't work. Please help thanks!
awk 'FNR>1' | ls -v1 chr*_smallfiles > bigfile
The issue is with the way that you're trying to pass the list of files to awk. At the moment, you're piping the output of awk to ls, which makes no sense.
Bear in mind that, as mentioned in the comments, ls is a tool for interactive use, and in general its output shouldn't be parsed.
If sorting weren't an issue, you could just use:
awk 'FNR > 1' chr*_smallfiles > bigfile
The shell will expand the glob chr*_smallfiles into a list of files, which are passed as arguments to awk. For each filename argument, all but the first line will be printed.
Since you want to sort the files, things aren't quite so simple. If you're sure the full range of files exist, just replace chr*_smallfiles with chr{1..99}_smallfiles in the original command.
Using some Bash-specific and GNU sort features, you can also achieve the sorting like this:
printf '%s\0' chr*_smallfiles | sort -z -n -k1.4 | xargs -0 awk 'FNR > 1' > bigfile
printf '%s\0' prints each filename followed by a null-byte
sort -z sorts records separated by null-bytes
-n -k1.4 does a numeric sort, starting from the 4th character (the numeric part of the filename)
xargs -0 passes the sorted, null-separated output as arguments to awk
Otherwise, if you want to go through the files in numerical order, and you're not sure whether all the files exist, then you can use a shell loop (although it'll be significantly slower than a single awk invocation):
for file in chr{1..99}_smallfiles; do # 99 is the maximum file number
[ -f "$file" ] || continue # skip missing files
awk 'FNR > 1' "$file"
done > bigfile
You can also use tail to concatenate all the files without header
tail -q -n+2 chr*_smallfiles > bigfile
In case you want to concatenate the files in a natural sort order as described in your quesition, you can pipe the result of ls -v1 to xargs using
ls -v1 chr*_smallfiles | xargs -d $'\n' tail -q -n+2 > bigfile
(Thanks to Charles Duffy) xargs -d $'\n' sets the delimiter to a newline \n in case the filename contains white spaces or quote characters
Using a bash 4 associative array to extract only the numeric substring of each filename; sort those individually; and then retrieve and concatenate the full names in the resulting order:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "Requires bash 4.0 or newer" >&2; exit 1;; esac
# when this is done, you'll have something like:
# files=( [1]=chr_smallfiles1.txt
# [10]=chr_smallfiles10.txt
# [9]=chr_smallfiles9.txt )
declare -A files=( )
for f in chr*_smallfiles.txt; do
files[${f//[![:digit:]]/}]=$f
done
# now, emit those indexes (1, 10, 9) to "sort -n -z" to sort them as numbers
# then read those numbers, look up the filenames associated, and pass to awk.
while read -r -d '' key; do
awk 'FNR > 1' <"${files[$key]}"
done < <(printf '%s\0' "${!files[#]}" | sort -n -z) >bigfile
You can do with a for loop like below, which is working for me:-
for file in chr*_smallfiles
do
tail +2 "$file" >> bigfile
done
How will it work? For loop read all the files from current directory with wild chard character * chr*_smallfiles and assign the file name to variable file and tail +2 $file will output all the lines of that file except the first line and append in file bigfile. So finally all files will be merged (accept the first line of each file) into one i.e. file bigfile.
Just for completeness, how about a sed solution?
for file in chr*_smallfiles
do
sed -n '2,$p' $file >> bigfile
done
Hope it helps!

How to capture first column values of a command?

I am new to shell scripting. I am trying to write a script that is suppose to run a command and use for loop to capture first column of the output and do further processing.
command: tst get files
output of this command is something like
NAME COUNT ADMIN
FileA.txt 30 adminA
FileB.txt 21 local
FileC.txt 9 local
FileD.txt 90 adminA
Here is what I have tried so far : UPDATED also want to run additional commands
#!/bin/bash
for f in $(tst get files)
do
echo "FILE :[${f}]"
tst setprimary ${f} && tst get dataload
done
the output I am seeing is something like
FILE :[NAME]
FILE :[COUNT]
FILE :[ADMIN]
FILE :[FileA.txt]
FILE :[30]
FILE :[adminA]
FILE :[FileB.txt]
FILE :[21]
FILE :[local]
FILE :[FileC.txt]
FILE :[9]
FILE :[local]
FILE :[FileD.txt]
FILE :[90]
FILE :[adminA]
I am looking for an output something like
FILE :[FileA.txt]
FILE :[FileB.txt]
FILE :[FileC.txt]
FILE :[FileD.txt]
What should I modify in the shell script to only capture NAME column values? Am I executing the tst get files command correctly in the for loop or is there a better way to execute a command and loop thru the results?
EDIT (Samuel Kirschner): you can do without the for loop entirely and just use awk to print the lines you're interested in
tst get files | awk 'NR > 1 {print "FILE :[" $1 "]"}'
If you want to keep the for loop for some reason and just extract the file name from the lines while skipping the header, you have a few choices. Awk is probably the easiest because of the NR builtin variable (which counts lines) and automatic field-splitting ($1 refers to the first field in the line, for instance), but you can use sed and cut as well.
You can use awk 'NR > 1 {print $1}' to get the first column (using any whitespace character as a delimiter while skipping the first line) or sed 1d | cut -d$'\t' -f1. Note that $'\t' is bash-specific syntax for a literal tab character, if your file is padded with spaces rather than using tabs to delimit fields, you can't use the sed ... | cut ... example.
i.e.
#!/bin/bash
for f in $(tst get files | awk 'NR > 1 {print $1}')
do
echo "FILE :[${f}]"
done
or
#!/bin/bash
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
to avoid unnecessary splitting in the for loop. It's best to set IFS to something specific outside the loop body to prevent 'a file with whitespace.txt' from being broken up.
OLD_IFS=IFS
IFS=$'\n\t'
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
You can just do:
tst get files | awk 'NR > 1 { printf "FILE :[%s]\n", $1 }'
Update: To answer extended problem as per comments below by OP:
while read -r file _; do
tst setprimary "$file" && tst get dataload
done < <(tst get files)
Or perl:
tst ... | perl -lanE 'say "File: [$F[0]]" if $.>1'
the variable $. contains the current line number

How to increment number in a file

I have one file with the date like below,let say file name is file1.txt:
2013-12-29,1
Here I have to increment the number by 1, so it should be 1+1=2 like..
2013-12-29,2
I tried to use 'sed' to replace and must be with variables only.
oldnum=`cut -d ',' -f2 file1.txt`
newnum=`expr $oldnum + 1`
sed -i 's\$oldnum\$newnum\g' file1.txt
But I get an error from sed syntax, is there any way for this. Thanks in advance.
Sed needs forward slashes, not back slashes. There are multiple interesting issues with your use of '\'s actually, but the quick fix should be (use double quotes too, as you see below):
oldnum=`cut -d ',' -f2 file1.txt`
newnum=`expr $oldnum + 1`
sed -i "s/$oldnum\$/$newnum/g" file1.txt
However, I question whether sed is really the right tool for the job in this case. A more complete single tool ranging from awk to perl to python might work better in the long run.
Note that I used a $ end-of-line match to ensure you didn't replace 2012 with 2022, which I don't think you wanted.
usually I would like to use awk to do jobs like this
following is the code might work
awk -F',' '{printf("%s\t%d\n",$1,$2+1)}' file1.txt
Here is how to do it with awk
awk -F, '{$2=$2+1}1' OFS=, file1.txt
2013-12-29,2
or more simply (this will file if value is -1)
awk -F, '$2=$2+1' OFS=, file1.txt
To make a change to the change to the file, save it somewhere else (tmp in the example below) and then move it back to the original name:
awk -F, '{$2=$2+1}1' OFS=, file1.txt >tmp && mv tmp file1.txt
Or using GNU awk, you can do this to skip temp file:
awk -i include -F, '{$2=$2+1}1' OFS=, file1.txt
Another, single line, way would be
expr cat /tmp/file 2>/dev/null + 1 >/tmp/file
this works if the file doesn't exist or if the file doesnt contain a valid number - in both cases the file is (re)created with a value of 1
awk is the best for your problem, but you can also do the calculation in shell
In case you have more than one rows, I am using loop here
#!/bin/bash
IFS=,
while read DATE NUM
do
echo $DATE,$((NUM+1))
done < file1.txt
Bash one liner option with BC. Sample:
$ echo 3 > test
$ echo 1 + $(<test) | bc > test
$ cat test
4
Also works:
bc <<< "1 + $(<test)" > test

"while read LINE do" and grep problems

I have two files.
file1.txt:
Afghans
Africans
Alaskans
...
where file2.txt contains the output from a wget on a webpage, so it's a big sloppy mess, but does contain many of the words from the first list.
Bashscript:
cat file1.txt | while read LINE; do grep $LINE file2.txt; done
This did not work as expected. I wondered why, so I echoed out the $LINE variable inside the loop and added a sleep 1, so i could see what was happening:
cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done
The output looks in terminal looks something like this:
Afghans
Africans
Alaskans
Albanians
Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabians
Arabs
Arabs/East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...
So you can see it did finally find the word "Asian". But why does it say:
No such file or directory
?
Is there something weird going on or am I missing something here?
What about
grep -f file1.txt file2.txt
#OP, First, use dos2unix as advised. Then use awk
awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } ' file1 file2_wget
Note: using while loop and grep inside the loop is not efficient, since for every iteration, you need to invoke grep on the file2.
#OP, crude explanation:
For meaning of FNR and NR, please refer to gawk manual. FNR==NR{a[1];next} means getting the contents of file1 into array a. when FNR is not equal to NR (which means reading the 2nd file now), it will check if each word in the file is in array a. If it is, print out. (the for loop is used to iterate each word)
Use more quotes and use less cat
while IFS= read -r LINE; do
grep "$LINE" file2.txt
done < file1.txt
As well as the quoting issue, the file you've downloaded contains CRLF line endings which are throwing read off. Use dos2unix to convert file1.txt before iterating over it.
Although usng awk is faster, grep produces a lot more details with less effort. So, after issuing dos2unix use:
grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>
You will have all the matches + line numbers (case insensitive)
At minimum this will suffice to find all the words from file_containing_pattern:
grep -F -f <file_containing_pattern> <file_containing_data_blob>

Resources