Pattern match in a column in bash - bash

I want to grab the rows containing Subject01, Subject02,...Subject50 in the text file and separate them in each file. Below is code I did and the result of outputted files are empty. Can anyone tell me what am I doing wrong?
Subject01/path/here 4
Subject01/path/here 1
Subject02/path/here 3
Subject03/path/here 5
Subject03/path/here 6
...
so one of the output can be in the format below:
Subject03/path/here 5
Subject03/path/here 6
here is the code I tried and it failed.
#!/bin/sh
subject=Subject
for i in {01..50}
do
awk '{ if ($1 == "${subject}${i}") { print } }' output-0 > output-0-sub-$i
done

You can simpyl use grep for that
for f in {01..10}; do
grep "Subject$f" inputFile.txt >> output-0-sub-$f
if [[ ! -s output-0-sub-${f} ]] ; then
rm output-0-sub-$f
fi
done
The if condition is checking if the file is empty and if so, it is deleted.
You could also add the -f flag to check if the file exists, but it depends on how your script works.

Related

Find words in multiple files and sort in another

Need help with "printf" and "for" loop.
I have individual files each named after a user (e.g. john.txt, david.txt) and contains various commands that each user ran. Example of commands are (SUCCESS, TERMINATED, FAIL, etc.). Files have multiple lines with various text but each line contains one of the commands (1 command per line).
Sample:
command: sendevent "-F" "SUCCESS" "-J" "xxx-ddddddddddddd"
command: sendevent "-F" "TERMINATED" "-J" "xxxxxxxxxxx-dddddddddddddd"
I need to go through each file, count the number of each command and put it in another output file in this format:
==== John ====
SUCCESS - 3
TERMINATED - 2
FAIL - 4
TOTAL 9
==== David ====
SUCCESS - 1
TERMINATED - 1
FAIL - 2
TOTAL 4
P.S. This code can be made more compact, e.g there is no need to use so many echo's etc, but the following structure is being used to make it clear what's happening:
ls | grep .txt | sed 's/.txt//' > names
for s in $(cat names)
do
suc=$(grep "SUCCESS" "$s.txt" | wc -l)
termi=$(grep "TERMINATED" "$s.txt"|wc -l)
fail=$(grep "FAIL" "$s.txt"|wc -l)
echo "=== $s ===" >>docs
echo "SUCCESS - $suc" >> docs
echo "TERMINATED - $termi" >> docs
echo "FAIL - $fail" >> docs
echo "TOTAL $(($termi+$fail+$suc))">>docs
done
Output from my test files was like :
===new===
SUCCESS - 0
TERMINATED - 0
FAIL - 0
TOTAL 0
===vv===
SUCCESS - 0
TERMINATED - 0
FAIL - 0
TOTAL 0
based on karafka's suggestions instead of using the above lines for the for-loopyou can directly use the following:
for f in *.txt
do
something
#in order to print the required name in the file without the .txt you can do a
printf "%s\n" ${f::(-4)}
awk to the rescue!
$ awk -vOFS=" - " 'function pr() {s=0;
for(k in a) {s+=a[k]; print k,a[k]};
print "\nTOTAL "s"\n\n\n"}
NR!=1 && FNR==1 {pr(); delete a}
FNR==1 {print "==== " FILENAME " ===="}
{a[$4]++}
END {pr()}' file1 file2 ...
if your input file is not structured (key is not always on fourth field), you can do the same with pattern match.

Make cat command to operate recursively looping through a directory

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.
"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii
Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.
You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done

Sorting and printing a file in bash UNIX

I have a file with a bunch of paths that look like so:
7 /usr/file1564
7 /usr/file2212
6 /usr/file3542
I am trying to use sort to pull out and print the path(s) with the most occurrences. Here it what I have so far:
cat temp| sort | uniq -c | sort -rk1 > temp
I am unsure how to only print the highest occurrences. I also want my output to be printed like this:
7 1564
7 2212
7 being the total number of occurrences and the other numbers being the file numbers at the end of the name. I am rather new to bash scripting so any help would be greatly appreciated!
To emit only the first line of output (with the highest number, since you're doing a reverse numeric sort immediately prior), pipe through head -n1.
To remove all content which is not either a number or whitespace, pipe through tr -cd '0-9[:space:]'.
To filter for only the values with the highest number, allowing there to be more than one:
{
read firstnum name && printf '%s\t%s\n' "$firstnum" "$name"
while read -r num name; do
[[ $num = $firstnum ]] || break
printf '%s\t%s\n' "$num" "$name"
done
} < temp
If you want to avoid sort and you are allowed to use awk, then you can do this:
awk '{
if($1>maxcnt) {s=$1" "substr($2,10,4); maxcnt=$1} else
if($1==maxcnt) {s=s "\n"$1" "substr($2,10,4)}} END{print s}' \
temp

search lines of file for email address - returning whole line, with bash

Suppose I have a file (sizes.txt)
daveclark#foo.com 0 23252 0
mikeclark#foo.com 0 45131 1
clark#foo.com 0 55235 0
joeclark#bar.net 33632 1
maryclark#bar.net 0 55523 0
clark#bar.net 0 99356 0
Now I have another file (users.txt)
clark#foo.com
clark#bar.net
What I want to do is find each line in sizes.txt for the specific email addresses in users.txt...using a loop, bash or one-liner in CentOS. Here's the key point, I need to find lines that only contain clark#foo.com and then clark#bar.net - meaning this should be one line only for each.
The most simple way that comes to mind...
for i in `cat users.txt`; do grep $i sizes.txt; done
...but this does not work because processing the first line of users.txt will return the lines containing daveclark#foo.com, mikeclark#foo.com and clark#foo.com. I explicitly want the line containing "clark#foo.com" (the third line of sizes.txt). Processing second line of users.txt, will have the same problem (it will return maryclark#bar.net and clark#bar.net lines) I know this has to be something totally simple that I'm overlooking.
What you are looking for is the exact match with grep. In your case that would be the -w option.
So
for i in cat users.txt do
grep -w "^$i" sizes.txt
done
should do the trick.
Cheers.
You can try something like this using only bash built-in functions and syntax:
while read -r user ; do
while read -r s_user s_column_2 s_column_3 s_column_4 ; do
[ "${s_user}" = "${user}" ] && printf "%b\t%b\t%b\t%b\n" "${s_user}" "${s_column_2}" "${s_column_3}" "${s_column_4}"
done < sizes.txt
done < users.txt
this nested while could be slow when using big size.txt files. In those cases you could use this in combination with awk

Shell script help to fix it please

Example i run
sh mycode Manu gg44
And I need to get file with name Manu
with content:
gg44
192.168.1.2.(second line) (this number I explain below)
(in the directory DIR=/h/Manu/HOME/hosts there is already file Alex
cat Alex
ff55
198.162.1.1.(second line))
So mycode creates file named Manu with the first line gg44 and generate IP at the second line.
BUT for generating IP he has compare with Alex file IP. So second line of Manu has to be 198.162.1.2. If we have more than one files in the directory then we have to check all second lines of all files and then generate according to them.
[CODE]
DIR=/h/Manu/HOME/hosts #this is a directory where i have my files (structure of the files above)
for j in $1 $2 #$1 is Manu; $2 is gg44
do
if [ -d $DIR ] #checking if directory exists (it exists already)
then #if it exists
for i in $* # for every file in this directory do operation
do
sort /h/ManuHOME/hosts/* | tail -2 | head -1 # get second line of every file
IFS="." read A B C D # divide number in second line into 4 parts (our number 192.168.1.1. for example)
if [ "$D" != 255 ] #compare D (which is 1 in our example: if its less than 255)
then
D=` expr $D + 1 ` #then increment it by 1
else
C=` expr $C + 1 ` #otherwise increment C and make D=0
D=0
fi
echo "$2 "\n" $A.$B.$C.$D." >/h/Manu/HOME/hosts/$1
done done #get $2 (which is gg44 in example as a first line and get ABCD as a second line)[/CODE]
In the result it creates file with name Manu and first line, but second line is totally wrong. It gives me ...1.
Also error message
sort: open failed: /h/u15/c2/00/c2rsaldi/HOME/hosts/yu: No such file or directory
yu n ...1.
#!/bin/bash
dir=/h/Manu/HOME/hosts
filename=$dir/$1
firstline=$2
# find the max IP address from all current files:
maxIP=$(awk 'FNR==2' $dir/* | cut -d. -f4 | sort -nr | head -1)
ip=198.162.1.$(( maxIP + 1 ))
cat > $filename <<END
$firstline
$ip
END
I'll leave it up to you to decide what to do when you get more than 255 files...

Resources