Multiple string grep must output the filename:target string:count in linux - shell

I wanted to grep for strings "Manager" and "DBA" in all file at current directory and provide the output as
filename:Target string:count

I don't know if you were after a one-liner or not but I think this should provide the output you would like.
Starting with two files:
test_file.txt:
HEre is some random text
Manager
Some stuff
Manager
Testing
This Manager did DBA
and this manager did DBA too
but these guys did not
and test_file2.txt:
This is another file
which checks the Manager
DBA thing
I ran this:
grep -o "Manager\|DBA" *.txt | sort | uniq -c | awk -F "[: ]+" '{printf %s:%s:%s\n", $3, $4, $2}'
To get this output (note it is case sensitive):
test_file2.txt:DBA:1
test_file2.txt:Manager:1
test_file.txt:DBA:2
test_file.txt:Manager:3
Hope that's what you were after.

Related

Filter awk system output with awk?

I need to use awk to see what users are logged in the computer, create a file with their names and inside that file print the pid of the process they're running. I've used this, but it does not work:
who | awk '{for(i = 0; i < NR; i++)
system("ps -u " $1 "| tail +2 | awk '{print $1}' >" $1".log")
}'
Is there any way to do this?
Thanks a lot!
To achieve your goal of using awk to create those files, I would start with ps rather than with who. That way, ps does more of the work so that awk can do less. Here is an example that might work for you. (No guarantees, obviously!)
ps aux | awk 'NR>1 {system("echo " $2 " >> " $1 ".txt")}'
Discussion:
The command ps aux prints a table describing each active process, one line at a time. The first column of each line contains the name of the process's user, the second column its PID. The line also contains lots of other information, which you can play with as you improve your script. That's what you pipe into awk. (All this is true for Linux and the BSDs. In Cygwin, the format is different.)
Inside awk, the pattern NR>1 gets rid of the first line of the output, which contains the table headers. This line is useless for the files you want awk to generate.
For all other lines in the output of ps aux, awk adds the PID of the current process (ie, $2) to the file username.txt, using $1 for username. Because we append with >> rather than overwriting with >, all PIDs run by the user username end up being listed, one line at a time, in the file username.txt.
UPDATE (Alternative for when who is mandatory)
If using who is mandatory, as noted in a comment to the original post, I would use awk to strip needless lines and columns from the output of who and ps.
for user in $(who | awk 'NR>1 {print $1}')
do
ps -u "$user" | awk 'NR>1' > "$user".txt
done
For readers who wonder what the double-quotes around $user are about : Those serve to guard against globbing (if $user contains asterisks (*)) and word splitting (if $user contains whitespace).
I will leave my original answer stand for the benefit of any readers with more freedom to choose the tools for their job.
Is that what you had in mind?

How to parse the output of `ls -l` into multiple variables in bash?

There are a few answers on this topic already, but pretty much all of them say that it's bad to parse the output of ls -l, and therefore suggest other methods.
However, I'm using ncftpls -l, and so I can't use things like shell globs or find – I think I have a genuine need to actually parse the ls -l output. Don't worry if you're not familiar with ncftpls, the output returns in exactly the same format as if you were just using ls -l.
There is a list of files at a public remote ftp directory, and I don't want to burden the remote server by re-downloading each of the desired files every time my cronjob fires. I want to check, for each one of a subset of files within the ftp directory, whether the file exists locally; if not, download it.
That's easy enough, I just use
tdy=`date -u '+%Y%m%d'`_
# Today's files
for i in $(ncftpls 'ftp://theftpserver/path/to/files' | grep ${tdy}); do
if [ ! -f $i ]; then
ncftpget "ftp://theftpserver/path/to/files/${i}"
fi
done
But I came upon the issue that sometimes the cron job will download a file that hasn't finished uploading, and so when it fires next, it skips the partially downloaded file.
So I wanted to add a check to make sure that for each file that I already have, the local file size matches the size of the same file on the remote server.
I was thinking along the lines of parsing the output of ncftpls -l and using awk, something like
for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
...
x=filesize # somehow get the file size and the filename
y=filename # from $i on each iteration and store in variables
...
done
but I can't seem to get both the filename and the filesize from the server into local variables on the same iteration of the loop; $i alternates between $9 and $5 in the awk string with each iteration.
If I could manage to get the filename and filesize into separate variables with each iteration, I could simply use stat -c "%s" $i to get the local size and compare it with the remote size. Then its a simple ncftpget on each remote file that I don't already have. I tinkered with syncing programs like lftp too, but didn't have much luck and would rather do it this way.
Any help is appreciated!
for loop splits when it sees any whitespace like space, tab, or newline. So, IFS is needed before loop, (there are a lot of questions about ...)
IFS=$'\n' && for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
echo $i | awk '{print $NF}' # filesize
echo $i | awk '{NF--; print}' # filename
# you may have spaces in filenames, so is better to use last column for awk
done
The better way I think is to use while not for, so
ls -l | while read i
do
echo $i | awk '{print $9, $5}'
#split them if you want
x=echo $i | awk '{print $5}'
y=echo $i | awk '{print $9}'
done

How to parse file to commands in bash

I'm trying to achieve some goal here and while I do know partials steps, I am not successful in putting it all together. I'm looking for an inline command for single usage on multiple hosts. Let's have SW repository file organized like this:
# comments
PROD_NAME:INSTALL_DIR:OPTIONS
PROD_NAME:INSTALL_DIR:OPTIONS
PROD_NAME:INSTALL_DIR:OPTIONS
Now, let's say we want to process the file and do some copy action on every one of the products. So, I can pipe grep getting rid of comment lines into while do cycle, where I use awk to break down each line to product name and it's path and complete it into copy commands. And that's too much of nesting for my skill level, I'm afraid. Anyone who'd care to share?
you can use a bash loop to do the same
$ while IFS=: read -r p i o;
do echo "cp $o $p $i";
done < <(grep -v '^#' file)
cp OPTIONS PROD_NAME INSTALL_DIR
cp OPTIONS PROD_NAME INSTALL_DIR
cp OPTIONS PROD_NAME INSTALL_DIR
remove echo to run as given.
Comments can be removed by
grep -v '^#'
For awk you have to specify the field delimiter:
awk -f: '{print $1, $2, $3}'
In order to craft copy commands you have to pipe the result to a shell.
echo -e '# comments\nNAME:DIR:OPT' |
grep -v '^#' |
awk -F: '{print "cp", $3, $2, $1}' |
sh
Even better: read a book.
Or this:
http://linuxcommand.org/learning_the_shell.php
https://en.wikibooks.org/wiki/Bourne_Shell_Scripting

tail -f, awk and output to file >

I am attempting to filter a log file and am running into issues, what I have so far is the following, which does not work,
tail -f /var/log/squid/accesscustom.log | awk '/username/;/user-name/ {print $1; fflush("")}' | awk '!x[$0]++' > /var/log/squid/accesscustom-filtered.log
The goal is to take a file that contains
ipaddress1 username
ipaddress7
ipaddress2 user-name
ipaddress1 username
ipaddress5
ipaddress3 username
ipaddress4 user-name
and save to accesscustom-filtered.log
ipaddress1
ipaddress2
ipaddress3
ipaddress4
It works without the output to accesscustom-filtered.log but something in the > isn't working right and the file ends up empty.
Edit: Changed the original example to be correct
Use tee:
tail -f /var/log/squid/accesscustom.log | awk '/username/;/user-name/ {print $1}' | tee /var/log/squid/accesscustom-filtered.log
See also: Writing “tail -f” output to another file and Turn off buffering in pipe
Note: awk doesn't buffer like grep in the superuser example, so you shouldn't need to do anything special with your awk command. (more info)

Bash Output different from command line

I have tried all kinds of filters using grep to try and solve this but just cannot crack it.
cpumem="$(ps aux | grep -v 'grep' | grep 'firefox-bin' | awk '{printf $3 "\t" $4}'
I am extracting the CPU and Memory usage for a process and when I run it from the command line, I get the 2 fields outputted correctly:
ps aux | grep -v 'grep' | grep 'firefox-bin' | awk '{printf $3 "\t" $4}'
> 1.1 4.4
but the same command executed from within the bash script produces this:
cpumem="$(ps aux | grep -v 'grep' | grep 'firefox-bin' | awk '{printf $3 "\t" $4}')"
echo -e cpumem
> 1.1 4.40.0 0.10.0 0.0
I am guessing that it is picking up 3 records, but I just don't know where from.
I am filtering out any other grep processes by using grep -v 'grep', can someone offer any suggestions or a more reliable way ??
Maybe you have 3 records because 3 firefox are running (or one is running, and it is threading itself).
You can avoid the grep hazzle by giving ps and option to select the processes. E.g. the -C to select processes by name. With ps -C firefox-bin you get only the firefox processes. But this does not help at all, when there is more than one process.
(You can also use the ps option to output only the columns you want, so your line would be like
ps -C less --no-headers -o %cpu,%mem
).
For the triple-record you must come up with a solution, what should happen, where more than one is running. In a multiuser environment with programms that are threading there can always be situations where you have more than one process of a kind. There are many possible solution where none can help you, as you dont say, way you are going to do with it. One can think of solutions like selecting only from one user, and only the one with the lowest pid, or the process-leader in case of groups, to change the enclosing bash-script to use a loop to handle the multiple values or make it working somehow different when ps returns multiple results.
I was not able to reproduce the problem, but to help you debug, try print $11 in your awk command, that will tell you what process it is talking about
cpumem="$(ps aux | grep -v 'grep' | grep 'firefox-bin' | awk '{printf $3 "\t" $4 "\t" $11 "\n"}')"
echo -e cpumem
It's actually an easy fix for the output display; In your echo statement, wrap the variable in double-quotes:
echo -e "$cpumem"
Without using double-quotes, newlines are not preserved by converting them to single-spaces (or empty values). With quotes, the original text of the variable is preserved when outputted.
If your output contains multiple processes (i.e. - multiple lines), that means your grep actually matched multiple lines. There's a chance a child-process is running for firefox-bin, maybe a plugin/container? With ps aux, the 11th column will tell you what the actual process is, so you can update your awk to be the following (for debugging):
awk '{printf $3 "\t" $4 "\t" $11}'

Resources