Csh - Fetching fields via awk inside xargs - shell

I'm struggling to understand this behavior:
Script behavior: read a file (containing dates); print a list of files in a multi-level directory tree and get their size, print the file size only, (future step: sum the overall file size).
Starting script:
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | head"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
[and so on]
But when I try to filter via awk on the first field, I still get the whole line
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print $1}'"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
I've already approached it via divide-et-impera, and the following command works just fine:
du -d 2 "/folder/" | grep '2000-03' | awk '{print $1}'
1000
I'm afraid that I'm missing something very trivial, but I haven't found anything so far.
Any idea? Thanks!
Input: directory containing folders named YYYY-MM-random_data and a file containing strings:
ls -l
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-03-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-04-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-05-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablb
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablc
[...]
cat dates
2000-03
2000-04
2000-05
[...]
Expected output: sum of the disk space occupied by all the files contained in the folder whose name include the string in the file dates
2000-03: 1000
2000-04: 2123
2000-05: 1222112
[...]
======
But in particular, I'm interested in why awk is not able to fetch the column $1 I asked it to.

Ok it seems I found the answer myself after a lot of research :D
I'll post it here, hoping that it will help somebody else out.
https://unix.stackexchange.com/questions/282503/right-syntax-for-awk-usage-in-combination-with-other-command-inside-xargs-sh-c
The trick was to escape the $ sign.
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print \$1}'"

Using GNU Parallel it looks like this:
parallel --tag "eval du -s folder/{}* | perl -ne '"'$s+=$_ ; END {print "$s\n"}'"'" :::: dates
--tag prepends the line with the date.
{} is replaced with the date.
eval du -s folder/{}* finds all the dirs starting with the date and gives the total du from those dirs.
perl -ne '$s+=$_ ; END {print "$s\n"}' sums up the output from du
Finally there is bit of quoting trickery to get it quoted correctly.

Related

grep -l does not behave as expected on piping to xargs [duplicate]

This question already has answers here:
How to ignore xargs commands if stdin input is empty?
(7 answers)
Closed 4 years ago.
So I have a command like: grep "\"tool\":\"SEETEST\"" * -l Which works great standalone - it prints out a list of JSON files generated for the selected tool in the current directory.
But then, if I were to pipe it to xargs ls like that:
grep "\"tool\":\"SEETEST\"" * -l | xargs ls -lSh
It prints all the files in the current directory!
How do I make it print just the matched filenames and pipe them to ls sorted by size?
If there are not matches for xargs, then it will list all files in the current directory:
#----------- current files in the directory
mortiz#florida:~/Documents/projects/bash/test$ ls -ltr
total 8
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#----------- using your command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l
json.example
#-----------adding xargs to the previous command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l | xargs ls -lSh
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#-----------adding purposely an error on "title"
mortiz#florida:~/Documents/projects/bash/test$ grep "\"titleo\": \"example\"" * -l | xargs ls -lSh
total 8.0K
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
If you want to use xargs and grep didn't return any match, then add "-r | --no-run-if-empty" that will prevent xargs to list all the files in the current directory:
grep "\"titleo\": \"example\"" * -l | xargs -r ls -lSh

Script to generate a list to run a command

Sorry for the semi-vague title, I wasn't exactly sure how to word it. I'm looking to generate a list, excluding devices without a matching major/minor number, and run
lkdev -l hdiskn -a -c DATAn
where the hdisk and the DATA device having corresponding major/minor numbers.
In /dev, I have -
root# testbox /dev
#ls -l | grep -E "DATA|hdisk" | grep -v rhd
crw-r--r-- 1 root system 18, 3 Oct 03 10:50 DATA01
crw-r--r-- 1 root system 18, 2 Oct 03 10:50 DATA02
brw------- 1 root system 18, 1 Apr 12 2013 hdisk0
brw------- 1 root system 18, 0 Apr 12 2013 hdisk1
brw------- 1 root system 18, 3 Jan 14 2014 hdisk2
brw------- 1 root system 18, 2 Jan 14 2014 hdisk3
brw------- 1 root system 18, 4 Jan 14 2014 hdisk4
So essentially, I'm trying to create something where hdisk0,1,4 are all excluded, and hdisk2-3 are locked with DATA01 and DATA02, respectively.
I originally was trying to use sort and/or uniq to isolate/remove fields, but haven't been able to generate the desired list to even begin looking at running the command on each.
(As a note, I have several servers with hundreds of these. If it were just these few, I'd find a "simpler" way.)
(I can't test it right now, so please correct syntax errors if any)
You could play with sort en uniq like beneath
ls -l | grep -E "DATA|hdisk" | sed -e 's/.* \([0-9]*, *[0-9]*\).*/\1/' | sort |
uniq -c | grep -v " 1" | cut -c8- | while read majorminor; do
ls -l | grep " ${majorminor}" | sed 's/.* //'
done
However, you should start with selecting the right lines without counting:
for data in $(find /dev -type c -name "DATA*" | cut -d/ -f3); do
majorminor="$(ls -l $data | sed -e 's/.* \([0-9]*, *[0-9]*\).*/\1/')"
echo "$data <==> $(ls -l hdisk* | grep " ${majorminor}" | sed 's/.* //')"
done

pick up files based on dates in ksh script

I have this list of files . Now I will have to pick the latest file based on some condition
3679 Jul 21 23:59 belk_rpo_error_**po9324892**_07212014.log
0 Jul 22 23:59 belk_rpo_error_**po9324892**_07222014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
0 Jul 20 05:50 belk_rpo_error_**po9999992**_07202014.log
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
742 Jul 21 07:30 belk_rpo_error_**po9999991**_07212014.log
0 Jul 23 2014 belk_rpo_error_**po9999991**_07232014.log
For a PATRICULAR Order_No(Marked with ** **)
If the latest file is 0 kB then we will discard it (rest of the files with same Order_no as well)
if the latest file is non Zero then I will take it.(Only the latest one)
Then append the contents in a txt file .
My expected output would be ::
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
I am at my wits end here. I cant seem to figure out how to compare dates in Unix. Any help is very appreciated.
You can try something like:
touch test.txt
for var in ` find . ! -empty -exec ls -r {} \;`
do
cat $var>>test.txt
done
untested
use stat to emit date (epoch time), size and filename.
use awk to filter out zero-length files and extract order number.
sort by order number and date
awk to pick up the last filename for each order number
stat -c $'%Y\t%s\t%n' *.log |
awk -F'\t' -v OFS='\t' '
$2 > 0 {
split($3, a, /_/)
print a[4], $1, $3
}' |
sort -t $'\t' -k1,1 -k2,2n |
awk -F'\t' '
NR > 1 && $1 != prev_order {print filename}
{filename = $3; prev_order = $1}
END {print filename}
'
The sort command might be wrong: In order to group by order number, you might need to sort first by file time then by order number.
If I understand your question, the resulting files need to be concatenated and appended to a file. If the above pipeline is working OK, then pipe into | xargs cat >> something.log

Can sed be used with find to print Fields and Folder Name

This small script :
touch ilFldsN9LS.txt
ls -l | grep "^d" > /home/userB/PLAY/LibTESTxOutputFiles/ilFldsN9LS_testTEST.txt
produces file content of this format:
drwxr-xr-x 2 userB userB 4096 Mar 23 22:40 BASH_Collection_FolderNESTY
drwxr-xr-x 2 userB userB 4096 Mar 24 17:33 BASH_Collection_Functionality
What I wish to achieve is to get output very much like the above, but using find.
Turning to the use of find, this script: (which unlike the one previous, is recursive)
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | grep "^d" |\
| tee -a /home/UserB/PLAY/LibTESTxOutputFiles/ilFldsR9FB.txt
produces file content of this format:
drwxr-xr-x 2 UserB UserB 4096 Mar 24 17:33 ./BASH_Collection_Functionality
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 ./LibTESTxOutputFiles/AdditionalTESTresults
adding some AWK to the script, like so:
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | grep "^d" | awk '{ sub(/\.\//, " ");print}'\
| tee -a /home/innocentxlii/PLAY/LibTESTxOutputFiles/ilFldsR9FB.txt
produces output with the ** ./ * stripped from the front of the path
and pads the gap with an extra space to ease reading:
drwxr-xr-x 2 UserB UserB 4096 Mar 24 17:33 BASH_Collection_Functionality
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 LibTESTxOutputFiles/AdditionalTESTresults
Where I have gotten stuck, is I have been trying to use sed to keep the Fields, but to
have only the last Folder in the path, listed. For example the last item above would be:
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 AdditionalTESTresults
Ideas? I tried literally dozens of sed variants but have realized something must be
wrong with this approach.
How about this: sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
Check my test runs below:
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2/dir3/dir4" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir4
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2/dir3" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir3
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir2
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1
NOTE: Please note that there is a space After the 1st asterisks in the sed command
You don't need sed, you can do it all with awk
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; |
awk '{n=split($NF,a,"/"); sub($NF, " "a[n])}1'
grep "^d" is redundant since you are already finding directories with find . -type d
awk Explanation:
n=split($NF,a,"/") splits the last field (denoted by $NF) by / and assigns it to array a
n gives the length of the array
a[n] will therefore return the string following the last / (i.e. the inner most directory)
sub($NF, " "a[n]) replaces the last field (denoted by $NF) with a space for padding (as per example) + inner most directory (denoted by a[n])
awk '{...}1' the 1 outside the awk is the same as print
EDIT: RE: for cases where directory contain spaces in name
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; |
awk -F '[0-9]+:[0-9]+ ' '{n=split($NF,a,"/"); sub($NF, " "a[n])}1'
specifying the input separator with -F '[0-9]+:[0-9]+ ' (for modification time) will ensure the last field ($NF) is the file name -- regardless whether or not it contain spaces in the directory name
what about
sed 's|\./||'
that remove the first ./, so in your line
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | sed 's|\./||'

Print unique names of users logged on with finger

I'm trying to write a shell script that prints the full names of users logged on to a machine. The finger command gives me a list of users, but there are many duplicates. How can I loop through and print out only the unique ones?
Edit:
This is the format of what finger gives me:
xxxx XX of group XXX pts/59 1:00 Feb 13 16:38
xxxx XX of group XXX pts/71 1:11 Feb 13 16:27
xxxx XX of group XXX pts/105 1d Feb 12 15:22
xxxx YY of group YYY pts/102 2:19 Feb 13 14:13
xxxx ZZ of group ZZZ pts/42 2d Feb 7 12:11
I'm trying to extract the full name (i.e. whatever comes before 'of group' in column 2), so I would be using awk together with finger.
What you want is actually fairly difficult in a shell script, here is, for example, my full output of finger(1):
Login Name TTY Idle Login Time Office Phone
martin Martin Tournoij *v0 1d Wed 14:11
martin Martin Tournoij pts/2 22 Wed 15:37
martin Martin Tournoij pts/5 41 Thu 23:16
martin Martin Tournoij pts/7 31 Thu 23:24
martin Martin Tournoij pts/8 Thu 23:29
You want the full name, but this may contain 1 space (as per my example), or it may just be 'Teller' (no space), or it may be 'Captain James T. Kirk' (3 spaces). So you can't just use the space as delimiter. You could use the character position of 'TTY' in the header as an indicator, but that's not very elegant IMHO (especially with shell scripting).
My solution is therefore slightly different, we get only the username from finger(1), then we get the full name from /etc/passwd
#!/bin/sh
prev=""
for u in $(finger | tail +2 | cut -w -f1 | sort); do
[ "$u" = "$prev" ] && continue
echo "$u $(grep "^$u" /etc/passwd | cut -d: -f5)"
prev="$u"
done
Which gives me both the username & login name:
martin Martin Tournoij
Obviously, you can also print just the real name (without the $u).
The sort and uniq BinUtils commands can be used to removed duplicates.
finger | sort -u
This will remove all duplicate lines, but you will still see similar lines due to how verbose the finger command is. If you just want a list of usernames, you can filter it out further to be very specific.
finger | cut -d ' ' -f1 | sort -u
Now, you can take this one step further, and remove the "header/label" line printed out by the finger command.
finger | cut -d ' ' -f1 | sort -u | grep -iv login
Hope this helps.
Other possible solution:
finger | tail -n +2 | awk '{ print $1 }' | sort | uniq
tail -n +2 to omit the first line.
awk '{ print $1 }' to extract the first column.
sort to prepare input for uniq.
uniq remove duplicates.
If you want to iterate use:
for user in $(finger | tail -n +2 | awk '{ print $1 }' | sort | uniq)
do
echo "$user"
done
Could this be simpler?
No spaces or any other special characters to worry about!
finger -l | awk '/^Login/'
Edit: To remove the content after of group
finger -l | awk '/^Login/' | sed 's/of group.*//g'
Output:
Login: xx Name: XX
Login: yy Name: YY
Login: zz Name: ZZ

Resources