How to get the nth recent file in the nth last modified subdirectory using pipes - shell

I'm doing an exercise for OS exam. It requires to get the 3rd recent file of the 2nd last modified sub-directory inside current directory. Then I have to print its lines in reverse order. I can not use tac command. The text suggest to use (other than awk and sed): head, tails, wc.
I've succeded getting filename of the requested file (but in a too complex way I think). Now I have to print it in reverse. I think I can use this awk solution https://stackoverflow.com/a/744093/11614625.
This is how I'm getting the filename:
ls -t | head | awk '{system("test -d \"" $0 "\" && echo \"" $0 "\"")}' | awk 'NR==2 {system("ls \"" $0 "\" | head")}' | awk 'NR==1'
How can I do better? And what if 3rd directory or 2nd file doesn't exists?

See https://mywiki.wooledge.org/ParsingLs and awk '{system("test -d \"" $0 "\" && echo \"" $0 "\"")}' is calling shell to call awk to call system to call shell to call test which is clearly a worse approach than just having shell call test in the first place if you were going to do that. Also, any solution that reads the whole file into memory (as any sed or a naive awk solution would) will fail for large files as they'll exceed available memory.
Unfortunately this is how to do what you want robustly:
dir="$(find . -mindepth 1 -maxdepth 1 -type d -printf '%T+\t%p\0' |
sort -rz |
awk -v RS='\0' 'NR==2{sub(/[^\t]+\t/,""); print; exit}')" &&
file="$(find "$dir" -mindepth 1 -maxdepth 1 -type f -printf '%T+\t%p\0' |
sort -z |
awk -v RS='\0' 'NR==3{sub(/[^\t]+\t/,""); print; exit}')" &&
cat -n "$file" | sort -rn | cut -f2-
If any of the commands in any of the pipes fail then the error message from the command that failed will be printed and then no other command will execute and the overall exit status will be the failure one from that failing command.
I used cat | sort | cut rather than awk or sed to print the file in reverse because awk (unless you write demand paging in it) or sed would have to read the whole file into memory at once and so would fail for very large files while sort is designed to handle large files by using paging with tmp files as necessary and only keeping parts of the file in memory at a time so it's limited only by how much free disk space you have on your device.
The above requires GNU tools to provide/handle NUL line-endings - if you don't have those then change \0 to \n in the find command, remove the z from sort options, and remove -v RS='\0' from the awk command and be aware that the result will only work if your directory or file names don't contain newlines.

Related

Counting Python files with bash and awk always returns zero

I want to get a number of python files on my desktop and I have coded a small script for that. But the awk command does not work as is have expected.
script
ls -l | awk '{ if($NF=="*.py") print $NF; }' | wc -l
I know that there is another solution to finding a number of python files on a PC but I just want to know what am i doing wrong here.
ls -l | awk '{ if($NF=="*.py") print $NF; }' | wc -l
Your code does count of files literally named *.py, you should deploy regex matching and use correct GNU AWK syntax, after fixing that, your code becomes
ls -l | awk '{ if($NF~/[.]py$/) print $NF; }' | wc -l
note [.] which denote literal . and $ denoting end of string.
Your code might be further ameloriated, as there is not need to use if here, as pattern-action will do that is
ls -l | awk '$NF~/[.]py$/{ print $NF; }' | wc -l
Morever you might easily implemented counting inside GNU AWK rather than deploying wc -l as follows
ls -l | awk '$NF~/[.]py$/{t+=1}END{print t}'
Here, t is increased by 1 for every describe line, and after all is processed, that is in END it is printed. Observe there is no need to declare t variable in GNU AWK.
Don't try to parse the output of ls, see https://mywiki.wooledge.org/ParsingLs.
Beyond that your awk script is failing because $NF=="*.py" is doing a literal string partial comparison of the last sting of non-spaces against *.py when you probably wanted a regexp comparison such as $NF~/*.py$/ and your print $NF would fail for any file names containing spaces.
If you really want to involve awk in this for some reason then, assuming the list of python files doesn't exceed ARG_MAX, it'd be:
awk 'BEGIN{print ARGC-1; exit}' *.py
but you could just do it in bash:
shopt -s nullglob
files=(*.py)
echo "${#files[#]}"
or if you want to have a pipe to wc -l for some reason and your files can't have newlines in their names then:
printf '%s\n' *.py | wc -l
gfind . -maxdepth 1 -type f -name "*.py" -print0 |
{m,g}awk 'END { print NR }' RS='\0' FS='^$'
or
{m,g}awk 'END { print --NF }' RS='^$' FS='\0'
879

grep: compare string from file with another string

I have a list of files paths that I need to compare with a string:
git_root_path=$(git rev-parse --show-toplevel)
list_of_files=.git/ForGeneratingSBConfigAlert.txt
cd $git_root_path
echo "These files needs new OSB config:"
while read -r line
do
modfied="$line"
echo "File for compare: $modfied"
if grep -qf $list_of_files" $modfied"; then
echo "Found: $modfied"
fi
done < <(git status -s | grep -v " M" | awk '{if ($1 == "M") print $2}')
$modified - is a string variable that stores path to file
Pattern file example:
SVCS/resources/
SVCS/bus/projects/busCallout/
SVCS/bus/projects/busconverter/
SVCS/bus/projects/Resources/ (ignore .jar)
SVCS/bus/projects/Teema/
SVCS/common/
SVCS/domain/
SVCS/techutil/src/
SVCS/tech/mds/src/java/fi/vr/h/service/tech/mds/exception/
SVCS/tech/mds/src/java/fi/vr/h/service/tech/mds/interfaces/
SVCS/app/cashmgmt/src/java/fi/vr/h/service/app/cashmgmt/exception/
SVCS/app/cashmgmt/src/java/fi/vr/h/service/app/cashmgmt/interfaces/
SVCS/app/customer/src/java/fi/vr/h/service/app/customer/exception/
SVCS/app/customer/src/java/fi/vr/h/service/app/customer/interfaces/
SVCS/app/etravel/src/java/fi/vr/h/service/app/etravel/exception/
SVCS/app/etravel/src/java/fi/vr/h/service/app/etravel/interfaces/
SVCS/app/hermes/src/java/fi/vr/h/service/app/hermes/exception/
SVCS/app/hermes/src/java/fi/vr/h/service/app/hermes/interfaces/
SVCS/app/journey/src/java/fi/vr/h/service/app/journey/exception/
SVCS/app/journey/src/java/fi/vr/h/service/app/journey/interfaces/
SVCS/app/offline/src/java/fi/vr/h/service/app/offline/exception/
SVCS/app/offline/src/java/fi/vr/h/service/app/offline/interfaces/
SVCS/app/order/src/java/fi/vr/h/service/app/order/exception/
SVCS/app/order/src/java/fi/vr/h/service/app/order/interfaces/
SVCS/app/payment/src/java/fi/vr/h/service/app/payment/exception/
SVCS/app/payment/src/java/fi/vr/h/service/app/payment/interfaces/
SVCS/app/price/src/java/fi/vr/h/service/app/price/exception/
SVCS/app/price/src/java/fi/vr/h/service/app/price/interfaces/
SVCS/app/product/src/java/fi/vr/h/service/app/product/exception/
SVCS/app/product/src/java/fi/vr/h/service/app/product/interfaces/
SVCS/app/railcar/src/java/fi/vr/h/service/app/railcar/exception/
SVCS/app/railcar/src/java/fi/vr/h/service/app/railcar/interfaces/
SVCS/app/reservation/src/java/fi/vr/h/service/app/reservation/exception/
SVCS/app/reservation/src/java/fi/vr/h/service/app/reservation/interfaces/
kraken_test.txt
namaker_test.txt
shmaker_test.txt
I need to compare file search pattern with a string, is it possible using grep?
I'm not sure I understand the overall logic, but a few immediate suggestions come to mind.
You can avoid grep | awk in the vast majority of cases.
A while loop with a grep on a line at a time inside the loop is an antipattern. You probably just want to run one grep on the whole input.
Your question would still benefit from an explanation of what you are actually trying to accomplish.
cd "$(git rev-parse --show-toplevel)"
git status -s | awk '!/ M/ && $1 == "M" { print $2 }' |
grep -Fxf .git/ForGeneratingSBConfigAlert.txt
I was trying to think of a way to add back your human-readable babble, but on second thought, this program is probably better without it.
The -x option to grep might be wrong, depending on what you are really hoping to accomplish.
This should work:
git status -s | grep -v " M" | awk '{if ($1 == "M") print $2}' | \
grep --file=.git/ForGeneratingSBConfigAlert.txt --fixed-strings --line-regexp
Piping the awk output directly to grep avoids the while loop entirely. In most cases you'll find you don't really need to print debug messages and the like in it.
--file takes a file with one pattern to match per line.
--fixed-strings avoids treating any characters in the patterns as special.
--line-regexp anchors the patterns so that they only match if a full line of input matches one of the patterns.
All that said, could you clarify what you are actually trying to accomplish?

Bash/Shell - paths with spaces messing things up

I have a bash/shell function that is supposed to find files then awk/copy the first file it finds to another directory. Unfortunately if the directory that contains the file has spaces in the name the whole thing fails, since it truncates the path for some reason or another. How do I fix it?
If file.txt is in /path/to/search/spaces are bad/ it fails.
dir=/path/to/destination/ | find /path/to/search -name file.txt | head -n 1 | awk -v dir="$dir" '{printf "cp \"%s\" \"%s\"\n", $1, dir}' | sh
cp: /path/to/search/spaces: No such file or directory
*If file.txt is in /path/to/search/spacesarebad/ it works, but notice there are no spaces. :-/
Awk's default separator is white space. Simply change it to something else by doing:
awk -F"\t" ...
Your script should look like:
dir=/path/to/destination/ | find /path/to/search -name file.txt | head -n 1 | awk -F"\t" -v dir="$dir" '{printf "cp \"%s\" \"%s\"\n", $1, dir}' | sh
As pointed by the comments, you don't really need all those steps, you could actually simply do (one-liner):
dir=/path/to/destination/ && path="$(find /path/to/search -name file.txt | head -n 1)" && cp "$path" "$dir"
Formated code (that may look better, in this case ^^):
dir=/path/to/destination/
path="$(find /path/to/search -name file.txt | head -n 1)"
cp "$path" "$dir"
The "" are used to assign the entire content of the string to the variable, causing the separator IFS, which is a white space by default, not to be considered over the string.
If you think spaces are bad, wait till you get into trouble with newlines. Consider for example:
mkdir spaces\ are\ bad
touch spaces\ are\ bad/file.txt
mkdir newlines$'\n'are$'\n'even$'\n'worse
touch newlines$'\n'are$'\n'even$'\n'worse/file.txt
And:
find . -name file.txt
The head command assumes newline delimiter. You can get around the space and newline issue with GNU find and GNU grep (maybe others) by using \0 delimiters:
find . -name file.txt -print0 | grep -zm1 . | xargs -0 cp -t "$dir"
You could try this.
awk '{print substr($0, index($0,$9))}'
For example this is the output of ls command:
-rw-r--r--. 1 root root 73834496 Dec 6 10:55 File with spaces 2
If you use simple awk like this
# awk '{print $9}'
It returns only
# File
If used with the full command
# awk '{print substr($0, index($0,$9))}'
I get the whole output
File with spaces 2
Here
substr(s, a, b) : it returns b number of chars from string s, starting at position a. The parameter b is optional.
For example if the match is addr:192.168.1.133 and you use substr as follows
# awk '{print substr($2,6)}'
You get the IP i.e 192.168.1.133. Note the 6 is the character starting from a in addr
So in the proper command the $2 is $0 ( which prints whole line.) and index($0,$9) matches $9 and prints everything ahead of column 9. You can change that to index($0,$8) and see that the output changes to
# 10:55 File with spaces 2
`index(IN, FIND)'
This searches the string IN for the first occurrence of the string
FIND, and returns the position in characters where that occurrence
begins in the string IN.
I hope it helps. Moreover if you are assigning this value to a variable in script then you need to enclose the variables in double quotes. Other wise you will get errors if you are doing some other operation for the extracted file name.

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Linux commands to output part of input file's name and line count

What Linux commands would you use successively, for a bunch of files, to count the number of lines in a file and output to an output file with part of the corresponding input file as part of the output line. So for example we were looking at file LOG_Yellow and it had 28 lines, the the output file would have a line like this (Yellow and 28 are tab separated):
Yellow 28
wc -l [filenames] | grep -v " total$" | sed s/[prefix]//
The wc -l generates the output in almost the right format; grep -v removes the "total" line that wc generates for you; sed strips the junk you don't want from the filenames.
wc -l * | head --lines=-1 > output.txt
produces output like this:
linecount1 filename1
linecount2 filename2
I think you should be able to work from here to extend to your needs.
edit: since I haven't seen the rules for you name extraction, I still leave the full name. However, unlike other answers I'd prefer to use head rather then grep, which not only should be slightly faster, but also avoids the case of filtering out files named total*.
edit2 (having read the comments): the following does the whole lot:
wc -l * | head --lines=-1 | sed s/LOG_// | awk '{print $2 "\t" $1}' > output.txt
wc -l *| grep -v " total"
send
28 Yellow
You can reverse it if you want (awk, if you don't have space in file names)
wc -l *| egrep -v " total$" | sed s/[prefix]//
| awk '{print $2 " " $1}'
Short of writing the script for you:
'for' for looping through your files.
'echo -n' for printing the current file
'wc -l' for finding out the line count
And dont forget to redirect
('>' or '>>') your results to your
output file

Resources