how to find the last modified file and then extract it - shell

Say I have 3 archrive file:
a.7z
b.7z
c.7z
What I want is to find the last modified archrive file and then extract it
1st: find the last modified
2nd: extract it
1st:
ls -t | head -1
My question is how to approach 2nd by using "|" at the end of 1st command

You can do it like that:
7z e `ls -t | head -1`
Use `` to embed the first command.

You can use the below code for writing more than 1 command together in a single line.
ls -t | head -1 && 7z e <file_name>.tar.7z command for the extracting .7z file

Here is a safer method of extracting last modified file in a directory:
find . -maxdepth 1 -type f -printf "%T#\0%p\0\0" |
awk -F '\0' -v RS='\0\0' '$1 > maxt{maxt=$1; maxf=$2} END{printf "%s%s", maxf, FS}' |
xargs -0 7z e
This required gnu find and gnu awk.
-printf option is using single NUL character or \0' as field separator and 2 NUL characters \0\0 as record separator for awk.

Related

Applying awk pattern to all files with same name, outputting each to a new file

I'm trying to recursively find all files with the same name in a directory, apply an awk pattern to them, and then output to the directory where each of those files lives a new updated version of the file.
I thought it was better to use a for loop than xargs, but I don't exactly how to make this work...
for f in $(find . -name FILENAME.txt );
do awk -F"\(corr\)" '{print $1,$2,$3,$4}' ./FILENAME.txt > ./newFILENAME.txt $f;
done
Ultimately I would like to be able to remove multiple strings from the file at once using -F, but also not sure how to do that using awk.
Also is there a way to remove "(cor*)" where the * represents a wildcard? Not sure how to do while keeping with the escape sequence for the parentheses
Thanks!
To use (corr*) as a field separator where * is a glob-style wildcard, try:
awk -F'[(]corr[^)]*[)]' '{print $1,$2,$3,$4}'
For example:
$ echo '1(corr)2(corrTwo)3(corrThree)4' | awk -F'[(]corr[^)]*[)]' '{print $1,$2,$3,$4}'
1 2 3 4
To apply this command to every file under the current directory named FILENAME.txt, use:
find . -name FILENAME.txt -execdir sh -c 'awk -F'\''[(]corr[^)]*[)]'\'' '\''{print $1,$2,$3,$4}'\'' "$1" > ./newFILENAME.txt' Awk {} \;
Notes
Don't use:
for f in $(find . -name FILENAME.txt ); do
If any file or directory has whitespace or other shell-active characters in it, the results will be an unpleasant surprise.
Handling both parens and square brackets as field separators
Consider this test file:
$ cat file.txt
1(corr)2(corrTwo)3[some]4
To eliminate both types of separators and print the first four columns:
$ awk -F'[(]corr[^)]*[)]|[[][^]]*[]]' '{print $1,$2,$3,$4}' file.txt
1 2 3 4

SHELL printing just right part after . (DOT)

I need to find just extension of all files in directory (if there are 2 same extensions, its just one). I already have it. But the output of my script is like
test.txt
test2.txt
hello.iso
bay.fds
hellllu.pdf
Im using grep -e -e '.' and it just highlight DOTs
And i need just these extensions give in one variable like txt,iso,fds,pdf
Is there anyone who could help? I already had it one time but i had it on array. Today I found out It's has to work on dash too.
You can use find with awk to get all unique extensions:
find . -type f -name '?*.?*' -print0 |
awk -F. -v RS='\0' '!seen[$NF]++{print $NF}'
can be done with find as well, but I think this is easier
for f in *.*; do echo "${f##*.}"; done | sort -u
if you want to assign a comma separated list of the unique extensions, you can follow this
ext=$(for f in *.*; do echo "${f##*.}"; done | sort -u | paste -sd,)
echo $ext
csv,pdf,txt
alternatively with ls
ls -1 *.* | rev | cut -d. -f1 | rev | sort -u | paste -sd,
rev/rev is required if you have more than one dot in the filename, assuming the extension is after the last dot. For any other directory simply change the part *.* to dirpath/*.* in all scripts.
I'm not sure I understand your comment. If you don't assign to a variable, by default it will print to the output. If you want to pass directory name as a variable to a script, put the code into a script file and replace dirpath with $1, assuming that will be your first argument to the script
#!/bin/bash
# print unique extension in the directory passed as an argument, i.e.
ls -1 "$1"/*.* ...
if you have sub directories with extensions above scripts include them as well, to limit only to file types replace ls .. with
find . -maxdepth 1 -type f -name "*.*" | ...

Find files that contain string match1 but does not contain match2

I am writing a shell script to find files which contain string "match1" AND does not contain "match2".
I can do this in 2 parts:
grep -lr "match1" * > /tmp/match1
grep -Lr "match2" * > /tmp/match2
comm -12 /tmp/match1 /tmp/match2
Is there a way I can achieve this directly without going through the process of creating temporary files ?
With bash's process substitution:
comm -12 <(grep -lr "match1" *) <(grep -Lr "match2" *)
Using GNU awk for multi-char RS:
awk -v RS='^$' '/match1/ && !/match2/ {print FILENAME}' *
I would use find together with awk. awk can check both matches in a single run, meaning you don't need to process all the files twice:
find -maxdepth 1 -type f -exec awk '/match1/{m1=1}/match2/{m2=1} END {if(m1 && !m2){print FILENAME}}' {} \;
Better explained in multiline version:
# Set flag if match1 occurs
/match1/{m1=1}
# Set flag if match2 occurs
/match2/{m2=1}
# After all lines of the file have been processed print the
# filename if match1 has been found and match2 has not been found.
END {if(m1 && !m2){print FILENAME}}
Is there a way I can achieve this directly without going through the process of creating temporary files ?
Yes. You can use pipelines and xargs:
grep -lr "match1" * | xargs grep -Lr "match2"
The first grep prints the names of files containing matches to its standard output, as you know. The xargs command reads those file names from its standard input, and converts them into arguments to the second grep command, appending them after the ones already provided.
You can initially search for the files containing match1 and then using xargspass it to other grep using -L or --files-without-match option.
grep -lr "match1" *|xargs grep -L "match2"

Sorting through numbered files for program execution

I have many files with the same format: mubunching-100302.0003.001_1c, mubunching-100302.0005.001_1c ...
I would like to feed a program many of these files that have a minimum value, e.g. only files with index *.0005.* and greater:
python Code.py mubunching-100302.0005.001_1c mubunching-100302.0008.001_1c ...
I am fairly new to bash and am not sure where to begin. Thanks for any help and suggestions!
You can get a list of all files matching your criteria like this:
ls | awk -F. '$2 >= 5 {print}'
This has awk compare the 2nd . delimited field against 5, and only print out names for which this is true. If you want to then process these files with you Python script:
ls | awk -F. '$2 >= 5 {print}' | xargs python Code.py
For example, given a directory containing:
$ ls
mubunching-100302.0002.001_1c mubunching-100302.0005.001_1c
mubunching-100302.0003.001_1c mubunching-100302.0008.001_1c
This first command above will produce:
$ ls | awk -F. '$2 >= 5 {print}'
mubunching-100302.0005.001_1c
mubunching-100302.0008.001_1c
You could use find and awk to get the list of desired filenames:
find . -type f -name "mubunching*" | awk -F'[.]' '$(NF-1)>=5'
In order to pass the list to your program, use process substitution:
python Code.py $(find . -type f -name "mubunching*" | awk -F'[.]' '$(NF-1)>=5')

Bash/Shell - paths with spaces messing things up

I have a bash/shell function that is supposed to find files then awk/copy the first file it finds to another directory. Unfortunately if the directory that contains the file has spaces in the name the whole thing fails, since it truncates the path for some reason or another. How do I fix it?
If file.txt is in /path/to/search/spaces are bad/ it fails.
dir=/path/to/destination/ | find /path/to/search -name file.txt | head -n 1 | awk -v dir="$dir" '{printf "cp \"%s\" \"%s\"\n", $1, dir}' | sh
cp: /path/to/search/spaces: No such file or directory
*If file.txt is in /path/to/search/spacesarebad/ it works, but notice there are no spaces. :-/
Awk's default separator is white space. Simply change it to something else by doing:
awk -F"\t" ...
Your script should look like:
dir=/path/to/destination/ | find /path/to/search -name file.txt | head -n 1 | awk -F"\t" -v dir="$dir" '{printf "cp \"%s\" \"%s\"\n", $1, dir}' | sh
As pointed by the comments, you don't really need all those steps, you could actually simply do (one-liner):
dir=/path/to/destination/ && path="$(find /path/to/search -name file.txt | head -n 1)" && cp "$path" "$dir"
Formated code (that may look better, in this case ^^):
dir=/path/to/destination/
path="$(find /path/to/search -name file.txt | head -n 1)"
cp "$path" "$dir"
The "" are used to assign the entire content of the string to the variable, causing the separator IFS, which is a white space by default, not to be considered over the string.
If you think spaces are bad, wait till you get into trouble with newlines. Consider for example:
mkdir spaces\ are\ bad
touch spaces\ are\ bad/file.txt
mkdir newlines$'\n'are$'\n'even$'\n'worse
touch newlines$'\n'are$'\n'even$'\n'worse/file.txt
And:
find . -name file.txt
The head command assumes newline delimiter. You can get around the space and newline issue with GNU find and GNU grep (maybe others) by using \0 delimiters:
find . -name file.txt -print0 | grep -zm1 . | xargs -0 cp -t "$dir"
You could try this.
awk '{print substr($0, index($0,$9))}'
For example this is the output of ls command:
-rw-r--r--. 1 root root 73834496 Dec 6 10:55 File with spaces 2
If you use simple awk like this
# awk '{print $9}'
It returns only
# File
If used with the full command
# awk '{print substr($0, index($0,$9))}'
I get the whole output
File with spaces 2
Here
substr(s, a, b) : it returns b number of chars from string s, starting at position a. The parameter b is optional.
For example if the match is addr:192.168.1.133 and you use substr as follows
# awk '{print substr($2,6)}'
You get the IP i.e 192.168.1.133. Note the 6 is the character starting from a in addr
So in the proper command the $2 is $0 ( which prints whole line.) and index($0,$9) matches $9 and prints everything ahead of column 9. You can change that to index($0,$8) and see that the output changes to
# 10:55 File with spaces 2
`index(IN, FIND)'
This searches the string IN for the first occurrence of the string
FIND, and returns the position in characters where that occurrence
begins in the string IN.
I hope it helps. Moreover if you are assigning this value to a variable in script then you need to enclose the variables in double quotes. Other wise you will get errors if you are doing some other operation for the extracted file name.

Resources