Find unmatched list from directory files in unix

Find unmatched list from directory files in unix - bash

I have a file called samples.list with sample IDs. I have same files in my directory that I want to pattern match with my sample.list and get the output of unmatched sample.list.
samples.list
SRR1369385
SRR1352799
SRR1377262
SRR1400622
ls -lh
-rw-rw----+ 1 gen dbgap_6109 2.2G Jul 29 02:44 SRR1369385_1.fastq.gz
-rw-rw----+ 1 gen dbgap_6109 2.2G Jul 29 02:44 SRR1369385_2.fastq.gz
-rw-rw----+ 1 gen dbgap_6109 1.2G Jul 29 03:34 SRR1352799_1.fastq.gz
-rw-rw----+ 1 gen dbgap_6109 1.2G Jul 29 03:34 SRR1352799_2.fastq.gz
-rw-rw----+ 1 gen tnt_pipeli 2.2G Jul 29 01:44 sometxt.txt
The output I want (samples that did not match with the file names in the directory):
SRR1377262
SRR1400622
code I tried:
grep -oFf `cat samples.list` ls -lh | grep -vFf - `cat samples.list`
I would really appreciate if someone could guide me through the solution.

# find all files named in the way you want and print filenames
find . -maxdepth 1 -type f -name '*_*.fastq.gz' -printf "%f\n" |
# Remove all everything except the SRR=numbers
sed 's/_.*//' |
# Sort the list, remove duplicate elements
sort -u |
# join the list with samples and print only unmatched elements from samples
join -v1 -o 1.1 <(sort samples.list) -
Tested on repl.
Notes:
do not use backticks `. Prefer $(...) instead. obsolete and deprecated syntax bashhackers wiki
greps -f options takes filename not content of the file. You could do grep -f some_file.txt to grep with all regexes stored in some_file.txt.
ls -lh produces output to stdout. Running grep ls -lh would make grep want to search file named ls (and what for -l and -h if you want to search filenames?). While you could ls -1 | grep, but it's better to find . -maxdepth 1 -mindepth 1 | grep ...

Try this:
awk -F_ 'NR==FNR{a[$1]=1;next}!($0 in a)' <(ls) samples.list
First that will index everything until _ from ls for each output line (NR==FNR is true for these lines), and then find all unmatched lines in samples.list (»if a line is not indexed, print it«).

Related

How to get a particular field from ls output

i am trying to get the info of the latest folder created in a particular path.
Here i am using the below command to fetch and filter the results so that i get only folders starting with 11,12,19:
ls_info=$(ls -lrt /orcl/grid/product |grep '11\|12\|19')
The output of ls_info is :
total 12
drwxrwx--- 3 oragrid oinstall 4096 May 21 2014 11.2.0.3
drwxr-xr-x 3 oragrid oinstall 4096 Feb 25 2019 11.2.0.4
How can i fetch "11.2.0.4" from this,which is the latest created folder.
Please suggest.Thanks.

Do not parse ls. Use find instead. First get the list of directories you want and print the directories with the modification timestamps. Then sort the list, filter newest line and remove the timestamp. With GNU utilities you can:
find /orcl/grid/product -mindepth 1 -maxdepth 1 -type d '(' -name '11*' -o -name '12*' -o -name '19*' ')' -printf "%Ts\t%f\n" | sort -n | cut -f2- | tail -n1

Bash - Version Numbers in Filenames. How to list latest versions only?

I have a directory of versioned files. The version of each file is indicated within it's filename, e.g. "_v1".
Example
List of files shown by ls:
123_FileA_v1.txt
123_FileA_v2.txt
132_FileB_v1.txt
I want to run a command to see only the latest versions:
123_FileA_v2.txt
132_FileB_v1.txt
My first attempt was to list files by mtime using
ls -ltr
But in my case this doesn't lead to sufficient results. I really want to collect versions from the filenames.
What would be the best way to do it?

This will do it :
ls | awk -F '_' '!prefixes[$1]++'
Hope it helps!
Edit :
If you want to see specific info you can do :
ls | awk -F '_' '!prefixes[$1]++' | xargs ls -lh
This will work as long as there are not spaces in your filenames.
Edit :
As requested by #PaulHodges, here is the sample output :_
$ ls -lh
total 0
drwxr-xr-x 5 Matias-Barrios Matias-Barrios 160B Feb 27 11:40 .
drwxr-xr-x 106 Matias-Barrios Matias-Barrios 3.3K Feb 27 11:39 ..
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 132_FileB_v1.txt
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 123_FileA_v2.txt
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 123_FileA_v1.txt
$ ls | awk -F '_' '!prefixes[$1]++'
.
..
132_FileB_v1.txt
123_FileA_v2.txt

You could do something like
(
PATTERN="[0-9]{3}_[^_]*"
for prefix in `find . | egrep -o "$PATTERN" | sort -u`;
do
ls $prefix* | tail -1;
done
)
It will print
123_FileA_v2.txt
132_FileB_v1.txt
What happens here?
The surrounding braces ( are used to support copy & paste of the provided code. read more
The variable PATTERN is used to access all files starting with the same prefix.
The for prefix in `find . | egrep -o "$PATTERN" | sort -u generates a list of file prefixes.
The ls $prefix* lists all files with the same prefix in alphanumerical order
The | tail -1 shows only the last entry of the former ls $prefix*
Edit
I decided to use find . instead of ls *. With that I hope to circumvent the issues with ls *. Please correct me, if I'm wrong!

Read first line from latest file

I’m working on requirement where I need to read first line from latest file under a directory. In a directory I can have multiple files but I want to read first line of latest file out all files which are having PPP in their file name.
I know how to read first line of file and write into a file
head -n 1 jsonPPPvp.txt > output.txt
But how can I pick latest file ( as per the time stamp) out of all files in a directory which are having PPP in it..?
Any suggestions please...!
I’ve written a command

Using find with -print0 and xargs -0 in a command substitution
Your optimal solution, though still requiring 4 subshells, will protect against all caveats in filenames by having find output nul-terminated filenames that can be used in conjunction with xargs -0 to form a nul-terminated list of filenames to be passed to ls for sorting in reverse selecting the last file with tail -n1 and the first line in that file with head -n1.
Using the -maxdepth 1 option to find limits the search to the current directory and prevent recursing into subdirectories (remove it if you want to search the entire directory tree below the current directory), e.g.
head -n1 $(find . -maxdepth 1 -type f -name "*PPP*" -print0 |
xargs -0 ls -rt |
tail -n 1)
In addition to working with nul-terminated filenames, it will benefit from letting xargs form the list to sort rather than looping to find the newest.

It is not maybe the best solution but it works (by latest file, I have considered the file modified with the most recent timestamp ):
ls -ltra
total 32
drwxr-xr-x 3 allanrobert primarygroup 4096 Feb 15 17:37 ..
drwxr-xr-x 2 allanrobert primarygroup 4096 Feb 15 17:37 .
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 file2PPP2
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 other
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 file3PPP3
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 other2
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 other1
-rw-r--r-- 1 allanrobert primarygroup 6 Feb 15 17:40 file1PPP
file content:
cat file1PPP
a
b
c
Command:
find . -type f -maxdepth 1 -name '*PPP*' -printf '%T+ %p\n' | sort -r | head -1 | cut -d' ' -f2 | xargs head -1
a
Beware of spaces in filenames!

temp = `ls -Art | head -n 1 `
head -1 $temp

head -n 1 $(find ./ -name "*PPP*" -type f | xargs ls -rt1 | tail -n 1)
The drawback of the command above is that you must have a *PPP* file in your directory, otherwise the command produces wrong result.

You can also try this:
ls -tr | grep "PPP" | tail -n 1 | xargs head -n 1

concatenate grep output to an echo statement in UNIX

I am trying to output the number of directories in a given path on a SINGLE line. My desire is to output this:
X-many directories
Currently, with my bash sript, I get this:
X-many
directories
Here's my code:
ARGUMENT=$1
ls -l $ARGUMENT | egrep -c '^drwx'; echo -n "directories"
How can I fix my output? Thanks

I suggest
echo "$(ls -l "$ARGUMENT" | egrep -c '^drwx') directories"
This uses the shell's feature of final newline removal for command substitution.

Do not pipe to ls output and count directories as you can get wrong results if special characters have been used in file/directory names.
To count directories use:
shopt -s nullglob
arr=( "$ARGUMENT"/*/ )
echo "${#arr[#]} directories"
/ at the end of glob will make sure to match only directories in "$ARGUMENT" path.
shopt -s nullglob is to make sure to return empty results if glob pattern fails (no directory in given argument).

as alternative solution
$ bc <<< "$(find /etc -maxdepth 1 -type d | wc -l)-1"
116
another one
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ~/etc | grep ^d); echo ${count}
116
Would work correctly with spaces in the folder name
$ ls -la
total 20
drwxrwxr-x 5 alex alex 4096 Jun 30 18:40 .
drwxr-xr-x 11 alex alex 4096 Jun 30 16:41 ..
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 asdasd
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 dfgerte
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 somefoler with_space
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ./ | grep ^d); echo ${count}
3

Bash script to get privileges difference of files in two directories

How can I write a bash script on Linux to determine which files in two directories have different permissions?
For example, I have two directories:
fold1 having two files:
1- file1 (-rw-rw-r--)
2- file2 (-rw-rw-r--)
fold2 having same-name files with different permissions:
1- file1 (-rwxrwxr-x)
2- file2 (-rw-rw-r--)
I need a script to output the file names that have different permissions,
so the script will print only file1
I am currently checking the permissions manually by displaying the files with:
for i in `find .`; do ls -l $i ls -l ../file2/$i; done

Parsing find . output with: for i in $(find .) is going to give you trouble for any filenames with spaces, newlines, or other perfectly normal characters:
$ touch "one file"
$ for i in `find .` ; do ls -l $i ; done
total 0
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 17:30 one file
ls: cannot access ./one: No such file or directory
ls: cannot access file: No such file or directory
$
Since permissions can also differ by owner or by group, I think you should include those as well. If you need to include the SELinux security label, the stat(1) program makes that easy to get as well via the %C directive:
for f in * ; do stat -c "%a%g%u" "$f" "../scatman/${f}" |
sort | uniq -c | grep -q '^\s*1' && echo "$f" is different ; done
(Do whatever you want for the echo command...)
Example:
$ ls -l sarnold/ scatman/
sarnold/:
total 0
-r--r--r-- 1 sarnold sarnold 0 2012-02-08 18:00 funky file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:01 second file
-rw-r--r-- 1 root root 0 2012-02-08 18:05 third file
scatman/:
total 0
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 17:30 funky file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:01 second file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:05 third file
$ cd sarnold/
$ for f in * ; do stat -c "%a%g%u" "$f" "../scatman/${f}" | sort | uniq -c | grep -q '^\s*1' && echo "$f" is different ; done
funky file is different
third file is different
$

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find unmatched list from directory files in unix - bash

Try this: awk -F_ 'NR==FNR{a[$1]=1;next}!($0 in a)' <(ls) samples.list First that will index everything until _ from ls for each output line (NR==FNR is true for these lines), and then find all unmatched lines in samples.list (»if a line is not indexed, print it«).

Related

How to get a particular field from ls output

Bash - Version Numbers in Filenames. How to list latest versions only?

Read first line from latest file

concatenate grep output to an echo statement in UNIX

Bash script to get privileges difference of files in two directories

Categories

Resources