SHELL printing just right part after . (DOT) - bash

I need to find just extension of all files in directory (if there are 2 same extensions, its just one). I already have it. But the output of my script is like
test.txt
test2.txt
hello.iso
bay.fds
hellllu.pdf
Im using grep -e -e '.' and it just highlight DOTs
And i need just these extensions give in one variable like txt,iso,fds,pdf
Is there anyone who could help? I already had it one time but i had it on array. Today I found out It's has to work on dash too.

You can use find with awk to get all unique extensions:
find . -type f -name '?*.?*' -print0 |
awk -F. -v RS='\0' '!seen[$NF]++{print $NF}'

can be done with find as well, but I think this is easier
for f in *.*; do echo "${f##*.}"; done | sort -u
if you want to assign a comma separated list of the unique extensions, you can follow this
ext=$(for f in *.*; do echo "${f##*.}"; done | sort -u | paste -sd,)
echo $ext
csv,pdf,txt
alternatively with ls
ls -1 *.* | rev | cut -d. -f1 | rev | sort -u | paste -sd,
rev/rev is required if you have more than one dot in the filename, assuming the extension is after the last dot. For any other directory simply change the part *.* to dirpath/*.* in all scripts.
I'm not sure I understand your comment. If you don't assign to a variable, by default it will print to the output. If you want to pass directory name as a variable to a script, put the code into a script file and replace dirpath with $1, assuming that will be your first argument to the script
#!/bin/bash
# print unique extension in the directory passed as an argument, i.e.
ls -1 "$1"/*.* ...
if you have sub directories with extensions above scripts include them as well, to limit only to file types replace ls .. with
find . -maxdepth 1 -type f -name "*.*" | ...

Related

filename group by a pattern and select only one from each group

I have following files(as an example, 60000+ actually) and all the log files follows this pattern:
analyse-ABC008795-84865-201911261249.log
analyse-ABC008795-84866-201911261249.log
analyse-ABC008795-84867-201911261249.log
analyse-ABC008795-84868-201911261249.log
analyse-ABC008795-84869-201911261249.log
analyse-ABC008796-84870-201911261249.log
analyse-ABC008796-84871-201911261249.log
analyse-ABC008796-84872-201911261249.log
analyse-ABC008796-84873-201911261249.log
Only numbers get change in log files. I want to take one file from each category where files should be categorized by ABC.... number. So, as you can see, there are only two categories here:
analyse-ABC008795
analyse-ABC008796
So, what I want to have is one file(let's say first file) from each category. Output should look like this:
analyse-ABC008795-84865-201911261249.log
analyse-ABC008796-84870-201911261249.log
This should be done in Bash/linux environment, so that after I get this, I should use grep to check if my "searching string" contain in those files
ls -l | <what should I do to group and get one file from each category> | grep "searching string"
With bash and awk.
files=(*.log)
printf '%s\n' "${files[#]}" | awk -F- '!seen[$2]++'
Or use find instead of a bash array for a more portable approach.
find . -type f -name '*.log' | awk -F- '!seen[$2]++'
If your find has the -printf flag and you don't want the leading ./ from the filename add it before the pipe |
-printf '%f\n'
The !seen[$2]++ Remove second and subsequent instances of each input line, without having to sort them first. The $2 means the second field which -F is using.

How to grep files in date order

I can list the Python files in a directory from most recently updated to least recently updated with
ls -lt *.py
But how can I grep those files in that order?
I understand one should never try to parse the output of ls as that is a very dangerous thing to do.
You may use this pipeline to achieve this with gnu utilities:
find . -maxdepth 1 -name '*.py' -printf '%T#:%p\0' |
sort -z -t : -rnk1 |
cut -z -d : -f2- |
xargs -0 grep 'pattern'
This will handle filenames with special characters such as space, newline, glob etc.
find finds all *.py files in current directory and prints modification time (epoch value) + : + filename + NUL byte
sort command performs reverse numeric sort on first column that is timestamp
cut command removes 1st column (timestamp) from output
xargs -0 grep command searches pattern in each file
There is a very simple way if you want to get the filelist in chronologic order that hold the pattern:
grep -sil <searchpattern> <files-to-grep> | xargs ls -ltr
i.e. you grep e.g. "hello world" in *.txt, with -sil you make the grep case insensitive (-i), suppress messages (-s) and just list files (-l); this you then pass on to ls (| xargs), sorting it by date (-t) showing date (-l) and all files (-a).

How to print the amount of files in a folder (recursively) seperated by extensions?

For example, I have a folder containing files of different types (.jpg, .png, .txt, ..) and would like to know how many files of each extensions there is in my folder separatly.
The output would be something like this:
.jpg : 255
.png : 123
.txt : 12
No extension : 1
For now, I only know how to find how many files exist for one given extension using this command:
find /folderpath -type f -name '*.jpg' | wc -l
However I would like it to be able to find by itself the files extensions.
Thanks for your help.
You can do this for a single directory with:
ls | grep '\.' | sed 's/.*\././' | sort | uniq -c
(I'm ignoring files with no . - tweak if you want something else)
I'd suggest fleshing this out into a script (say, extension_counts) that takes a list of directories, and for each one outputs the path followed by the report in the format you wish.
Quick and dirty version:
#!/bin/sh
for dir in $*; do
echo $dir
(cd $dir && ls | grep '\.' | sed 's/.*\././' | sort | uniq -c)
done
... but you should consider hardening this.
Then for the recursive part, you can use find and xargs:
find . -type d | xargs extension_counts
You could be a bit smarter and do it all in one script file by defining extension_counts as a function, but that's an optimisation.
There are some pitfalls to parsing the output of ls (or find). In this case the only potential issue I can think of is filenames containing a newline (yes, this is possible). You could just accept that you're using a tool not designed for weird filenames, or you could write something more robust in a language with firmer data structures, such as Python, Perl, Ruby, Go, etc.
This could be done with a quick awk one liner as well:
find /folderpath -type f -name '*.*' | awk -F"." 'BEGIN{OFS=" : "}{extensions[$NF]++}END{for (ext in extensions) { print ext, extensions[ext]}};'
That awk script will split each line by a period -F"."
Set the OFS (Output Field Separator) by " : " BEGIN{OFS=" : "}
Load an array using the file extension for the key extensions[$NF] where $NF is the last field in the record. The value of the array will be a count ++.
When all the lines are processed we iterate the array for (ext in extensions) and print out the index and value {print ext, extensions[ext]}
I would proceed this way :
list the file names (rather than their paths produced by find) :
find . -type f | rev | cut -d/ -f1 | rev
We reverse each line so that we can easily address the last field
reduce to their extension :
sed -E 's/^.*\././;t end;s/.*/No extension/;:end'
Here we remove everything up to the first dot, or if the substitution could not be done (because there was no dot) we replace everything by "No extension".
sort the result :
sort
group by extension and add the count :
uniq -c
For a complete command as follows :
find . -type f | rev | cut -d/ -f1 | rev | sed -E 's/^.*\././;t end;s/.*/No extension/;:end' | sort | uniq -c
Note that the presentation differs from yours, which could be easily fixed with an additional sed :
2 .119
1 .147
[...]
1 .Xauthority
1 .xml
1 .xsession-errors
2 .zip
1 .zshrc
48 No extension

Bash, getting the latest folder based on its name which is a date

Can anyone tell me how to get the name of the latest folder based on its name which is formatted as a date, using bash. For example:
20161121/
20161128/
20161205/
20161212/
The output should be: 20161212
Just use GNU sort with -nr flags for based on reverse numerical sort.
find . ! -path . -type d | sort -nr | head -1
An example structure, I have a list of following folders in my current path,
find . ! -path . -type d
./20161121
./20161128
./20161205
./20161212
See how the sort picks up the folder you need,
find . ! -path . -type d | sort -nr
./20161212
./20161205
./20161128
./20161121
and head -1 for first entry alone,
find . ! -path . -type d | sort -nr | head -1
./20161212
to store it in a variable, use command-substitution $() as
myLatestFolder=$(find . ! -path . -type d | sort -nr | head -1)
Sorting everything seems like extra work if all you want is a single entry. It could especially be problematic if you need to sort a very large number of entries. Plus, you should note that find-based solutions will by default traverse subdirectories, which might or might not be what you're after.
$ shopt -s extglob
$ mkdir 20160110 20160612 20160614 20161120
$ printf '%d\n' 20+([0-9]) | awk '$1>d{d=$1} END{print d}'
20161120
$
While the pattern 20+([0-9]) doesn't precisely match dates (it's hard to validate dates without at least a couple of lines of code), we've at least got a bit of input validation via printf, and a simple "print the highest" awk one-liner to parse printf's results.
Oh, also, this handles any directory entries that are named appropriately, and does not validate that they are themselves directories. That too would require either an extra test or a different tool.
One method to require items to be directories would be the use of a trailing slash:
$ touch 20161201
$ printf '%s\n' 20+([0-9])/ | awk '$1>d{d=$1} END{print d}'
20161120/
But that loses the input validation (the %d format for printf).
If you felt like it, you could build a full pattern for your directory names though:
$ dates='20[01][0-9][01][0-9][0-3][0-9]'
$ printf '%s\n' $dates/ | awk '$1>d{d=$1} END{print d}'
20161120/

To show only file name without the entire directory path

ls /home/user/new/*.txt prints all txt files in that directory. However it prints the output as follows:
[me#comp]$ ls /home/user/new/*.txt
/home/user/new/file1.txt /home/user/new/file2.txt /home/user/new/file3.txt
and so on.
I want to run the ls command not from the /home/user/new/ directory thus I have to give the full directory name, yet I want the output to be only as
[me#comp]$ ls /home/user/new/*.txt
file1.txt file2.txt file3.txt
I don't want the entire path. Only filename is needed. This issues has to be solved using ls command, as its output is meant for another program.
ls whateveryouwant | xargs -n 1 basename
Does that work for you?
Otherwise you can (cd /the/directory && ls) (yes, parentheses intended)
No need for Xargs and all , ls is more than enough.
ls -1 *.txt
displays row wise
There are several ways you can achieve this. One would be something like:
for filepath in /path/to/dir/*
do
filename=$(basename $filepath)
... whatever you want to do with the file here
done
Use the basename command:
basename /home/user/new/*.txt
(cd dir && ls)
will only output filenames in dir. Use ls -1 if you want one per line.
(Changed ; to && as per Sactiw's comment).
you could add an sed script to your commandline:
ls /home/user/new/*.txt | sed -r 's/^.+\///'
A fancy way to solve it is by using twice "rev" and "cut":
find ./ -name "*.txt" | rev | cut -d '/' -f1 | rev
The selected answer did not work for me, as I had spaces, quotes and other strange characters in my filenames. To quote the input for basename, you should use:
ls /path/to/my/directory | xargs -n1 -I{} basename "{}"
This is guaranteed to work, regardless of what the files are called.
I prefer the base name which is already answered by fge.
Another way is :
ls /home/user/new/*.txt|awk -F"/" '{print $NF}'
one more ugly way is :
ls /home/user/new/*.txt| perl -pe 's/\//\n/g'|tail -1
just hoping to be helpful to someone as old problems seem to come back every now and again and I always find good tips here.
My problem was to list in a text file all the names of the "*.txt" files in a certain directory without path and without extension from a Datastage 7.5 sequence.
The solution we used is:
ls /home/user/new/*.txt | xargs -n 1 basename | cut -d '.' -f1 > name_list.txt
There are lots of way we can do that and simply you can try following.
ls /home/user/new | tr '\n' '\n' | grep .txt
Another method:
cd /home/user/new && ls *.txt
Here is another way:
ls -1 /home/user/new/*.txt|rev|cut -d'/' -f1|rev
You could also pipe to grep and pull everything after the last forward slash. It looks goofy, but I think a defensive grep should be fine unless (like some kind of maniac) you have forward slashes within your filenames.
ls folderpathwithcriteria | grep -P -o -e "[^/]*$"
When you want to list names in a path but they have different file extensions.
me#server:/var/backups$ ls -1 *.zip && ls -1 *.gz

Resources