Scanning, group and count file extensions on Linux - bash

Is there a way to scan a path and group and count the file extensions?

If I understand your question, you can use this command -
ls -ls | awk '{print $10}' | grep "\." | awk -F. '{print $2}' | sort | uniq -c
which count the extensions in the current path.

How to count files
To count how many files for each extension are present in a path, you can use one of the answers of "Count files in a directory by extension" question of another site[1], e.g.:
ls | awk -F . '{print $NF}' | sort | uniq -c | awk '{print $2,$1}'
How to list grouped by extension
To group the files by extension you can use simply the -X option of ls
ls -X
--sort=WORD
sort by WORD instead of name:
none -U, extension -X, size -S, time -t, version -v
Note:
The concept of extension is imported from DOS, under Unix there is only the file name eventually with more than one '.' character inside...

Related

awk issue, summing lines in various files

I have a list of files starting with the word "output", and I want to sum up the total number of rows in all the files.
Here's my strategy:
for f in `find outpu*`;do wc -l $f | awk '{x+=$1}END{print $1}' ; done
Before piping over, if there were a way I could do something like >> to a temporary variable and then run the awk command after, I could accomplish this goal.
Any tips?
use this to see details and sum :
wc -l output*
and this to see only the sum:
wc -l output* | tail -n1 | cut -d' ' -f1
Here is some stuff for fun, check it out:
grep -c . out* | cut -d':' -f2- | paste -sd+ | bc
all lines, including empty ones:
grep -c '' out* | cut -d':' -f2- | paste -sd+ | bc
you can play in grep with conditions on lines in files
Watch out, this find command will only find stuff in your current directory if there is one file matching outpu*.
One way of doing it:
awk 'END{print NR}' $(find 'outpu*')
Provided that there is not an insane amount of matching filenames that overflows the maximum command length limit of your shell.

Append xargs argument number as prefix

I want to analyze the most frequentry occuring entries in (column of) a logfile. To write the detail results, I am creating new directories from the output of something along the lines of
cat logs| cut -d',' -f 6 | sort | uniq -c | sort -rn | head -10 | \
awk '{print $2}' |xargs mkdir -p
Is there a way to create the directories with the sequence number of the argument as processed by xargs as a prefix? For e.g. For e.g. "oranges" is the most frequent entry (of the column) the directory created should be named "1.oranges" and so on.
A quick (and dirty?) solution could be to pipe your directory names through cat -n in their proper order and then remove the whitespace separating the line number from the directory name, before passing them to xargs.
A better solution would be to modify your awk command:
... | awk '{ print NR "." $2 }' | xargs mkdir -p
The NR variable contains the record (i.e. line) number.

How would i search inside a tar file for executable files and use that number as a variable?

So far i have this:
Executables=$(cd $tarchive | awk ‘{print $1 }’ | grep –c ‘d’)
This searches for .sh files and counts them, in theory. I would like to do the same thing but for executable files.
Try
Executables=`tar -tvf archive.tar | grep "^...x" | awk '{print $6}'`
Explanations:
`command` is the same as $(command)
tar -tvf archive.tar lists the contents of archive.tar in verbose mode (necessary to get file permissions)
grep "^...x" gets all lines that start with three symbols (. stands for anything), followed by an x (meaning that this is an executable file)
awk '{print $6}' prints the 6th column, in this case the filename

find a directory with most recent update and with a particular prefix in name

I have number of directories starting with same prefix (say foo123, foo345, foo234, foo456h,..) in a particular directory.
Now I want to find directory with prefix foo which is created(modified) most recently. What would be best way to do this job?
very similar to the others
ls -ltrd foo* | tail -1 | awk '{print $8}'
or if you want the list do it without awk, awk is just returning the name
ls -ltr | grep '^d' | awk '{print $8}' | grep '^foo' | tail -1
YF
how about:
ls -tdF foo*|grep '/$'|head -1

Linux commands to output part of input file's name and line count

What Linux commands would you use successively, for a bunch of files, to count the number of lines in a file and output to an output file with part of the corresponding input file as part of the output line. So for example we were looking at file LOG_Yellow and it had 28 lines, the the output file would have a line like this (Yellow and 28 are tab separated):
Yellow 28
wc -l [filenames] | grep -v " total$" | sed s/[prefix]//
The wc -l generates the output in almost the right format; grep -v removes the "total" line that wc generates for you; sed strips the junk you don't want from the filenames.
wc -l * | head --lines=-1 > output.txt
produces output like this:
linecount1 filename1
linecount2 filename2
I think you should be able to work from here to extend to your needs.
edit: since I haven't seen the rules for you name extraction, I still leave the full name. However, unlike other answers I'd prefer to use head rather then grep, which not only should be slightly faster, but also avoids the case of filtering out files named total*.
edit2 (having read the comments): the following does the whole lot:
wc -l * | head --lines=-1 | sed s/LOG_// | awk '{print $2 "\t" $1}' > output.txt
wc -l *| grep -v " total"
send
28 Yellow
You can reverse it if you want (awk, if you don't have space in file names)
wc -l *| egrep -v " total$" | sed s/[prefix]//
| awk '{print $2 " " $1}'
Short of writing the script for you:
'for' for looping through your files.
'echo -n' for printing the current file
'wc -l' for finding out the line count
And dont forget to redirect
('>' or '>>') your results to your
output file

Resources