Extract part of a filename shell script - bash

In bash I would like to extract part of many filenames and save that output to another file.
The files are formatted as coffee_{SOME NUMBERS I WANT}.freqdist.
#!/bin/sh
for f in $(find . -name 'coffee*.freqdist)
That code will find all the coffee_{SOME NUMBERS I WANT}.freqdist file. Now, how do I make an array containing just {SOME NUMBERS I WANT} and write that to file?
I know that to write to file one would end the line with the following.
> log.txt
I'm missing the middle part though of how to filter the list of filenames.

You can do it natively in bash as follows:
filename=coffee_1234.freqdist
tmp=${filename#*_}
num=${tmp%.*}
echo "$num"
This is a pure bash solution. No external commands (like sed) are involved, so this is faster.
Append these numbers to a file using:
echo "$num" >> file
(You will need to delete/clear the file before you start your loop.)

If the intention is just to write the numbers to a file, you do not need find command:
ls coffee*.freqdist
coffee112.freqdist coffee12.freqdist coffee234.freqdist
The below should do it which can then be re-directed to a file:
$ ls coffee*.freqdist | sed 's/coffee\(.*\)\.freqdist/\1/'
112
12
234
Guru.

The previous answers have indicated some necessary techniques. This answer organizes the pipeline in a simple way that might apply to other jobs as well. (If your sed doesn't support ‘;’ as a separator, replace ‘;’ with ‘|sed’.)
$ ls */c*; ls c*
fee/coffee_2343.freqdist
coffee_18z8.x.freqdist coffee_512.freqdist coffee_707.freqdist
$ find . -name 'coffee*.freqdist' | sed 's/.*coffee_//; s/[.].*//' > outfile
$ cat outfile
512
18z8
2343
707

Related

Script that lists all file names in a folder, along with some text after each name, into a txt file

I need to create a file that lists all the files in a folder into a text file, along with a comma and the number 15 after. For example
My folder has video.mp4, video2.mp4, picture1.jpg, picture2.jpg, picture3.png
I need the text file to read as follows:
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
picture3.png,15
No spaces, just filename.ext,15 on each line. I am using a raspberry pi. I am aware that the command ls > filename.txt would put all the file names into a folder, but how would I get a ,15 after every line?
Thanks
bash one-liner:
for f in *; do echo "$f,15" >> filename.txt; done
To avoid opening the output file on each iteration you may redirect the entire output with > filename.txt:
for f in *; do echo "$f,15"; done > filename.txt
$ printf '%s,15\n' *
picture1.jpg,15
picture2.jpg,15
picture3.png,15
video.mp4,15
video2.mp4,15
This will work if those are the only files in the directory. The format specifier %s,15\n will be applied to each of printf's arguments (the names in the current directory) and they will be outputted with ,15 appended (and a newline).
If there are other files, then the following would work too, regardless of whether there are files called like this or not:
$ printf '%s,15\n' video.mp4 video2.mp4 picture1.jpg picture2.jpg "whatever this is"
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
whatever this is,15
Or, on all MP4, PNG and JPEG files:
$ printf '%s,15\n' *.mp4 *.jpg *.png
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
picture3.png,15
Then redirect this to a file with printf ...as above... >output.txt.
If you're using Bash, then this will not make use of any external utility, as printf is built into the shell.
You need to do something like this:
#!/bin/bash
for i in $(ls folder_name); do
echo $i",15" >> filename.txt;
done
It's possible to do this in one line, however, if you want to create a script, consider code readability in the long run.
Edit 1: better solution
As #CristianRamon-Cortes suggested in the comments below, you should not rely on the output of ls because of the problems explained in this discussion: why not parse ls. As such, here's how you should write the script instead:
#!/bin/bash
cd folder_name
for i in *; do
echo $i",15" >> filename.txt;
done
You can skip the part cd folder_name if you are already in the folder.
Edit 2: Enhanced solution:
As suggested by #kusalananda, you'd better do the redirection after done to avoid opening the file in each iteration of the for loop, so the script will look like this:
#!/bin/bash
cd folder_name
for i in *; do
echo $i",15";
done > filename.txt
Just 1 command line using 2 msr commands recusively (-r) search specific files:
msr -rp your-dir1,dir2,dirN -l -f "\.(mp4|jpg|png)$" -PAC | msr -t .+ -o '$0,15' -PIC > save-file.txt
If you want to sort by time, add --wt to first command like: msr --wt -l -rp your-dirs
Sort by size? Add --sz but only the prior one is effective if use both --sz and --wt.
If you want to exclude some directory, add like: --nd "^(test|garbage)$"
remove tail \r\n in save-file.txt : msr -p save-file.txt -S -t "\s+$" -o "" -R
See msr.exe / msr.gcc48 etc in my open project https://github.com/qualiu/msr tools directory.
A solution without a loop:
ls | xargs -i echo {},15 > filename.txt

Using grep to find and estimate the total # of shell scripts in the current dir

New to UNIX, currently learning UNIX via secureshell in a class. We've been given a few basic assignments such as creating loops and finding files. Our last assignment asked us to
write code that will estimate the number of shell scripts in the current directory and then print out that total number as "Estimated number of shell script files in this directory:"
Unlike in our previous assignments we are now allowed to use conditional loops, we are encouraged to use grep and wc statements.
On a basic level I know I can enter
ls * .sh
to find all shell scripts in the current directory. Unfortunately, this doesn't estimate the total number or use grep. Hence my question, I imagine he wants us to go
grep -f .sh (or something)
but I'm not exactly sure if I am on the right path and would greatly appreciate any help.
Thank You
You can do it like:
echo "Estimated number of shell script files in this directory:" `ls *.sh | wc -l`
I'd do it this way:
find . -executable -execdir file {} + | egrep '\.sh: | Bourne| bash' | wc -l
Find all files in the current directory (.) which are executable.
For each file, run the file(1) command, which tries to guess what type of file it is (not perfect).
Grep for known patterns: filenames ending with .sh, or file types containing "Bourne" or "bash".
Count lines.
Huhu, there's a trap, .sh file are not always shell script as the extension is not mandatory.
What tells you this is a shell script will be the Shebang #!/bin/*sh ( I put a * as it could be bash, csh, tcsh, zsh, which are shells) at top of line, hence the hint to use grep, so the best answer would be:
grep '^#!/bin/.*sh' * | wc -l
This give output:
sensible-pager:#!/bin/sh
service:#!/bin/sh
shelltest:#!/bin/bash
smbtar:#!/bin/sh
grep works with regular expression by default, so the match #!/bin/.*sh will match files with a line starting (the ^) by #!/bin/ followed by 0 or unlimited characters .* followed by sh
You may test regex and get explanation of them on http://regex101.com
Piping the result to wc -l to get the number of files containing this.
To display the result, backticks or $() in an echo line is ok.
grep -l <string> *
will return a list of all files that contain in the current directory. Pipe that output into wc -l and you have your answer.
Easiest way:
ls | grep .sh > tmp
wc tmp
That will print the number of lines, bytes and charcters of 'tmp' file. But in 'tmp' there's a line for each *.sh file in your working directory. So the number of lines will give an estimated number of shell scripts you have.
wc tmp | awk '{print $1}' # Using awk to filter that output like...
wc -l tmp # Which it returns the number of lines follow by the name of file
But as many people say, the only certain way to know a file is a shell script is by taking a look at the first line an see if there is #!/bin/bash. If you wanna develop it that way, keep in mind:
cat possible_script.x | head -n1 # That will give you the first line.

Shell script: Count number of files in a particular type extension in single folder

I am new with shell script.
I need to save the number of files with particular extension(.properties) in a variable using shell script.
I have used
ls |grep .properties$ |wc -l
but this command prints the number of properties files in the folder. How can I assign this value in a variable.
I have tried
count=${ls |grep .properties$ |wc -l}
But it is showing error like:
./replicate.sh: line 57: ${ls |grep .properties$ |wc -l}: bad substitution
What is this type of errors?
Please anyone help me to save the number of particular files in a variable for future use.
You're using the wrong brackets, it should be $() (command output substitution) rather than ${} (variable substitution).
count=$(ls -1 | grep '\.properties$' | wc -l)
You'll also notice I've use ls -1 to force one file per line in case your ls doesn't do this automatically for pipelines, and changed the pattern to match the . correctly.
You can also bypass the grep totally if you use something like:
count=$(ls -1 *.properties 2>/dev/null | wc -l)
Just watch out for "evil" filenames like those with embedded newlines for example, though my ls seems to handle these fine by replacing the newline with a ? character - that's not necessarily a good idea for doing things with files but it works okay for counting them.
There are better tools to use if you have such beasts and you need the actual file name, but they're rare enough that you generally don't have to worry about it.
You could use a loop with globbing:
count=0
for i in *.properties; do
count=$((count+1))
done
If you are using a shell that supports arrays, you can simply capture all such file names
files=( *.properties )
and then determine the number of array elements
count=${#files[#]}
(The above assumes bash; other shells may require slightly different syntax.)
You'd better use find instead of parsing ls. Then, use the var=$(command) syntax to store the value.
var=$(find . -maxdepth 1 -name "*\.properties" | wc -l)
Reference: Why you shouldn't parse the output of ls.
To solve the problem appearing if any file name contains new lines, you can use what chepner suggests in the comments:
var=$(find . -maxdepth 1 -name "*\.properties" -exec 'echo 1' | wc -l)
so that for every match it will print not the name, but any random character (in this case, 1) and then the amount of them will be counted to produce the correct output.
Use:
count=`ls|grep .properties$ | wc -l`
echo $count
You could write your assignment like this:
count=$(ls -q | grep -c '\.properties$')
or
count=$(ls -qA | grep -c '\.properties$')
if you want to include hidden files.
This works with all kind of filenames because we're using ls with q.
Sure it's easier to link to some webpage that tells you to "never parse ls" than to read the ls manual and see there's a q option (and that most implementations default to q if the output is to a terminal device which explains why some people here state their ls seems to handle filenames with newlines just fine by replacing the newline with a ? character).

Command to list all file types and their average size in a directory

I am working on a specific project where I need to work out the make-up of a large extract of documents so that we have a baseline for performance testing.
Specifically, I need a command that can recursively go through a directory and, for each file type, inform me of the number of files of that type and their average size.
I've looked at solutions like:
Unix find average file size,
How can I recursively print a list of files with filenames shorter than 25 characters using a one-liner? and https://unix.stackexchange.com/questions/63370/compute-average-file-size, but nothing quite gets me to what I'm after.
This du and awk combination should work for you:
du -a mydir/ | awk -F'[.[:space:]]' '/\.[a-zA-Z0-9]+$/ { a[$NF]+=$1; b[$NF]++ }
END{for (i in a) print i, b[i], (a[i]/b[i])}'
Give you something to start, with below script, you will get a list of file and its size, line by line.
#!/usr/bin/env bash
DIR=ABC
cd $DIR
find . -type f |while read line
do
# size=$(stat --format="%s" $line) # For the system with stat command
size=$(perl -e 'print -s $ARGV[0],"\n"' $line ) # #Mark Setchell provided the command, but I have no osx system to test it.
echo $size $line
done
Output sample
123 ./a.txt
23 ./fds/afdsf.jpg
Then it is your homework, with above output, you should be easy to get file type and their average size
You can use "du" maybe:
du -a -c *.txt
Sample output:
104 M1.txt
8 in.txt
8 keys.txt
8 text.txt
8 wordle.txt
136 total
The output is in 512-byte blocks, but you can change it with "-k" or "-m".

Removing last n characters from Unix Filename before the extension

I have a bunch of files in Unix Directory :
test_XXXXX.txt
best_YYY.txt
nest_ZZZZZZZZZ.txt
I need to rename these files as
test.txt
best.txt
nest.txt
I am using Ksh on AIX .Please let me know how i can accomplish the above using a Single command .
Thanks,
In this case, it seems you have an _ to start every section you want to remove. If that's the case, then this ought to work:
for f in *.txt
do
g="${f%%_*}.txt"
echo mv "${f}" "${g}"
done
Remove the echo if the output seems correct, or replace the last line with done | ksh.
If the files aren't all .txt files, this is a little more general:
for f in *
do
ext="${f##*.}"
g="${f%%_*}.${ext}"
echo mv "${f}" "${g}"
done
If this is a one time (or not very often) occasion, I would create a script with
$ ls > rename.sh
$ vi rename.sh
:%s/\(.*\)/mv \1 \1/
(edit manually to remove all the XXXXX from the second file names)
:x
$ source rename.sh
If this need occurs frequently, I would need more insight into what XXXXX, YYY, and ZZZZZZZZZZZ are.
Addendum
Modify this to your liking:
ls | sed "{s/\(.*\)\(............\)\.txt$/mv \1\2.txt \1.txt/}" | sh
It transforms filenames by omitting 12 characters before .txt and passing the resulting mv command to a shell.
Beware: If there are non-matching filenames, it executes the filename—and not a mv command. I omitted a way to select only matching filenames.

Resources