Easiest way to compare two file lists in bash? - bash

I have a directory with files frame* (schema) and input_frame* (schema), where frame and input_frame are prefixes for two different types of files. If one takes just the characters after the prefixes and compares the two file lists, then the set of files frame* is always a subset of the set input_frame*.
I'd like to remove the files in input_frame* that don't have an equivalent member in frame*. Is there an easy way to do this in bash?

You can use:
for f in input_frame*; do [[ ! -f "${f#input_}" ]] && echo rm "$f"; done
Once you're satisfied with the output remove echo.

Something like this (test it first by using echo instead of rm):
for i in input_frame*;
if [ ! -e ${i/input_/} ]; then
rm $i
fi
done

Since frame is itself a suffix of input_frame, you can accomplish this with a simple use of the # parameter expansion operator.
for f in input_frame*; do
[[ -f "${f#input_}" ]] || rm "$f"
done
For example, if f is input_frame97, then ${f#input_} expands to frame97. Just check if the modified file name exists, and if not, remove the original.

Related

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

How to recursively copy files while removing part of the path

I have a hundreds of image files in a structure like this:
path/to/file/100/image1.jpg
path/to/file/9999/image765.jpg
path/to/file/333/picture2.jpg
I'd like to remove the 4th part of the path (100,9999,333, ...) so that I get this:
path/to/file/image1.jpg
path/to/file/image765.jpg
path/to/file/picture2.jpg
In this case the image file names have no duplicates and the the target directory could be named entirely different if this makes things easier (e.g. target could be "another/path/to/the/images/image1.jpg"
The solution might be some combination of find/cut/rename command.
How can I do this in bash?
Since you only have "hundreds" of files, it's quite possible that you don't need to do anything special, and can just write:
mv path/to/file/*/*.jpg path/to/file/
But depending on the number of files and lengths of their names, this may turn out to be more than the kernel will let you pass to a single command, in which case you may need to write a for-loop instead:
for file in path/to/file/*/*.jpg ; do
mv "$file" path/to/file/
done
(Of course, this assumes you have mv on your path. There's no Bash builtin for renaming a file, so any approach will depend on what else is available on your system. If you don't have mv, you'll need to adjust the above accordingly.)
I recommend using ruakh's solution if it will work, but if you need to explicitly test for those numeric directories, here's an alternative.
I'm just using echo to pipe the list of names in, and to show the mv at the end, but you could use find (example in a comment) and remove the echo on the mv to make it live.
IFS=/
echo "path/to/file/100/image1.jpg
path/to/file/9999/image765.jpg
path/to/file/333/picture2.jpg" |
# find path/to/file -name "*.jpg" |
while read -r orig
do this=""
read -a line <<< "$orig"
for sub in "${line[#]}"
do if [[ "$sub" =~ ^[0-9]+$ ]]
then continue
else this="$this$sub/"
fi
done
old="${line[*]}"
echo mv "$old" "${this%/}"
done
mv path/to/file/100/image1.jpg path/to/file/image1.jpg
mv path/to/file/9999/image765.jpg path/to/file/image765.jpg
mv path/to/file/333/picture2.jpg path/to/file/picture2.jpg

bash rename files with prefix serial number

I have loads of files in a folder. I want to do two things:
prefix them with xxx three digit serial numbers - ascending: 001 002 and so on
remove the prefix from their names, so 001a.xyz = a.xyz
I intend to do this using a simple bash script. What's the most elegant and simple to understand way to do this?
edit
the files are on a removable device, and I cannot seem to set chmod +X on the script on the device. So how do I run a script from my home directory which will change the files in another directory?
To add prefixes:
counter=1
for f in *; do
printf -v prefix_str '%03d' "$((counter++))"
mv "$f" "${prefix_str}$f"
done
To remove prefixes (caution -- this may overwrite if you have two files with the same suffix but different prefixes):
for f in [0-9][0-9][0-9]*; do
mv "$f" "${f:3}"
done
Use mv -n to avoid overwriting when two files have the same suffix.
This should work:
#!/bin/bash
count=1
for file in *; do
if [[ $file =~ [0-9][0-9][0-9].* ]]; then
sfile="${file:3}"
new=$(printf "%03d" ${count})
mv "$file" "${new}${sfile}"
((count++))
else
new=$(printf "%03d" ${count})
mv "$file" "${new}${file}"
((count++))
fi
done
What this script does is, checks for a given file in the current directory. If the file has a prefix already it will remove it and assign a new sequential prefix. If the file has no prefix it will add a sequential prefix to it.
The end result should be, all the files in your current directory (some with and some without prefixes) will have a new sequential prefixes.

How can I grep contents of files with bash only without using find or grep -r?

I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt

bash filename start matching

I've got a simple enough question, but no guidance yet through the forums or bash. The question is as follows:
I want to add a prefix string to each filename in a directory that matches *.h or *.cpp. HOWEVER, if the prefix has already been applied to the filename, do NOT apply it again.
Why the following doesn't work is something that has yet to be figured out:
for i in *.{h,cpp}
do
if [[ $i!="$pattern*" ]]
then mv $i $pattern$i
fi
done
you can try this:
for i in *.{h,cpp}
do
if ! ( echo $i | grep -q "^$pattern" )
# if the file does not begin with $pattern rename it.
then mv $i $pattern$i
fi
done
Others have shown replacements comparisons that work; I'll take a stab at why the original version didn't. There are two problems with the original prefix test: you need spaces between the comparison operator (!=) and its operands, and the asterisk was in quotes (meaning it gets matched literally, rather than as a wildcard). Fix these, and (at least in my tests) it works as expected:
if [[ $i != "$pattern"* ]]
#!/bin/sh
pattern=testpattern_
for i in *.h *.cpp; do
case "$i" in
$pattern*)
continue;;
*)
mv "$i" "$pattern$i";;
esac
done
This script will run in any Posix shell, not just bash. (I wasn't sure if your question was "why isn't this working" or "how do I make this work" so I guessed it was the second.)
for i in *.{h,cpp}; do
[ ${i#prefix} = $i ] && mv $i prefix$i
done
Not exactly conforming to your script, but it should work. The check returns true if there is no prefix (i.e. if $i, with the prefix "prefix" removed, equals $i).

Resources