how to get a substring with varied length?

how to get a substring with varied length? - bash

So i am writing a script which gets the substring from the input which is a path to a file (/path/to/file.ext) and if the directory (/path/to) does not exist it will run mkdir -p /path/to and then touch file.ext.
my question is this, how can i use cut to get the /path/to if we have a potentially unknown length of /'s
my script currently looks like this
INPUT=$0
SUBSTRING_PATH=`$INPUT | cut -d'/' -f 2`
if [! -d $SUBSTRING_PATH]; then
mkdir -p $SUBSTRING_PATH
fi
touch $INPUT

Instead of cut, use dirname and basename:
input=/path/to/foo
dir=$(dirname "$input")
file=$(basename "$input")
Now $DIR is /path/to and $FILE is foo.
dirname will also give you a valid directory for relative paths to the working directory (I mean that $(dirname file.txt) is .). This means, for example, that you can write "$dir/some/stuff/foo" without having to worry that you end up in a completely different directory tree (such as /some/stuff rather than ./some/stuff).
As #ruakh mentions in the comments, if you didn't have a directory but a string of tokens of which you wanted to discard the last (a line of a csv file, perhaps), one way to do it would be "${input%,*}", where the comma can be replaced by any delimiter. To my knowledge this is a bash extension. I only edit this in because a stray visitor in the future might have better luck seeing it here than in the comments; for your particular use case, dirname and basename are a better fit.

Related

How can I create a rename script using multiple rules?

I constantly get a bunch of files named "Unknown.png" into a folder, and often times they get renamed "unknown (1).png, unknown (2).png" etc. This is a bit of a problem as sometimes when cleaning up files and moving them somewhere else I get asked if I want to replace or rename, etc.
So I decided to make a crontab task that renames the files to CB_RANDOM this way I don't even have to worry about potentially overwriting two files with the same name.
I could figure it so far, I find the files, replace the name Unknown to CB_ and add a random number.
the problem comes to (x) at the end of the filename. I managed to figure out also how to solve it I just strip away any parenthesis and numbers.
The problem is I can't figure out how to make the rename function to follow both rules.
for u in (find -name unknown*); do
rCode = random
rename -v 's/unknown/CB_$rCode' $u
rename -v 's/[ ()0123456789]//g' $u
Ideally I'd like to be able to follow both rules on the same line of code, specially since once it runs the first line, then $u wont be able to find the file for the second step.

No need for a loop:
find -name 'unknown*' -exec rename 's/unknown \([0-9]+\)\.(.*)$/"CB_".sprintf("%04s",int(rand(10000))).".".$1/e' {} \;
find all the files, starting in the current directory, recursively, with names similar to "unknown (1).png"
rename them with a resulting filename similar to "CB_0135.png"
This produces an error message if a filename already exists.

Your code should first be changed into
# find is a subcommand, use $()
# find a file with wildcard, use quotes
for u in $(find -name "unknown*"); do
# Is random a command? Use $()
rCode=$(random)
# Debug with echo, will show other problem
echo "File $u"
# $rCode will not be replaced by its value in single quotes
# Write a filename in double quotes, so it will not be split by a space
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done
The new line with echo shows that the loop is breaking up the filenames at the spaces. You can change this in
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
echo "File $u"
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done < <(find -name "unknown*")
I never use rename and would use
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
# construct new filename.
# Restriction: Path to file is without newlines, spaces or parentheses
newfile=$(sed 's/[ ()]//g; s/.*unknown/&_'"${rCode}"'_/' <<< "$u")
echo "Moving file $u to ${newfile}"
mv "$u" to "${newfile}"
done < <(find -name "unknown*")
EDIT:
I removed a sed command for renaming files with (something) in it:
# Removed command
newfile=$(sed 's/\(.*\)(\(.*\))/\1'"${rCode}"'_\2/' <<< "$u")

Scripting for file management with a very large amount of files

I have a three OSX machine setup that was using syncthing to keep shared drives synchronized remotely. Someone made some mistakes and a lot of files ended up getting renamed.
So all throughout this drive I have situations where there's a file of size 0KB named,for example, file.jpg and another file with real size named
file.sync-confilct201705-4528.jpg. I need to search the entire drive recursively and while I find a file with the sync-conflict string in it, check to see if there is the same file without the 'sync-conflict' string along with a size of 0KB. If there is, I need to rename the sync-conflict file to overwrite the 0KB file.
I have considered tackling this with a bash script or a Perl script. Using bash I think just using the 'find' command with -regex would get me started but I don't really know how to process the results and run the next find test. I am studying and working on it.
Same problem with Perl. I can get through the first step using File::Find:find and select what I need using regex to filter out the files, but there again I am stuck getting to the next step, which would be finding the original file in the same directory and performing the necessary file move function.
In both of these cases I am willing to put in the time to figure it out, but I wonder what the caveats will be? Can both of these scenarios handle recursing a large number of files without exception? Is there perhaps a better approach anyone can recommend?

One good tool in Perl for this is File::Find::Rule.
Find all sync-conflict files, then test whether corresponding files exist and are zero size
use warnings;
use strict;
use FindBin qw($RealBin);
use File::Copy qw(move);
use File::Find::Rule;
my $dir = shift || '.'; # top of hierarchy to search (from command line, or ./)
my #conflict_files = File::Find::Rule
->file->name('*sync-conflict*.jpg')->in($dir);
foreach my $conflict (#conflict_files)
{
my ($file) = $conflict =~ m|(.*)\.sync-conflict|;
$file .= '.jpg';
if (-z "$RealBin/$file") {
print "Rename $conflict to $file\n"
#move($conflict, $file) or warn "Can't move $conflict to $file: $!";
}
}
This builds the file's name file for each file.sync-conflict file and applies -z file test (-X), which tests for both existence and zero size. Then it renames the file using the core File::Copy.
Note that file-test operators need the full path while File::Find::Rule returns the path relative to the $dir it searches. I use $RealBin provided by FindBin, which is the path to the directory where the script was started with all links resolved, to build the full path for -z.
Uncomment the move line after sufficient testing (and with having made a backup first).
The code makes some assumptions about file names, please adjust as needed.
The $dir supplied on the command line is expected to be relative to the script's directory.

find is great. But as you've noted, you need more.
What find gets you in this scenario is the ability to search recursively and match certain patterns. As it happens as of Bash version 4, you can do that right in the shell.
(Note that macOS ships with bash version 3, so for this solution, you'll need to install bash 4 from Macports, Homebrew or Fink.)
$ shopt -s globstar nullglob
$ for file in **/*sync-confilct2017*.*; do echo mv -v "$file" "${file%sync-conf*}${file##*.}"; done
mv -v file.sync-confilct201705-4528.jpg file.jpg
mv -v foo/bar.sync-confilct201705-4528.ext foo/bar.ext
You can remove the echo to actually run the mv command.
The way this works is that the double asterisk, **, is treated by bash like a * that recurses. We're using parameter expansion to strip the parts of the filename we want in order to construct the "target" filename.

Create a function to fix the name:
$ function fixname() { file="$1"; newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" ); if [ -f "$newname" -a ! -s "$newname" ]; then mv "$file" "$newname"; fi; }
Or, spread out a bit:
function fixname() {
file="$1"
newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" )
# If empty file exists
if [ -f "$newname" -a ! -s "$newname" ]; then
mv "$file" "$newname"
fi
}
Export the function:
$ export -f fixname
Run find to execute the function:
$ find . -type f -name \*sync-conflict\*.jpg -exec bash -c 'fixname {}' bash \;
Caveat: It will not work with spaces or funky characters in the filenames.

In shell, how do I delete numbered duplicate files?

I've got a directory with a few thousand files in it, named things like:
filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.
Most of the files with bracketed numbers are duplicates of the original, but in some cases they're not.
How can I keep my original files, delete the duplicates, but not lose the files that are different?
I know that I could rm *\).ext, but that obviously doesn't make sure that files match the original.
I'm using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).
And I don't know what to write in the script. I was thinking of storing things in an array with this:
awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'
But that always seems to delete my original.
What am I doing wrong? How do I do it right?
Thanks.

Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that's seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.
Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.
I wouldn't do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.
Instead:
$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done
You can expand this to check MD5's along the way. But it's more code, so I'll break it into multiple lines, in a script:
#!/bin/bash
shopt -s nullglob # Show nothing if a fileglob matches no files
for file in *[^\)].ext; do
md5=$(md5 -q "$file") # The -q option gives you only the message digest
echo "-- Found: $file ($md5)"
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
rm -v "$duplicate"
fi
done
done
As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:
#!/bin/bash
shopt -s nullglob
for file in *[^\)].ext; do
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
  if cmp "$file" "$duplicate"; then
rm -v "$file"
fi
done
done

If you don't need to use AWK, you could maybe do something simpler in bash:
for file in *\([0-9]*\)*; do
[ -e "$(echo "$file" | sed -e 's/ ([0-9]\+)//')" ] && rm "$file"
done
Hope this helps a little =)

Shell script: execute cmd on a file, with additional processing of file name

So I am going to post a question about shell scripting again.
Problem Definition: For all files under a dir, ex.:
A_anything.txt, B_anything.txt, ......
I want to execute a script, say 'CMD', on each of them, with the output files named like:
A_result.txt, B_result.txt, ......
In addition, at the first line of these output file, I want to have the file name of the original one.
The 'find -exec' util seems to me unable to extract part of the file name.
Does someone know a solution to this problem, by any means(shell, python, find,etc)? Thank you!

cd /directory
for file in *.txt ; do
newfilename=`echo "$file"|sed 's/\(.\+\)_.*/\1_result.txt/`
echo "$file" > "$newfilename"
your-command $file >> "$newfilename"
done
HTH

Well, there's more than one way to do it (including using Perl, where that's the motto), but probably I'd write it like this:
find . -name '[A-Z]_*.txt' -type f -print0 |
xargs -0 modify_rename.sh
And then I'd write the script modify_rename.sh like this:
#!/bin/sh
for file in "$#"
do
dirname=$(dirname "$file")
basename=$(basename "$file" .txt)
leadname=${file%_*}
outname="$dirname/${leadname}_result.txt"
# Optionally check for pre-existence of $outname
{
# Optionally echo "$basename.txt" instead of "$file"
echo "$file"
# Does this invocation of CMD write to standard output?
# If not, adjust invocation appropriately.
CMD "$file"
} > "$outname"
done
The advantage of this separation into separate scripting operations is that the rename/modify operation can be checked out separately from the search process - which runs less risk of zapping your entire directory structure with bad commands.
Bash has the tools to avoid invoking basename and dirname but the notation is moderatly excruciating; I find the clarity of the command names worth having. I'd be happy if bash implemented them as built-ins. There are plenty of other ways to get the prefix of the file; this should be safe, though, even in the presence of spaces (tabs, newlines) in file or directory names because of the careful use of double quotes.

Batch renaming files with Bash

How can Bash rename a series of packages to remove their version numbers? I've been toying around with both expr and %%, to no avail.
Examples:
Xft2-2.1.13.pkg becomes Xft2.pkg
jasper-1.900.1.pkg becomes jasper.pkg
xorg-libXrandr-1.2.3.pkg becomes xorg-libXrandr.pkg

You could use bash's parameter expansion feature
for i in ./*.pkg ; do mv "$i" "${i/-[0-9.]*.pkg/.pkg}" ; done
Quotes are needed for filenames with spaces.

If all files are in the same directory the sequence
ls |
sed -n 's/\(.*\)\(-[0-9.]*\.pkg\)/mv "\1\2" "\1.pkg"/p' |
sh
will do your job. The sed command will create a sequence of mv commands, which you can then pipe into the shell. It's best to first run the pipeline without the trailing | sh so as to verify that the command does what you want.
To recurse through multiple directories use something like
find . -type f |
sed -n 's/\(.*\)\(-[0-9.]*\.pkg\)/mv "\1\2" "\1.pkg"/p' |
sh
Note that in sed the regular expression grouping sequence is brackets preceded by a backslash, \( and \), rather than single brackets ( and ).

I'll do something like this:
for file in *.pkg ; do
mv $file $(echo $file | rev | cut -f2- -d- | rev).pkg
done
supposed all your file are in the current directory. If not, try to use find as advised above by Javier.
EDIT: Also, this version don't use any bash-specific features, as others above, which leads you to more portability.

We can assume sed is available on any *nix, but we can't be sure
it'll support sed -n to generate mv commands. (NOTE: Only GNU sed does this.)
Even so, bash builtins and sed, we can quickly whip up a shell function to do this.
sedrename() {
if [ $# -gt 1 ]; then
sed_pattern=$1
shift
for file in $(ls $#); do
mv -v "$file" "$(sed $sed_pattern <<< $file)"
done
else
echo "usage: $0 sed_pattern files..."
fi
}
Usage
sedrename 's|\(.*\)\(-[0-9.]*\.pkg\)|\1\2|' *.pkg
before:
./Xft2-2.1.13.pkg
./jasper-1.900.1.pkg
./xorg-libXrandr-1.2.3.pkg
after:
./Xft2.pkg
./jasper.pkg
./xorg-libXrandr.pkg
Creating target folders:
Since mv doesn't automatically create target folders we can't using
our initial version of sedrename.
It's a fairly small change, so it'd be nice to include that feature:
We'll need a utility function, abspath (or absolute path) since bash
doesn't have this build in.
abspath () { case "$1" in
/*)printf "%s\n" "$1";;
*)printf "%s\n" "$PWD/$1";;
esac; }
Once we have that we can generate the target folder(s) for a
sed/rename pattern which includes new folder structure.
This will ensure we know the names of our target folders. When we
rename we'll need to use it on the target file name.
# generate the rename target
target="$(sed $sed_pattern <<< $file)"
# Use absolute path of the rename target to make target folder structure
mkdir -p "$(dirname $(abspath $target))"
# finally move the file to the target name/folders
mv -v "$file" "$target"
Here's the full folder aware script...
sedrename() {
if [ $# -gt 1 ]; then
sed_pattern=$1
shift
for file in $(ls $#); do
target="$(sed $sed_pattern <<< $file)"
mkdir -p "$(dirname $(abspath $target))"
mv -v "$file" "$target"
done
else
echo "usage: $0 sed_pattern files..."
fi
}
Of course, it still works when we don't have specific target folders
too.
If we wanted to put all the songs into a folder, ./Beethoven/ we can do this:
Usage
sedrename 's|Beethoven - |Beethoven/|g' *.mp3
before:
./Beethoven - Fur Elise.mp3
./Beethoven - Moonlight Sonata.mp3
./Beethoven - Ode to Joy.mp3
./Beethoven - Rage Over the Lost Penny.mp3
after:
./Beethoven/Fur Elise.mp3
./Beethoven/Moonlight Sonata.mp3
./Beethoven/Ode to Joy.mp3
./Beethoven/Rage Over the Lost Penny.mp3
Bonus round...
Using this script to move files from folders into a single folder:
Assuming we wanted to gather up all the files matched, and place them
in the current folder, we can do it:
sedrename 's|.*/||' **/*.mp3
before:
./Beethoven/Fur Elise.mp3
./Beethoven/Moonlight Sonata.mp3
./Beethoven/Ode to Joy.mp3
./Beethoven/Rage Over the Lost Penny.mp3
after:
./Beethoven/ # (now empty)
./Fur Elise.mp3
./Moonlight Sonata.mp3
./Ode to Joy.mp3
./Rage Over the Lost Penny.mp3
Note on sed regex patterns
Regular sed pattern rules apply in this script, these patterns aren't
PCRE (Perl Compatible Regular Expressions). You could have sed
extended regular expression syntax, using either sed -r or sed -E
depending on your platform.
See the POSIX compliant man re_format for a complete description of
sed basic and extended regexp patterns.

Here is a POSIX near-equivalent of the currently accepted answer. This trades the Bash-only ${variable/substring/replacement} parameter expansion for one which is available in any Bourne-compatible shell.
for i in ./*.pkg; do
mv "$i" "${i%-[0-9.]*.pkg}.pkg"
done
The parameter expansion ${variable%pattern} produces the value of variable with any suffix which matches pattern removed. (There is also ${variable#pattern} to remove a prefix.)
I kept the subpattern -[0-9.]* from the accepted answer although it is perhaps misleading. It's not a regular expression, but a glob pattern; so it doesn't mean "a dash followed by zero or more numbers or dots". Instead, it means "a dash, followed by a number or a dot, followed by anything". The "anything" will be the shortest possible match, not the longest. (Bash offers ## and %% for trimming the longest possible prefix or suffix, rather than the shortest.)

I find that rename is a much more straightforward tool to use for this sort of thing. I found it on Homebrew for OSX
For your example I would do:
rename 's/\d*?\.\d*?\.\d*?//' *.pkg
The 's' means substitute. The form is s/searchPattern/replacement/ files_to_apply. You need to use regex for this which takes a little study but it's well worth the effort.

better use sed for this, something like:
find . -type f -name "*.pkg" |
sed -e 's/((.*)-[0-9.]*\.pkg)/\1 \2.pkg/g' |
while read nameA nameB; do
mv $nameA $nameB;
done
figuring up the regular expression is left as an exercise (as is dealing with filenames that include spaces)

This seems to work assuming that
everything ends with $pkg
your version #'s always start with a "-"
strip off the .pkg, then strip off -..
for x in $(ls); do echo $x $(echo $x | sed 's/\.pkg//g' | sed 's/-.*//g').pkg; done

I had multiple *.txt files to be renamed as .sql in same folder.
below worked for me:
for i in \`ls *.txt | awk -F "." '{print $1}'\` ;do mv $i.txt $i.sql; done

Thank you for this answers. I also had some sort of problem. Moving .nzb.queued files to .nzb files. It had spaces and other cruft in the filenames and this solved my problem:
find . -type f -name "*.nzb.queued" |
sed -ne "s/^\(\(.*\).nzb.queued\)$/mv -v \"\1\" \"\2.nzb\"/p" |
sh
It is based on the answer of Diomidis Spinellis.
The regex creates one group for the whole filename, and one group for the part before .nzb.queued and then creates a shell move command. With the strings quoted. This also avoids creating a loop in shell script because this is already done by sed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to get a substring with varied length? - bash

Related

How can I create a rename script using multiple rules?

Scripting for file management with a very large amount of files

In shell, how do I delete numbered duplicate files?

Shell script: execute cmd on a file, with additional processing of file name

Batch renaming files with Bash

Categories

Resources