bash display parts of a variable that contains a path - bash

I am trying to output parts of a file path but remove the file name and some levels of the path.
Currently I have a for loop doing a lot of things, but I am creating a variable from the full file path and would like to strip some bits out.
For example
for f in (find /path/to/my/file - name *.ext)
will give me
$f = /path/to/my/file/filename.ext
What I want to do is printf/echo some of that variable. I know I can do:
printf ${f#/path/to/}
my/file/filename.ext
But I would like to remove the filename and end up with:
my/file
Is there any easy way to do this without having to use sed/awk etc?

When you know which level of your path you want, you can use cut:
echo "/path/to/my/filename/filename.ext" | cut -d/ -f4-5
When you want the last two levels of the path, you can use sed:
echo "/path/to/my/file/filename.ext" | sed 's#.*/\([^/]*/[^/]*\)/[^/]*$#\1#'
Explanation:
s/from/to/ and s#from#to# are equivalent, but will help when from or to has slashes.
s/xx\(remember_me\)yy/\1/ will replace "xxremember_meyy" by "remember_me"
s/\(r1\) and \(r2\)/==\2==\1==/ will replace "r1 and r2" by "==r2==r1=="
.* is the longest match with any characters
[^/]* is the longest match without a slash
$ is end of the string for a complete match

Related

sed: Can't replace latest text occurrence including "-" dashes using variables

Trying to replace a text to another with sed, using a variable. It works great until the variable's content includes a dash "-" and sed tries to interpret it.
It is to be noted that in this context, I need to replace only the latest occurrence of the origin variable ${src}, which is why my sed command looks like this:
sed -e "s:${source}([^${source}]*)$:${dest}\1:"
"sed" is kind of new to me, I always got my results with "replace" or "awk" whenever possible, but here I'm trying to make the code as versatile as possible, hence using sed. If you think of another solution, that is viable as well.
Example for the issue:
# mkdir "/home/youruser/TEST-master"
# source="TEST-master" ; dest="test-master" ; find /home/youruser/ -depth -type d -name '*[[:upper:]]*' | grep "TEST" | sed -e "s:${source}([^${source}]*)$:${dest}\1:"
sed: -e expression #1, char 46: Invalid range end
Given that I don't know how many dashes every single variable may contain, does any sed expert know how could I make this work?
Exact context: Open source project LinuxGSM for which I'm rewriting a function to recursively lowercase files and directories.
Bash function I'm working on and comment here: https://github.com/GameServerManagers/LinuxGSM/issues/1868#issuecomment-996287057
If I'm understanding the context right, the actual goal is to take a path that contains some uppercase characters in its last element, and create a version with the last element lowercased. For example, /SoMe/PaTh/FiLeNaMe would be converted to /SoMe/PaTh/filename. If that's the case, rather than using string substitution, use dirname and basename to split it into components, uppercase the last, then reassemble it:
parentdir=$(dirname "$src")
filename=$(basename "$src")
lowername=$(echo "${latestpath}" | tr '[:upper:]' '[:lower:]')
dst="$parentdir/$lowername"
(Side note: it's important to quote the parameters to tr, to make sure the shell doesn't treat them as filename wildcards and replace them with lists of matching files.)
As long as the paths contain at least one "/" but not end with "/", you can use bash substitutions instead of dirname and basename:
parentdir="${src%/*}"
filename="${src##*/}"
As long as you're using bash v4.0 or later, you can also use a builtin substitution to do the lowercasing:
lowername="${filename,,}"

UNIX change all the file extension for a list of files

I am a total beginner in this area so sorry if it is a dumb question.
In my shell script I have a variable named FILES, which holds the path to log files, like that:
FILES="./First.log ./Second.log logs/Third.log"
and I want to create a new variable with the same files but different extension, like that:
NEW_FILES="./First.txt ./Second.txt logs/Third.txt"
So I run this command:
NEW_FILES=$(echo "$FILES" | tr ".log" ".txt")
But I get this output:
NEW_FILES="./First.txt ./Secxnd.txt txts/Third.txt"
# ^^^
I understand the . character is a special character, but I don't know how I can escape it. I have already tried to add a \ before the period but to no avail.
tr replaces characters with other characters. When you write tr .log .txt it replaces . with ., l with t, o with x, and g with t.
To perform string replacement you can use sed 's/pattern/replacement/g', where s means substitute and g means globally (i.e., replace multiple times per line).
NEW_FILES=$(echo "$FILES" | sed 's/\.log/.txt/g')
You could also perform this replacement directly in the shell without any external tools.
NEW_FILES=${FILES//\.log/.txt}
The syntax is similar to sed, with a global replacement being indicated by two slashes. With a single slash only the first match would be replaced.
tr is not the tool you need. The goal of tr is to change characters on a 1-by-1 basis. You probably did not see it, but Second must have been changed to Secxnd.
I think sed is better.
NEW_FILES=$(sed 's/\.log/.txt/g' <<< $FILES)
It searches the \.log regular expression and replaces it with the .txt string. Please note the \. in the regex which means that it matches the dot character . and nothing else.

How to get last part of string in substring by variable number of delimiters

I am struggeling a bit.
As I write a ksh script I need to extract a Substring of a String where the number of occurances of the dilimiter is flexible.
This is, as my String holds the name of a file which might be compressed several times, therefore having more than 1 point (.).
These points would be my delimiter, but as the supplier might include version numbers into the name of the file (e.g. software-v.3.2.4.tar.gz), I find no was to cut off the last suffix.
The progress is as follows:
Filename is saved in variable.
File is decompressed first time. (taking the .gz suffix away of the file)
Now I need to extract the .tar archive. But my command would still be holding the .gz suffix. Command would not work as the file has the suffix no more.
How do I get the suffix of my variable.
I can not guarantee that the numbers of delimiters stay the same.
I tried several combinations of | rev | cur -d'.' | rev, but in this case I only get the suffix.
I aswell tried initialize the $fileName variable again with the actual name of the file, but therefore I would need to search the whole directory. I try to avoid that.
...
fileName="whole file name"
pathTo="path in which it should be moved after decompression"
if [ "$fileType" = "gz" ]; then
gzip $pathTo$fileName -d $pathTo
#Problem occurs right here
tar xfv $pathTo$fileName -C $pathTo
else
echo "Unknown Filetype. Cannot decompress. Script stopped."
exit
fi
...
I am thankful for any help.
Greets
Yonkske
Don't use | rev | cut -d'.' -f 1 | rev but use | rev | cut -d'.' -f 2- | rev
Variable Substitution is the best choose here, as it is a shell built in it also the fastest.
filename='software-v.3.2.4.tar.gz'
echo ${filename##*.}
Output will be gz
This will not modify the value of the variable $filename.
if [[ "${filename##*.}" == "gz" ]]; then
How it work,
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest
There is more but this is the relevant ones here.
The Limitation of this they cannot be Nested.

Ubuntu: Terminal remove part of string

I want to use the Ubuntu Terminal to rename several hundred files in a folder since I am not allowed to install anything.
The name of the files is in the following format:
ER201703_Company_Name_Something_9876543218_90087625374823.csv
Afterwards it should look like this:
ER201703_9876543218_90087625374823.csv
So, I want to remove the middle part (Company_name_something) which sometimes has 2, 3 or even 4 _'s. I wanted to create 2 strings; one for the front part and one for the back part. The front part is easy and already working but I am struggeling with the back part.
for name in *.csv;
do
charleng=${#name};
start=$(echo "$name" | grep -a '_9');
back=$(echo "$name" | cut -c $start-);
front=$(echo "$name" | cut -c1-9);
mv "$name""$front$back";
done
I am trying to find the position of _9 and keep everything from there to the end of the string.
Best regards
Jan
If rename is installed (I think that's the case for Ubuntu) you can use the following command instead of your loop.
rename -n 's/^(ER\d*)\w*?(_9\w*)/$1$2/' *.csv
Remove the -n (no act) to apply the changes.
Explanation
s/.../.../ substitutes matches of the left regex with the right pattern.
(ER\d*) matches the first part (ER followed by some digits) and stores it inside $1 for later use.
\w*? matches the company part, that is as few (non-greedy) word characters (letters, numbers, underscore, ...) as possible.
(_9\w*) matches the second part and stores it inside $2 for later use.
$1$2 is the substitution of the previously matched parts. We only omit the company part.
awk -F'_' '{printf "mv %s %s_%s_%s\n",$0,$1,$(NF-1),$NF}'
Example:
kent$ awk -F'_' '{printf "mv %s %s_%s_%s\n",$0,$1,$(NF-1),$NF}' <<<"ER201703_Company_Name_Something_9876543218_90087625374823.csv"
mv ER201703_Company_Name_Something_9876543218_90087625374823.csv ER201703_9876543218_90087625374823.csv
This one-liner will print out the mv old new command. If it is ok, you just pipe the output to |sh, (awk ....|sh), the rename will be done.
If your filename can contain spaces, pls consider to quote the filenames by double quotes.
I can offer alternative solution, may be more generic.
rename 's/^([^_]+(?=_))(?:\w+(?=_\d+))(_\d+_\d+\.csv)$/$1$2/' *.csv
in a case the name of the log will change you want to have robust regex expression.
([^_]+(?=_)) - match everything that not underscore till the first one and store it to $1
(?:\w+(?=_\d+)) - match chars until the numbers but (?:...) not store to var
(_\d+_\d+\.csv) - match set of numbers and file extension and store it to $2

Remove pattern in first occurence from right to left in file name in bash

Say I have a string file name aa.bb.cc.xx.txt
I would like to remove the first content between . and . (remove .xx) before the .txt to have aa.bb.cc.txt.
I don't want to use rev, cut and rev because this uses 3 commands
echo 'aa.bb.cc.xx.rpm' |rev | cut -d '.' --complement -s -f 2 |rev
Is there any better solution by using bash?
Thanks
If you know the file ends with .txt, you can remove that as well, then put it back on.
$ oldname=aa.bb.cc.xx.txt
$ echo "${oldname%.*.txt}.txt"
aa.bb.cc.txt
%.*.txt removes the shortest string matching the pattern .*.txt (in this case, .xx.txt).
If the extension could be an arbitrary string, you can save it by removing everything but the extension as a prefix, then restoring it.
$ echo "${oldname%.*.*}.${oldname##*.}"
##*. removes the longest matching prefix ending in ., in this case aa.bb.cc.xx.. Both operators require removing the . that delimits the matched prefix or suffix, which is why you need to add it back explicitly between the two expansions.
You can use sed as follows:
$ echo "aa.bb.cc.xx.txt" | sed "s/.[a-zA-Z].txt/txt/g"
aa.bb.cc.txt
If you want a general sed solution that works on any extension, you can do:
$ echo 'aa.bb.cc.xx.rpm' | sed 's/[^.]*\.\([^.]*\)$/\1/'
aa.bb.cc.rpm

Resources