Remove leading and trailing whitespace directories - bash

The problem is that Mac OS X lets folders get names like " Foo Bar /".
This is what I could fix on my own and it works for most of the problems.
for d in */ ; do
echo "$d" | xargs
done
Result: "Foo Bar /".
The only problem it leaves that last space between the directory name and the slash.
Can someone help get rid of that last space?

Try this for a bash-only approach:
dir="some dir /"
fixed=${dir/ \///}
echo $fixed
some dir/
Bash string manipulation:
SUBSTRING REMOVAL BEGINNING OF STRING:
${string#substring} - remove shortest substring match from beginning
${string##substring} - remove longest substring match from beginning
SUBSTRING REMOVAL END OF STRING:
${string%substring} - remove shortest matching substring from tail
${string%%substring} - remove longest matching substring from tail
SUBSTRING REPLACEMENT:
${string/substring/replacement} - replace first match of substring
${string//substring/greplacement} - replace all matches of substring
${string/#substring/replacement} - replace substring at front of string
${string/%substring/replacement} - replace substring at tail of string

What about using sed like this:
name=" Foo Bar /"
echo "$name" | sed 's/^\s*\(.*\)\s*$/\1/g' | tr -s ' '
The sed expression removes all spaces before and after your name, while tr squeezes them to a maximum of one.

The problem is that Mac OS X lets folders get names like " Foo Bar /"
This is a Unix/BASH issue too. All characters except NUL and forward slash are valid characters in file names. This includes control characters like backspace and UTF8 characters.
You can use ${WORD%%filter} and ${WORD##filter} to help remove the extra spaces on the front and end of files. You might also want to substitute white space with an underscore character. Unix shell scripts usually work better if file names don't contain white space:
for file in *
do
new_file=$(tr -s " "<<<"$file")
mv "$file" "$new_file"
done
The tr -s " " uses the tr command to squeeze spaces out of the file name. This also squeezes beginning and ending spaces on Mac OS X too. This doesn't remove tab characters or NLs (although they could be added to the string.
The for file in * does assign the file names correctly to $file. The <<< is a Bashism to allow you to redirect a string as STDIN. This is good because tr only works with STDIN. The $(...) tells Bash to take the output of that command and to interpolate it into the command line.
This is a fairly simply script and may choke on other file names. I suggest you try it like this:
for file in *
do
new_file=$(tr -s " "<<<"$file")
echo "mv '$file' '$new_file' >> output.txt
done
Then, you can examine output.txt to verify that the mv command will work. Once you've verifies output.txt, you can use it as a bash script:
$ bash output.txt
That will then run the mv commands you've saved in output.txt.

Related

Replace exact matching word containing special character

I am trying to replace a word in a string which contains same word with special character in it.
Example:
string="this is a joke. this is a poor-joke. this is a joke-club"
I just want to replace the word joke with coke, not with the special character.
below command replaces all the word joke.
[chandu#mynode ~]$ echo $string | sed "s/joke/coke/g;"
this is a coke. this is a poor-coke. this is a coke-club
I tried using sed "s/\<joke\>/coke/g;"
but even this replaces all the words
Expected output:
this is a coke. this is a poor-joke. this is a joke-club
You can match beginning and ending of the word yourself if you want to include - as word character.
$ sed 's/\(^\|[^a-zA-Z-]\)joke\([^a-zA-Z-]\|$\)/\1coke\2/g' <<<"$string"
this is a coke. this is a poor-joke. this is a joke-club
Using perl and look-around to detect favorable leading (space) and trailing (space or period) characters around the word joke:
$ echo $string | perl -p -e 's/(?<=[ ])joke(?=[. ])/coke/g'
Output.
this is a coke. this is a poor-joke. this is a joke-club
Unfortunately in your case, the hyphen separates the string into different words.
i.e.: if I change your string to:
string='this is a joke. this is a poorjoke. this is a jokeclub'
and I'm running the command:
echo $string | sed 's/\bjoke\b/coke/g'
(where \b stands for: word boundary), I get the following result:
this is a coke. this is a poorjoke. this is a jokeclub
But when I'm applying this same command on your string, I get (as you do):
this is a coke. this is a poor-coke. this is a coke-club
So, in your particular case, I'd try something like:
echo $string | sed 's/\([^-]\)\(joke\)\([^-]\)/\1coke\3/g'
Which produces the following result:
this is a coke. this is a poor-joke. this is a joke-club

Deleting lines beginning with a CR in a file directly

I want to write a ksh script delete all lines of a file beginning by a carriage return. I want to specify that in the same script I want to reuse the modified file so I need to do the modification directly in the file.
For example here is my file in Notepad ++ (with the carriage return shown as CRLF as its a Windows format file):
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CRLF
CE3;CPr3;CRLF
CRLF
CRLF
and I want to obtain:
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CE3;CPr3;CRLF
The script I wrote so far is:
sed -i '/^\n/d' ListeTable.lst
I also tried with \r and \R but nothing is working.
As I specify there is a following script that reuse the modified file that looks like (but there is more):
echo -n "(CE = '$(tail -n 1 ListeTable.lst | cut -d$';' -f1)'and CPr = '$(tail -n 1 ListeTable.lst | cut -d$';' -f2)')"
Ok, so I found a regex that works for this problem : '/^\s*$/d' (\s = match any whitespace character (newlines, spaces, tabs); * = the character may repeat any times or be absent; $ = to the end of the last \s character found)
So the working code is : sed -i '/^\s*$/d' ListeTable.lst

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.
If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"
Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary
Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file
You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.
from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.
You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

Remove pattern in first occurence from right to left in file name in bash

Say I have a string file name aa.bb.cc.xx.txt
I would like to remove the first content between . and . (remove .xx) before the .txt to have aa.bb.cc.txt.
I don't want to use rev, cut and rev because this uses 3 commands
echo 'aa.bb.cc.xx.rpm' |rev | cut -d '.' --complement -s -f 2 |rev
Is there any better solution by using bash?
Thanks
If you know the file ends with .txt, you can remove that as well, then put it back on.
$ oldname=aa.bb.cc.xx.txt
$ echo "${oldname%.*.txt}.txt"
aa.bb.cc.txt
%.*.txt removes the shortest string matching the pattern .*.txt (in this case, .xx.txt).
If the extension could be an arbitrary string, you can save it by removing everything but the extension as a prefix, then restoring it.
$ echo "${oldname%.*.*}.${oldname##*.}"
##*. removes the longest matching prefix ending in ., in this case aa.bb.cc.xx.. Both operators require removing the . that delimits the matched prefix or suffix, which is why you need to add it back explicitly between the two expansions.
You can use sed as follows:
$ echo "aa.bb.cc.xx.txt" | sed "s/.[a-zA-Z].txt/txt/g"
aa.bb.cc.txt
If you want a general sed solution that works on any extension, you can do:
$ echo 'aa.bb.cc.xx.rpm' | sed 's/[^.]*\.\([^.]*\)$/\1/'
aa.bb.cc.rpm

bash display parts of a variable that contains a path

I am trying to output parts of a file path but remove the file name and some levels of the path.
Currently I have a for loop doing a lot of things, but I am creating a variable from the full file path and would like to strip some bits out.
For example
for f in (find /path/to/my/file - name *.ext)
will give me
$f = /path/to/my/file/filename.ext
What I want to do is printf/echo some of that variable. I know I can do:
printf ${f#/path/to/}
my/file/filename.ext
But I would like to remove the filename and end up with:
my/file
Is there any easy way to do this without having to use sed/awk etc?
When you know which level of your path you want, you can use cut:
echo "/path/to/my/filename/filename.ext" | cut -d/ -f4-5
When you want the last two levels of the path, you can use sed:
echo "/path/to/my/file/filename.ext" | sed 's#.*/\([^/]*/[^/]*\)/[^/]*$#\1#'
Explanation:
s/from/to/ and s#from#to# are equivalent, but will help when from or to has slashes.
s/xx\(remember_me\)yy/\1/ will replace "xxremember_meyy" by "remember_me"
s/\(r1\) and \(r2\)/==\2==\1==/ will replace "r1 and r2" by "==r2==r1=="
.* is the longest match with any characters
[^/]* is the longest match without a slash
$ is end of the string for a complete match

Resources