Sed replace unusual file extension arising from gmv - macos

As a result of using gmv on a large nested directory to flatten in, I have a number of duplicate files separated out and with the extensions "._1_" "._2_" etc ( .... ._n_ )
eg "a.pdf.\_1\_"
ie its
a(dot)pdf(dot)(back slash)1(back slash)
as opposed to
a(dot)pdf(dot)1
which I want to reduce it back to "a.pdf"
I tried something like
sed -i .bak "s|.\_1\_||" *
which is usually reliable and doesn't require escape characters. However its giving me
"error: illegal byte sequence"
Grateful for help to fix. This is on Mac OSX terminal. Ideally I'd like a generic solution to fix ._*_ forms where the * varies 1 to 9

There are two challenges here.
How to deal with the duplicate basename (The suffixes '1', '2', ... mostly like added to designate different sections of a single file - may be different pages a PDF, etc. Performing rename that will strip the files may cause some important files to disappear.
How to deal with the "error: illegal byte sequence" which indicate that some special characters (unicode) are part of the file name. Usually ASCII characters with value >= \0xc0, which can not be decoded according to the current local. The fact that the file names are escaped (as per OP "a.pdf.\_1\_" may hint at additional characters, not displayed (assuming this was not added by the OP).
Proposed solution is to rename the file, and place the 'sequence' part, that make the file unique BEFORE the extension, allowing the extension to be used to determine file type.
a.pdf.1 => a.1.pdf
The rename command to perform this task is:
rename 's/(.).pdf.(_._)/$1$2.pdf/' .pdf.__
Adjust the file name list as needed, and use -n to verify before running.

rename -n s/.\_1\_// *.*_1_
works (remove the -n once tested).

Related

Replace incorrectly displayed special chars in bash

I've uploaded a big number of files including their folder structure to my Ubuntu 12.04 LTS Server using WinSCP.
The goal is to access these files in Owncloud.
However, all files that contain special character like German Umlauts cause problems. In Ownclouds view, their name is cut off at the special character and trying to view that folder or file will send you back to the folder root.
Using ls, the special character is always displayed as a question mark, e.g. "Moterschwei?en1.jpg"
What works is manually renaming them through "mv" in the shell. Inserting the special char properly, e.g. "Motorschweißen1.jpg" for this example, does work, but doing this for all of them would take ages.
Using find . -name "?" will not yield any hits.
Is there any way to replace all of those special characters, e.g. with an underscore?
Try the command rename:
rename 'y/\W/_' *
The above command will replace all non alphanumeric characters with _. See http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators and http://perldoc.perl.org/perlre.html#Special-Backtracking-Control-Verbs for the documentation of perl regex expression.

BASH: Replacing special character groups

I have a rather tricky request...
We use a special application which is connected to a oracle database. For control reasons the application uses special characters which are defined by the application and saved in a long field of the database.
My task is to query the long field periodically and check for changes. To do that, I write the content by using a bash script in a file and compare the old and the new file with md5sum.
When there's a difference, I want to send the old file via mail. The problem is, that the old file contains these special characters and I don't know how to replace them with for example a string which describes them.
I tried to replace them on the basis of their ASCII code, but this didn't work. I've also tried to replace them by their appearance in the file. (They look like this: ^P ) This didn't work neither.
When viewing the file by text editor like nano the characters are visible like described above. But when using cat on the file, the content is only displayed until the first appearance of such a control character.
As far as I know there is know possibility to replace them while querying from the database because of the fact that the content is in a LONG field.
I hope you can help me.
Thank you in advance.
Marco
^P is the Control-P character, which is decimal 16 or hexadecimal 0x10, also known as the Data Link Escape (DLE) character in ASCII.
To replace all occurrences of 0x10 in a file with another string we can use our friend gsed:
gsed "s/\x10/Data Link Escape/g" yourfile.txt
This should replace all occurrences of characters containing the hex value 0x10 with the text string "Data Link Escape". You'll probably want to use a different string - this is just an example.
Depending on the system you're using you may be able to use the standard sed command if your version of sed recognizes the \xNN single-character escape codes. If there are multiple hex characters you need to replace you may want to create a file containing your sed commands, one for each hexadecmial character you need to replace, and tell sed or gsed to use the commands in the file - consult the sed or gsed man pages for how to do this.
Share and enjoy.
You can use xxd to change the string to its hex representation, then use xxd -r to convert back.
Or, you can use uuencode and uudecode.
One option is to run the file through cat -v. This replaces nonprinting characters with visible representations (using the ^ notation for control characters):
$ echo $'\x10\x12\x13\x14\x16' | cat -v
^P^R^S^T^V

Testing "framework" for scripts with nonstandard filenames

Here are many comments on some questions (especially for shell) that say basically one or more of the following:
This will fail on file names that contain spaces, newlines, etc,
This will fail if the file is a symbolic link (or not),
This will fail if the $filaneme is a directory and not regular file,
and so on.
While I understand that every script needs its own testing environment, but
these are some common things for what the script should be immune against.
So, my intention is to write a script what will create some directory hierarchy
with "specially crafted" file names for testing purposes.
The question is: what "special" file names are good for this test?
Currently I have (the script creates files and directories) with:
space in the file name
newline in the file name
file name that starts with one of:
- (like command argument)
# (comment char)
! (command history)
file name that contains one of:
| char (pipe)
() chars
* and ? (wildcards)
file name with unicode characters
all above for the directories
symbolic link to the directory
symbolic link to the file
Any other idea what I shouldn't miss?
What comes to my mind:
quotes in the filename single and double
the $ character at the start
several redirection characters like > < << <<<
the ~ char ($HOME)
the ';' (as command delimiter)
backslash in the filename \
basically, go thru ascii table and test all chars, if you think that you need this :)
Some another comments:
If you want test scripts for the stack-overflow questions, you should create one file with the OP's content (calling as the "basic file")
And the all above "special files" should be symlinks to the above basic file. With this method you can easily modify the content of the files (you need change only one - the basic).
Or, if symlinks not a solution for you use hard-links.
Not directly about special characters in the filenames, but it is good care about:
different case filenames, especially for images like image.jpg image.JPG, same filename only different extension
EDIT: Ideas from the comments:
Very long filenames, lots and lots of files, and very deep directory hierarchies (tripleee)

Concatenating strings fails when read from certain files

I have a web application that is deployed to a server. I am trying to create a script that amoing other things reads the current version of the web application from a properties file that is deployed along with the application.
The file looks like this:
//other content
version=[version number]
build=[buildnumber]
//other content
I want to create a variable that looks like this: version-buildnumber
Here is my script for it:
VERSION_FILE=myfile
VERSION_LINE="$(grep "version=" $VERSION_FILE)"
VERSION=${VERSION_LINE#$"version="}
BUILDNUMBER_LINE=$(grep "build=" $VERSION_FILE)
BUILDNUMBER=${BUILDNUMBER_LINE#$"build="}
THEVERSION=${VERSION}-${BUILDNUMBER}
The strange thing is that this works in some cases but not in others.
The problem I get is when I am trying to concatenate the strings (i.e. the last line above). In some cases it works perfectly, but in others characters from one string replace the characters from the other instead of being placed afterwards.
It does not work in these cases:
When I read from the deployed file
If I copy the deployed file to another location and read from there
It does work in these cases:
If I write a file from scratch and read from that one.
If I create my own file and then copy the content from the deployed file into my created file.
I find this very strange. Is there someone out there recognizing this?
It is likely that your files have carriage returns in them. You can fix that by running dos2unix on the file.
You may also be able to do it on the fly on the strings you're retrieving.
Here are a couple of ways:
Do it with sed instead of grep:
VERSION_LINE="$(sed -n "/version=/{s///;s/\r//g;p}" $VERSION_FILE)"
and you won't need the Bash parameter expansion to strip the "version=".
OR
Do the grep as you have it now and do a second parameter expansion to strip the carriage return.
VERSION=${VERSION_LINE#$"version="}
VERSION=${VERSION//$'\r'}
By the way, I recommend habitually using lowercase or mixed case variable names in order to reduce the chance of name collisions.
Given this foo.txt:
//other content
version=[version number]
build=[buildnumber]
//other content
you can extract a version-build string more easily with awk:
awk -F'=' '$1 == "version" { version = $2}; $1 == "build" { build = $2}; END { print version"-"build}' foo.txt
I don't know why your script doesn't work. Can you provide an example of erroneous output?
From this sentence:
In some cases it works perfectly, but in others characters from one string replace the characters from the other instead of being placed afterwards.
I can't understand what's actually going on (I'm not a native English speaker so it's probably my fault).
Cheers,
Giacomo

Replace chars in file by index

I am looking for a reliable method to replace a sequence of chars in a text file. I know that the file will always follow a specific format and that I need to replace a specific range of chars (ie start at char 20, replace the next 11 chars with '#')
I have found several examples using sed and awk which accomplish this on most files. However, the hangup in my case is that the range of chars in the file contain random gibberish chars include several NULL chars. This causes the file commands to stop processing.
I know that the simplest fix would be to go to the process that creates the file and not pad the file with NULL chars. However, the file is generated by a process buried within ancient COBOL running on a mainframe and any changes there require nearly an act of congress.
so, knowing that I am stuck with what I have, is there any way to manipulate the file, from the command line, that can successfully overwrite the NULL chars?
Thanks in advance.
GNU dd can do that
echo '###########'|dd of=FILENAME seek=20 bs=1 count=11 conv=notrunc
Make sure the echo command provides enough characters as input.

Resources