Special character "à" not recognized in a path in Windows - windows

EDIT : my original post was about PostgreSQL since I thought there lied the problem. But after the investigation allowed by the comments, it is not a PostgreSQL case, but a more general one related to the Windows' command prompt.
I am trying to use the COPY command to copy a selection in a CSV file in PostgreSQL. For some reason, it is not working. At first I thought it was because of the blank spaces in the names of the folders, but it worked fine when I tested it in another case. Then I thought about the length of the path, but it seems I am far from the size limit.
After further testings, it seems the problem is more related to the fact that my path contains a french special character "à", which is not recognized in the command prompt.
Here is the path :
'\\10.000.000.00\data\SIEA\_0 Mise à niveau des boites\Zone NRA de Ferney\Ornex 04-05\APD_e\06_Fichier Adresse - IPE\ZAPM_01281_00004_ADDR_ETR.csv'
I tried to replace the character by some others, according to what I found online, but without success, possibly because I didn't find the proper character.
Does anyone have a clue on how to solve this ?
Thanks

Related

How to archive an entire FTP server where many of the filenames seem to include illegal characters

I am trying to use wget -m <address> to download the contents of an FTP server. A lot of the content is icelandic and so contains a bunch of weird characters that I think are causing issues as I keep seeing:
Incomplete or invalid multibyte sequence encountered
I have tried adding flags such as --restrict-file-names=nocontrol but to no avail.
I have also tried using lftp but doesn't seem to make any difference.
According to wget manual
If you specify ‘nocontrol’, then the escaping of the control
characters is also switched off.
that is it as actually more permissive than default, bunch of weird characters suggest you have some issues with getting encoding right and therefore ascii seems to be best fit for your use case
The ‘ascii’ mode is used to specify that any bytes whose values are
outside the range of ASCII characters (that is, greater than 127)
shall be escaped. This can be useful when saving filenames whose
encoding does not match the one used locally.
As I do not have ability to test, please try it and write about result it give.

Codemirror (AjaXplorer/Linux Webserver): ANSI-files with special characters

I have a AjaXplorer installation on a Linux Webserver.
One of the plugins of AjaXplorer is Codemirror - to view and edit text files.
Now I have the following situation: If I create a txt-File on Windows (ANSI) and upload it into Ajaxplorer (UTF-8), Codemirror shows every special character as a question mark. Consequently the whole file will be saved with question marks instead of the special characters.
But if a file once is saved in UTF-8, the special characters will be saved correctly.
So the problem exists in opening the ANSI-File. This is the point where I have to implement the solution, for example convert ANSI to UTF-8.
The 'funny' thing is, that if I open a fresh uploaded ANSI-File and a saved UTF-8 file with VIM on the Linux console, they look exactly the same, but the output in codemirror is different.
This is a uploaded ANSI-File with special characters like ä and ö and ü
Output in Codemirror: 'like ? and ? and ?'
This is a saved UTF8-File with special characters like ä and ö and ü
Output in Codemirror 'like ä and ö and ü'
This is the CodeMirror-Class of AjaXplorer and I think here must be the point where I could intervene:
https://github.com/mattleff/AjaXplorer/blob/master/plugins/editor.codemirror/class.CodeMirrorEditor.js
As you may see, I'm not a pro and I already tried some code pieces - otherwise I already had the solution ;-) I would be happy if someone gives me a hint! Thank you!!!

A leading question mark in oracle using datastage to import from text to oracle?

The question mark "?" appears only in the front of the first field of the first row to insert.
For once, I changed the ftp upload file type to text/ascii (rather than binary) and it seemed resolve the problem. But later it came back.
The server OS is aix5.3.
DataStage is 7.5x2.
Oracle is 11g.
I used ue to save the file to utf-8, using unix end mark.
Has anyone got this thing before?
The question mark itself doesn't mean much as it could be only a "mask" for some special character which is not recognized by the database. You didn't provide any details about your environment, so my opinions here are only a guess. I hope it can give you a little of a light.
How is the text file created? If it's a file created in a windows environment you're very likely to have character like this due brake lines {CR}{LF} characters.
What is the datatype for the oracle table?
Char datatype will "fill" every position according to the size of the field, I'd recommend to use varchar instead on this case.
If it's not the case, I would edit the file in Hex mode and check for the Ascii code for this specific character then use a TRIM (if parallel) or Convert(if server) to replace the character.
The convert function would be something like this:
Convert(Char([ascii_char_number]),'',[your_string])
Alternatively you can use the Trim function if your job is a parallel job
Trim([your_string],[ascii_char_number],'L')
The option "L" will remove all leading characters. You might need to adapt this function to suit your needs. If you're not familiar with the TRIM function you can find more details at the datastage online documentation.
The only warning I'd give when doing this, is that you'll be deleting data from your original source of data, so make sure you're not deleting any valid information when manipulating a file like this as this is not a very recommended practice between the ETL gurus out there.
Any questions, give me a shout. Happy to help if I can.
Cheers
I had a similar issue where unprintable characters were being displayed as '?' and datastage was throwing a warning when processing these records. It was ok for me to not display those unprintable characters, so I used the function ICONV which converts those characters into printable ones. There are multiple options, I chose the one which will convert them to '.' which worked for me. More details are available in the IBM pages below:
https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/r_deeref_String_Functions.html
http://docs.intersystems.com/ens201317/csp/docbook/DocBook.UI.Page.cls?KEY=RVBS_foconv
The conversion I used:
ICONV(column_name,"MCP")

Creating files in a *nix environment with an email address as its name

PLEASE don't tell me why you think its a bad idea. Just tell me if its a workable idea.
I want to create files in a folder with names like the following:
asdf#qwerty.com.eml
abc+def#asdf.net.eml
abc_def#sasdf.at.eml
Is there some fundamental incompatibility in the characters allowed in email addresses and those allowed by a unix system?
I will be having a bash script reading the file names, subtracting the ".eml" ending, converting it into the "correct" usable format and sending an email to the address.
A simple test showed that it saved the above as files called
asdf\#qwerty.com.eml
abc+def\#asdf.net.eml
abc_def\#sasdf.at.eml
The only characters not allowed in a *nix filename are \0 and /, neither of which is allowed in an email address anyways. How your shell may handle symbols is another matter.
There are no characters disallowed in UNIX file names except / (directory separator) and ASCII 0 (string terminator), so there is no problem at a fundamental level.
Handling those file names in shell scripts is a different matter; it requires at least quoting every variable reference as "$FILENAME", and forgetting even one quotatino will create a very rare, insidious bug. (Also, many other utilities will fail on strange characters such as | or newline unless you consistently use the -0 option.)
So yes, technically your bad idea is workable :-)
Short answer:
przemek#linux-634b:~/tmp/email> touch john.smith#example.com
przemek#linux-634b:~/tmp/email> ls
john.smith#example.com
Works perfectly;)
Long answer:
It depends on filesystem you're using. See Wikipedia entry which lists allowed characters in file names. Most UNIX file systems support all characters that can be included in e-mail addresses. Non-UNIX filesystems, such as FAT, however, may cause problems.
Note that your problems may come from improper escaping. Check how are you creating your files.
What was your "simple test"?
Typing abc and hitting tab?
The bash autocompletion will add a \ before every special character.
But this does not mean, it is stored with a \ in its name.
Use ls to see the true name.
There is no problem with such file names on systems which treat file names as blobs and allow all byte sequences, i.e. Linux. But they are not portable to systems which treat file names as Unicode strings and disallow certain characters (Windows) or transform file names (Mac OS X, canonical decomposition).

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources