how to automatically set csv file from UTF-8 to ANSI in windows? - utf-8

I am using an old app cannot handle UTF-8 characters.
The csv files are SFTP to a folder with default UTF-8. Is there a way to automatically convert UTF-8 to ANSI?
Can this be done in windows setting or I have to put some code to convert them?
Thank you!

Historically, I have used my own custom tools to do this.
But these days, if you have WSL, CygWin or MingW installed, you can use the GNU iconv tool to do the conversion, after first using other tools (such as head -c 3) to check if the input file has the 3 byte UTF-8 encoded BOM character as a file format marker.

Related

git show HEAD~1:.gitignore command results in weird characters in Windows 64bit

I have these lines written in my .gitignore file:
logs/
main.log
*.log
bin/
*.bin
When i execute the command: git show HEAD~1:.gitignore, i get this weird result:
git command output
When i'm supposed to get this: extected results
I run this command from poweshell, cmd, git bash, windows terminal and Cygwin64 and from the integrated terminal in VS Code and i get the same result from everywhere. .gitignore's encoding is UTF-16 LE. I even changed the encoding to ANSI and UTF-8 but still got the same result. Stuck with this one for days, any help will be much appreciated!
The <FF><FE> is a byte order mark or BOM that marks the file as UTF-16-LE. The remainder of the file consists of UTF-16-LE characters. That is, this isn't a plain text file, it's a UTF-16 file. Git does not know how to read that (and the result is that its contents don't actually matter).
I even changed the encoding to ANSI and UTF-8 but still got the same result.
The actual bytes of the file matter: a UTF-8 file would, ideally, not begin with a BOM (UTF-8 files don't have byte order in the first place so these are just junk). The entire text of the file would then just be the UTF-8 data. This requires rewriting the file, i.e., you will need to make a new commit: the existing commit has a useless file in it.
To actually rewrite the file, you will need a tool that does it. You do not say how you changed the encoding. See UTF-16 to UTF-8 conversion (for scripting in Windows) for various options. It's possible you've already done this correctly, and made a new commit, because HEAD~1 is not the current commit, but rather some existing, previous commit. No existing commit can ever be changed, though the special name HEAD always refers to the current commit, so as you make new commits, the commit that HEAD means—and therefore the one that HEAD~1 means—changes over time.

iconv on windows ubuntu subsystem

I downloaded a huge csv file (7,98 Gio) in order to import it on a postgres database. The problem is that the file is encoded in ISO-8859 and if I want to import it on postgres it must be in UTF-8.
So i tried to convert it in utf-8 using iconv command on Ubuntu subsystem (integrated in Windows 10). The problem is that the output file is still empty according to Properties window of the output file. And the command won't terminate until Ctrl+C is pressed.
Here is my command :
iconv -t utf-8 < sirc-17804_9075_14209_201612_L_M_20170104_171522721.csv > xaus.csv
I've tried many syntaxes but none of theme are populating the output file...
P-S sorry for my english I'm french
edit : after a very long period the commands outputs :
iconv: unable to allocate buffer for input: Cannot allocate memory
iconv appears to want to load the entire file into memory, which may be problematic for large files. See iconv-chunks for a possible solution; from the iconv-chunks description:
This script is just a wrapper that processes the input file in manageable chunks and writes it to standard output.

How to stop new line conversion when zipping a Windows text file on a Unix machine

I want to zip a windows .cmd file on an OSX server, using the zip command line tool.
templateName="Windows_Project_Template"
zip -r -T -y -9 "${templateName}.zip" $templateName
When the file is unzipped on a windows machine all the new line carriage returns are converted and so the text file comes out without any new line formatting on a windows machine. How can I work around this?
Thanks
While not a perfect solution (I can't find an option to handle everything as binary), you can force the \r\n with the --to-crlf option:
-l
--to-crlf
Translate the Unix end-of-line character LF into the MSDOS convention CR LF. This option should not be used on binary files. This
option can be used on Unix if the zip file is intended for PKUNZIP under MSDOS. If the input files already contain CR LF, this
option adds an extra CR. This is to ensure that unzip -a on Unix will get back an exact copy of the original file, to undo the
effect of zip -l. See -ll for how binary files are handled.
Be careful, if the file already contains \r\n you will get \r\r\n.

How to recursively zip utf-8 files and folders from a batch file?

I have a folder named "Attachments2". I'm working in windows 7 and am using zip.exe downloaded from cygwin to zip this folder.
In this folder there are folders and files which have Hebrew characters in their names and content (in the files' content that is).
This is a snapshot of how the folder looks like in the file system:
When trying the following :
zip.exe -r results.zip Attechments2
I get the following:
The file contents are as the origin.
Please help.
Regrards,
Omer.
Windows's ZIP file shell extension (“Compressed Folders”) doesn't support Unicode filenames. Instead it takes the byte filename string and interprets it using a locale-specific legacy encoding (which varies from machine to machine and is never a UTF).
It looks like you've got some further mangling in the zipping-up process too as it doesn't look like a straight UTF-8 misinterpretation; you could get a better idea of what the filenames are in the ZIP by using another tool that does support Unicode to open it (eg 7zip). But the point is likely moot: if you are expecting the consumers of the ZIP file to be Windows users, the only safe filename characters are ASCII.

JSch to download file with file name in non ASCII characters

I am using JSch's ChannelSftp to download files from remote FTP server to local linux machine. When remote machine has file's with filenames which have non-ascii characters, downloaded file has ? instead of those non-ascii characters.
For example a file with filename - test-測試中國人的字.txt present in the ftp server will appear as test-??????.txt after downloading on local machine.
Is there a way, I can retain the non-ascii characters after downloading or automatically convert them to something more meaningful.
Here, problem was that client was not supporting UTF-8 encoding. This issue was solved by setting language in jvm argument to UTF8 in the client application.

Resources