From UTF-8 to UTF-8-BOM using aix command/script - utf-8

My AIX server by default generates file in UTF-8 .. I need to convert UTF-8 file to UTF-8-BOM .. Please suggest unixAIX command/script

Related

how to automatically set csv file from UTF-8 to ANSI in windows?

I am using an old app cannot handle UTF-8 characters.
The csv files are SFTP to a folder with default UTF-8. Is there a way to automatically convert UTF-8 to ANSI?
Can this be done in windows setting or I have to put some code to convert them?
Thank you!
Historically, I have used my own custom tools to do this.
But these days, if you have WSL, CygWin or MingW installed, you can use the GNU iconv tool to do the conversion, after first using other tools (such as head -c 3) to check if the input file has the 3 byte UTF-8 encoded BOM character as a file format marker.

iconv on windows ubuntu subsystem

I downloaded a huge csv file (7,98 Gio) in order to import it on a postgres database. The problem is that the file is encoded in ISO-8859 and if I want to import it on postgres it must be in UTF-8.
So i tried to convert it in utf-8 using iconv command on Ubuntu subsystem (integrated in Windows 10). The problem is that the output file is still empty according to Properties window of the output file. And the command won't terminate until Ctrl+C is pressed.
Here is my command :
iconv -t utf-8 < sirc-17804_9075_14209_201612_L_M_20170104_171522721.csv > xaus.csv
I've tried many syntaxes but none of theme are populating the output file...
P-S sorry for my english I'm french
edit : after a very long period the commands outputs :
iconv: unable to allocate buffer for input: Cannot allocate memory
iconv appears to want to load the entire file into memory, which may be problematic for large files. See iconv-chunks for a possible solution; from the iconv-chunks description:
This script is just a wrapper that processes the input file in manageable chunks and writes it to standard output.

How to recursively zip utf-8 files and folders from a batch file?

I have a folder named "Attachments2". I'm working in windows 7 and am using zip.exe downloaded from cygwin to zip this folder.
In this folder there are folders and files which have Hebrew characters in their names and content (in the files' content that is).
This is a snapshot of how the folder looks like in the file system:
When trying the following :
zip.exe -r results.zip Attechments2
I get the following:
The file contents are as the origin.
Please help.
Regrards,
Omer.
Windows's ZIP file shell extension (“Compressed Folders”) doesn't support Unicode filenames. Instead it takes the byte filename string and interprets it using a locale-specific legacy encoding (which varies from machine to machine and is never a UTF).
It looks like you've got some further mangling in the zipping-up process too as it doesn't look like a straight UTF-8 misinterpretation; you could get a better idea of what the filenames are in the ZIP by using another tool that does support Unicode to open it (eg 7zip). But the point is likely moot: if you are expecting the consumers of the ZIP file to be Windows users, the only safe filename characters are ASCII.

How do I get the Tanuki Wrapper log files to be UTF-8 encoded?

I have a working Java program that uses the Tanuki wrapper. The problem I have is that the wrapper log file is not UTF-8 encoded, but appears to be ASCII. The wrapper configuration file begins with:
#encoding=UTF-8
#include ..\..\Tomcat\conf\wrapper-license.conf
wrapper.java.command.loglevel=INFO
wrapper.lang.encoding=UTF-8
wrapper.debug=true
The wrapper starts and it starts the JVM successfully. But when I edit the wrapper logfile, Japanese characters (for example) are present as question mark characters, i.e., ASCII character 0x3f. I double-checked by loading the log file in a hex editor.
The Tanuki Wrapper log file is put into a directory that contains Japanese characters -- for testing purposes. The log file is successfully created in that folder, so the wrapper is clearly able to read and process the UTF-8 characters. But when it logs the folder name in which it will create its logs, the folder name is logged as all ASCII 0x3f characters ('?').
How can I get the Tanuki Wrapper to encode its log file in UTF-8?
I have confirmation from Tanuki that the current wrapper software will always write its logs using the current system encoding. There is currently (as of 3.5.17) no way to configure the wrapper to write its logs in any different encoding, such as UTF-8.
Again, you can configure the encoding in which the wrapper will read the configuration file, but not the encoding with which it writes to its log file.

JSch to download file with file name in non ASCII characters

I am using JSch's ChannelSftp to download files from remote FTP server to local linux machine. When remote machine has file's with filenames which have non-ascii characters, downloaded file has ? instead of those non-ascii characters.
For example a file with filename - test-測試中國人的字.txt present in the ftp server will appear as test-??????.txt after downloading on local machine.
Is there a way, I can retain the non-ascii characters after downloading or automatically convert them to something more meaningful.
Here, problem was that client was not supporting UTF-8 encoding. This issue was solved by setting language in jvm argument to UTF8 in the client application.

Resources