I downloaded a huge csv file (7,98 Gio) in order to import it on a postgres database. The problem is that the file is encoded in ISO-8859 and if I want to import it on postgres it must be in UTF-8.
So i tried to convert it in utf-8 using iconv command on Ubuntu subsystem (integrated in Windows 10). The problem is that the output file is still empty according to Properties window of the output file. And the command won't terminate until Ctrl+C is pressed.
Here is my command :
iconv -t utf-8 < sirc-17804_9075_14209_201612_L_M_20170104_171522721.csv > xaus.csv
I've tried many syntaxes but none of theme are populating the output file...
P-S sorry for my english I'm french
edit : after a very long period the commands outputs :
iconv: unable to allocate buffer for input: Cannot allocate memory
iconv appears to want to load the entire file into memory, which may be problematic for large files. See iconv-chunks for a possible solution; from the iconv-chunks description:
This script is just a wrapper that processes the input file in manageable chunks and writes it to standard output.
Related
I am using an old app cannot handle UTF-8 characters.
The csv files are SFTP to a folder with default UTF-8. Is there a way to automatically convert UTF-8 to ANSI?
Can this be done in windows setting or I have to put some code to convert them?
Thank you!
Historically, I have used my own custom tools to do this.
But these days, if you have WSL, CygWin or MingW installed, you can use the GNU iconv tool to do the conversion, after first using other tools (such as head -c 3) to check if the input file has the 3 byte UTF-8 encoded BOM character as a file format marker.
I have url links to image files I want to retrieve from the internet.
I can download the files using curl without issue using:
curl "https://...web address..." > myfileName;
The image files are of various types, some .bmp some .jpg etc. I have been using sip in Terminal on Mac osx to convert each to .png files using:
sips -s format png downloadFileName --out newFileName.png
This works well on files I've saved as downloadedFileName regardless of the starting file type.
As I have many files to process I wanted to pipe the output of the curl download directly into sips, without saving an intermediate file.
I tried the following (which combines my two working steps without the intermediate file name):
curl "https://...web address..." | sips -s format png --out fileName.png
And get a no file error: Error 4: no file was specified.
I've searched the sip man pages but cannot find a reference for piped input and have been unable to find a useful answer searching SO or google.
Is there a way to process an image downloaded using curl directly in sips without first saving the file?
I do not necessarily need the solution to use a pipe, or even be on one line. I have a script that will cycle through a few thousand urls and simply want to avoid saving lots of files that will be deleted a line later.
I should add, I do not necessarily need to use sips either. However, any solution must be able to handle image files of unknown type (which sips does admirably) as no file extension is present on the files.
Thanks
I don't have sips installed but its
manpage indicates that it cannot read
from stdin. However, if you use Bash or ZSH (MacOS default now) you
can use process substitution, in this example I use convert which is
a part of ImageMagick and can convert different image types too:
$ convert <(curl -s https://i.kym-cdn.com/entries/icons/mobile/000/018/012/this_is_fine.jpg) this_is_fine.png
$ file this_is_fine.png
this_is_fine.png: PNG image data, 800 x 450, 8-bit/color RGB, non-interlaced
After doing that this_is_fine.png will be the only file in the
directory with no temporary files
Apparently sips only reads regular files which makes it impossible to use /dev/stdin or named pipes.
However, it is possible using the mature and feature-rich convert command:
$ curl -sL https://picsum.photos/200.jpg | convert - newFilename.png
$ file newFilename.png
newFilename.png: PNG image data, 200 x 200, 8-bit/color RGB, non-interlaced
(First install ImageMagick via brewinstall imagemagick or sudoportinstall ImageMagick.)
ImageMagick permits image data to be read and written from the standard streams STDIN (standard in) and STDOUT (standard out), respectively, using a pseudo-filename of -.
source, section STDIN, STDOUT, and file descriptors
Can someone show me how to use the PostScript deletefile operator to delete the input file after GhostScript finishes converting the input file to a PDF file.
This appears to work for me, first creating the PDF file, then setting the permissions on the input file, and finally deleting the input file.
"C:/Program Files/gs/gs9.55.0/bin/gswin64c.exe" -q -sDEVICE#pdfwrite
-o "C:/Temp/Temp_0001.pdf"
-f "C:/Temp/Temp_0001.ps"
--permit-file-all=C:/Temp/Temp_0001.ps
-c (C:/Temp/Temp_0001.ps) deletefile
NOTE: Since I had to switch to Unix-style path separators (even though I am running this on Windows) for the permit-file-all and the deletefile, I decided to use the same convention for both the output and input files as well. Windows seems to be OK with that, and the convention was uniformly used for all paths/files.
I have a file ffmpeg_list_of_files.txt with the content
file '.\Output_0\forces_vs_radii.pdf'
file '.\Output_1\forces_vs_radii.pdf'
file '.\Output_2\forces_vs_radii.pdf'
file '.\Output_3\forces_vs_radii.pdf'
file '.\Output_4\forces_vs_radii.pdf'
and so on...
and then run ffmpeg -f concat -i ffmpeg_list_of_files.txt -c copy output.mkv as is stated at
http://trac.ffmpeg.org/wiki/Concatenate
I, unfortunately, get the error
Line 1: unknown keyword ' ■f'
.\ffmpeg_list_of_files.txt: Invalid data found when processing input
in Windows PowerShell in Windows 10.
What am I doing wrong?
It's an encoding problem I also got that I solved changing the file encoding.
Steps to solve it:
Open the ffmpeg_list_of_files.txt with the notepad, notepad++ or similar
Change the encoding to UTF-8 without BOM. To do it follow one of the next steps:
with windows notepad, this is done using the "save as..." option and at the bottom changing the encoding to "UTF-8" and press "save"
with notepad++ select "encoding" in the main menu and select "encoding with UTF-8 without BOM" and save the file after that
Names could change a little bit depending on the version, but following the steps it's pretty straight forward.
Note: In my case, when redirecting ls or dir to a file in powershell, the default encoding of the file is USC-2
I have a very large CSV file, over 2.5GB, that, when importing into SQL Server 2005, gives an error message "Column delimiter not found" on a specific line (82,449).
The issue is with double quotes within the text for that column, in this instance, it's a note field that someone wrote "Transferred money to ""MIKE"", Thnks".
Because the file is so large, I can't open it up in Notepad++ and make the change, which brought me to find VIM.
I am very new to VIM and I reviewed the tutorial document which taught me how to change the file using 82,449 G to find the line, l over to the spot, x the double quotes.
When I save the file using :saveas c:\Test VIM\Test.csv, it seems to be a portion of the file. The original file is 2.6GB and the new saved one is 1.1GB. The original file has 9,389,222 rows and the new saved one has 3,751,878. I tried using the G command to get to the bottom of the file before saving, which increased the size quite a bit, but still didn't save the whole file; Before using G, the file was only 230 MB.
Any ideas as to why I'm not saving the entire file?
You really need to use a "stream editor", something similar to sed on Linux, that lets you pipe your text through it, without trying to keep the entire file in memory. In sed I'd do something like:
sed 's/""MIKE""/"MIKE"/' < source_file_to_read > cleaned_file_to_write
There is a sed for Windows.
As a second choice, you could use a programming language like Perl, Python or Ruby, to process the text line by line from a file, writing as it searches for the doubled-quotes, then changing the line in question, and continuing to write until the file has been completely processed.
VIM might be able to load the file, if your machine has enough free RAM, but it'll be a slow process. If it does, you can search from direct mode using:
:/""MIKE""/
and manually remove a doubled-quote, or have VIM make the change automatically using:
:%s/""MIKE""/"MIKE"/g
In either case, write, then close, the file using:
:wq
In VIM, direct mode is the normal state of the editor, and you can get to it using your ESC key.
You can also split the file into smaller more manageable chunks, and then combine it back. Here's a script in bash that can split the file into equal parts:
#!/bin/bash
fspec=the_big_file.csv
num_files=10 # how many mini-files you want
total_lines=$(cat ${fspec} | wc -l)
((lines_per_file = (total_lines+num_files-1) / num_files))
split --lines=${lines_per_file} ${fspec} part.
echo "Total Lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l part.*
I just tested it on a 1GB file with 61151570 lines, and each resulting file was almost 100 MB
Edit:
I just realized you are on Windows, so the above may not apply. You can use a utility like simple text splitter a Windows program which does the same thing.
When you're able to open the file without errors like E342: Out of memory!, you should be able to save the complete file, too. There should at least be an error on :w, a partial save without error is a severe loss of data, and should be reported as a bug, either on the vim_dev mailing list or at http://code.google.com/p/vim/issues/list
Which exact version of Vim are you using? Using GVIM 7.3.600 (32-bit) on Windows 7/x64, I wasn't able to open a 1.9 GB file without out of memory. I was able to successfully open, edit, and save (fully!) a 3.9 GB file with the 64-bit version 7.3.000 from here. If you're not using that native 64-bit version yet, give it a try.