saving entire file in VIM - windows

I have a very large CSV file, over 2.5GB, that, when importing into SQL Server 2005, gives an error message "Column delimiter not found" on a specific line (82,449).
The issue is with double quotes within the text for that column, in this instance, it's a note field that someone wrote "Transferred money to ""MIKE"", Thnks".
Because the file is so large, I can't open it up in Notepad++ and make the change, which brought me to find VIM.
I am very new to VIM and I reviewed the tutorial document which taught me how to change the file using 82,449 G to find the line, l over to the spot, x the double quotes.
When I save the file using :saveas c:\Test VIM\Test.csv, it seems to be a portion of the file. The original file is 2.6GB and the new saved one is 1.1GB. The original file has 9,389,222 rows and the new saved one has 3,751,878. I tried using the G command to get to the bottom of the file before saving, which increased the size quite a bit, but still didn't save the whole file; Before using G, the file was only 230 MB.
Any ideas as to why I'm not saving the entire file?

You really need to use a "stream editor", something similar to sed on Linux, that lets you pipe your text through it, without trying to keep the entire file in memory. In sed I'd do something like:
sed 's/""MIKE""/"MIKE"/' < source_file_to_read > cleaned_file_to_write
There is a sed for Windows.
As a second choice, you could use a programming language like Perl, Python or Ruby, to process the text line by line from a file, writing as it searches for the doubled-quotes, then changing the line in question, and continuing to write until the file has been completely processed.
VIM might be able to load the file, if your machine has enough free RAM, but it'll be a slow process. If it does, you can search from direct mode using:
:/""MIKE""/
and manually remove a doubled-quote, or have VIM make the change automatically using:
:%s/""MIKE""/"MIKE"/g
In either case, write, then close, the file using:
:wq
In VIM, direct mode is the normal state of the editor, and you can get to it using your ESC key.

You can also split the file into smaller more manageable chunks, and then combine it back. Here's a script in bash that can split the file into equal parts:
#!/bin/bash
fspec=the_big_file.csv
num_files=10 # how many mini-files you want
total_lines=$(cat ${fspec} | wc -l)
((lines_per_file = (total_lines+num_files-1) / num_files))
split --lines=${lines_per_file} ${fspec} part.
echo "Total Lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l part.*
I just tested it on a 1GB file with 61151570 lines, and each resulting file was almost 100 MB
Edit:
I just realized you are on Windows, so the above may not apply. You can use a utility like simple text splitter a Windows program which does the same thing.

When you're able to open the file without errors like E342: Out of memory!, you should be able to save the complete file, too. There should at least be an error on :w, a partial save without error is a severe loss of data, and should be reported as a bug, either on the vim_dev mailing list or at http://code.google.com/p/vim/issues/list
Which exact version of Vim are you using? Using GVIM 7.3.600 (32-bit) on Windows 7/x64, I wasn't able to open a 1.9 GB file without out of memory. I was able to successfully open, edit, and save (fully!) a 3.9 GB file with the 64-bit version 7.3.000 from here. If you're not using that native 64-bit version yet, give it a try.

Related

Loop Over Files as Input for Program, Rename and Write Output to Different Directory

I have a problem with writing the output of a program to a different directory when I loop different files as variables as inputs. I run this in the command line. The problem is that I do not know how to "tell" the program to put the output with a changed filename into another directory than the input directory.
Here is the command, although it is a bioinformatic tool which requires specific input file formats. I am sorry that I could not give a better example. Nonetheless, the program is called computeMatrix in a software-tool box called deeptools2.
command:
for f in ~/my/path/*spc_files*; do computeMatrix reference-point--referencePoint center --regionsFileName /target/region.bed --binSize 500 --scoreFileName "$f" **--outFileName "$f.matrix"** ; done \
So far, I tried to use the command basename to just get the filename and then change the directory before that. However I could not figure out:
if this is combinable
what is the correct order of the commands (e.g.:
outputFile='basename"$f"', "~/new/targetDir/'basename$f'")
Probably there are other options to solve the problem which I could not think of/ find.

Vim crashing when navigating through file

I'm very very new to Vim. I've been using it for 2 days now (out of the womb new), and I've been having some problems navigating a certain Ruby file of mine without it crashing.
Before I get to the error message, here are the steps I did to reproduce the problem...
First I opened up the file as read-only with the :edit command
If the file has no syntax coloring turn it on :syntax on. (For some reason it doesn't crash without it.)
Navigate up and down the file with j and k (reproduces the crash quicker when you set the cursor in a position where it would
scatter the cursor more. For ex, the end of a line)
At first I thought something was wrong with my .rb file, but I was able to reproduce the same crash with the tk.rb file as well which is located in lib\ruby\2.2.0\ folder. It took some more time to do it with tk.rb since the comments in the code make it harder to crash. (I recommend to try it on files with lots of lines like this).
Here's a gif of me reproducing the problem and the file I was navigating through to reproduce the crash...
http://puu.sh/jHXXG/14d2cf6460.gif
http://puu.sh/jHVG2/fdae9e38fa.rar
I'm using Vim 7.4 and windows 10. If any more information is needed please ask in the comments. I would like to know how to resolve this. Vim looks like a really nice program. However, if its gonna break itself and my heart from navigating with hjkl. I might have to travel back to the fork in the road, and walk down the emacs path.
As it was indicated on the comments, you should open a bug report if the problem is indeed in Vim.
But first you should try the following:
Ensure you are using a version with the latest patches; there were some reports similar to the problem you are describing, and there are chances that it is already solved.
Check if any setting/plugin is triggering the problem, beyond the :syntax. The procedure at Vim-FAQ 2.5 can be helpful. Some relevant parts follows:
2.5. I have a "xyz" (some) problem with Vim. How do I determine it is a
problem with my setup or with Vim? / Have I found a bug in Vim?
First, you need to find out, whether the error is in the actual
runtime files or any plugin that is distributed with Vim or whether it
is a simple side effect of any configuration option from your .vimrc
or .gvimrc. So first, start vim like this:
vim -u NONE -U NONE -N -i NONE
this starts Vim in nocompatible mode (-N), without reading your
viminfo file (-i NONE), without reading any configuration file (-u
NONE for not reading .vimrc file and -U NONE for not reading a .gvimrc
file) or even plugin.
If the error does not occur when starting Vim this way, then the
problem is either related to some plugin of yours or some setting in
one of your local setup files. You need to find out, what triggers the
error, you try starting Vim this way:
vim -u NONE -U NONE -N
If the error occurs, the problem is your .viminfo file. Simply delete
the viminfo file then. If the error does not occur, try:
vim -u ~/.vimrc --noplugin -N -i NONE
This will simply use your .vimrc as configuration file, but not load
any plugins. If the error occurs this time, the error is possibly
caused by some configuration option inside your .vimrc file. Depending
on the length of your vimrc file, it can be quite hard to trace the
origin within that file.
The best way is to add :finish command in the middle of your .vimrc.
Then restart again using the same command line. If the error still
occurs, the bug must be caused because of a setting in the first half
of your .vimrc. If it doesn't happen, the problematic setting must be
in the second half of your .vimrc. So move the :finish command to the
middle of that half, of which you know that triggers the error and
move your way along, until you find the problematic option. If your
.vimrc is 350 lines long, you need at a maximum 9 tries to find the
offending line (in practise, this can often be further reduced, since
often lines depend on each other).
If the problem does not occur, when only loading your .vimrc file, the
error must be caused by a plugin or another runtime file (indent
autoload or syntax script). Check the output of the :scriptnames
command to see what files have been loaded and for each one try to
disable each one by one and see which one triggers the bug. Often
files that are loaded by vim, have a simple configuration variable to
disable them, but you need to check inside each file separately.
If the previous steps doesn't solved the problem you could try checking similar bug reports and try maybe some of the patches which still weren't merged:
long line with syntax highlighting crashes vim w/ 100% CPU
Segfault on 7.4 caused by syntax :on with Ruby file
vim_dev search

When was a file used by another program

I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours.
Is there is a way to know which was the last of the csv files that was used by Stata? I would like to know if the code is somewhere near finishing. I thought that organizing the 750 files by "last used" would do the trick, but it does not.
Next time I should be more careful about signalling how the process is going...
Thank you
From the OS X terminal, cd to the directory containing the CSV files, and run the command
ls -lUt | head
which should show your files, sorted by the most recent access time, limited to the 10 most recently accessed.
On the most basic level you can use display and log your session:
clear
set more off
local myfiles file1 file2 file3
forvalues i = 1/3 {
display "processing file`i'"
// <do-something-with-file>
}
See also help log.

How to Copy/Paste large number of lines of code in a terminal

I am working on mySQL source. I need to copy a file with 12,000 lines of code at a time and paste it into another terminal/text document. How can I perform this task?
Use the cat command so that it allows you to select all text and paste it in another terminal where u require

Modify text file in VMDK

I need to modify a text file in the file system of a VMDK file for a deployable OVA file. I have seen that it is possible to find the string (a test string in the text file needing modification) and replace it in the VMDK using sed, however whenever I try to replace it, it seems to work, but then nothing actually changes in the file. Here is what I have tried:
sed -i 's/oldstring/newstring/g' file.vmdk
The sha1sum before and after are identical telling me the file has not been modified. Something gets goofed up though because I can no longer start the VM of that VMDK (its ok, its a test).
Does anyone have any programmatic suggestions here. I will be doing all of this in bash. If no one can suggest anything, I will move this question to servar fault or SuperUser and ask if anyone knows how to mount the VMDK (Without VMware-tools!!!) then edit the file directly with sed or via some other means.
Perhaps there is some way of doing a binary search or something for the string to replace.

Resources