Convert Mysql Dump File from INSERT to INSERT IGNORE - insert

I have a huge dump file about 40G in size, and I need to dump that back into the database since there are some records are missing after a recovery.
Is there any easy way I can covert the INSERT into INSERT IGNORE in the dump file to avoid duplicate entries errors? loading the file to a text editor seems a no go to me.
thank you very much in advance

There is also a switch for mysqldump
--insert-ignore in mysqldump

If your using a unix-like operating system, you can use sed:
cat file.sql | sed s/"^INSERT"/"INSERT IGNORE"/g > updated.sql

Use a text processing application like sed from the command line to do the search-replace on your imports.

Related

Copy large datasets from Hive to local directory

Im trying to copy data from a hive table to my local dir.
The code that I am using is:
nohup hive -e "set hive.cli.print.header=true; set hive.resultset.use.unique.column.names=false; select * from sample_table;" | sed 's/[\t]/|/g' > /home/sample.txt &
The issue is the file will be around 400 GB and the process takes forever to complete.
Is there any better way to do it, like compressing the file as it is being generated.
I need to have the data as .txt file but im not able to get a quick work around for this problem.
Any smart ideas would be really helpful.
Have you tried doing it with the -getmerge option of the hadoop command? That'd typically what I use to merge Hive text tables and export to a local share drive.
hadoop fs -getmerge ${SOURCE_DIR}/table_name ${DEST_DIR}/table_name.txt
I think the sed command would also be slowing things down significantly. If you do the character replacement in Hive prior to extracting the data, that would be faster than a single-threaded sed command running on your edge node.

Saving hive queries

I need to know how we can store a query I have written in a command line just like we do in sql(we use ctrl+S in sql server).
I heared hive QL queries use .q or .hql extension.Is there any possibility I save my query to get the same by saving list of commands I am executing.
sure whatever ide you use you can just save your file as myfile.q and then run it from the command line as
hive -f myfile.q
You can also do
hive -f myfile.q > myfileResults.log
if you want to pipe your results into a log file.
Create a new file using "cat" command(You can even use editor).Write all the queries you want to perform inside the file
$cat > MyQueries.hql
query1
query2
.
.
Ctrl+D
Note: .hql or .q is not necessary. It is just for our reference to identify that it is a hive query(file).
You can execute all the queries inside the file at a time using
$hive -f MyQueries.hql
You can use hue or web interface to access hive instead of terminal. It will provide you UI from where you can write and execute queries. Solves copy problem too.
http://gethue.com/
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface

expdp dump file is big

I am trying to make dump of Oracle database using the expdp tool. I added the exclude=statistics option to the command line to make the resulting dmp file smaller, but the file is still very big even with this setting. Is there some other setting that can be used to make the dmp file smaller? The database is almost empty and the dmp file is around 230MB. Thank you.
Split into multiple dump files
expdp usr1/usr1 tables=tbl_test directory=dp_dir dumpfile=test_dump_%u.dmp filesize=20m
Cheers
Brian

Strange issue running hiveql using -e option from .sh file

I have checked Stackoverflow but could not find any help and that is the reason i m posting a new question.
Issue is related executing hiveql using -e option from .sh file.
If i run hive as $ bin/hive everything works fine and properly all databases and tables are displayed.
If i run hive as $ ./hive OR $ hive (as set in path variable) OR $HIVE_HOME/bin/hive only default database is displayed that too without any table information.
I m learning hive and trying to execute hive command using $HIVE_HOME/bin/hive -e from .sh file but it always give database not found.
So i understand that it is something related to reading of metadata but i m not able to understand why this kind of behavior.
However hadoop commands work fine from anywhere.
Below is one command i m trying to execute from .sh file
$HIVE_HOME/bin/hive -e 'LOAD DATA INPATH hdfs://myhost:8040/user/hduser/sample_table INTO TABLE rajen.sample_table'
Information:
I m using hive-0.13.0, hadoop-1.2.1
Can anybody pl explain me how to solve this or how to overcome this issue?
can you correct the query first, hive expect load statement path should be followed by quotes.
try this first from shell- HIVE_HOME/bin/hive -e "LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table"
or put your command in test.hql file and test $hive -f test.hql
--test.hql
LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table
I finally was able to fix the issue.
Issue was that i have kept the default derby setup of hive metadatastore_db , so from where ever i used to trigger hive -e command, it used to create a new copy of metadata_db copy.
So i created metadata store in mysql which became global and so now from where ever i trigger hive -e command, same metadata store db was being used.

saving entire file in VIM

I have a very large CSV file, over 2.5GB, that, when importing into SQL Server 2005, gives an error message "Column delimiter not found" on a specific line (82,449).
The issue is with double quotes within the text for that column, in this instance, it's a note field that someone wrote "Transferred money to ""MIKE"", Thnks".
Because the file is so large, I can't open it up in Notepad++ and make the change, which brought me to find VIM.
I am very new to VIM and I reviewed the tutorial document which taught me how to change the file using 82,449 G to find the line, l over to the spot, x the double quotes.
When I save the file using :saveas c:\Test VIM\Test.csv, it seems to be a portion of the file. The original file is 2.6GB and the new saved one is 1.1GB. The original file has 9,389,222 rows and the new saved one has 3,751,878. I tried using the G command to get to the bottom of the file before saving, which increased the size quite a bit, but still didn't save the whole file; Before using G, the file was only 230 MB.
Any ideas as to why I'm not saving the entire file?
You really need to use a "stream editor", something similar to sed on Linux, that lets you pipe your text through it, without trying to keep the entire file in memory. In sed I'd do something like:
sed 's/""MIKE""/"MIKE"/' < source_file_to_read > cleaned_file_to_write
There is a sed for Windows.
As a second choice, you could use a programming language like Perl, Python or Ruby, to process the text line by line from a file, writing as it searches for the doubled-quotes, then changing the line in question, and continuing to write until the file has been completely processed.
VIM might be able to load the file, if your machine has enough free RAM, but it'll be a slow process. If it does, you can search from direct mode using:
:/""MIKE""/
and manually remove a doubled-quote, or have VIM make the change automatically using:
:%s/""MIKE""/"MIKE"/g
In either case, write, then close, the file using:
:wq
In VIM, direct mode is the normal state of the editor, and you can get to it using your ESC key.
You can also split the file into smaller more manageable chunks, and then combine it back. Here's a script in bash that can split the file into equal parts:
#!/bin/bash
fspec=the_big_file.csv
num_files=10 # how many mini-files you want
total_lines=$(cat ${fspec} | wc -l)
((lines_per_file = (total_lines+num_files-1) / num_files))
split --lines=${lines_per_file} ${fspec} part.
echo "Total Lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l part.*
I just tested it on a 1GB file with 61151570 lines, and each resulting file was almost 100 MB
Edit:
I just realized you are on Windows, so the above may not apply. You can use a utility like simple text splitter a Windows program which does the same thing.
When you're able to open the file without errors like E342: Out of memory!, you should be able to save the complete file, too. There should at least be an error on :w, a partial save without error is a severe loss of data, and should be reported as a bug, either on the vim_dev mailing list or at http://code.google.com/p/vim/issues/list
Which exact version of Vim are you using? Using GVIM 7.3.600 (32-bit) on Windows 7/x64, I wasn't able to open a 1.9 GB file without out of memory. I was able to successfully open, edit, and save (fully!) a 3.9 GB file with the 64-bit version 7.3.000 from here. If you're not using that native 64-bit version yet, give it a try.

Resources