After opening a text file larger then 2.5MB, datagrip will open the file in read-only mode. If I then edit the text file as a table and export the table with dump data to file, it will only write the first 2.5MB to file and the rest of the file will never be processed.
How do I make datagrip export the entire file, instead of just the first 2.5mb?
I already tried increasing the file limit in the config, however If I go past the 100mb it requires more then 8GB's of ram to continue.
I am sorry, but there is no workaround for now. Can you please describe the whole task? Perhaps I can help you.
Related
Create a stored procedure that will read the .csv file from oracle server path using read file operation, query the data in some X table and write the output in .csv file.
here after read .csv file, compare .csv file data with table data and need to update few columns in .csv file.
Oracle works best with data in the database. UPDATE is one of the most frequently used commands.
But, modifying a file which resides in some directory seems to be somewhat out of scope. There are other programming languages you should use, I believe. However, if a hammer is the only tool you have, every problem looks like a nail.
I can think of two options.
One is to load file into the database. Use SQL*Loader to do that if file resides on your PC, or - if you have access to the database server and DBA granted you read/write privileges on a directory (an Oracle object which points to a filesystem directory) - use it as an external table. Once you load data, modify it and export it back (i.e. create a new CSV file) using spool.
Another option is to use UTL_FILE package. It also requires access to the database server's directory. Using the A(ppend) option, you can add rows to the original file, but I don't think that you can edit it so this option - at the end - finishes like the previous one - with creating a new file (but this time using UTL_FILE).
Conclusion? Don't use a database management system to modify files. Use another tool.
I want to query from a .gz file which i had imported to hive table but when i use some queries which require Map-reduce job for example:
select count(*) from test;
it shows below errors:
java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
I checked and found that Z LIB is a default compressor codec.
I tried with bzip file and it was OK.
but how can i use .gz file.
how can I change the default codec that can support the gz file?
I had the similar problem, in my case the issue was the files on the folder are of different formats like few were csv and others were parquet. once I keep single file format the issue was resolved.
I faced the same error, although I can read initial few records, but count no. of records failing with same error.
I solved the problem just by renaming my plain (un-compressed) file to .txt. Previously my file name was ; I renamed it to .txt. Also if you un-compress any file test you can read data from it.
And if you want to test run count number of records as explained above, it will do complete scan which will tell you exactly if data is loaded correctly or not.
I posted this solution at one other place
When trying to load csv file into Oracle table through ODI, ODI is not able to fetch the data from csv file. The csv file format is an issue here with the data in a single line. But when we are opening the csv file through excel and then saving it as csv the format is changing and the data is getting arranged properly and then we are able to import it through ODI.
Problem is we need to import the original csv file whatever format it is. Is there a possibility of doing the same?
SQL Loader will be the first thing that has came to my mind. I use this a lot.
SQL Developer will be a better option if you dont want to work with command line utilities.
Try using External Tables...you can configure how the CSV should be read in the EXTERNAL TABLE configuration
I'm reading a json file and I wish to modify some changes in the json file. After modification I would like to overwrite in the same json file. When I'm doing that, MapReduce throws an exception as "FileAlreadyExists". Please give me a solution to overwrite in the same file. I'm not interested to delete the file and create a new file. I just wants to overwrite.
HDFS does not allow writes in the existing files. You have to delete the files first and re-write them. The in-place update to file is not supported in HDFS. The HDFS was design to provide high read on the existing data. So the feature you are expecting is not available in HDFS.
I am getting some error in loading my files onto big sheets both directly from the HDFS( files that are output of pig scripts) and also raw data that is lying on the local hard disk.
I have observed that whenever I am loading the files and issuing a row count to see if all data is loaded into bigsheets, then I see lesses number of rows being loaded.
I have checked that the files are consistent and proper delimeters(/t or comma separated fields).
Size of my file is around 2GB and I have used either of the format *.csv/ *.tsv.
Also in some cases when i have tired to load a file from windows os directly then the files sometimes load successfully with row count matching with actual number of lines in the data, and then sometimes with lesser number of rowcount.
Even sometimes when a fresh file being used 1st time it gives the correct result but if I do the same operation next time some rows are missing.
Kindly share your experience your bigsheets, solution to any such problems where the entire data is not being loaded etc. Thanks in advance
The data that you originally load into BigSheets is only a subset. You have to run the sheet to get it on the full dataset.
http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/t0057547.html?lang=en