So I'm aware that Snowflake doesn't really have an Oracle Blob equivalent, but i'm just curious how are other out there addressing the need for having Blob data from Oracle in their data-warehouse? Specifically where the general 16MB limit on VARCHAR and 8MB limit on Binary is not enough.
These are some examples I have come across for "Specifically where the general 16MB limit on VARCHAR and 8MB limit on Binary is not enough"
Storing ore that 16mb data in Snowflake - Variant 16mb of compressed data
How To Load Data Into Snowflake – Snowflake Data Load Best Practices
- Using a Snowflake stage is a great way plan the upload
Related
In BigQuery I have a table storing 237 GB data. I don't have any columns on which i can create partition as it does not store any date fields
When I am using it in the query the processing says 77 GB data will be processed but in bytes shuffled i see 7 GB data.
what is the actual GB of data processed here?
is there any way i could restructure this table ?
BigQuery operates column-wise. If you only choose the columns you really need in a query then you're optimizing cost already. Traditionally databases operate row-wise, so this can be a bit counter-intuitive.
There's also this great blog article on optimizing for costs.
Do I need to be able to fit my entire database in memory to us Oracle's Database In-Memory?
No, you can selectively declare a subset of your database to be in-memory. Since Database In-Memory is targeted at analytic workloads it populates selected objects into an in-memory area in columnar format. This allows analytic queries to scan the columnar data much faster than in the row format.
Bulk data insertion from greenplum from Oracle through JDBC, plain text data, storage speed is very slow, 200 per second. Is there any good solution?
I try to insert data from Oracle to HDFS, the same configuration, 20,000 data per second.
During my first tests with BigQuery, I've observed that the size of a table imported in BigQuery is much bigger than its original representation in Hadoop.
Here are the numbers that I get:
ORC original Hadoop table: 2GB
Avro compressed representation to load the data into BigQuery: 6.4 GB
(test: Avro uncompressed: 45.8 GB)
size in BigQuery (Capacitor format): 47.1 GB
This table has 11 million rows, with 366 columns (most of them being "strings").
Is it a normal behavior from BigQuery? I thought that Capacitor optimized the data in a very efficient way.
Is there any way to see the internal structures of my data in BigQuery to understand what is going wrong and what causes such amount of space being generated?
I am trying to migrate data from SQL Database to HBase with Hadoop. But the problem is my database is of 70 GB in SQL and it takes around 400 GB when I have transferred it to Hadoop. Why it is so ?. Is there any way to reduce this space used.
Also how much disk space is required if I have a data of SQL database of 800 GB.
After a large amount of serach, I come across some results that I am storing my data in default format of Hadoop i.e. text format. So, It will consume large amount of space for storing data compare to other Storage. Also Manjunath is correct as we reduce the replication factor, It might reduce the storage space but it will cause some problems as well. For further information on this topic, kindly refer to the below mentioned link :
http://datametica.com/rcorc-file-format/