Ruby - updating large amounts of data into Postgres DB on Heroku - ruby

I have a Ruby application Postrgress DB on heroku.
Some of the data is was migrated incorrectly.
I want to update the data on about 5 thousand rows. I do not want to blow the DB away and remigrate.
What would be the best way to do this.
I have update small amounts of data using the active record type sql but not sure for a large amount.
thanks in advance
Maggs

It really depends on how many fields or what type of data you specifically you are trying to change.
Are your fields changing specifically for each one or are you able to run a loop to iterate through each row to update the value? Please be more specific in terms of the data and what kind of editing you are trying to do.

Related

Impact Oracle DB by frequently creation and Drop of tables

I have a use case where we need to store some data for a specific period(lets say, 10k rows for 5 mins) and since data is huge so can`t store in Java Memory. Please help me to understand which one is the best approach from following and why?
Create a permanent table and introduce a column, which will help me to fetch and drop the rows as per the session.
Create multiple temporary table for each session and drop just after the process.
Thanks in Advance.
Its sound like you just need to make use of GTTs, please check out the below documentation for the same, you dont need to worry about truncating or dropping the table as data will be stored only till the session lasts.
Documentation

force Oracle to use indexes over DB-Link queries

I use stored proceduers on DB instance "A" to store data in GTT. To get the original data i have to go over a DB-Link to DB instance "B". That for i put together the whole query and send it to remote DB instance.
This works fine. But sometimes it seems that Oracle is not using the best way or correct indexes for queries. Is there a way to force Oracle to use specific indexes? I tried to use hints, but honestly I dind't understand the difference between all these options.
Thanks for helping me!
There is a huge temptation to optimize a query one way when you want it to work another way. Adding hints is a temporary solution which can backfire on you when the amount or type of data in the table changes or when you upgrade to a newer version with a newer optimizer.
First, determine that there is a problem. Are all queries taking too long? Just some? Only the first one?
The easiest thing to do is to make sure the indexes on that table are up to date. Then look at optimizing the query by using the explain plan feature to see what indexes are being used.
It's also prudent to examine your data to see if the query is selecting different things or different amounts of records if it is time based.

MarkLogic 8 - Reporting and Aggregation from Large Collection

Say I have a collection with 100 million records/documents in it.
I want to create a series of reports that involve summing of values in certain columns and grouping by various columns.
What references for XQuery and/or MarkLogic can anyone point me to that will allow me to do this quickly?
I saw cts:avg-aggregate which looks fine. But then I need to group as well..
Also, since I am dealing with a large amount of data and it will take some time to go through it all, I am thinking about setting this up as a job that runs at night to update the report.
I thought of using corb to run through the records and then do something with the output from that. Is this the right approach with MarkLogic and reporting?
Perhaps this guide would help:
http://developer.marklogic.com/blog/group-by-the-marklogic-way
You have several options which are discussed above:
cts:estimate
cts:element-value-co-occurrences
cts:value-tuples + cts:frequency

database for enterprise level using oracle - normalization and duplication

I am developing an enterprise application with an Oracle backend. I am designing a core part of the DB architecture now and im having some questions on it.
First and most important thing is, most of my tables needs to preserve old data. For example
Consider a table with the fields
Contract No, Contract Name, Contract Person, Contract Email
I have a records like
12, xxx, yyy, xxx#zzz.ccc
and some one modifies it to
12, xxx, zzz, xxx#zzz.ccc
at any point of time i need to display the new record while still have copy of the old record.
So what i thought was to put a duplicate record of the old data and update the fields that was changed and have a flag to keep track of active records with something like "is active" as 1.
The downside is that this creates redundancy in the table and seems like a bad design. But any other model seems unnecessarily complex and this seems cleaner to me. Also i dont see any performance issues having a duplicate record too. So please let me know if this is ok or am i missing something here.
Some times where there is a one to many relationship my assumption is to have a mapping table where i map the multiple entity in individual records by repeating master ID and changing child ID in each record. Is this a right way to do it or is there a better way to do it.
Is there a book on database best practices.
Thanks.
The database im dealing with is Oracle 11g on a two node RAC cluster
Also i dont see any performance issues having a duplicate record too.
Assume you have a row that, over time, has 15 updates to it. If you don't store any temporal data (if you don't store different versions of the row), you end up storing one row. If you do store temporal data, you end up storing 15 rows.
You also need more indexes, because the id number is no longer sufficient to identify a single row.
If you have only relatively small tables, you probably won't see any performance difference. (There will be one, but it probably won't be noticeable to users.) But a table that has 10 million rows will perform differently than a table that has 150 million rows. (15 versions per row, times 10 million rows.)
Some times where there is a one to many relationship my assumption is
to have a mapping table where i map the multiple entity in individual
records by repeating master ID and changing child ID in each record.
Is this a right way to do it or is there a better way to do it.
You probably need to know which child rows belong to which parent rows. So you need more than a single master id for the key. The master id alone doesn't tell you which version of that row in the parent table applies to a given child row.
Is there a book on database best practices.
There are books on temporal databases. The first one that I know of is Snodgrass's Developing Time-Oriented Database Applications in SQL. It's available in several formats, and it's free. It's also kind of old, but the information in it is important to understand if you're going to be building a temporal database. Also, think about reading Date's book Temporal Data and the Relational Model.
Wikipedia has an article that summarizes the ideas behind temporal databases.
Is normalization completely mandatory.
That's a meaningless question. You will have different issues with tables normalized to 2NF than you'll have with tables normalized to 5NF or 6NF.
I would keep the old/history records in a separate table. Create an upd/del trigger to populate your audit/history table for you, and keep only the most current data in your main table.
See here for an example. Many other similar examples exists in SO.

Rhomobile inserting into local database using CSV or XML from external web server

I am currently developing a Rhomobile application. I have a backend database which holds customer information. I have got from the webserver a csv string (or XML - I am able to parse the XML using REXML) which contains all the customers. Each time I sync the device I am going to reset the customer table on the device and re-insert all data from the backend database. I am not using RhoSync and the device will be using property bag.
Is it possible to use the CSV or XML data to insert into the customers table? If so, how would I go about it?
At the moment the only option I can see that would work would be to manually loop through the CSV/XML and insert into the database manually; this isn't very elegant.
Any help will be much appreciated, sorry if this is a dumb question; still relatively new to this framework.
I have come to the conclusion that the only way is to loop through the csv/xml, which with the help of a database transaction this doesn't take long.
Using fixed schema also increases the performance a lot as property bag has to do column inserts (so if you have lots of columns - there is lots of inserts per record).
Also in Rhomobile garbage collection is turned off, so if you are trying to process large data sets your device will quickly run out of memory:
GC.enable
The above solves this issue

Resources