Spring Boot application with Postgres: indexes not being used during first use - spring

I have a Spring Boot application that is using a Postgres database. When the application is deployed I need to run a transactional operation that uploads a zip file that is used to populate the database. The application is checking for duplicate rows before inserting them (because users can upload duplicate data that should just be ignored).
The problem I am having is that the first time I upload the file, even thought the indexes are created, they are not being used when checking for the existence of a row. My theory is that this happens because the query plan is deciding not to use the index because it is checking the original statistics, which show that the tables are empty. If I upload a small zip file first, then the problem goes away because the tables now have data.
I have two questions. First, is my theory correct or is there some other reason for this behaviour? Also, if so, is there a way to force Postgres to update the query plan it uses at some predefined interval within the same transaction and can this be done using JPA? Any ideas are appreciated.

Just in case someone runs into this issue, I'll post the solution I found. It appears my theory was correct. The queries will not use the indexes until some statistics are collected. One way to force this is to call ANALYZE after a number of rows have been written to the database. You can do this using a native query like this:
entityManager.createNativeQuery("ANALYZE " + tbl).executeUpdate();
You can wrap this call in a try catch and ignore any exceptions that might occur if you change the database engine. I couldn't find a way of doing this in a database-independent way but this approach works fine and now the initial upload performs as expected.

Related

Duplicate records are inserting into database with spring transaction

We have 10 servers.Some flight related data will come to the servers.From servers that data will come to our application.Means same data can come to our application more than one time,but finally i need to save that data only once in the database.So we are checking in the database before inserting the data.If that record is already not exist in the database then only we are going to save the data.But for some reason we are getting duplicate records in the database.
Is it necessary using synchronization in this scenario.
What might be the problem here.Thanks in advance...
In our company the way we deal with multiple data sources where same piece of information may go through is by utilizing batches.
What we found is by doing this at code level (java and .NET), we would invest a lot of devops time and still have duplications.
By implementing a batching process we stored everything locally and process using 2 batch jobs.
1st will ensure quality of data and remove duplications
2nd will compress and push data to our persistence service (we use XCOM to push straight into a db queue which then plugs the data in).
If you can implement something similar because you have a central point of entry upon which you can implement proper quality gates.
Hope our example helps, if not let me know happy to remove this. :)

Realistic Data Backup method for Parse.com

We are building an iOS app with Parse.com, but still can't figure out the right way to backup data efficiently.
As a premise, we have and will have a LOT of data store rows.
Say we have a class with 1million rows, assume we have it backed up, then want to bring it back to Parse, after a hazardous situation (like data loss on production).
The few solutions we have considered are the following:
1) Use external server for backup
BackUp:
- use the REST API to constantly back up data to a remote MySQL server (we chose MySQL for customized analytics purpose, since it's way faster and easier to handle data with MySQL for us)
ImportBack:
a) - recreate JSON objects from MySQL backup and use the REST API to send back to Parse.
Say we use the batch operation which permits 50 simultaneous objects to be created with 1 query, and assume it takes 1 sec for every query, 1million data sets will take 5.5hours to transfer to Parse.
b) - recreate one JSON file from MySQL backup and use the Dashboard to import data manually.
We just tried with 700,000 records file with this method: it took about 2 hours for the loading indicator to stop and show the number of rows in the left pane, but now it never opens in the right pane (it says "operation time out") and it's over 6hours since the upload started.
So we can't rely on 1.b, and 1.a seems to take too long to recover from a disaster (if we have 10 million records, it'll be like 55 hours = 2.2 days).
Now we are thinking about the following:
2) Constantly replicate data to another app
Create the following in Parse:
- Production App: A
- Replication App: B
So while A is in production, every single query will be duplicated to B (using background job constantly).
The downside is of course that it'll eat up the burst limit of A as it'll simply double the amount of query. So not ideal thinking of scaling up.
What we want is something like AWS RDS which gives an option to automatically backup daily.
I wonder how this could be difficult for Parse since it's based on AWS infra.
Please let me know if you have any idea on this, will be happy to share know-hows.
P.S.:
We’ve noticed an important flaw in the above 2) idea.
If we replicate using REST API, all the objectIds of all Classes will be changed, so every 1to1 or 1toMany relations will be broken.
So we think about putting a uuid for every object class.
Is there any problem about this method?
One thing we want to achieve is
query.include(“ObjectName”)
( or in Obj-C “includeKey”),
but I suppose that won’t be possible if we don’t base our app logic on objectId.
Looking for a work around for this issue;
but will uuid-based management be functional under Parse’s Datastore logic?
Parse has never lost production data. While we don't currently offer automated backups, you can request one any time you like, and we're working on making all of this even nicer. Additionally, it's easier in most cases to import the JSON export file through the data browser rather than using the REST batch.
I can confirm that today, Parse did lost my data. Or at least it appeared to be so.
After several errors where detected on multiple apps (agreed by Parse Status twitter account), we could not retrieve data for an app, without any error.
It was because an entire column of one of our class (type pointer) disappeared and data was not present anymore in the dashboard.
We are using this pointer column to filter / retrieve data, so the returned queries and collections were empty.
So we decided to recreate the column manually. By chance, recreating the column, with the same name and type, solved the issue and the data was still there... I can't explain it but I really thought, and the app reacted as if, data were lost.
So an automated backup and restore option is mandatory, it is not an option.
On December 2015 parse.com released a new dashboard with an improved export feature.
Just select your app, click on "App Settings" -> "General" -> "Export app data". Parse generates a json-file for every class in your app and sends an email to you, if the export-progress is done.
UPDATE:
Sad but true, parse.com is winding down: http://blog.parse.com/announcements/moving-on/
I had the same issue of backing up parse server data. As parse server is using mongodb that is why backing up data is not an issue I have just done a simple thing. downloaded the mongodb backup from the server. And then restored it using
mongorestore /path-to-mongodump (extracted files)
As parse has been turned to open source.Therefore we can adopt this technique.
For accidental deletes, writing a cloud function 'beforedelete' to backup the current row to another class would work.
For regular backups, manual export of changed records (use filter) will be useful. For recovery this requires you to write scripts / use import option (not so sure) in data browser. You could also write a cloud function replicate data on your backup server (haven't tried this yet).
However there are some limitations to cloud code that you should consider before venturing into it:
https://parse.com/docs/cloud_code_guide#functions-resource

Exporting 8million records from Oracle to MongoDB

Now I have an Oracle Database with 8 millions records and I need to move them to MongoDB.
I know how to import some data to MongoDB with JSON file using import command but I want to know that is there a better way to achieve this regarding these issues.
Due to the limit of execution time, how to handle it?
The database is going up every seconds so what's the plan to make sure that every records have been moved.
Due to the limit of execution time, how to handle it?
Don't do it with the JSON export / import. Instead you should write a script that reads the data, transforms into the correct format for MongoDB and then inserts it there.
There are a few reasons for this:
Your tables / collections will not be organized the same way. (If they are, then why are you using MongoDB?)
This will allow you to monitor progress of the operation. In particular you can output to log files every 1000th entry or so to get some progress and be able to recover from failures.
This will test your new MongoDB code.
The database is going up every seconds so what's the plan to make sure that every records have been moved.
There are two strategies here.
Track the entries that are updated and re-run your script on newly updated records until you are caught up.
Write to both databases while you run the script to copy data. Then once you've done the script and everything it up to date, you can cut over to just using MongoDB.
I personally suggest #2, this is the easiest method to manage and test while maintaining up-time. It's still going to be a lot of work, but this will allow the transition to happen.

Rhomobile inserting into local database using CSV or XML from external web server

I am currently developing a Rhomobile application. I have a backend database which holds customer information. I have got from the webserver a csv string (or XML - I am able to parse the XML using REXML) which contains all the customers. Each time I sync the device I am going to reset the customer table on the device and re-insert all data from the backend database. I am not using RhoSync and the device will be using property bag.
Is it possible to use the CSV or XML data to insert into the customers table? If so, how would I go about it?
At the moment the only option I can see that would work would be to manually loop through the CSV/XML and insert into the database manually; this isn't very elegant.
Any help will be much appreciated, sorry if this is a dumb question; still relatively new to this framework.
I have come to the conclusion that the only way is to loop through the csv/xml, which with the help of a database transaction this doesn't take long.
Using fixed schema also increases the performance a lot as property bag has to do column inserts (so if you have lots of columns - there is lots of inserts per record).
Also in Rhomobile garbage collection is turned off, so if you are trying to process large data sets your device will quickly run out of memory:
GC.enable
The above solves this issue

Can DB2 tell a web-app when a table data is updated?

I have a table of non trivial size on a DB2 database that is updated X times a day per user input in another application. This table is also read by my web-app to display some info to another set of users. I have a large number of users on my web app and they need to do lots of fuzzy string lookups with data that is up-to-the-minute accurate. So, I need a server side cache to do my fuzzy logic on and to keep the DB from getting hammered.
So, what's the best option? I would hate to pull the entire table every minute when the data changes so rarely. I could setup a trigger to update a timestamp of a smaller table and poll that to see if I need refresh my cache, but that seems hacky to.
Ideally I would like to have DB2 tell my web-app when something changes, or at least provide a very lightweight mechanism to detect data level changes.
I think if your web application is running in WebSphere, setting up MQ would be a pretty good solution.
You could write triggers that use the MQ Series routines to add things to a queue, and your web app could subscribe to the queue and listen for updates.
If your web app is not in WebSphere then you could still look at this option but it might be more difficult.
A simple solution could be to have a timestamp (somewhere) for the latest change on to table.
The timestamp could be located in a small table/view that is updated by either the application that updates the big table or by an update-trigger on the big table.
The update-triggers only task would be to update the "help"-timestamp with currenttimestamp.
Then the webapp only checks this timestamp.
If the timestamp is newer then what the webapp has then the data is reread from the big table.
A "low-tech"-solution thats fairly non intrusive to the exsisting system.
Hope this solution fits your setup.
Regards
Sigersted
Having the database push a message to your webapp is certainly doable via a variety of mechanisms (like mqseries, etc). Similar and easier is to write a java stored procedure that gets kicked off by the trigger and hands the data to your cache-maintenance interface. But both of these solutions involve a lot of versioning dependencies, etc that could be a real PITA.
Another option might be to reconsider the entire approach. Is it possible that instead of maintaining a cache on your app's side you could perform your text searching on the original table?
But my suggestion is to do as you (and the other poster) mention - and just update a timestamp in a single-row table purposed to do this, then have your web-app poll that table. Similarly you could just push the changed rows to this small table - and have your cache-maintenance program pull from this table. Either of these is very simple to implement - and should be very reliable.

Resources