Lazy Logical Replication with MonetDB - monetdb

I'm trying to implement MonetDB in three machines, one master and two replicas in lazy logical replication.
For now I'm trying to implement in only machine with the following commands I took from this old issue in only one machine for now.
Everything goes according to plan until the first problem I have: When trying to create tables or inserting stuff I get the following errors I was not able to find on google:
Error in optimizer wlc: TypeException:user.main[17]:'wlc.predicate' undefined in: X_0:any := wlc.predicate("alpha":str, "id":str);
Error in optimizer wlc: TypeException:user.main[50]:'wlc.predicate' undefined in: X_0:any := wlc.predicate("beta":str, "id":str);
Error in optimizer wlc: TypeException:user.main[77]:'wlc.depend' undefined in: X_0:any := wlc.depend("beta":str, X_1:lng);
I got around this by setting optpipe to minimal_pipe but I wanted to know why this is happening so I don't have to do this.
The second problem I have when trying CALL wlr.replicate:
Perhaps a missing wlr.master() call.
How do I correctly set-up replication?
Thanks in advance.

The wlc/wlr features are experimental and de facto deprecated in current releases of MonetDB and completely removed starting from the next major release. Replication in MonetDB is a topic currently under revision. You might be better off formulating a feature request on MonetDB's githup page.
You might also consider looking into the concepts of replicate and remote tables. But those are definitely not solutions by themselves and if used as such, implement replication on the SQL layer instead of the infrastructural layer.
But on the short run, I do not expect that the open source community can help you out here much. Consider commercial support otherwise if feasible.

Related

How to build statistics on a procedures execution in oracle 12.2?

I have a ~2,300 line package, which is split into many procedures & function. It is running slower than I would like. Many years ago on a previous release of Oracle (9i or 11g) I had a similar problem and I was able to build a hierarchical structure which contained everything that was executed in the procedure/package an how much time was spent on each item.
I cannot seem to find a tutorial/blog that shows how to accomplish this. It is probably done with the DBMS_STATS package, but I find Oracle's documentation unsuitable for task oriented problem solving. It may be great if you want to learn everything there is to know about a subject, but generally all I need to know is how to solve the issue I am currently working on.
At any rate can someone point me to how I can get the runtime statistics of a run of an Oracle Procedure?
There are two options:
dbms_profiler - records times on each statement executed. docs
dbms_hpof - similar, but collects statistics on a hierarchy of pl/sql calls docs
Either method will require some setup using SYS (dba) access. Setup Instuctions for dbms_profiler
Did you tried dbms_utility.get_time method as described in THIS post.
Link to the original question which is almost the same as your question.
https://www.quora.com/How-can-I-log-the-execution-time-of-a-stored-procedures-in-a-table-in-Oracle-database

Using tbl and src_monetdblite to access data

Sorry if this question has been asked elsewhere, I can't find it. I'm working through some basic examples in MonetDBLite.
> dbGetQuery(dbcon, "SELECT MAX(mpg) FROM mtcars WHERE cyl = 8")
L3
1 19.2
works, but
> ms <- MonetDBLite::src_monetdblite("./DB")
> t <- tbl(ms, "mtcars")
Error in UseMethod("tbl") :
no applicable method for 'tbl' applied to an object of class
"c('src_monetdb', 'src_sql', 'src')"
It seems that it's trying to assign the db to t not the table.
Any suggestions would be greatly appreciated.
I've been perusing resources and found a useR2016 presentation and noticed a difference here:
> ms
src: MonetDBEmbeddedConnection
tbls: mtcars
Curious...
I'm a huge fan of using MonetDBLite together with dplyr. My addition to Hannes Mühleisen's (thanks for the package!) answer would be that it appears that the order you load the packages can matter. Loading MonetDBLite after dplyr and dbplyr seems to be the key for me. Loading MonetDBLite first causes errors similar to the one nzgwynn noted.
Sometimes I could connect to the database with no problems. Other times I would get error messages like:
Error in UseMethod("db_query_fields") : no applicable method for 'db_query_fields' applied to an object of class "MonetDBEmbeddedConnection"
Like nzgwynn, I was puzzled about why it would work sometimes but not others. Restarting and reinstalling wouldn't necessarily fix it for me.
This clue, from an issue filed about sparklyr, lead me to explore the package loading order:
https://github.com/rstudio/sparklyr/issues/38
Like noted there with sparklyr, and I've noticed with other R database packages, MonetDBLite will load and attach automatically if the Global Environment already contains a connection object. My problem was that I had an src_monetdb object in my workspace, which was causing MonetDBLite to load upon starting RStudio. So I while I thought I was loading it after dplyr and dbplyr, it was really loading first. If I clear the workspace and then restart, I can load the packages in the preferred order. So far, this method has worked.
I've seen starting with a clean workspace advised as good practice generally, e.g.: https://twitter.com/hadleywickham/status/561146907519500288. Starting with a fresh workspace loses you no time either given MonetDBLite's speedy query ability.
Lastly, I would put a enthusiastic pitch in for using MonetDBLite. I saw it mentioned on RStudio's database page and was immediately impressed on how easy it was to setup and how fast it is. It's the best way I've found for working with a ~2 GB dataset in R. When exploring the data interactively, the dplyr queries run so quickly that it feels like I'm working with the data in memory. And if all I want to do is load the whole dataset into memory, MonetDBLite is as fast or faster than other methods I've tried like read.fst() from the fst package.
I closed R and opened it again and the same coding worked fine...
You need to call library("dplyr") before using tbl and friends. Also make sure you have dbplyr installed.
Update: Also, please make sure there is no connection object (src) in a stored workspace loaded at startup. Loading connections from .Rdata files does not work! Instead, create the connection/src from scratch every time you run a script.

executePackage seems to take a long time to launch subpackage

I am a relative beginner at SSIS so I may be doing something silly.
I have a process that involves looping over a heterogenous queue and processing the objects 1 at a time. The process is currently being done in 'set logic' and its dropping stuff. I was asked to rework it in a looping manner, so that decision has been made for me.
I have chosen to implement queue logic in 1 package and the actual processing in another package.
This is all going relatively well considering...
I now have the process up and running, but its slow. 9 seconds per item. Clearly I cant present this solution. :-)
One thing i notice, 1.5 - 2 seconds of each loop are on the ExecutePackage Task in the queue loop.
I cant figure out how to get a hard number, I am using the flashing green box method of performance tuning. The other steps seem to be very fast. Adding indexes, changing sql to sps, all the usual tricks have helped.
Is the UI realiable at all with regards to boxes turning white/yellow/green? Some tasks report times in the progress tab, some dont seem to. So I am counting yellow time.
Should calling a subpackage be that expensive? 1 change i made was I change 'RunInASeparateProcess' to FALSE. I did that because the subpackage produces the following message otherwise:
Error: 0xC0012024 at Script Task: The task "Script Task" cannot run on this edition of Integration Services. It requires a higher level edition.
Task failed: Script Task
The reading i have done seems to advocate multiple packages. Anyone have any counter patterns? Should i stay the course? I started changing to 1 package. Copy/paste doesnt seem to work well w/ SequenceContainers. I would also need to recreate all the variables in the parent package. Doable, but im not sure that is the answer.
Does anyone know of any tuning resources/websites/books they would be willing to share.
Update - I have been tearing things down in an effort to figure out what the problem is. I was thinking it was the package configurations passing variable values. I dont think that is it. I can pass variables to another package w/ nothing in it and it is fast.
I can make the trivial subpackage slow by adding the two connection managers to it.
I suddenly realize I may be making and breaking a connection to both an Oracle Server and a SQL server in both the main package and then the sub package.
Am I correct in this observation?
Is there any way I can reuse the connection between the two packages?
When i google it, most of what i see is suggestions for passing the connection string.
UPDATE - I combined the two packages into one. This performance is not about 1.25 seconds per item, down from about 9. the only thing i can point to that changed is i am now reusing a single connection instead of making multiple connections.
Thanks, I appreciate any help you are kind enough to offer.
Greg
Once you enable logging, I'd suggest running the package from a command window using dtexec. While that doesn't perfectly duplicate the server environment, it does have the advantages of (a) eliminating BIDS as a potential performance issue and (b) being something you can do without jumping through change control hoops.

Rails, how to migrate large amount of data?

I have a Rails 3 app running an older version of Spree (an open source shopping cart). I am in the process of updating it to the latest version. This requires me to run numerous migrations on the database to be compatible with the latest version. However the apps current database is roughly around 300mb and to run the migrations on my local machine (mac os x 10.7, 4gb ram, 2.4GHz Core 2 Duo) takes over three days to complete.
I was able to decrease this time to only 16 hours using an Amazon EC2 instance (High-I/O On-Demand Instances, Quadruple Extra Large). But 16 hours is still too long as I will have to take down the site to perform this update.
Does anyone have any other suggestions to lower this time? Or any tips to increase the performance of the migrations?
FYI: using Ruby 1.9.2, and Ubuntu on the Amazon instance.
Dropping indices beforehand and adding them again afterwards is a good idea.
Also replacing .where(...).each with .find_each and perhaps adding transactions could help, as already mentioned.
Replace .save! with .save(:validate => false), because during the migrations you are not getting random inputs from users, you should be making known-good updates, and validations account for much of the execution time. Or using .update_attribute would also skip validations where you're only updating one field.
Where possible, use fewer AR objects in a loop. Instantiating and later garbage collecting them takes CPU time and uses more memory.
Maybe you have already considered this:
Tell the db not to bother making sure everything is on disk (no WAL, no fsync, etc), you now have an in memory db which should make a very big difference. (Since you have taken the db offline you can just restore from a backup in the unlikely event of power loss or similar). Turn fsync/WAL on when you are done.
It is likely that you can do some of the migrations before you take the db offline. Test this in staging env of course. That big user migration might very well be possible to do live. Make sure that you don't do it in a transaction, you might need to modify them a bit.
I'm not familiar with your exact situation but I'm sure there are even more things you can do unless this isn't enough.
This answer is more about approach than a specific technical solution. If your main criteria is minimum downtime (and data-integrity of course) then the best strategy for this is to not use rails!
Instead you can do all the heavy work up-front and leave just the critical "real time" data migration (i'm using "migration" in the non-rails sense here) as a step during the switchover.
So you have your current app with its db schema and the production data. You also (presumably) have a development version of the app based on the upgraded spree gems with the new db schema but no data. All you have to do is figure out a way of transforming the data between the two. This can be done in a number of ways, for example using pure SQL and temporary tables where necessary or using SQL and ruby to generate insert statements. These steps can be split up so that data that is fairly "static" (reference tables, products, etc) can be loaded into the db ahead of time and the data that changes more frequently (users, sesssions, orders, etc) can be done during the migration step.
You should be able to script this export-transform-import procedure so that it is repeatable and have tests/checks after it's complete to ensure data integrity. If you can arrange access to the new production database during the switchover then it should be easy to run the script against it. If you're restricted to a release process (eg webistrano) then you might have to shoe-horn it into a rails migration but you can run raw SQL using execute.
Take a look at this gem.
https://github.com/zdennis/activerecord-import/
data = []
data << Order.new(:order_info => 'test order')
Order.import data
Unfortunaltly the downrated solution is the only one. What is really slow in rails are the activerecord models. The are not suited for tasks like this.
If you want a fast migration you will have to do it in sql.
There is an other approach. But you will always have to rewrite most of the migrations...

ORA-03113 while executing a sql query

I have a 400 line sql query which is throwing exception withing 30 seconds
ORA-03113: end-of-file on communication channel
Below are things to note:
I have set the timeout as 10 mins
There is one last condition when removed resolves this error.
This error came only recently when I analyzed indexes.
The troubling condition is like this:
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
So my assumption is that the query is getting terminated from the server side apparently because its identified as a resource hog.
Is my assumption appropriate ? How should I go about to fix this problem ?
EDIT: I tried to get the explain plan of faulty query but the explain plan query also gives me an ORA-03113 error. I understand that my query is not very performant but why should that be a reason for ORA-03113 error. I am trying to run the query from toad and there are no alert log or trace generated, my db version is
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
One possible cause of this error is a thread crash on the server side. Check whether the Oracle server has generated any trace files, or logged any errors in its alert log.
You say that removing one condition from the query causes the problem to go away. How long does the query take to run without that condition? Have you checked the execution plans for both versions of the query to see if adding that condition is causing some inefficient plan to be chosen?
I've had similar connection dropping issues with certain variations on a query. In my case connections dropped when using rownum under certain circumstances. It turned out to be a bug that had a workaround by adjusting a certain Oracle Database configuration setting. We went with a workaround until a patch could be installed. I wish I could remember more specifics or find an old email on this but I don't know that the specifics would help address your issue. I'm posting this just to say that you've probably encountered a bug and if you have access to Oracle's support site (support.oracle.com) you'll likely find that others have reported it.
Edit:
I had a quick look at Oracle support. There are more than 1000 bugs related to ORA-03113 but I found one that may apply:
Bug 5015257: QUERY FAILS WITH ORA-3113 AND COREDUMP WHEN QUERY_REWRITE_ENABLED='TRUE'
To summarize:
Identified in 9.2.0.6.0 and fixed in 10.2.0.1
Running a particular query
(not identified) causes ORA-03113
Running explain on query does the
same
There is a core file in
$ORACLE_HOME/dbs
Workaround is to set
QUERY_REWRITE_ENABLED to false: alter
system set query_rewrite_enabled =
FALSE;
Another possibility:
Bug 3659827: ORA-3113 FROM LONG RUNNING QUERY
9.2.0.5.0 through 10.2.0.0
Problem: Customer has long running query that consistently produces ORA-3113 errros.
On customers system they receive core.log files but do not receive any errors
in the alert.log. On test system I used I receivded ORA-7445 errors.
Workaround: set "_complex_view_merging"=false at session level or instance level.
You can safely remove the "UPPER" on both parts if you are using the like with numbers (that are not case sensitive), this can reduce the query time to check the like sentence
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
Is equals to:
AND someMultiJoin.someColumn LIKE '%90936%'
Numbers are not affected by UPPER (and % is independent of character casing).
From the information so far it looks like an back-end crash, as Dave Costa suggested some time ago. Were you able to check the server logs?
Can you get the plan with set autotrace traceonly explain? Does it happen from SQL*Plus locally, or only with a remote connection? Certainly sounds like an ORA-600 on the back-end could be the culprit, particularly if it's at parse time. The successful run taking longer than the failing one seems to rule out a network problem. I suspect it's failing quite quickly but the client is taking up to 30 seconds to give up on the dead connection, or the server is taking that long to write trace and core files.
Which probably leaves you the option of patching (if you can find a relevant fix for the specific ORA-600 on Metalink) or upgrading the DB; or rewriting the query to avoid it. You may get some ideas for how to do that from Metalink if it's a known bug. If you're lucky it might be as simple as a hint, if the extra condition is having an unexpected impact on the plan. Is someMultiJoin.someColumn part of an index that's used in the successful version? It's possible the UPPER is confusing it and you could persuade it back on to the successful plan by hinting it to use the index anyway, but that's obviously rather speculative.
It means you have been disconnected. This not likely to be due to being a resource hog.
I have seen where the connection to the DB is running over a NAT and because there is no traffic it closes the tunnel and thus drops the connection. Generally if you use connection pooling you won't get this.
As #Daniel said, the network connection to the server is being broken. You might take a look at End-of-file on communication channel to see if it offers any useful suggestions.
Share and enjoy.
This is often a bug in the Cost Based Optimizer with complex queries.
What you can try to do is to change the execution plan. E.g. use WITH to pull some subquerys out. Or use the SELECT /*+ RULE */ hint to prevent Oracle from using the CBO. Also dropping the statistics helps, because Oracle then uses another execution plan.
If you can update the database, make a test installation of 9.2.0.8 and see if the error is gone there.
Sometimes it helps to make a dump of the schema, drop everything in it and import the dump again.
I was having the same error, in my case what was causing it was the length of the query.
By reducing said length, I had no more problems.

Resources