Pentaho CassandraInput slow response time

Pentaho CassandraInput slow response time - time

I have a problem that affects the response time of the queries from CassandraInput. I use Datastax Enterprise 3.2.4 - Cassandra 1.2.13.2.
If I try to run the same query (any) directly from the Cassandra client, the answer is considerably faster than the same query executed on the node CassandraInput from Pentaho Data Integration.
What can cause this?
And above all, there is a way to improve the response time from the node CassandraInput in Pentaho?
I hope that some of you might have some suggestions.
Thank you
Federica

Generally it has not to happen.
Try below thing and check whether performance in increasing or not.
open spoon.bat or spoon.sh file according to the OS you are using and change below thing.
Below thing has to be change according to the size of the RAM of your Machine.
PENTAHO_DI_JAVA_OPTIONS="-Xmx2g"

Related

CouchDB Performance: 1.6.1 vs 2.1.1

We are looking at upgrading our CouchDB on our RHEL servers from 1.6.1 to 2.1.1. Before we do that, though, we wanted to run a performance test. So we created a JMeter test that goes directly against the database. It does not use any random values, so that the test would be exactly the same, and we could compare the two results. This is just a standalone server, we are not using clustering. I ran the tests the exacts same way for both. I ran the tests for 1.6.1, and then installed 2.1.1 on the same machine. And I created the database new for each test run. [I also updated Erlang to R19.3.]
The results were very shocking:
Average response times:
1.6.1: 271.15 ms
2.1.1: 494.32 ms
POST and PUTs were really bad ...
POST:
1.6.1: 38.25 ms
2.1.1: 250.18 ms
PUT:
1.6.1: 37.33 ms
2.1.1: 358.76
We are just using the default values for all the config options, except that we changed 1.6.1 to have delayed_commits = false (that is now the default in 2.1.1). I'm wondering if there's some default that changed that would make 2.1.1 so bad.
When I ran the CouchDB setup from the Fauxton UI, it added the following to my local.ini:
[cluster]
n = 1
Is that causing CouchDB to try to use clustering, or is that the same as if there were no entries here at all? One other thing, I deleted the _global_changes database, since it seemed as if it would add extra processing that we didn't need.

Is that causing CouchDB to try to use clustering, or is that the same as if there were no entries here at all?
It's not obvious from your description. If you setup CouchDB 2.0 as clustered then that's how it will work. This is something you should know depending on the setup instructions you followed: http://docs.couchdb.org/en/2.1.1/install/setup.html
You can tell by locating the files on disk and seeing if they are in a shards directory or not.
I'm pretty sure you want at least two, so setting n = 1 doesn't seem like something you should be doing.
If you're trying to run in single node follow the instructions I linked above to do that.
One other thing, I deleted the _global_changes database, since it seemed as if it would add extra processing that we didn't need.
You probably don't want to delete random parts of your database unless there are instructions saying this is OK?

HUE beeswax can not load databases when lots people do queries

enter image description hereOur group members used beeswax to do the hive queries.
When few people do the query,everything is okay.
But when lots people do the query at the same time or in a short time,something strange happened. The database list keep loading and the whole browser will be no response.At this time, the cores of hive servers and metastore servers were always more than one core.
So I had to wait for a long time or restart the whole hive server and everything going okay again.
I had checked the log of hue and hive server,but found nothing useful.(so sad of it)
I had worked for this question for a every log time but could not get the solution of this problem.
Because of the limit of network,I can not paste the screenshots and any logs here.
Are there anyone can help me.
I search this picture from internet.I made a mark on it.When the problem happened,there is a loading icon on the it and it was impossible to change database.

Neo4j Cypher queries really slow after upgrade to 2.1.3

This morning, with some struggles (see: Upgrading a neo4j database from 2.0.1 to 2.1.3 fails), i upgraded my database from version 2.0.1 to 2.1.3. My main goal with the upgrade was to gain performance on certain queries (see: Cypher SORT performance).
Everything seems to be working, except for the fact that all Cypher queries - without exception - have become much, much, much slower. Queries that used to take 75ms now take nearly 2000ms.
As i was running on an A1 (1xCPU ~2GB RAM) VM in Azure, i thought that giving neo4j some more ram and an extra core would help, but after upgrading to an A2 VM i get more or less the same results.
I'm no wondering, did i loose my indexes by doing a backup and upgrading/using that db? I have perhaps 50K nodes in my db, so it's not that spectacular, right?
I'm now still running on an A2 VM (2xCPU, ~4GB RAM), but had to downgrade to 2.0.1 again.
UPDATE: #1 2014-08-12
After reading Michael's first comment, on how to inspect my indexes using the shell, i did the following:
With my 2.0.1 database service running (and performing well), i executed Neo4jShell.bat and then executed the Schema command. This yielded the following response:
I uninstalled the 2.0.1 service using the Neo4jInstall.bat remove command.
I installed the 2.1.3 service using the Neo4jInstall install command.
With my 2.1.3 database service running, I again executed the Neo4jShell.bat and then executed the schema command. This yielded the following response:
I think it is safe to conclude that either the migration process (in 2.1.3) or the backup process (in 2.0.1) has removed the indexes from my database. This does explain why my backed up database is much smaller (~110MB) than the online database (~380MB). After migration to 2.1.3, my database became even smaller (~90MB).
Question is now, is it just a matter of recreating my indexes and be done with it?
UPDATE: #2 2014-08-12
I guess i have answered my own question. After recreating the constraints and indexes, my queries perform like they used to (some even faster, as expected).

Eventually, it turned out that in the process of backing up my database (in version 2.0.1) or during the migration process at startup (in version 2.1.3) i lost my indexes and constraints. Obvious solution is to manually recreate them (http://docs.neo4j.org/chunked/stable/cypher-schema.html) and be on your way.

OBIEE:how to reload rpd file quickly?

I'm new to use oracle BIEE.My development enviromnent now is installed,and the project is a little big.Multi user development is using for developing now.The problems happens when one developer publish the rpd to network and want to test the data,the server reloading the rpd file takes too much time and I can hardly wait!When multi users want to test rpd file,e,can't stand it... is there any other way to solve the problem?or how to make the biee sever reload the rpd file quickly?

It's hard to say specifically without knowing a bit more about your setup, but here are a few general advice pointers:
When stopping the service OBI will wait for any running queries to complete before stopping the service, so making sure there's nothing running before you try to do this.
Make sure you're only restarting the BI Server component, you don't need to wait for the other services to restart if you're just changing the RPD (if you're on 11g then deploying through EM should mean this happens anyway so you don't need to worry).
If you're using 11g, you could try incremental updates by creating patches.
Check whether the hardware you're running on is adequate, most importantly that you've enough RAM so it's not having to page out to disk when it loads the RPD.
Remove anything unused from the RPD to make it smaller.

MySQL database backup: performance issues

Folks,
I'm trying to set up a regular backup of a rather large production database (half a gig) that has both InnoDB and MyISAM tables. I've been using mysqldump so far, but I find that it's taking increasingly longer periods of time, and the server is completely unresponsive while mysqldump is running.
I wanted to ask for your advice: how do I either
Make mysqldump backup non-blocking - assign low priority to the process or something like that, OR
Find another backup mechanism that will be better/faster/non-blocking.
I know of the existence of MySQL Enterprise Backup product (http://www.mysql.com/products/enterprise/backup.html) - it's expensive and this is not an option for this project.
I've read about setting up a second server as a "replication slave", but that's not an option for me either (this requires hardware, which costs $$).
Thank you!
UPDATE: more info on my environment: Ubuntu, latest LAMPP, Amazon EC2.

If replication to a slave isn't an option, you could leverage the filesystem, depending on the OS you're using,
Consistent backup with Linux Logical Volume Manager (LVM) snapshots.
MySQL backups using ZFS snapshots.
The joys of backing up MySQL with ZFS...
I've used ZFS snapshots on a quite large MySQL database (30GB+) as a backup method and it completes very quickly (never more than a few minutes) and doesn't block. You can then mount the snapshot somewhere else and back it up to tape, etc.

Edit: (previous answer was suggestion a slave db to back up from, then I noticed Alex ruled that out in his question.)
There's no reason your replication slave can't run on the same hardware, assuming the hardware can keep up. Grab a source tarball, ./configure --prefix=/dbslave; make; make install; and you'll have a second mysql server living completely under /dbslave.
EDIT2: Replication has a bunch of other benefits, as well. For instance, with replication running, you'll may be able to recover the binlog and replay it on top your last backup to recover the extra data after certain kinds of catastrophes.
EDIT3: You mention you're running on EC2. Another, somewhat contrived idea to keep costs down is to try setting up another instance with an EBS volume. Then use the AWS api to spin this instance up long enough for it to catch up with writes from the binary log, dump/compress/send the snapshot, and then spin it down. Not free, and labor-intensive to set up, but considerably cheaper than running the instance 24x7.

Try mk-parallel-dump utility from maatkit (http://www.maatkit.org/)
regards,

Something you might consider is using binary logs here though a method called 'log shipping'. Just before every backup, issue out a command to flush the binary logs and then you can copy all except the current binary log out via your regular file system operations.
The advantage with this method is your not locking up the database at all, since when it opens up the next binary log in sequence, it releases all the file locks on the prior logs so processing shouldn't be affected then. Tar'em, zip'em in place, do as you please, then copy it out as one file to your backup system.
An another advantage with using binary logs is you can restore up to X point in time if the logs are available. I.e. You have last year's full backup, and every log from then to now. But you want to see what the database was on Jan 1st, 2011. You can issue a restore 'until 2011-01-01' and when it stops, your at Jan 1st, 2011 as far as the database is concerned.
I've had to use this once to reverse the damage a hacker caused.
It is definately worth checking out.
Please note... binary logs are USUALLY used for replication. Nothing says you HAVE to.

Adding to what Rich Adams and timdev have already suggested, write a cron job which gets triggered on low usage period to perform the slaving task as suggested to avoid high CPU utilization.
Check mysql-parallel-dump also.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio