We have done a Proof of Concept (POC) with HyperLedger Composer and Fabric v1.
We have used the query functionality in Composer in our code.
We that when there are a lot of records in the chain, query performance slows down linearly.
I understand it is still an experimental feature but the degree of performance slow down is too significant. It limits the viability of putting some use cases in production.
Has anyone come across this? Any suggestion(s)?
Have you experimented with adding indexes to CouchDB?
Related
as the question title, if crud data directly through elasticsearch without relation database(mysql/postgresql), is there any issue here?
i know elasticsearch good at searhing, but if update data frequencies, maybe got bad performance?
if every update-request setRefreshPolicy(IMMEDIATE), maybe got bad performance also?
ElasticSearch will likely outperform a relational db on similar hardware, though workloads can vary. However, ElasticSearch can do this because it has made certain design decisions that are different than the design decisions of a relational database.
ElasticSearch is eventually consistent. This means that queries immediately after your insert might still get old results. There are things that can be done to mitigate this but nothing will eliminate the possibility.
Prior to version 5.x ElasticSearch was pretty good at losing data when bad things happen the 5.x release was all about making Elastic more robust in those regards, and data loss is no longer the problem it was previously, though potential for data loss still exists, particularly if you make configuration mistakes.
If you frequently modify documents in ElasticSearch you will generate large numbers of deleted documents as every update generates a new document and marks an old document as deleted. Over time those old documents fall off, or you can force the system to clean them out, but if you are doing rapid modifications this could present a problem for you.
The application I am working for is using Elasticsearch as the backend. There are 9 microservices connecting to this backend. Writes are fewer when compared to reads. Our write APIs have a performance requirements of max. 3 seconds.
We have configured 1 second as the refresh interval and always using WAIT_FOR instead of IMMEDIATE and fewer times using NONE in the case of asynchronous updates.
Since replacing Mongodb with Pouchdb in my ionic app, the app feels a little sluggish, and I would like to know if there is a way to speed it up. The database we are talking about currently contains less than a 100 documents and is slow even when the usage is purely local. We are using secondary indexes. Is this the cause of this performance drop? Would we be better off using allDocs() and then searching manually trough the database? I read it would be faster, but the posts were over a year old and things may have changed since then. I also tried using the websql adapter, but it didn't really affect the speed. Are they other adapters or things I could try?
On such a small database, a secondary index would not be faster than allDocs in my experience. But I would not think the performance difference would be noticeable (I have used both on a small local database). You might try "compacting" the databases regularly if you have not already as this can make the database size smaller and more efficient. Like you, I have tried different adapters (IndexDb and websql) but could not see much difference in speed.
I just started learning about Redis. I installed it on my laptop and wrote a simple java client. I have an Elasticsearch instance that handles queries that come in from a web based application. It's pretty fast, but I'm wondering if there is a practical case where I could 'front' the elasticsearch instance with Redis to speed up response time for the clients. In my very limited redis knowledge, I'm wondering if storing the responses from ES queries in Redis would be practical, or would provide any value? More generally, can someone give me an example of how ES and Redis are used together. Thanks
One use case for having Redis in the picture is to use it as temporary buffer when loading documents into Elasticsearch via Logstash.
Since Redis is basically a cache, its main purpose is to make data available fast that would not be promptly available otherwise, because the back-end service you're querying is not fast enough. Since you are saying that your Elasticsearch instance is "pretty fast" (whatever that means), why would you want to cache the response?
Also, when you put a cache into the picture, you have other new concerns that arise, most importantly, how do you expire the cache, when and at which frequency? So if your data in Elasticsearch is pretty stable, you might benefit from a cache. However, if your data in Elasticsearch is changing frequently, you'll often be faced with many issues of stale data in your Redis cache, and that's a problem you don't want to have.
In my opinion, it's much better to spend time improving your ES queries and mappings to deliver blazing fast data, than to spend your time tuning a cache that might be useful 1% of the time.
I want to disable SPARQL query caching of Fuseki server. Can I disable it? And how to do ? I'm considering the following ways:
Using command line argument - It looks unprepared
Using settings file (*.ttl) - I couldn't find notation to disable caching
Edit server code - Basically I won't do it :(
Please tell how can I disable caching.
What caching are you talking about?
As discussed in JENA-388 the current default behaviour is actually to add headers that disable caching so there is not any HTTP level caching.
If you are using the TDB backend then there are caches used to improve query performance and those are not configurable AFAIK. Also even if you could do it turning them off would likely drastically worsen performance so would not be a good idea.
Edit
The --mem option uses a pure in-memory dataset so there is no caching. Be aware that this will actually be much slower than using TDB as you scale up your data and is only faster at small dataset sizes.
If you are looking to benchmark then there are much better ways to eliminate the effect of caches than turning them off since disabling caches (even when you can) won't give you realistic performance numbers. There are several real world ways to eliminate cache effects:
Run warmups - either some fixed number or until you see the system reach a steady state.
Eliminate outliers in your statistics, discard the best and worst N results and compute your statistics over the remainder
Use query parameterisation, use a query template and substitute different constants into it each time thus ensuring you aren't issuing an identical query each time. Query plan caching may still come into effect but as Jena doesn't do this anyway it won't matter for your tests.
You may want to take a look at my 2012 SemTech talk Practical SPARQL Benchmarking and the associated SPARQL Query Benchmarker tool. We've been working on a heavily revised version of the tool lately which has a lot of new features such as support for query parameterisation.
I am working on developing an application which caters to about 100,000 searches everyday. We can safely assume that there are about the same number of updates / insertions / deletions in the database daily. The current application uses native SQL and we intend to migrate it to Hibernate and use Hibernate Search.
As there are continuous changes in the database records, we need to enable automatic indexing. The management has concerns about the performance impact automatic indexing can cause.
It is not possible to have a scheduled batch indexing as the changes in the records have to be available for search as soon as they are changed.
I have searched to look for some kind of performance statistics but have found none.
Can anybody who has already worked on Hibernate Search and faced a similar situation share their thoughts?
Thanks for the help.
Regards,
Shardul.
It might work fine, but it's hard to guess without a baseline. I have experience with even more searches / day and after some fine tuning it works well, but it's impossible to know if that will apply for your scenario without trying it out.
If normal tuning fails and NRT doesn't proof fast enough, you can always shard the indexes, use a multi-master configuration and plug in a distributed second level cache such as Infinispan: all combined the architecture can achieve linear scalability, provided you have the time to set it up and reasonable hardware.
It's hard to say what kind of hardware you will need, but it's a safe bet that it will be more efficient than native SQL solutions. I would suggest to make a POC and see how far you can get on a single node; if the kind of queries you have are a good fit for Lucene you might not need more than a single server. Beware that Lucene is much faster in queries than in updates, so since you estimate you'll have the same amount of writes and searches the problem is unlikely to be in the amount of searches/second, but in the writes(updates)/second and total data(index) size. Latest Hibernate Search introduced an NRT index manager, which suites well such use cases.