We are developing an app in dart wherein we need to fetch around more than 50K rows at once (doing it when app loads) and then data will be used in other sections of app for further calculations. We are using Firebase Realtime database and we are facing some serious performance issues.
Its currently taking somewhere around 40 seconds to load 50K rows(currently using free database version, not sure if that would the reason), but we have also observed that when multiple users uses the app, it starts to take around 1 minute 20 sec to load 50K rows and Peak goes to 100%.
Can you please suggest how can we improve performance in firebase realtime database ?
If I break the data in two collection but keep it in same JSON file, would that help ?
Can it be because we are using currently free database version for testing ?
We have tried creating indexes in "Rules" section on 1 Key field but that did not help much. Is there any way we can improve this ?
Can it be because we are using currently free database version for testing?
All Firebase Realtime Database instances run on the same infrastructure. There is no difference based on the plan your project is on.
Can you please suggest how can we improve performance in firebase realtime database?
The best way to improve performance is to only load data that you're going to show to the user right away. In any client-side application it's unlikely that the user will look at 50K items, let alone look at them straight when the application starts.
If you need 50K items to show the initial data to the user, that typically means that you're aggregating that data in some way, and showing that aggregate to the user. Consider doing that aggregation when you write the data to the database, and store the aggregation result in the database. Then you can load just that result in each client, instead of having each client do its own aggregation.
For more data modeling tips, read NoSQL data modeling and watch Firebase for SQL developers. I'd also recommend watching Getting to know Cloud Firestore, which is for Cloud Firestore, but contains many great tips that apply to all NoSQL databases.
Related
I've spent a lot of time reading and watching videos of people talking about how they use tools designed for handling huge datasets and real-time processing in their architectures. And while I understand what it is that tools like Hadoop/Cassandra/Kafka etc do, no one seems to explain how the data gets from these large processing tools to rendering something on a client/webpage.
From what I understand of big data tools, is that you can't build your application the same way you would a standard web-app querying MySQL, which I can understand given the size of the data that flows through these tools, however, for all this talk of "realtime data analytics" I cannot find any explanation of how the actual analytics gets put in front of someone in terms of some chart/table/etc?
explain how the data gets from these large processing tools to rendering something on a client/webpage.
With respect to this, one way would be to process the big data using Spark or Hadoop and store the results onto a RDBMS. Then have your webapp pull data from RDBMS to render charts, table etc. I can provide you the examples that I have done myself if you need more information.
Impala supports ODBC/JDBC interfaces. So, you actually could hook up a web app to it the same way you do with MySQL.
Other stuff you might want to check out is HBase, Kudu or Solr. In some realtime architectures data ends up in one of those. And all of them have some sort of an API that you can use in your web app to access their data.
If you want a simple solution for realtime data processing and analytics, check out the new Stride API, which enables developers to collect, process, and analyze streaming data and then either visualize summary data in Stride or push processed data out to applications in realtime. This is a very easy way to build the kind of realtime reporting dashboards and monitoring / alerting systems you described above.
Take a look at the Stride API technical docs for examples and more info on how to implement this.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I got the task to test Solr performance testing. I am completely new to Solr and not having idea how to perform here testing.
Solr which we are using, it is utilizing a lot of RAM and CPU. Due to that our application is getting hang and send server error messages.
What would be the way of testing Solr, whether it is required any tool to create multiple concurrent threads?
According to the Solr Quick Start guide
Searching
Solr can be queried via REST clients, cURL, wget, Chrome POSTMAN, etc., as well as via the native clients available for many programming languages.
so you can use "usual" HTTP Request samplers to mimic multiple users concurrently using Solr.
References:
Building a Web Test Plan
Testing SOAP/REST Web Services Using JMeter
For search applications, the amount of requests by itself usually isn't as important as the query profile. There's a lot of internal caching going on, and the only useful way to be able to do decent performance testing, is to use your actual query logs to replicate the query profile that represents your users. You'll also have to use the actual data that you have in your Solr server, so you get (at least) close to the same cardinality for fields and values.
This would mean using the same filters, the same kind of queries and within the same kind of simultaneous load. Since you probably want to go above the load you see in production, using logs for several days as a single day (and be sure to get weekends vs weekdays in there, and if you have a particularly bad day, such as black fridays for ecommerce, keep those logs available so you're able to replicate that profile.
There are (many) tools to do the HTTP requests to Solr, but be sure to use a query profile and sets of queries that actually represent how you're using Solr, otherwise you're just hitting the query cache each single time, or you have data that doesn't represent the actual data in your dataset - which will give you completely irrelevant response times (i.e. random data performs a lot worse than actual english text where tokens are repeated over documents).
You can use solr meter to do the performance testing. read here solr
meter wiki
The most important thing is not how to test the queries - but putting up the scenarios that you want to test and which mimicks the real time application usage you see.
initially you need to decide what you want to find out with your test. Do you want to find bottlenecks? do you want to find out if your current setup can match business requirements? do you need to find the breaking points of your current architecture and setup?
Solr utilizing a lot of CPU is very often related to indexing and memory usage might be related to segment merging - so it sounds like you need to define your senarios.
how much content should you push to Solr for it to perform indexing?
how many queries do you need to send?
what are the features of the queries (facets, highlighting, spellcheck etc)?
You could use Jmeter to test the throughput of your search application and you could also check for IO, load, CPU usage & Ram Usage on each Solr instances.
I am optimizing Postgres with Ruby on Rails. For last few days I am finding that my site is loading slowly. Application is using different queries with join of 3-4 tables to fetch the data.
Could you help me what I need to do to improve the performance of the application at database level?
Have a look here. You need to capture and look at all the activity and go from there. The link below provides a open source tool to do that.
http://dalibo.github.io/pgbadger/
We are building a mobile app with a rails CMS to manage it.
What our app look like?
Every admin user of the app can set one private channel with very small amount of data -
About 50 short strings.
Users can then download the app and register few different channels and fetch the data from the server to their devices. The data will be stored locally and will not be fetched again unless the admin user will update the data (but we assume that it won't happen so often). Every channel will be available to not more then 500 devices.
The users can contribute to the channel but this data will be stored on S3 and not on the database.
2 important points:
Most of the channels will be active for 5 months and not for 500 users +-. But most of the activity will happen on the same couple of days.
Every channel is for small amout of users (500) But we hope :) to get to hundreds of thousens of admin users.
Building the CMS with rails we saw that using SimpleDB is more strait-forward then using DynamoDB. But, as we are not server experts, we saw the limitations of SimpleDB and we don't know if SimpleDB could handle the amount of data transfer that we will have (if our app will succeed). another important point is that DynamoDb costs are much higher and not depended on the use while SimpleDb will be much cheaper at the beginning.
The question is:
Does simpleDB can feet our needs?
Could we migrate later to dynamoDB if our service will grow in the future ?
Starting out with a new project and not really knowing what to expect from the usage i'd say that the better option is to go with SimpleDB. It doesn't sound like your usage is going to be very high SimpleDB should be able to handle that no problem. The real power of dynamoDB comes in when you really have a lot of load. You don't fall into that category it seems.
If you design your application correctly switching between SimpleDB and DynamoDB should be a simple task if you decide at some point that SimlpeDB is not working out. I do these kind of switches all the time with other components in my software. Since both databases are NoSQL you shouldn't have a problem converting between the two. Just make sure that any any features you use in SimpleDB are available in DynamoDB. Make sure to design your database design for both DynamoDB has stricter requirements using indexes make sure that the two will be compatible.
That being said. Plenty of people have been using SimpleDB for their applications and I don't expect that you would see any performance problems unless your product really takes off, at which time you can invest in resources to move to DynamoDB.
Aside from all that we have the pricing, like you already mentioned. SimpleDB is the obvious solution for your use case.
I have an application that talks to several internal and external sources using SOAP, REST services or just using database stored procedures. Obviously, performance and stability is a major issue that I am dealing with. Even when the endpoints are performing at their best, for large sets of data, I easily see calls that take 10s of seconds.
So, I am trying to improve the performance of my application by prefetching the data and storing locally - so that at least the read operations are fast.
While my application is the major consumer and producer of data, some of the data can change from outside my application too that I have no control over. If I using caching, I would never know when to invalidate the cache when such data changes from outside my application.
So I think my only option is to have a job scheduler running that consistently updates the database. I could prioritize the users based on how often they login and use the application.
I am talking about 50 thousand users, and at least 10 endpoints that are terribly slow and can sometimes take a minute for a single call. Would something like Quartz give me the scale I need? And how would I get around the schedular becoming a single point of failure?
I am just looking for something that doesn't require high maintenance, and speeds at least some of the lesser complicated subsystems - if not most. Any suggestions?
This does sound like you might need a data warehouse. You would update the data warehouse from the various sources, on whatever schedule was necessary. However, all the read-only transactions would come from the data warehouse, and would not require immediate calls to the various external sources.
This assumes you don't need realtime access to the most up to date data. Even if you needed data accurate to within the past hour from a particular source, that only means you would need to update from that source every hour.
You haven't said what platforms you're using. If you were using SQL Server 2005 or later, I would recommend SQL Server Integration Services (SSIS) for updating the data warehouse. It's made for just this sort of thing.
Of course, depending on your platform choices, there may be alternatives that are more appropriate.
Here are some resources on SSIS and data warehouses. I know you've stated you will not be using Microsoft products. I include these links as a point of reference: these are the products I was talking about above.
SSIS Overview
Typical Uses of Integration Services
SSIS Documentation Portal
Best Practices for Data Warehousing with SQL Server 2008