Is it normal that CockroachDB Serverless uses 500K RUs in 19 hours with no connections? - cockroachdb

I set up a CockroachDB cluster for a school project. The only thing I have done is created 1 database with 1 table with 1 instance of 6 rows, but when I look at the dashboard I have already used 500K RUs. This seems like a huge amount to me, but I'm new to cloud databases so I don't know if this is normal behavior or not. I'm just worried I will run out of RUs without doing anything on the database. In this image the graph of the RU usage can be seen when there are no connections and when the hub wasn't opened. Can anyone maybe clarify this for me?

I think this explanation is more likely to be the reason:
https://www.cockroachlabs.com/docs/cockroachcloud/serverless-faqs.html#my-cluster-doesnt-have-any-current-co[…]ing-rus-when-there-are-no-connections
To summarize, the monitoring console uses up some RUs. So if you have a browser tab open with the console, it will use RUs even if you don't have any connections open.
As that FAQ says, this can use ~8 RUs per second. Over 19 hours, that is about ~540,000 RUs total. The solution is to not leave the console open.
On the stats point, note that auto-stats collection is only triggered when data in the table changes.

I believe what you're seeing is the Automatic Metric collection. You can read more about it on this FAQ.

Related

Hadoop vs Cassandra: Which is better for the following scenario?

There is a situation in our systems in which the user can view and "close" a report. After they close it, the report is moved to a temporary table inside the database where it is kept for 24 hrs, and then moved to an archives table(where the report is stored for next 7 years). At any point during the 7 years, a user can "reopen" the report and work on it. The problem is that archives storage is getting large and finding/reopening reports tend to be time consuming. And I need to get statistics on the archives from time to time(i.e. report dates, clients, average length "opened", etc). I want to use a big data approach but I am not sure whether to use Hadoop, Cassandra, or something else ? Can someone provide me with some guidelines how to get started and decide on what to use ?
If you archive is large and you'd like to get reports from it, you won't be able to use just Cassandra, as it has no easy means of aggregating the data. You'll end up collocating Hadoop and Cassandra on the same nodes.
From my experience archives (write once - read many) is not the best use case for Cassandra if you're having a lot of writes (we've tried it for a backend for a backup sysyem). Depending on your compaction strategy you'll pay either in space or in iops for having that. Added changes are propagated through the SSTable hierarchies resulting in a lot more writes than the original change.
It is not possible to answer your question in full without knowing other variables: how much hardware (servers, their ram/cpu/hdd/ssd) are you going to allocate? what is the size of each 'report' entry? how many reads / writes you usually serve daily? How large is your archive storage now?
Cassandra might work fine. Keep two tables, reports and reports_archive. Define the schema using a TTL of 24 hours and 7 years:
CREATE TABLE reports (
...
) WITH default_time_to_live = 86400;
CREATE TABLE reports_archive (
...
) WITH default_time_to_live = 86400 * 365 * 7;
Use the new Time Window Compaction Strategy (TWCS) to minimize write amplification. It could be advantageous to store the report metadata and report binary data in separate tables.
For roll-up analytics, use Spark with Cassandra. You don't mention the size of your data, but roughly speaking 1-3 TB per Cassandra node should work fine. Using RF=3 you'll need at least three nodes.

access-2013 report opens very slow

I've a weird problem here with a report which I use every day.
I've moved from XP to WIN-7 some time ago and use access 2013.
(Language is german, so sorry I can only guess how the modes are called in english)
"Suddenly" (I really can't say when this started) opening the report in "report-view" takes VERY long. Around 1 minute, or so. Then, switching to "page-view" and formatting the report takes only 2 or 3 seconds. Switching back to report-view, again takes 1 minute.
The report has a complex Query as datasource. (In fact, a UNION of 8 sub-queries) Opening the this query displays the data after 1 second which is ok.
All tables are "linked" from the same ODBC Datasource, which points to a mysql server on our network.
Further testing I opened every table the queries use, one after another. I noticed that opening these tables takes around 9 seconds for every single table. It doesn't matter if it's a small or big table. Always these 9 seconds.
The ODBC datasource is defined using the IP address of the server, not the name. So I consider it not being a nameserver problem / timeout/ ...
What could cause this slowdown on opening tables ????
I'm puzzeled..
Here are a few steps I would try:
Taking a fresh copy of the Access app running on one of those "fast clients" and see if that solves the issue
try comparing performance with a fast client after setting the same default printer
check the version of the driver on both machines, and if you rely on a DSN, compare them

CouchDB - Mobile application architecture - Replication performance

I built a mobile application based on CouchDB.
For security reason, i have to make sure that a document can be read only by the users allowed to do do it. As i cannot manage the access right at document level, i create one couchdb database per user, and i replicate documents from my main couchDB database in each user database with a filtered replication.
This model work very well, but today i faced huge performance issues.
I tried to have all my replications continuous, filtered and bi-directionnal, but after 80 users (so 81 databases and 160 simultanous continuous replications), there was too much replications and my couchDB service start to slow down and even crashed sometimes. Notices that all the databases are on the same server (and i could not have more than one server)
I tried to put in place a "manual" replications, but even this way when i need to replicate a document from my main database to all my 80 users databases, each filtered replication from my main database to a user database take around 30 seconds.
Maybe i have an issue with my replication filter, i store for each document a list of users allowed to see it. As each user has it own database, i replicate only the document the user is allowed to see in its database. Here is my replication function :
function(doc, req) {
if(doc.userList) {
if(doc.userList.indexOf(req.query.username) > 1) {
return true;
}
}
return false;
}
The goal of my application is to get around 1000 users, that is totally impossible with the current architecture / performance.
I have three questions :
1. Even if i think that it's not possible, Is it possible to get about 1000 databases in continuous replication on the same server?
2. Is there anything wrong with my replication filter? Is there any way to improve it to have fast databases replications?
3. If the current architecture is not good at all, what kind of architecture would you advise in my case?
Thank you very much !
We finally changed our global project architecture.
The main server cannot handle more than 100 replicated databases even if the configuration limits can be changed, after 80 synchronied databases couchdb logs start to explode. I may wrong, but i think that this kind of architecture is not possible on a single server.
Here is the solution we put in place.
We removed all the users databases and we plugged all our mobile applications directly on the main database and do a filtered replication directly on the main database : http://pouchdb.com/api.html#replication by using this solution : Example 3: filter function inside of a design document
This new model is now working we did some stress tests and we didn't get any issue until 1000 simultaneous users.
Just be aware that pouchDB, to replicate a database, ask couchdb all the modifications applied on the main database since the last synchronisation (even for filtered replication). So when you create a new pouchdb database and synchronise it, if your main couchDB is old and has a big historical (check couchdb _changes API), it can take a very (very) long time !
Step 0 is always identify the bottleneck. My first guess based on your scenarioe outlined would be to look at I/O perf. Check out
GET /_stats/couchdb
and
GET /_active_tasks
Each database gets its own read & write file descriptors so as the number of open databases on the server increases, so does the I/O resources required. Hope this helps

Realistic Data Backup method for Parse.com

We are building an iOS app with Parse.com, but still can't figure out the right way to backup data efficiently.
As a premise, we have and will have a LOT of data store rows.
Say we have a class with 1million rows, assume we have it backed up, then want to bring it back to Parse, after a hazardous situation (like data loss on production).
The few solutions we have considered are the following:
1) Use external server for backup
BackUp:
- use the REST API to constantly back up data to a remote MySQL server (we chose MySQL for customized analytics purpose, since it's way faster and easier to handle data with MySQL for us)
ImportBack:
a) - recreate JSON objects from MySQL backup and use the REST API to send back to Parse.
Say we use the batch operation which permits 50 simultaneous objects to be created with 1 query, and assume it takes 1 sec for every query, 1million data sets will take 5.5hours to transfer to Parse.
b) - recreate one JSON file from MySQL backup and use the Dashboard to import data manually.
We just tried with 700,000 records file with this method: it took about 2 hours for the loading indicator to stop and show the number of rows in the left pane, but now it never opens in the right pane (it says "operation time out") and it's over 6hours since the upload started.
So we can't rely on 1.b, and 1.a seems to take too long to recover from a disaster (if we have 10 million records, it'll be like 55 hours = 2.2 days).
Now we are thinking about the following:
2) Constantly replicate data to another app
Create the following in Parse:
- Production App: A
- Replication App: B
So while A is in production, every single query will be duplicated to B (using background job constantly).
The downside is of course that it'll eat up the burst limit of A as it'll simply double the amount of query. So not ideal thinking of scaling up.
What we want is something like AWS RDS which gives an option to automatically backup daily.
I wonder how this could be difficult for Parse since it's based on AWS infra.
Please let me know if you have any idea on this, will be happy to share know-hows.
P.S.:
We’ve noticed an important flaw in the above 2) idea.
If we replicate using REST API, all the objectIds of all Classes will be changed, so every 1to1 or 1toMany relations will be broken.
So we think about putting a uuid for every object class.
Is there any problem about this method?
One thing we want to achieve is
query.include(“ObjectName”)
( or in Obj-C “includeKey”),
but I suppose that won’t be possible if we don’t base our app logic on objectId.
Looking for a work around for this issue;
but will uuid-based management be functional under Parse’s Datastore logic?
Parse has never lost production data. While we don't currently offer automated backups, you can request one any time you like, and we're working on making all of this even nicer. Additionally, it's easier in most cases to import the JSON export file through the data browser rather than using the REST batch.
I can confirm that today, Parse did lost my data. Or at least it appeared to be so.
After several errors where detected on multiple apps (agreed by Parse Status twitter account), we could not retrieve data for an app, without any error.
It was because an entire column of one of our class (type pointer) disappeared and data was not present anymore in the dashboard.
We are using this pointer column to filter / retrieve data, so the returned queries and collections were empty.
So we decided to recreate the column manually. By chance, recreating the column, with the same name and type, solved the issue and the data was still there... I can't explain it but I really thought, and the app reacted as if, data were lost.
So an automated backup and restore option is mandatory, it is not an option.
On December 2015 parse.com released a new dashboard with an improved export feature.
Just select your app, click on "App Settings" -> "General" -> "Export app data". Parse generates a json-file for every class in your app and sends an email to you, if the export-progress is done.
UPDATE:
Sad but true, parse.com is winding down: http://blog.parse.com/announcements/moving-on/
I had the same issue of backing up parse server data. As parse server is using mongodb that is why backing up data is not an issue I have just done a simple thing. downloaded the mongodb backup from the server. And then restored it using
mongorestore /path-to-mongodump (extracted files)
As parse has been turned to open source.Therefore we can adopt this technique.
For accidental deletes, writing a cloud function 'beforedelete' to backup the current row to another class would work.
For regular backups, manual export of changed records (use filter) will be useful. For recovery this requires you to write scripts / use import option (not so sure) in data browser. You could also write a cloud function replicate data on your backup server (haven't tried this yet).
However there are some limitations to cloud code that you should consider before venturing into it:
https://parse.com/docs/cloud_code_guide#functions-resource

How many Queries per second is normal for this site?

I have a small site running Flynax classifieds software. I get 10/15 users concurrent users at the most. Sometimes I get very high load avg that results in outages and downtime problems on my server.
I run
root#host [~]# mysqladmin proc stat
and I see this :
Uptime: 111346 Threads: 2 Questions: 22577216 Slow queries: 5159 Opens: 395 Flush tables: 1 Open tables: 285 Queries per second avg: 202.766
Are 202.766 queries per second is normal for a small site like mine ?!
The hosting company is saying, my app is poorly coded and must be optimized.
The Flynax developers are saying, the server is poor and weak and must be replaced.
I'm not sure what to do? any help is much appreciated.
202.766 queries per second isn't normals for small website you described.
Probably some queries run in a loop and that is why you have such statistics.
As far as I know the latest flynax versions has mysql debug option, using this option
you can see how many queries run on the page and how much time each query executes.
Cheers

Resources