mongoDB is getting slower and slower - performance

I update data into MongoDB continuously in different collection or DB, which name are timestamp. And I delete the oldest data and keep about 3 days data, 200GB, in mongo. The mapped and vsize are increasing but res is under 10 GB. And I summarize the mongo response time is larger and larger. Do you know the reason? I am willing your sharing.

Please make sure that you are using indexes correctly.
For example, if you find users by email field, you have to build index for this field:
db.users.ensureIndex({ email: 1 })
To learn more about indexes please follow the link: http://docs.mongodb.org/manual/indexes/
Also, some explanation will be very useful for you. You can see detailed information about your queries from the next command:
db.users.find({ email: "user#example.com" }).explain()
explain() will say you a lot about your query. To read more about it, please follow the official documentation: http://docs.mongodb.org/manual/reference/method/cursor.explain/
So, if you are sure that indexes are built correctly, please post the output of explain(). It will help us to find the problem.

Related

How to look for S&P 500 Constituents history, added and removed dates etc

I am trying to get a historical list of the S&P500 underlying stocks mix. all tickers the dates were added to the S&P500 index mix and the dates tickers were removed from the list. and throughout the years for each period what is the mix. I did some search, doesn't seems to have any luck.
if anyone can provide some good search keywords, or suggest a place to look for would be appreciated
this is something very specific.
I currently use backtrader to work on some data. if there is a systematic way to get the data, please let me know as well.
many thanks.
You can access this data systematically in QuantRocket, via data provider Sharadar:
https://www.quantrocket.com/data/?filter=sharadar

Neo4j - count relationships vs store the no. of relationships

I'm feeling good to work with Neo4j as I'm building a social network and neo4j is working well for me. Please answer to these points:
1) I'm stuck at making a decision as to store the number of likes on a post (de-normalized)somewhere in the database or should I count the number of edges to that post dynamically every time.
For example, When retrieving the "post" json data, for each user who needs that data, I need to count the no. of edges everytime I generate json.
2) Stuck at deciding best way to notify users about likes or comments.
For example, I want to push a notification to the user saying "John and 3 others also commented on Cena's post".
This notification might be updated as the number of comments increases. So, it's helpful for me to update notification if I'm using count(*) rather than than storing the counter somewhere, because I can fetch the count of "${count} new replies on your post" easily. But I'm worried about the performance.
3) Can I use Redis or other memcache with neo4j? Does that make a "significant" difference?
Please help me out in deciding which is better.
P.S: Please keep in mind the efficiency and scalability of application.

Postgres tsvector_update_trigger sometimes takes minutes

I have configured free text search on a table in my postgres database. Pretty simple stuff, with firstname, lastname and email. This works well and is fast.
I do however sometimes experience looong delays when inserting a new entry into the table, where the insert keeps running for minutes and also generates huge WAL files. (We use the WAL files for replication).
Is there anything I need to be aware of with my free text index? Like Postgres maybe randomly restructuring it for performance reasons? My index is currently around 400 MB big.
Thanks in advance!
Christian
Given the size of the WAL files, I suspect you are right that it is an index update/rebalancing that is causing the issue. However I have to wonder what else is going on.
I would recommend against storing tsvectors in separate columns. A better way is to run an index on to_tsvector()'s output. You can have multiple indexes for multiple languages if you need. So instead of a trigger that takes, say, a field called description and stores the tsvector in desc_tsvector, I would recommend just doing:
CREATE INDEX mytable_description_tsvector_idx ON mytable(to_tsvector(description));
Now, if you need a consistent search interface across a whole table, there are more elegant ways of doing this using "table methods."
In general the functional index approach has fewer issues associated with it than anything else.
Now a second thing you should be aware of are partial indexes. If you need to, you can index only records of interest. For example, if most of my queries only check the last year, I can:
CREATE INDEX mytable_description_tsvector_idx ON mytable(to_tsvector(description))
WHERE created_at > now() - '1 year'::interval;

Microsoft Access equivalent of explain in MySQL

I'm working on a very large query, in a inherited application. This is a large insert-query, that takes 4 tables with well over a million records. I know, I would also rather have this in SQL-server, but there is no infrastructure at this customer to do this :-)
This query has worked for over a year. However, the source-tables keep on growing, and last week it threw the dreaded 'out of system resources'-error. Bummer...!
I think it is possible to optimize this query. Working in MySQL, I would use the explain-command, to see where optimalisation might occur. Is there a equivalent of this in Access? I cannot seem to find it....
kind regards,
Paul
Probably Jet ShowPlan is closest to what you want. You will have to set a registry key. Then query plan information gets dumped to a text file named SHOWPLAN.OUT. You can read about the details in this article on TechRepublic: Use Microsoft Jet's ShowPlan to write more efficient queries
Also try the Performance Analyzer wizard. You can ask it to examine your query alone, or also ask it to examine table or other queries used by that query.
If you haven't compacted the database recently, see whether that improves performance. Compacting also updates index statistics which allows the engine to make better decisions for the query plan.

informix query performance problem

The following SQL takes 62 seconds to return:
select getCreditBalance(Customerid)
from business_apply
where serialno = '20101013000005'
How to tune it?
Please tell me in detail.
I just want to know the steps I should do to tune it .
we use IDS 9.04 .
As in JDBC I cant see output with SET Explain ON
shall I execute query in dbaccess (with SET Explain on)?
My problem is I cant get execution plan ...If I can get it ,I will post it here.
You've not given us very much to work on.
Basic questions
What is the type of the column 'SerialNo'?
If it is a numeric column, don't quote the value you are searching for.
Is there an index on 'SerialNo'?
The index is important; the type is not so important.
Crucial question
What does the getCreditBalance() procedure do?
Auxilliary questions
Which version of Informix are you using? Is it IDS or SE or something else?
When did you last run UPDATE STATISTICS?
Is there a problem connecting to the database, or is it definitely just this query that is slow?
What language are you using to submit the query?
Are there any networks with huge latencies involved?
Which isolation level are you running at?
How big is the Business_Apply table?
What is the size of each row?
How many rows?
Which other tables are accessed by the getCreditBalance() procedure?
How big are they?
Do they have appropriate indexes?
What sort of machine is the Informix server running on?
What does the query plan tell you when you run with SET EXPLAIN on?
Is there any chance you've got a failing disk and the o/s is taking forever to read it?
Make sure there is an index on serialno and tune the code in the getCreditBalance function. Without knowing what that does, it's hard to give you any additional help.

Resources