why stmt.Close is slow? - go

We implemented an Clickhouse driver according to sql/driver with ConnPrepareContext interface which use stmt to do the real querying work.
For some queries, our driver cost must of the time during stmt.Close(). I can find the query execution time in Clickhouse system.query_log.
For 10 out of 4000 queries in production environment, I found they executed very fast in Clikchouse but cost too much in driver.
Take one query as example: it run 100ms in Clickhouse and resulted 0 rows, but the driver cost 10s on the Close() method.
This issue is not easy to reproduce. I want to ask you for help. If this is database/sql package issue, it should not only emerge in my driver, but also other drivers. Does anyone meet the same issue?
Can you give some suggestion on how to debug this.

I find some hints.
In one sql.DB.QueryContext() call, our driver will
step 1: call sql.DB.PrepareContext() to prepare sql.Stmt, and
step 2: call sql.Stmt.QueryContext(), and
step 3: run sql.Stmt.finalClose() due to Rows.Close()
Both in step 1 and step 2, sql.Stmt will apply one sql.driverConn, and append it into sql.Stmt.css.
In step 3, sql.Stmt will be removed in each sql.driverConn in sql.Stmt.css, which will acquire sql.driverConn lock. This will cause contention.
In our case, we don't care so much about security. We will implement QueryerContext interface in Golang sql/driver avoid unnecessary PrepareContext.
refer http://go-database-sql.org/prepared.html

Related

Client application hangs when inserting into table on Oracle using ArrayBinding

Here is our environment:
.Net version: 4.5
Database: Oracle 12.1.0.2 (odp.net)
We are using LLBL "Adapter" but I don't think that has anything to do with the issue
LLBLGen Pro version: 4.1
Llbl Gen Pro Runtime: 4.1.13.1213
When we do an Insert(always into different tables which we are using for the short period and then removing) we use the following code:
int numRecords = strings.Count();
var insertCmd = "insert into " + tableName + " (StringField) values (:StringField)";
var oracleCommand = new OracleCommand();
oracleCommand.CommandText = insertCmd;
oracleCommand.CommandType = CommandType.Text;
oracleCommand.BindByName = true;
oracleCommand.ArrayBindCount = numRecords;
oracleCommand.Parameters.Add(":StringField", OracleDbType.NVarchar2, strings.ToArray(), ParameterDirection.Input);
// this is an LLBL adapter. Like I said, I think the issue is below the LLBL layer.
this.adapter.ExecuteActionQuery(new ActionQuery(oracleCommand));
When the database is getting hit hard with multiple of these inserts in parallel, we get the following error and the insert call never returns from the database.
WG_6.Index_586.TVD: An exception was caught during the execution of an action query: ORA-24381: error(s) in array DML
ORA-12592: TNS:bad packet
ORA-12592: TNS:bad packet
ORA-12592: TNS:bad packet
ORA-12592: TNS:bad packet
ORA-03111: break received on communication channel
ORA-03111: break received on communication channel
ORA-03111: break received on communication channel
On the database, using Toad's session browser, I can see that the "Current Statement" is correct.
insert into schemaX.tableY(StringField) values(:Stringfield)
Under the Waits tab in Toad, there is the following message:
“Waiting for SQL*Net more data from client - waited X hundred seconds, so far” and the X keeps incrementing until we hit our database timeout.
We tried with batches of 1 million and this gave us the best performance for our scenario. However, this hanging issue arose. I then decrease the ArrayBindCount to 500K, 100K, 50K, 10K and then 5K. Only when I used 5K did it stop happening.
A couple of notes:
This happens more frequently when the database is on a different physical machine than the client. When using a local VM, it rarely happens. The network that we are using is generally very reliable with no other noted issues.
From the error message(ORA-12592: TNS:bad packet), it seems that the issue might be on the client and perhaps related to code in the "Oracle.DataAccess.Client"(ODAC) dll.
My next steps for troubleshooting are to use Reflector to debug the call from the ODAC code and also to get more reliable client side tracing while forcing this error to occur.
I had the same situation when trying to insert into an Oracle table using the ArrayBinding.
Using a small number for oracleCommand.ArrayBindCount seemed to improve the frequency of the errors (same like yours) but not completely.
The solution was to use the Managed data access. I suggest you get the latest ODP.NET, add a reference to ManagedDataAccess and change to:
using Oracle.ManagedDataAccess.Client;
using Oracle.ManagedDataAccess.Types;
This fixed problem in my case and with no need to change anything in the code.

How to Fix Read timed out in Elasticsearch

I used Elasticsearch-1.1.0 to index tweets.
The indexing process is okay.
Then I upgraded the version. Now I use Elasticsearch-1.3.2, and I get this message randomly:
Exception happened: Error raised when there was an exception while talking to ES.
ConnectionError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)) caused by: ReadTimeoutError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)).
Snapshot of the randomness:
Happened --33s-- Happened --27s-- Happened --22s-- Happened --10s-- Happened --39s-- Happened --25s-- Happened --36s-- Happened --38s-- Happened --19s-- Happened --09s-- Happened --33s-- Happened --16s-- Happened
--XXs-- = after XX seconds
Can someone point out on how to fix the Read timed out problem?
Thank you very much.
Its hard to give a direct answer since the error your seeing might be associated with the client you are using. However a solution might be one of the following:
1.Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python
es = Elasticsearch(timeout=30)
2.Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.
# only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)
The above will give the cluster some extra time to respond
Try this:
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
It might won't fully avoid ReadTimeoutError, but it minimalize them.
Read timeouts can also happen when query size is large. For example, in my case of a pretty large ES index size (> 3M documents), doing a search for a query with 30 words took around 2 seconds, while doing a search for a query with 400 words took over 18 seconds. So for a sufficiently large query even timeout=30 won't save you. An easy solution is to crop the query to the size that can be answered below the timeout.
For what it's worth, I found that this seems to be related to a broken index state.
It's very difficult to reliably recreate this issue, but I've seen it several times; operations run as normal except certain ones which periodically seem to hang ES (specifically refreshing an index it seems).
Deleting an index (curl -XDELETE http://localhost:9200/foo) and reindexing from scratch fixed this for me.
I recommend periodically clearing and reindexing if you see this behaviour.
Increasing various timeout options may immediately resolve issues, but does not address the root cause.
Provided the ElasticSearch service is available and the indexes are healthy, try increasing the the Java minimum and maximum heap sizes: see https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html .
TL;DR Edit /etc/elasticsearch/jvm.options -Xms1g and -Xmx1g
You also should check if all fine with elastic. Some shard can be unavailable, here is nice doc about possible reasons of unavailable shard https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

python slow to check if mongodb record found

I have a python (3.2) request that goes to MongoDB and the request itself is running fast enough. When I then perform an if statement check to see if any records were found it takes 50 times as long:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
58 27623 6475988 234.4 1.7 itemInDB = db.mainData.find({"x":item[x]}).limit(1)
59
60 #existing item in db
61 27623 293419802 10622.3 77.6 if itemInDB.count():
What on earth is the cause for that if statement taking so long?! I presume there must be a better way to check if a record was found but google has come up empty.
Thanks for the help.
Perhaps a Better Way
If you're only interested in returning one value, you might want to use find_one instead of find. It will stop looking for values after one has been found, as opposed to find, which has to run through the collection:
itemInDB = db.mainData.find_one({"x":item[x]})
if itemInDB:
print("Item found")
else:
print("Item not found")
For Your Example
According to the PyMongo docs, when querying the count of a cursor, you can pass in a parameter (True or False) to take into account any skip or limit calls previously made to the cursor. The default for that parameter is False (namely, not taking those calls into account). That may be affecting the performance of your count query.
Gauging Query Performance
If you want to see how your query will be carried out by mongo, you can call explain on your cursor:
db.coll.find({"x":4}).explain()
The explain function is also implemented in PyMongo.
Turns out it was due to the find() function and not the if statement. I created an index on "x" (as I should have anyway). Changed the find to find_one and removed the .count() from the if statement. Overall 75% faster.

Is there a way to fix the initial Go/Golang FreeTDS+ODBC call being slow?

I'm using the ODBC driver with FreeTDS by Brainman (http://code.google.com/p/odbc), and I'm noticing that my initial DB stored procedure call is slow, and the other ones are as fast.
Here are the results of a test:
=== RUN TestLoadElitePartner
ELAPSED TrafficCheck: 456.0261ms
ELAPSED LoadEliteInfo: 123.0071ms
ELAPSED LiveFeed: 128.0073ms
--- PASS: TestLoadElitePartner (0.71 seconds)
PASS
The first call is 456ms, and the subsequent calls are more realistic, at ~125ish each, but still a bit slow. If I SWAP the order that these three db queries are called, the first one is always slow (~450ms) and the subsequent ones faster (~125ms), assumedly due to connection pooling.
Is there a way to FIX the initial call being THAT slow? 450ms does not seem reasonable to me.
My DB connection string is directly to an IP address, so this is not a DNS issue. I've read on http://freetds.schemamania.org/userguide/seemtooslow.htm that DNS and logging are the main issues that FreeTDS can be slow, but I do not have logging enabled and I am going directly off IP. Is there anything else I can do to optimize my ODBC/FreeTDS?

can't reproduce/verify the performance claims in graph databases and neo4j in action books

UPDATE I have put up a follow up question that contains updated scripts and and a clearer setup on neo4j performance compared to mysql (how can it be improved?). Please continue there./UPDATE
I have some problems verifying the performance claims made in the "graph databases" book (page 20) and in the neo4j (chapter 1).
To verify these claims I created a sample dataset of 100000 'person' entries with 50 'friends' each, and tried to query for e.g. friends 4 hops away. I used the very same dataset in mysql. With friends of friends over 4 hops mysql returns in 0.93 secs, while neo4j needs 65 -75 secs (on repeated calls).
How can I improve this miserable outcome, and verify the claims made in the books?
A bit more detail:
I run the whole setup on a i5-3570K with 16GB Ram, using ubuntu12.04 64bit, java version "1.7.0_25" and mysql 5.5.31, neo4j-community-2.0.0-M03 (I get a similar outcome with 1.9)
All code/sample data can be found on https://github.com/jhb/neo4j-experiements/ (to be used with 2.0.0). The resulting sample data in different formats can be found on https://github.com/jhb/neo4j-testdata.
To use the scripts you need a python with mysql-python, requests and simplejson installed.
the dataset is created with friendsdata.py and stored to friends.pickle
friends.pickle gets imported to neo4j using import_friends_neo4j.py
friends.pickle gets imported to mysql using import_friends_mysql.py
I add indexes on t_user_friend.* in mysql
I added "create index on :node(noscenda_name) in neo4j
To make life a bit easier the friends.*.bz2 contain sql and cypher statements to create those datasets in mysql and neo4j 2.0 M3.
Mysql performance
I first warm mysql up by querying:
select count(distinct name) from t_user;
select count(distinct name) from t_user;
Then, for the real meassurment I do
python query_friends_mysql.py 4 10
This creates the following sql statement (with changing t_user.names):
select
count(*)
from
t_user,
t_user_friend as uf1,
t_user_friend as uf2,
t_user_friend as uf3,
t_user_friend as uf4
where
t_user.name='person8601' and
t_user.id = uf1.user_1 and
uf1.user_2 = uf2.user_1 and
uf2.user_2 = uf3.user_1 and
uf3.user_2 = uf4.user_1;
and repeats this 4 hop query 10 times. The queries need around 0.95 secs each. Mysql is configured to use a key_buffer of 4G.
neo4j performance testing
I have modified neo4j.properties:
neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=250M
and the neo4j-wrapper.conf:
wrapper.java.initmemory=2048
wrapper.java.maxmemory=8192
To warm up neo4j I do
start n=node(*) return count(n.noscenda_name);
start r=relationship(*) return count(r);
Then I start using the transactional http endpoint (but I get the same results using the neo4j-shell).
Still warming up, I run
./bin/python query_friends_neo4j.py 3 10
This creates a query of the form (with varying person ids):
{"statement": "match n:node-[r*3..3]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}
after the 7th call or so each call needs around 0.7-0.8 secs.
Now for the real thing (4 hops) I do
./bin/python query_friends_neo4j.py 4 10
creating
{"statement": "match n:node-[r*4..4]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}
and each call takes between 65 and 75 secs.
Open questions/thoughts
I'd really like see the claims in the books to be reproducable and correct, and neo4j faster then mysql instead of magnitudes slower.
But I don't know what I am doing wrong... :-(
So, my big hopes are:
I didn't do the memory settings for neo4j correctly
The query I use for neo4j is completely wrong
Any suggestions to get neo4j up to speed are highly welcome.
Thanks a lot,
Joerg
2.0 has not been performance optimized at all, so you should use 1.9.2 for comparison.
(if you use 2.0 - did you create an index for n.noscenda_name)
You can check the query plan with profile start ....
With 1.9 please use a manual index or node_auto_index for noscenda_name.
Can you try these queries:
start n=node:node_auto_index(noscenda_name={target})
match n-->()-->()-->m
return count(*);
Fulltext indexes are also more expensive than exact indexes, so keep the exact auto-index for noscenda_name.
can't get your importer to run, it fails at some point, perhaps you can share the finished neo4j database
python importer.py
reading rels
reading nodes
delete old
Traceback (most recent call last):
File "importer.py", line 9, in <module>
g.query('match n-[r]->m delete r;')
File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 99, in query
return self.call(payload)
File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 71, in call
self.transactionurl = result.headers['location']
File "/Library/Python/2.7/site-packages/requests-1.2.3-py2.7.egg/requests/structures.py", line 77, in __getitem__
return self._store[key.lower()][1]
KeyError: 'location'
Just to add to what Michael said, in the book I believe the authors are referring to a comparison that was done in the Neo4j in Action book - it's described in the free first chapter of that book.
At the top of page 7 they explain that they were using the Traversal API rather than Cypher.
I think you'll struggle to get Cypher near that level of performance at the moment so if you want to do those types of queries you'll want to use the Traversal API directly and then perhaps wrap it in an unmanaged extension.

Resources