Managing shared resources with threads in Huey - thread-safety

I have to update many rows (increment one value in each rows) in peewee database (SqliteDatabase). Some objects can be uncreated so I have to create them with default values before working with them. I would use ways which are in peewee docs (Atomic updates) but I couldn't figure out how to mix model.get_or_create() and in [my_array].
So I decided to make queries in a transaction to commit it once at the end (I hope it does).
Why I'm writting in stack overflow is because I don't know how to work with db.atomic() with threading (I tested with 4 workers) in Huey because .atomic() locks the connection (peewee.OperationalError: database is locked). I've tried to use #huey.lock_task but it's not a solution of my problem as I've found.
Code of my class:
class Article(Model):
name = CharField()
mention_number = IntegerField(default=0)
class Meta:
database = db
Code of my task:
#huey.task(priority=30)
def update(names): # "names" is a list of strings
with db.atomic():
for name in names:
article, success = Article.get_or_create(name=name)
article.mention_number += 1
article.save()

Well, if you're using a recent version of Sqlite (3.24 or newer) you can use Postgres-style upsert queries. This is well supported by Peewee: http://docs.peewee-orm.com/en/latest/peewee/api.html#Insert.on_conflict
To answer the other question about shared resources, it's not clear from your example what you would like to happen... Sqlite only allows one write transaction at a time. So if you are running several threads, only one of them may be writing at any given time.
Peewee stores database connections in a thread local, so Peewee databases can be safely used in multithreaded applications.
You didn't mention why huey lock_task wouldn't work.
Another suggestion is to try using WAL-mode with Sqlite, as WAL-mode allows multiple reader transactions to co-exist with a single writer.

Related

Where is the best place to store an application setting that needs to be updated frequently in ServiceNow

I have a scheduled script execution that needs to persist a value between runs. It is updated with each run. Using gs.setProperty seemed like the natural place until I came across this:
Care should be taken when setting system properties (sys_properties)
using this method as it causes a system-wide cache flush. Each flush
can cause system degradation while the caches rebuild. If a value must
be updated often, it should not be stored as a system property. In
general, you should only place values in the sys_properties table that
do not frequently change.
Creating a separate table to store a single scalar value seems like overkill. Is there a better place to store it?
You could set a preference if you need it in the instance. Another place could be the events table. Log the event with the data in parm1 or parm2 and on next run query the most recent event.
I'd avoid making a table as that has cost implications for some clients. I agree with the sys_properties.
var encrypter = new GlideEncrypter();
var encrypted = encrypter.encrypt('Super Secret Phrase');
gs.info('encrypted: ' + encrypted);
var decrypted = encrypter.decrypt(encrypted);
gs.info('decrypted: ' + decrypted);
/**
*** Script: encrypted: g/bXLJHa7xNRMKZEo5q/YtLMEdse36ED
*** Script: decrypted: Super Secret Phrase
*/
This way only administrators could really read this data. Also if I recall correctly, the sysevent table is cleared after 7 days. You could have the job remove the event as soon as it has it in memory.

Concurrency in Redis for flash sale in distributed system

I Am going to build a system for flash sale which will share the same Redis instance and will run on 15 servers at a time.
So the algorithm of Flash sale will be.
Set Max inventory for any product id in Redis
using redisTemplate.opsForValue().set(key, 400L);
for every request :
get current inventory using Long val = redisTemplate.opsForValue().get(key);
check if it is non zero
if (val == null || val == 0) {
System.out.println("not taking order....");
}
else{
put order in kafka
and decrement using redisTemplate.opsForValue().decrement(key)
}
But the problem here is concurrency :
If I set inventory 400 and test it with 500 request thread,
Inventory becomes negative,
If I make function synchronized I cannot manage it in distributed servers.
So what will be the best approach to it?
Note: I can not go for RDBMS and set isolation level because of high request count.
Redis is monothreaded, so running a Lua Script on it is always atomic.
You can define then a Lua script on your Redis instance and running it from your Spring instances.
Your Lua script would just be a sequence of operations to execute against your redis instance (the only one to have the correct value of your stock) and returns the new value for instance or an error if the value is negative.
Your Lua script is basically a Redis transaction, there are other methods to achieve Redis transaction but IMHO Lua is the simplest above all (maybe the least performant, but I have found that in most cases it is fast enough).

can't reproduce/verify the performance claims in graph databases and neo4j in action books

UPDATE I have put up a follow up question that contains updated scripts and and a clearer setup on neo4j performance compared to mysql (how can it be improved?). Please continue there./UPDATE
I have some problems verifying the performance claims made in the "graph databases" book (page 20) and in the neo4j (chapter 1).
To verify these claims I created a sample dataset of 100000 'person' entries with 50 'friends' each, and tried to query for e.g. friends 4 hops away. I used the very same dataset in mysql. With friends of friends over 4 hops mysql returns in 0.93 secs, while neo4j needs 65 -75 secs (on repeated calls).
How can I improve this miserable outcome, and verify the claims made in the books?
A bit more detail:
I run the whole setup on a i5-3570K with 16GB Ram, using ubuntu12.04 64bit, java version "1.7.0_25" and mysql 5.5.31, neo4j-community-2.0.0-M03 (I get a similar outcome with 1.9)
All code/sample data can be found on https://github.com/jhb/neo4j-experiements/ (to be used with 2.0.0). The resulting sample data in different formats can be found on https://github.com/jhb/neo4j-testdata.
To use the scripts you need a python with mysql-python, requests and simplejson installed.
the dataset is created with friendsdata.py and stored to friends.pickle
friends.pickle gets imported to neo4j using import_friends_neo4j.py
friends.pickle gets imported to mysql using import_friends_mysql.py
I add indexes on t_user_friend.* in mysql
I added "create index on :node(noscenda_name) in neo4j
To make life a bit easier the friends.*.bz2 contain sql and cypher statements to create those datasets in mysql and neo4j 2.0 M3.
Mysql performance
I first warm mysql up by querying:
select count(distinct name) from t_user;
select count(distinct name) from t_user;
Then, for the real meassurment I do
python query_friends_mysql.py 4 10
This creates the following sql statement (with changing t_user.names):
select
count(*)
from
t_user,
t_user_friend as uf1,
t_user_friend as uf2,
t_user_friend as uf3,
t_user_friend as uf4
where
t_user.name='person8601' and
t_user.id = uf1.user_1 and
uf1.user_2 = uf2.user_1 and
uf2.user_2 = uf3.user_1 and
uf3.user_2 = uf4.user_1;
and repeats this 4 hop query 10 times. The queries need around 0.95 secs each. Mysql is configured to use a key_buffer of 4G.
neo4j performance testing
I have modified neo4j.properties:
neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=250M
and the neo4j-wrapper.conf:
wrapper.java.initmemory=2048
wrapper.java.maxmemory=8192
To warm up neo4j I do
start n=node(*) return count(n.noscenda_name);
start r=relationship(*) return count(r);
Then I start using the transactional http endpoint (but I get the same results using the neo4j-shell).
Still warming up, I run
./bin/python query_friends_neo4j.py 3 10
This creates a query of the form (with varying person ids):
{"statement": "match n:node-[r*3..3]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}
after the 7th call or so each call needs around 0.7-0.8 secs.
Now for the real thing (4 hops) I do
./bin/python query_friends_neo4j.py 4 10
creating
{"statement": "match n:node-[r*4..4]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}
and each call takes between 65 and 75 secs.
Open questions/thoughts
I'd really like see the claims in the books to be reproducable and correct, and neo4j faster then mysql instead of magnitudes slower.
But I don't know what I am doing wrong... :-(
So, my big hopes are:
I didn't do the memory settings for neo4j correctly
The query I use for neo4j is completely wrong
Any suggestions to get neo4j up to speed are highly welcome.
Thanks a lot,
Joerg
2.0 has not been performance optimized at all, so you should use 1.9.2 for comparison.
(if you use 2.0 - did you create an index for n.noscenda_name)
You can check the query plan with profile start ....
With 1.9 please use a manual index or node_auto_index for noscenda_name.
Can you try these queries:
start n=node:node_auto_index(noscenda_name={target})
match n-->()-->()-->m
return count(*);
Fulltext indexes are also more expensive than exact indexes, so keep the exact auto-index for noscenda_name.
can't get your importer to run, it fails at some point, perhaps you can share the finished neo4j database
python importer.py
reading rels
reading nodes
delete old
Traceback (most recent call last):
File "importer.py", line 9, in <module>
g.query('match n-[r]->m delete r;')
File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 99, in query
return self.call(payload)
File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 71, in call
self.transactionurl = result.headers['location']
File "/Library/Python/2.7/site-packages/requests-1.2.3-py2.7.egg/requests/structures.py", line 77, in __getitem__
return self._store[key.lower()][1]
KeyError: 'location'
Just to add to what Michael said, in the book I believe the authors are referring to a comparison that was done in the Neo4j in Action book - it's described in the free first chapter of that book.
At the top of page 7 they explain that they were using the Traversal API rather than Cypher.
I think you'll struggle to get Cypher near that level of performance at the moment so if you want to do those types of queries you'll want to use the Traversal API directly and then perhaps wrap it in an unmanaged extension.

Performance transaction scope against DB2 table

We are trying to track down the cause of a performance problem.
We have a table with a single row that contains a primary key and a counter. Within a transaction we read the value of the counter, increment the value by one and save the new value.
The read and update is done using Entity Framework, we use a serializable transaction scope, we need to ensure that a counter value is read once only.
Most of the time this takes 0.1 seconds, then sometimes it takes over 1 second. We have not been able to find any pattern as to why this happen.
Has anyone else experienced variable performance when using transaction scope? Would it help to drop using transaction scope and set the transaction directly on the connection?
I remember commenting on this question a long time ago, but recently some developers in my shop have started using TransactionScope, and have also run into performance issues. While trying to search for some information, this came up pretty high in the Google Search results.
The issue we ran into was that, apparently chaining commands (INSERTs, etc.) with BeginChain does not work when using a TransactionScope (at least on the version we are running, Client v9.7.4.4 connecting to DB2 z/OS v 10).
I thought that I would leave a workaround for the issue we ran into (slow performance when running lots [1k+] of INSERTs under TransactionScope, but ran fine when the scope was removed and chaining was allowed). I'm not really sure if it would help the original question directly, but there are some options if you look in the IBM.Data.DB2.dll classes that allow you to update rows using a DB2DataAdapter and an underlying DataTable.
Here's some example code in VB.NET:
Private Sub InsertByBulk(tableName As String, insertCollection As List(Of Object))
Dim curTimestamp = Date.Now
Using scope = New TransactionScope
'Something that opens a connection to DB2, may vary
Using conn = GetDB2Connection()
'Dumb query to get DataTable from the ResultSet
Dim sql = String.Format("SELECT * FROM {0} WHERE 1=0", tableName)
Using adapter = New DB2DataAdapter(sql, conn)
Using table As New DataTable
adapter.FillSchema(table, SchemaType.Source)
For Each item In insertCollection
Dim row = table.NewRow()
row("ID") = item.Id
row("CHAR_FIELD") = item.CharField
row("QUANTITY") = item.Quantity
row("UPDATE_TIMESTAMP") = curTimestamp
table.Rows.Add(row)
Next
Using bc = New DB2BulkCopy(conn)
bc.DestinationTableName = tableName
bc.WriteToServer(table)
End Using 'BulkCopy
End Using 'DataTable
End Using 'DataAdapter
End Using 'Connection
scope.Complete()
End Using
End Sub
We have now solved this problem.
The root of the problem was that the DB2 provider does not support transaction promotion. This results in Transaction Scope using MSDTC distributed transactions for everything.
We replaced the use of Transaction Scope with transactions set on the database connection.
Composite services that included the code in the question were then reduced from 3 seconds to 0.3 seconds.

Slow Update/insert into SQL Server CE using LinqToDatasets

I have a mobile app that is using LinqToDatasets to update/insert into a SQL Server CE 3.5 File.
My Code looks like this:
// All the MyClass Updates
MyTableAdapter myTableAdapter = new MyTableAdapter();
foreach (MyClassToInsert myClass in updates.MyClassChanges)
{
// Update the row if it is already there
int result = myTableAdapter.Update(myClass.FirstColumn,
myClass.SecondColumn,
myClass.FirstColumn);
// If the row was not there then insert it.
if (result == 0)
{
myTableAdapter.Insert(myClass.FirstColumn, myClass.SecondColumn);
}
}
This code is used to keep the hand held database in sync with the server database. Problem is if it is a full update (first time for example) there are a lot of updates (about 125). That makes this code (and more loops like it take a very long time (I have three such loops that take over 30 seconds each).
Is there a faster or better way to do updates/inserts like this?
(I did see this Codeplex Project, but I could not see how to make it work with both updates and inserts.)
You should always use SqlCeResultSet for data access on mobile devices for maximum performance and memory usage. You must identify the data to be inserted and then use code like the SqlCeBulkCopy sample, and use similar code by using the Seek and Update methods of the SqlCeResultSet.

Resources