Update document after printing result - rethinkdb

I'm trying to retrieve a list of documents, do something with the returned documents, then update the status to flag its been processed. This is what I have:
cursor = r.db("stuff").table("test").filter(r.row["subject"] == "books").run()
for document in cursor:
print(document["subject"])
document.update({"processed": True})
This seems to run OK but the "processed" field does not get updated as I would have expected. I'm probably approaching this incorrectly, so any pointers would be appreciated here.
UPDATE
This seems to work OK but I can't help thinking its somewhat inefficient:
cursor = r.db("stuff").table("test").filter(r.row["subject"] == "books").run()
for document in cursor:
print(document["subject"])
r.db("certs").table("test").get(document['id']).update({"tpp_processed": True}).run()

1. Using a for_each
Instead of doing an update with a run every time you want to update a single document, you can saves the changes in an array and then use forEach to update all the documents in one query. This would look something like this:
cursor = r.table('30693613').filter(r.row["subject"] == "book").run(conn)
arr = list(cursor)
for row in arr:
row['processed'] = True
r.expr(arr)
.for_each(lambda row: r.table('30693613').get(row["id"]).update(row))
.run(conn)
Instead of doing N networks calls for every update, this will only execute one network call.
2. Building an update array and using forEach
You can also do something similar in which you build an array and you just run one query at the end:
cursor = r.db("stuff").table("test").filter(r.row["subject"] == "books").run()
updated_rows = {}
for document in cursor:
print(document["subject"])
updated_rows.append({ id: document["id"], "tpp_processed": True }
// Afterwards...
r.expr(updated_rows)
.for_each(lambda row: r.table('30693613').get(row["id"]).update(row))
.run(conn)
3. Using no_reply
Finally, you can keep your query exactly the same as it is and then just run with noreply. That wait, your code will just keep running and won't wait till the database gets back a response.
cursor = r.db("stuff").table("test").filter(r.row["subject"] == "books").run()
for document in cursor:
print(document["subject"])
r.db("certs").table("test").get(document['id']).update({"tpp_processed": True}).run(conn, noreply=True)

Related

Proper Upsert (Atomic Update Counter Field or Insert Document) with RethinkDB

After looking at some SO questions and issues on RethinkDB github, I failed to come to a clear conclusion if atomic Upsert is possible?
Essentially I would like to perform the same operation as ZINCRBY using Redis.
If member does not exist in the sorted set, it is added with increment
as its score (as if its previous score was 0.0). If key does not
exist, a new sorted set with the specified member as its sole member
is created.
The current implementation appears to differ from almost all databases that I have used. With the data being replaced or inserted not updated. This is a simple use case, like update the last visit, update the number of clicks, update a product quantity. So I must be missing something very obvious, because I cannot see a simple way to do this.
Yes, it is possible. After get on the key, perform an atomic replace. Something like this might work:
function set_or_increment_score(player, points){
return r.table('scores').get(player).replace(
row =>
{ id: player,
score: r.branch(
row.eq(null),
points,
row('score').add(points))
});
}
It has the following behaviour:
> set_or_increment_score("alice", 1).run(conn)
{ inserted: 1 }
> set_or_increment_score("alice", 2).run(conn)
{ replaced: 1 }
It works because get returns null when the document doesn't exist, and a replace on a non-existing document tuns into an insert. See the documentation for replace
So I end up using the following code to go around the no Update issue.
r.db("test").table("t").insert(
{id:"A", type:"player", species:"warrior", score:0, xp:0, armor:0},
{conflict: function(id, oldDoc, newDoc) {
return newDoc.merge(oldDoc).merge(
{armor: oldDoc("armor").add(1)});
}
}
)
Do you think this is more readable/elegant or do you see any issues with the code compared to your sample?

Groovy Sql rows

Hello I am trying to get rows using Groovy Sql connection but it returns me records as a List inside a List. The following:
Sql sql = new Sql(dataSource)
List<GroovyRowResult> row = sql.rows('select * from user where username=:userName and password=:password, [userName:'groovy',password:'123'])
returns the result as [[return record as map]]
Any one help me to figure out why the result is a List inside a List. How will I get it as a single level List using the rows method?
Your results are coming back as a list of maps, not a list of lists. Look at the ':' and ',' chars in the inner part. You can use standard groovy extraction of values from these.
In your case, it looks like you're using a primary key search, so will only return one result, so use firstRow in this case, so that you don't have to extract the single map result from the list.
See the documentation for the groovy Sql class for examples.
In the more general case where you are returning multiple rows, then your data probably looks like this:
[[username:"foo", password:"foopass"], [username:"bar", password:"barpass"]]
Assuming the line:
def results = sql.rows('select * from user')
You can then do things like spread operators:
assert results.username == ["foo", "bar"]
assert results.password == ["foopass", "barpass"]
or iterate over the results
results.each { println it.username }
==> foo
==> bar
or use any of the many Collection functions
println results.collect { "${it.username} -> ${it.password}" }
==> [ "foo -> foopass", "bar -> barpass" ]
I think your main issue was not recognising a single map entry in a list.
It doesn't return a List inside a List, it returns a List of Map with each map containing the columns selected from your select.
So if you want all of the usernames selected (as a List), you can just do:
def usernames = row.username
If you just want a single row, you can do:
GroovyRowResult row = sql.firstRow('select * from user where username=:userName and password=:password, [userName:'groovy',password:'123'])
And then this will effectively just be a map with each key being the field name selected, and each value being the value of the first row of each field

Select one unique instance from LINQ query

I'm using LINQ to SQL to obtain data from a set of database tables. The database design is such that given a unique ID from one table (Table A) one and only one instance should be returned from an associated table (Table B).
Is there a more concise way to compose this query and ensure that only one item was returned without using the .Count() extension method like below:
var set = from itemFromA in this.dataContext.TableA
where itemFromA.ID == inputID
select itemFromA.ItemFromB;
if (set.Count() != 1)
{
// Exception!
}
// Have to get individual instance using FirstOrDefault or Take(1)
FirstOrDefault helps somewhat but I want to ensure that the returned set contains only one instance and not more.
It sounds like you want Single:
var set = from itemFromA in this.dataContext.TableA
where itemFromA.ID == inputID
select itemFromA.ItemFromB;
var onlyValue = set.Single();
Documentation states:
Returns the only element of a sequence, and throws an exception if there is not exactly one element in the sequence.
Of course that means you don't get to customize the message of the exception... if you need to do that, I'd use something like:
// Make sure that even if something is hideously wrong, we only transfer data
// for two elements...
var list = set.Take(2).ToList();
if (list.Count != 1)
{
// Throw an exception
}
var item = list[0];
The benefit of this over your current code is that it will avoid evaluating the query more than once.

NHibernate IQueryable doesn't seem to delay execution

I'm using NHibernate 3.2 and I have a repository method that looks like:
public IEnumerable<MyModel> GetActiveMyModel()
{
return from m in Session.Query<MyModel>()
where m.Active == true
select m;
}
Which works as expected. However, sometimes when I use this method I want to filter it further:
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.ID < 100
select new { m.Name };
Which produces the same SQL as the first one and the second filter and select must be done after the fact. I thought the whole point in LINQ is that it formed an expression tree that was unravelled when it's needed and therefore the correct SQL for the job could be created, saving my database requests.
If not, it means all of my repository methods have to return exactly what is needed and I can't make use of LINQ further down the chain without taking a penalty.
Have I got this wrong?
Updated
In response to the comment below: I omitted the line where I iterate over the results, which causes the initial SQL to be run (WHERE Active = 1) and the second filter (ID < 100) is obviously done in .NET.
Also, If I replace the second chunk of code with
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.Items.Count > 0
select new { m.Name };
It generates the initial SQL to retrieve the active records and then runs a separate SQL statement for each record to find out how many Items it has, rather than writing something like I'd expect:
SELECT Name
FROM MyModel m
WHERE Active = 1
AND (SELECT COUNT(*) FROM Items WHERE MyModelID = m.ID) > 0
You are returning IEnumerable<MyModel> from the method, which will cause in-memory evaluation from that point on, even if the underlying sequence is IQueryable<MyModel>.
If you want to allow code after GetActiveMyModel to add to the SQL query, return IQueryable<MyModel> instead.
You're running IEnumerable's extension method "Where" instead of IQueryable's. It will still evaluate lazily and give the same output, however it evaluates the IQueryable on entry and you're filtering the collection in memory instead of against the database.
When you later add an extra condition on another table (the count), it has to lazily fetch each and every one of the Items collections from the database since it has already evaluated the IQueryable before it knew about the condition.
(Yes, I would also like to be the extensive extension methods on IEnumerable to instead be virtual members, but, alas, they're not)

Is this a LINQ lazy loading problem?

Something very strange is happening in my program:
I make this query agt.DefaultNr == 1 on a collection and get 3 items as Result:
IEnumerable<Agent> favAgents =
from agt in builtAgents where agt.DefaultNr == 1 select agt;
For every item I set the DefaultNr = 0
foreach (Agent noFavAgt in favAgents)
{
noFavAgt.DefaultNr = 0;
}
I do another query but for some reason my favAgents collection is empty now!
IEnumerable<Agent> smallAgents = (from agt in favAgents
where agt.tempResultCount < 30
orderby agt.tempResultCount descending
select agt);
What is going on here?
Is this a LINQ lazy loading problem?
Looks like there will be some kind of re-query after I set all items = 0 because I my collection is empty!
This is not lazy loading, it's deferred execution. When you define your initial enumerable, you're defining a query, not a collection. You're correct that it's performing a requery; every time you iterate over favAgents, it will execute the query that you defined. If you want to create a list based off of that query that doesn't change, add ToList().
var favAgents =
(from agt in builtAgents where agt.DefaultNr == 1 select agt).ToList();
Doing this will create a list in memory and cache the results of the query at that point in time.
Yes, your favAgents collection will be empty now - you've "turned off" the bit in each element of it that made it match the query! If you iterate over favAgents twice, it will execute the query twice. favAgents represents the query itself, not the results.
If you want to preserve one particular set of results, use ToList or something similar:
favAgents = favAgents.ToList();
That will materialize the query - perform it once and then remember the results in a list, basically. ToArray would have the same effect, but store the results in an array instead.

Resources