Are ActiveRecord dynamic attribute-based finders thread-safe? - ruby

according to this (older) post these Rails 3 finders have race conditions. Something like
User.find_or_create_by_username(:username => 'uuu', :password => 'xxx')
could possibly create two records under some conditions according to the post.
Is this still relevant for Rails 3.0+ ? Thanks

Yes, it is. In the amount of time the first statement is executed and the object created, a second statement can be executed in parallel.
There's no exclusive lock.
The best way to prevent this is to add an unique validation in your model and an unique index in your database. In this way, the database will raise an error if you try to create two records with the same fields.

Related

Avoid Multiple Select on an Activerecord

I currently have the following code to find all the unique IDs of comments placed by a user. This works "fine". However, it's really slow for users with a lot of comments and I'm trying to figure out if there is a more elegant way of handling this as it doesn't seem to be the best solution.
def find_unique_user_grades
#comments = []
#environment.users.includes(:comments).map(&:comments).select do |comments|
comments.select { |comment| #comments.push(comment.id) }
end
#comments.uniq!
end
I'm hoping someone can help me with this.
You should always prefer to do this in the database, not in Ruby. The code you've posted will load all users from the database, and then convert the raw row data to ActiveRecord objects, which is (comparatively) extremely expensive. You don't need any of that data to join through to comments. Then, you'll do the same for every user's comments (query and create ActiveRecord objects) and again, you don't need any of that to get at the comment's id column.
What you're after (assuming I've guessed correctly at the shape your schema) is a simply join followed by a pluck. This will run a single query and return a single array of numbers, without any of the cost of loading users, creating objects, loading comments, creating objects, or iterating in Ruby.
Finally, it will also perform the distinct query in the database, where it can take advantage of any relevant indexes, rather than uniqing in Ruby.
The correct query is something near to:
#environment.users.joins(:comments).distinct.pluck('comments.id')

How to check unique entity in the controller

Before persisting my entity I would like to check if it doesn't already exist according to three fields.
I know how to use the annotation "UniqueEntity" but it doesn't work for me because I can't use a conventional "formType".
To summarise, my question is: In symfony 2 what's the best way to perform a unique entity check in the controller?
I already thought about get an array of Id then use an "in_array" function to decide to persist my entity or not. But I'm not sure about the efficiency of that method.
I expect that entities that already exists in my database (according to 3 fields) are not persisted.
Thank you for your answers.
It's not a very good approach but if you can't use UniqueEntity, you can execute a findBy on your repository and decide to persist your entity or not.
$entityExists = $em->getRepository('MyBundle:MyEntity')->findBy(array('field1' => $value1,'field2' => $value2,'field3' => $value3));

JPA add a condition to every single query automatically

Before anything, i must say this first: This table design is not my decision. We protest but to no avail, so please don't tell me, don't create a table like that.
We have a database with each table have a flag. This flag used to indicate which environment this row belong to, production or test data.
For server side, we have one variable which currently stored in ThreadLocal to indicate which environment this request belong to, same value as the flag in database.
Our requirement is that if my request belong to test environment then we must select only record belong to this environment. We would need to add a condition to every query we made to database, something like:
SELECT t FROM TABLE t WHERE t.flag = :environment
But we have to update every single query, update every object to set this flag before insert/update into database. This will require a lot of effort as our system already built long ago, not on progress. Also this will bring a lots of risk if someone forgot to add this to any new query.
So is there anyway to insert a condition to check this flag value for every query without have to manually edit the query string? Like an interceptor or something to put this condition in?
Which JPA provider?
With Hibernate, you could try using a #Filter.
Multitenancy could be another option, but probably an overkill in your scenario.
Finally, since you flagged the question with Oracle, perhaps the easiest approach would be to provide dedicated schemas (per environment) with views for every single table in your db, filtered by the flag column. Not sure if you're allowed to do that, though.
With some of the above, you would need a global entity listener to populate the flag field of your entities before they are persisted.

Why in the world would I have_many relationships?

I just ran into an interesting situation about relationships and databases. I am writing a ruby app and for my database I am using postgresql. I have a parent object "user" and a related object "thingies" where a user can have one or more thingies. What would be the advantage of using a separate table vs just embedding data within a field in the parent table?
Example from ActiveRecord:
using a related table:
def change
create_table :users do |i|
i.text :name
end
create_table :thingies do |i|
i.integer :thingie
i.text :discription
end
end
class User < ActiveRecord::Base
has_many :thingies
end
class Thingie < ActiveRecord::Base
belongs_to :user
end
using an embedded data structure (multidimensional array) method:
def change
create_table :users do |i|
i.text :name
i.text :thingies, array: true # example contents: [[thingie,discription],[thingie,discription]]
end
end
class User < ActiveRecord::Base
end
Relevant Information
I am using heroku and heroku-posgres as my database. I am using their free option, which limits me to 10,000 rows. This seems to make me want to use the multidimensional array way, but I don't really know.
Embedding a data structure in a field can work for simple cases but it prevents you from taking advantage of relational databases. Relational databases are designed to find, update, delete and protect your data. With an embedded field containing its own wad-o-data (array, JSON, xml etc), you wind up writing all the code to do this yourself.
There are cases where the embedded field might be more suitable, but for this question as an example I will use a case that highlights the advantages of a related table approch.
Imagine a User and Post example for a blog.
For an embedded post solution, you would have a table something like this (psuedocode - these are probably not valid ddl):
create table Users {
id int auto_increment,
name varchar(200)
post text[][],
}
With related tables, you would do something like
create table Users {
id int auto_increment,
name varchar(200)
}
create table Posts {
id auto_increment,
user_id int,
content text
}
Object Relational Mapping (ORM) tools: With the embedded post, you will be writing the code manually to add posts to a user, navigate through existing posts, validate them, delete them etc. With the separate table design, you can leverage the ActiveRecord (or whatever object relational system you are using) tools for this which should keep your code much simpler.
Flexibility: Imagine you want to add a date field to the post. You can do it with an embedded field, but you will have to write code to parse your array, validate the fields, update the existing embedded posts etc. With the separate table, this is much simpler. In addition, lets say you want to add an Editor to your system who approves all the posts. With the relational example this is easy. As an example to find all posts edited by 'Bob' with ActiveRecord, you would just need:
Editor.where(name: 'Bob').posts
For the embedded side, you would have to write code to walk through every user in the database, parse every one of their posts and look for 'Bob' in the editor field.
Performance: Imagine that you have 10,000 users with an average of 100 posts each. Now you want to find all posts done on a certain date. With the embedded field, you must loop through every record, parse the entire array of all posts, extract the dates and check agains the one you want. This will chew up both cpu and disk i/0. For the database, you can easily index the date field and pull out the exact records you need without parsing every post from every user.
Standards: Using a vendor specific data structure means that moving your application to another database could be a pain. Postgres appears to have a rich set of data types, but they are not the same as MySQL, Oracle, SQL Server etc. If you stick with standard data types, you will have a much easier time swapping backends.
These are the main issues I see off the top. I have made this mistake and paid the price for it, so unless there is a super-compelling reason do do otherwise, I would use the separate table.
what if users John and Ann have the same thingies? the records will be duplicated and if you decide to change the name of thingie you will have to change two or more records. If thingie is stored in the separate table you have to change only one record. FYI https://en.wikipedia.org/wiki/Database_normalization
Benefits of one to many:
Easier ORM (Object Relational Mapping) integration. You can use it either way, but you have to define your tables with native sql. Having distinct tables is easier and you can make use of auto-generated mappings.
Your space limitation of 10,000 rows will go further with the one to many relationship in the case that 2 or more people can have the same "thingies."
Handle users and thingies separately. In some cases, you might only care about people or thingies, not their relationship with each other. Some examples, updating a username or thingy description, getting a list of all thingies (or all users). Selecting from the single table can make it harding to work with.
Maintenance and manipulation is easier. In the case that a user or a thingy is updated (name change, email address update, etc), you only need to update 1 record in their table instead of writing update statements "where user_id=?".
Enforceable database constraints. What if a thingy is not owned by anyone? Is the user column now nillable? It would have to be in the single table case, so you could not enforce a simple "not nillable" username, for example.
There are a lot of reasons of course. If you are using a relational database, you should make use of the one to many by separating your objects (users and thingies) as separate tables. Considering your limitation on number of records and that the size of your dataset is small (under 10,000), you shouldn't feel the down side of normalized data.
The short truth is that there are benefits of both. You could, for example, get faster read times from the single table approach because you don't need complicated joins.
Here is a good reference with the pros/cons of both (normalized is the multiple table approach and denormalized is the single table approach).
http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/
Besides the benefits other mentioned, there is also one thing about standards. If you are working on this app alone, then that's not a problem, but if someone else would want to change something, then the nightmare starts.
It may take this guy a lot of time to understand how it works alone. And modifing something like this will take even more time. This way, some simple improvement may be really time consuming. And at some point, you will be working with other people. So always code like the guy who works with your code at the end is the brutal psychopath who knows where you live.

mutual exclusion in joomla

I created an extension for joomla using:
$id=$database->insertid();
I just covered that if two users are logged on to the site will fit together perform two records in the database and then this statement will return in both cases the same value.
in php you can solve this problem with the transactions.
In joomla how do I solve this problem?
If you have a table you are working with that extends JTable then make sure that you included the check out functionality that is optionally a part of that. THis must means adding a couple of fields like what is in the content table. This will prevent two people from editing the same row at the same time which creates a race condition in which one of the other will lose their data.
Please note that both php and joomla functions to return the last insert id rely on the mysql implementation, and mysql returns the last id inserted on the currently open connection so concurrency is not an issue
#iacoposk8 Your are right it might possible that in very rear case. Such time try to add current logged in user id in your sql query or any where so that it doesn't make any confict. I hope you get it what i want to say. Thanks

Resources