I would like to build a application like facebook (actually has nothing to do with facebook, but for the nature of the question we can say so).
I currently have a table named Post and another named Comment and of course I would represent the one-to-many relationship between them (I read the documentation here but wasn't really helpful to me).
In Comment I created a column with a pointer to the Post class with the parent Post.
In Post I created then a column with an Array where will be stored the related comment's id.
(each post will have a number of comments not very high, between 10 and 100).
The technique used here is the best? There are more efficient methods?
If your array is only storing the objectIDs for the comments then it's probably more idiomatic to use a Relation as the column type rather than an Array.
A Relation is more efficient in that the ID's aren't returned when you retrieve your Post object, so your Post objects will transfer faster, and it has the same disadvantages as storing the object ID's in an Array in that you'll still have to run a query to get the Comment objects. The only possible downside I can see is that if you need to have the number of comments, you can calculate this based on the size of the array, but with a Relation you'll have to run a count query (or maintain a separate count field).
With an Array you're introducing a slight data maintenance/integrity overhead as well. If your users have the ability to delete comments, then you'll also need to remove the comment ID from the array. And this will require a permissive ACL (to allow a commenter to edit a post they may not have created, and because of this they'll have the ability to edit any value in the post), or you'll have to have a before/after save action to update the Post when a Comment is deleted.
Related
I am using a dynamoDB table with GraphQL api to store posts. I want a user to be able to mark certain posts as favorites.
I have thought to create a relation table of user to post, but I also thought to just add an array of userId's to the post object with all the userIds of users who have made that post a favorite.
My understanding is a UUID is 16 bytes so even if say 10,000 users favorite the object then that array will be 160kb. Not insignificant but manageable to pass that much data with the object each time it is loaded.
Just wondering what is the best practice for this scenario. I'm pretty new to nosql.
With dynamoDB you have to think about access patterns first:
To get the favorite posts of a user, store a postsIds array in the user table
To get the users who like a post, store a likerIds array in the post table
To get a bidirectional link, do both of the above
Please also keep in mind that:
You can select fields when getting a document (only select the fields you are interested in)
I don't see a scenario where you would load 10k usernames and display them
The above solution looks pretty good for common scenarios.
More advanced solution:
There could be a more powerful way to do that using range keys. For instance:
Hash Key: postID
range key: likerID
title
...
post1
MyFancyPost
post1
user1
post1
user2
This structures is more powerful, and could store a lot of connections without having any "big" field in the post model.
you could easily paginate, and count the list of likers
can handle many more likers for only one post
I'm trying to dramatically cut down on pricey DB queries for an app I'm building, and thought I should perhaps just return IDs of a child collection (then find the related object from my React state), rather than returning the children themselves.
I suppose I'm asking, if I use 'pluck' to just return child IDs, is that more efficient than a general 'get', or would I be wasting my time with that?
Yes,pluck method is just fine if you are trying to retrieving a Single Column from tables.
If you use get() method it will retrieve all information about child model and that could lead to a little slower process for querying and get results.
So in my opinion, You are using great method for retrieving the result.
Laravel has also different methods for select queries. Here you can look Selects.
The good practice to perform DB select query in a application, is to select columns that are necessary. If id column is needed, then id column should be selected, instead of all columns. Otherwise, it will spend unnecessary memory to hold unused data. If your mind is clear, pluck and get are the same:
Model::pluck('id')
// which is the same as
Model::select('id')->get()->pluck('id');
// which is the same as
Model::get(['id'])->pluck('id');
I know i'm a little late to the party, but i was wondering this myself and i decided to research it. It proves that one method is faster than the other.
Using Model::select('id')->get() is faster than Model::get()->pluck('id').
This is because Illuminate\Support\Collection::pluck will iterate over each returned Model and extract only the selected column(s) using a PHP foreach loop, while the first method will make it cheaper in general as it is a database query instead.
I just ran into an interesting situation about relationships and databases. I am writing a ruby app and for my database I am using postgresql. I have a parent object "user" and a related object "thingies" where a user can have one or more thingies. What would be the advantage of using a separate table vs just embedding data within a field in the parent table?
Example from ActiveRecord:
using a related table:
def change
create_table :users do |i|
i.text :name
end
create_table :thingies do |i|
i.integer :thingie
i.text :discription
end
end
class User < ActiveRecord::Base
has_many :thingies
end
class Thingie < ActiveRecord::Base
belongs_to :user
end
using an embedded data structure (multidimensional array) method:
def change
create_table :users do |i|
i.text :name
i.text :thingies, array: true # example contents: [[thingie,discription],[thingie,discription]]
end
end
class User < ActiveRecord::Base
end
Relevant Information
I am using heroku and heroku-posgres as my database. I am using their free option, which limits me to 10,000 rows. This seems to make me want to use the multidimensional array way, but I don't really know.
Embedding a data structure in a field can work for simple cases but it prevents you from taking advantage of relational databases. Relational databases are designed to find, update, delete and protect your data. With an embedded field containing its own wad-o-data (array, JSON, xml etc), you wind up writing all the code to do this yourself.
There are cases where the embedded field might be more suitable, but for this question as an example I will use a case that highlights the advantages of a related table approch.
Imagine a User and Post example for a blog.
For an embedded post solution, you would have a table something like this (psuedocode - these are probably not valid ddl):
create table Users {
id int auto_increment,
name varchar(200)
post text[][],
}
With related tables, you would do something like
create table Users {
id int auto_increment,
name varchar(200)
}
create table Posts {
id auto_increment,
user_id int,
content text
}
Object Relational Mapping (ORM) tools: With the embedded post, you will be writing the code manually to add posts to a user, navigate through existing posts, validate them, delete them etc. With the separate table design, you can leverage the ActiveRecord (or whatever object relational system you are using) tools for this which should keep your code much simpler.
Flexibility: Imagine you want to add a date field to the post. You can do it with an embedded field, but you will have to write code to parse your array, validate the fields, update the existing embedded posts etc. With the separate table, this is much simpler. In addition, lets say you want to add an Editor to your system who approves all the posts. With the relational example this is easy. As an example to find all posts edited by 'Bob' with ActiveRecord, you would just need:
Editor.where(name: 'Bob').posts
For the embedded side, you would have to write code to walk through every user in the database, parse every one of their posts and look for 'Bob' in the editor field.
Performance: Imagine that you have 10,000 users with an average of 100 posts each. Now you want to find all posts done on a certain date. With the embedded field, you must loop through every record, parse the entire array of all posts, extract the dates and check agains the one you want. This will chew up both cpu and disk i/0. For the database, you can easily index the date field and pull out the exact records you need without parsing every post from every user.
Standards: Using a vendor specific data structure means that moving your application to another database could be a pain. Postgres appears to have a rich set of data types, but they are not the same as MySQL, Oracle, SQL Server etc. If you stick with standard data types, you will have a much easier time swapping backends.
These are the main issues I see off the top. I have made this mistake and paid the price for it, so unless there is a super-compelling reason do do otherwise, I would use the separate table.
what if users John and Ann have the same thingies? the records will be duplicated and if you decide to change the name of thingie you will have to change two or more records. If thingie is stored in the separate table you have to change only one record. FYI https://en.wikipedia.org/wiki/Database_normalization
Benefits of one to many:
Easier ORM (Object Relational Mapping) integration. You can use it either way, but you have to define your tables with native sql. Having distinct tables is easier and you can make use of auto-generated mappings.
Your space limitation of 10,000 rows will go further with the one to many relationship in the case that 2 or more people can have the same "thingies."
Handle users and thingies separately. In some cases, you might only care about people or thingies, not their relationship with each other. Some examples, updating a username or thingy description, getting a list of all thingies (or all users). Selecting from the single table can make it harding to work with.
Maintenance and manipulation is easier. In the case that a user or a thingy is updated (name change, email address update, etc), you only need to update 1 record in their table instead of writing update statements "where user_id=?".
Enforceable database constraints. What if a thingy is not owned by anyone? Is the user column now nillable? It would have to be in the single table case, so you could not enforce a simple "not nillable" username, for example.
There are a lot of reasons of course. If you are using a relational database, you should make use of the one to many by separating your objects (users and thingies) as separate tables. Considering your limitation on number of records and that the size of your dataset is small (under 10,000), you shouldn't feel the down side of normalized data.
The short truth is that there are benefits of both. You could, for example, get faster read times from the single table approach because you don't need complicated joins.
Here is a good reference with the pros/cons of both (normalized is the multiple table approach and denormalized is the single table approach).
http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/
Besides the benefits other mentioned, there is also one thing about standards. If you are working on this app alone, then that's not a problem, but if someone else would want to change something, then the nightmare starts.
It may take this guy a lot of time to understand how it works alone. And modifing something like this will take even more time. This way, some simple improvement may be really time consuming. And at some point, you will be working with other people. So always code like the guy who works with your code at the end is the brutal psychopath who knows where you live.
in my data model I take a statement of a user with hashtags, each hashtag is turned into a node and their co-occurrence is the relationship between them. For each relationship I need to take into account:
the user who created it rel.user property
the time it was created - rel.timestamp property
the context it was created in - rel.context property
the statement it was made in - rel.statement property
Now, Neo4J doesn't allow relationship property indexing and so when I do the search that requires me to retrieve and evaluate those properties, it takes a very long time. Specifically, when I do a Cypher request of the kind:
MERGE hashtag1-[rel:TO
{context:"deb659c0-a18d-11e3-ace9-1fa4c6cf2894",
statement:"824acc80-aaa6-11e3-88e3-453baabaa7ed",
user:"b9745f70-a13f-11e3-98c5-476729c16049"}]->hashtag2
ON CREATE
SET
rel.uid="824f6061-aaa6-11e3-88e3-453baabaa7ed",
rel.timestamp="13947117878770000";
This request first checks if there is a relationship with those properties and if there is, it won't do anything, but if there is none, it will add a new one (with a unique ID and timestamp). So then because evaluation of each relationship has to take place – and they are not indexed – it takes a very long time for this request to go through. Now I'm having a problem with such request because I'm dealing with about 100 nodes and 300 relations at one query (the one above is only 1 type, there are also a few others added to the query but those above are the most expensive ones).
Therefore the actual question:
Does anybody know of a good way to keep those relationship properties and to somehow make them work faster, so they can be retrieved and evaluated when needed faster? Or do you think I should use a different type of request (if yes, which?)
Thank you!
This almost looks to me as if you relationship should actually be a node, which then would be connected to nodes:
context
user
statement
tag1
tag2
tagN
Then you can have sensible merge options (.e.g merge on UID).
Currently you loose the power of the graph model for your relationships.
This is also discussed in the graph-databases book in the chapter with the email domain.
Do you already have your hashtag1 and hashtag2 nodes available?
And if so, how many rels already exist between these?
What Cypher has to do for this to work, is to go over each of those relationships and compare all 3 properties (which I'm not sure will fit into shortstring storage) so they have to be loaded if they are not in the cache. You can check your store files, if you have a large string-store file then those uid's might not fit into the property records and have to be loaded separately.
What is the memory config of your server (heap and mmio)?
All that adds up.
I can do bit of coding in ruby. I just touched objects and I am not so object literate, I mean I do not think in objects yet :-)
I have data that I scrape from the forum on regular basis. I need fields like
author, date posted, title, category, number of views, etc etc = array in my point of view.
Then I want to be able to these in ruby
save the whole lot (quick solution is csv or xml - later probably some sql database)
sort it by field
load/read my file to update fields and do some statistics, extract some data
add new fields easily in case I need to
edit, modify my "file/database" outside ruby.
I believe that I can do every operation like change the number of views of post, change the date of the last reply in the post etc etc either using array or object.
so my Question is: would you use
...................................... custom class/object or array?
could you tell why?
It would seem logical to me, at least, to make an object for storing and working with the data that you're scraping. Typically, you'd have instance variables for each of the fields that you have mentioned (author, title, category, views, date_posted) and probably some methods to populate them from the scraped data as well as read/write them.
In terms of storing the data for these objects, using an ORM such as ActiveRecord or DataMapper makes this very easy. An ORM let's you map the data in a data store, such as MySQL, to the corresponding Ruby objects. It will also provide a bunch of convenience methods for saving, updating and querying those objects.
However, it might be a good learning experience to try writing your own methods to map the data to XML files.
Do you mean "would you use an array or a custom-made class" do process this data.
What I would probably do is create a class that stores the data you want internally as an array or hash. You would then have methods of that class you could call to perform the tasks that you describe.
An object encapsulates data with behaviour i.e. functions or operations that can be performed on data. However, array is just a data structure that has a collection of element. Basically data structures expose data and have no meaningful functions.
Since you want to perform save, sort, update, stat, etc operations on your collected data so it makes sense to have a Post object with data/attributes (like author, date posted, title, category, etc.) and the operations/methods you would like to perform on your data. Abstracting the data and behaviour of your object into a class will make your code easy to maintain and understand where you can easily see the responsibility of the class by the methods defined in that class and how those methods change the state of your object by manipulating the object attributes/data.