Whats the difference between using serialize and store methods - ruby

I couldn't find much information online but it seems either methods used in the model enable the same functionality. How are they different and when should one be used over the other?
Example code:
class User < ActiveRecord::Base
store :extra_stuff
serialize :extra_stuff_too
end
Thanks!

Store wraps serialize so that you can store a hash in a column on your record. You can't however query data in a store.
Serialize basically saves the data as YAML in the record.
Serialize can store an array of things:
[thing1, thing2, thing3]
Store deals in hashes of key value pairs:
{thing1: "thing1 value", thing2: "thing2 value"}

Related

How to store filtered data?

I'm building an API which handles purely the storage of data.
Let's imagine inside Redis I've remembered the key foo:123 for 20 minutes, which holds an Eloquent Collection since I'm using the collection later on rather than returning the raw json.
As example the foo collection could look like
[
{
"name":"Doe",
"first_name":"John",
"age":42,
"favorite_color":"red"
},
{
"name":"Example",
"first_name":"Eric",
"age":37,
"favorite_color":"black"
},
....
]
How would I store a new collection, which has the same structure but entries having black as favorite_color? Would I have to store something like foo:123:black? Do I store the full collection and filter it down manually? Or is this done completely different when using Redis?
Q: How would I store a new collection, which has the same structure but entries having black as favorite_color?
Why not just map over the collection and update the favorite_color to black? https://laravel.com/docs/5.4/collections#method-map. Then store it however you want, either overwrite the old data or create new data.
Q: Would I have to store something like foo:123:black?
It's not clear to me what you're asking for here. I'm not sure we can give you any answers on how you choose to store it or the naming convention.
Q: Do I store the full collection and filter it down manually?
This is something you would need to decide based on your app requirements. It should be possible to serialize a collection and store it, but it may be a huge waste of resources if you're doing this with a bunch of different collections that have minor differences.

Confusion about hash tables

I am currently studying for some interviews, and I've heard that at some of these interviews people are sometimes asked to build a data structure from scratch, including a hash table. However, I am having some trouble ..really understanding hash tables from a programming perspective.
I've been building these data structures from scratch using C++, and I know that using templates I can create linkedlists, dynamic arrays, binary search trees, etc, that can basically store whatever type of object (as long as that object is the only type that can be stored in that instance of the hash table). So I would assume I could create a template or "generic" hash table that depending on the instance of the hash table, could store a particular object. But I have two things that confuse me:
I know that the through a hash function, the different keys are mapped to different indices in the array that makes up the hash table. But let's say you are using the hash table you created to store objects of type Book, and then let's say you create another hash table to store objects of type People. Obviously, different types of objects will have different member attributes, and one of these attributes would have to be the key. Would this mean that basically every object that you would ever want to store on the hash table you created would have to have at least one attribute that has the same name? Because your hash function would have to have some key value to hash, so it would have to know which attribute of the object it is using as a key to hash? So for example, every object that you would wanna store in this hash table would have to have an attribute called "key" that you can use when using a hash function to map to an index of the array, no? Otherwise, how would it know what "key" to hash?
This would also lead to the problem of the hash function...I've read that depending on the datasets you're given, some hash functions are better than other. So if the hash function depends on the dataset, how could you possibly create a hash table data structure that could store any type of object?
So am I just overthinking this? Should I just learn to create an easy hash table that hashes integers when practicing for my interviews? And are hash tables in real life created generically, or do people usually come up with a different hash table depending on the type of data they have?
If this question is better suited for the Computer Science theory stack exchange, please let me know. I am just finding these little details are keeping me from truly understanding this data structure.
You need to seperate the hash table from the hash function, these are different functionalities.
There are two common practices to keep your hash table generic and still be able to properly hash objects.
The first is to assume your template type (let it be T) implements
the hash method, and use it. You don't care how it is being
implemented, as long as you have it.
The other option is to have in addition to the template type, a
template function hash(T), that needed to be provided when
declaring a hash table.
This basically solves both problems: The user, who knows the data distribution better than the library reader, is supplying the hash function, and the supplied hash function works on the supplied type, regardless of what the "key" is.
If chosen the 2nd option, you could implement some default hash functions for the known and primitive types, so users won't need to reinvent the wheel for each usage of the library, when using standard types.

How do I prevent a parse.com user from seeing parts of their user data? aka set ACL per field on User class?

I add new users.
Let's presume we add a field of 'additionaldata1' on the parse user class
I do NOT want the user to be able to see the data stored in 'additionaldata1' and as such don't want it returned when I query the current parse users.
Seeing as the code is a web.app I don't want it to be possible for a user to 'hack' the local code in order to bring back 'all' their user object data.
So my question is how do I ensure that certain fields such as 'additionaldata1' are NEVER returned on the parse.com user object? Do I have to set up an additional class that is related to the user but set the ACL as non-read? Or can I set ACL per field on the user class?
EDIT//
UPDATE: I believe I worked this out myself. It doesn't appear to be possible to set ACL per field on a class. As such I have to add this data into an additional class with a RELATION and then set the ACL on that class table to 'no read' and 'no write'. That way only cloud code can see the class values due to the master key and I can run any validation and queries via cloud code where I need that data to be secure / private from the user.
This case is mentioned in Parse Docs under one-to-one relational data https://www.parse.com/docs/relations_guide#onetoone_anchor.
They recommend that you split up the data into two tables and use a one-to-one:
In Parse, a one-to-one relationship is great for situations where you need to split one object into two objects. These situations should be rare, but two examples include:
Limiting visibility of some user data. In this scenario, you would split the object in two, where one portion of the object contains data that is visible to other users, while the related object contains data that is private to the original user (and protected via ACLs).
Splitting up an object for size. In this scenario, your original object is greater than the 128K maximum size permitted for an object, so you decide to create a secondary object to house extra data. It is usually better to design your data model to avoid objects this large, rather than splitting them up. If you can't avoid doing so, you can also consider storing large data in a Parse File.

would you use an array or a custom-made class for simple data manipulation? (ruby)

I can do bit of coding in ruby. I just touched objects and I am not so object literate, I mean I do not think in objects yet :-)
I have data that I scrape from the forum on regular basis. I need fields like
author, date posted, title, category, number of views, etc etc = array in my point of view.
Then I want to be able to these in ruby
save the whole lot (quick solution is csv or xml - later probably some sql database)
sort it by field
load/read my file to update fields and do some statistics, extract some data
add new fields easily in case I need to
edit, modify my "file/database" outside ruby.
I believe that I can do every operation like change the number of views of post, change the date of the last reply in the post etc etc either using array or object.
so my Question is: would you use
...................................... custom class/object or array?
could you tell why?
It would seem logical to me, at least, to make an object for storing and working with the data that you're scraping. Typically, you'd have instance variables for each of the fields that you have mentioned (author, title, category, views, date_posted) and probably some methods to populate them from the scraped data as well as read/write them.
In terms of storing the data for these objects, using an ORM such as ActiveRecord or DataMapper makes this very easy. An ORM let's you map the data in a data store, such as MySQL, to the corresponding Ruby objects. It will also provide a bunch of convenience methods for saving, updating and querying those objects.
However, it might be a good learning experience to try writing your own methods to map the data to XML files.
Do you mean "would you use an array or a custom-made class" do process this data.
What I would probably do is create a class that stores the data you want internally as an array or hash. You would then have methods of that class you could call to perform the tasks that you describe.
An object encapsulates data with behaviour i.e. functions or operations that can be performed on data. However, array is just a data structure that has a collection of element. Basically data structures expose data and have no meaningful functions.
Since you want to perform save, sort, update, stat, etc operations on your collected data so it makes sense to have a Post object with data/attributes (like author, date posted, title, category, etc.) and the operations/methods you would like to perform on your data. Abstracting the data and behaviour of your object into a class will make your code easy to maintain and understand where you can easily see the responsibility of the class by the methods defined in that class and how those methods change the state of your object by manipulating the object attributes/data.

How would you represent a relational entity as a single unit of retrievable data in BerkeleyDB?

BerkeleyDB is the database equivalent of a Ruby hashtable or a Python dictionary except that you can store multiple values for a single key.
My question is: If you wanted to store a complex datatype in a storage structure like this, how could you go about it?
In a normal relational table, if you want to represent a Person, you create a table with columns of particular data types:
Person
-id:integer
-name:string
-age:integer
-gender:string
When it's written out like this, you can see how a person might be understood as a set of key/value pairs:
id=1
name="john";
age=18;
gender="male";
Decomposing the person into individual key/value pairs (name="john") is easy.
But in order to use the BerkeleyDB format to represent a Person, you would need some way of recomposing the person from its constituent key/value pairs.
For that, you would need to impose some artificial encapsulating structure to hold a Person together as a unit.
Is there a way to do this?
EDIT: As Robert Harvey's answer indicates, there is an entity persistence feature in the Java edition of BerkeleyDB. Unfortunately because I will be connnecting to BerkeleyDB from a Ruby application using Moneta, I will be using the standard edition which I believe requires me to create a custom solution in the absence of this support.
You can always serialize (called marshalling in Ruby) the data as a string and store that instead. The serialization can be done in several ways.
With YAML (advantage: human readable, multiple implementation in different languages):
require 'yaml'; str = person.to_yaml
With Marshalling (Ruby-only, even Ruby version specific):
Marshal.dump(person)
This will only work if class of person is an entity which does not refer to other objects you want not included. For example, references to other persons would need to be taken care of differently.
If your datastore is able to do so (and BerkeleyDB does AFAICT) I'd just store a representation of the object attributes keyed with the object Id, without splitting the object attributes in different keys.
E.g. given:
Person
-id:1
-name:"john"
-age:18
-gender:"male"
I'd store the yaml representation in BerkleyDB with the key person_1:
--- !ruby/object:Person
attributes:
id: 1
name: john
age: 18
gender: male
Instead if you need to store each attribute as a key in the datastore (why?) you should make sure the key for the person record is somewhat linked to its identifying attribute, that's the id for an ActiveRecord.
In this case you'd store these keys in BerkleyDB:
person_1_name="john";
person_1_age=18;
person_1_gender="male";
Have a look at this documentation for an Annotation Type Entity:
http://www.oracle.com/technology/documentation/berkeley-db/je/java/com/sleepycat/persist/model/Entity.html

Resources