Relation Table data structure in Clojure - data-structures

I am looking for Clojure data structure, that works like relation table (as in relational databases).
Map (even biderectional) id -> (val1, val2, val3, ...) does not do the job. If I, for example, want to find all rows with val2 = "something" it will took O(n).
But I want to perform search in a column in O(log n)!

Searching for rows in database with a column predicate without an index is O(n) as every row has to be checked if it matches the predicate. If there is an index for a column that your predicate uses then the index can be used to find all the rows for a specific value by looking up that value as the key in the index to fetch all the matching rows. Then it is usually log(n) operation (it depends on the internal implementation of the index, e.g. for B-tree it is log(n)).
I am not aware of out-of-the-box implementation of a Clojure data structure having such characteristics as they have usually single-purpose (e.g. map is an associative datastructure for lookup by a single key, not multiple keys as in DB with multiple indexes). You would rather need a library providing some kind of an in-memory database, for example (as mentioned by Thumbnail in his comment) DataScript, or even in-memory SQL DB with JDBC interface (e.g. H2SQL, HSQLDB or DerbyDB using their in-memory stores).
I am not sure of your specific requirements, but you could also implement some of the features yourself using basic API from Clojure. For example, you could use a Clojure set as your "table" and enhance it with some functions from clojure.set:
Your table:
(def my-table #{{:id 1 :name "John" :age 30 :gender :male}
{:id 2 :name "Jane" :age 25 :gender :female}
{:id 3 :name "Joe" :age 40 :gender :male}})
And specify your indices:
(def by-id (clojure.set/index my-table [:id]))
(def by-age (clojure.set/index my-table [:age]))
(def by-gender (clojure.set/index my-table [:gender]))
And then use your indices when querying/filtering your table:
(clojure.set/intersection
(by-age {:age 30})
(by-gender {:gender :male}))
;; => #{{:id 1, :name "John", :age 30, :gender :male}}

Related

RethinkDB composite primary key Join

I was wondering how would you do a join on a table with a composite primary key.
The composite key is achieved by using an array in the primary key field
Table 1
{id: key1, other: data}
Table 2
{id: [key1, key2], other: data}
So what I want is to join on table2.id[0] with table1
r.table("table1").eq_join("id[0]", r.table("table2")).run()
You cannot use eqJoin here because it requires keys to be strictly equal (strings are not arrays and vice versa).
This also means the best performance among all join operations, so this is why eqJoin is designed to accept a field name only, not an expression.
You seem to want innerJoin that can handle you case but sacrificing some performance (actually I'm not sure about real performance implications):
r.table('table1')
.innerJoin(
r.table('table2'),
(doc1, doc2) => doc1('id').eq(doc2('id').nth(0))
)
Note that you can use expressions you were trying to use in your question ("id[0]" merely means a field name for eqJoin).

jdbc/insert! on sqlite3 does not manage more than two rows

I am trying to batch-write to a sqlite3 db using a pooled connection as described in clojure-cookbook.
It works up to two rows. When I insert three rows I got a java.lang.ClassCastException: clojure.lang.MapEntry cannot be cast to clojure.lang.Named exception.
Here's my code:
(def db-spec {:classname "org.sqlite.JDBC"
:subprotocol "sqlite"
:subname "sqlite.db"
:init-pool-size 1
:max-pool-size 1
:partitions 1})
(jdbc/db-do-commands
*pooled-db*
(jdbc/create-table-ddl
:play
[[:primary_id :integer "PRIMARY KEY AUTOINCREMENT"]
[:data :text]]))
(jdbc/insert! *pooled-db* :play {:data "hello"}{:data "hello"})
(jdbc/insert! *pooled-db* :play {:data "hello"}{:data "hello"}{:data "hello"})
What am I missing here?
Thanks
See the docs for this example: https://github.com/clojure/java.jdbc
(j/insert-multi! mysql-db :fruit
[{:name "Apple" :appearance "rosy" :cost 24}
{:name "Orange" :appearance "round" :cost 49}])
The API docs say this:
insert-multi!
function
Usage: (insert-multi! db table rows)
(insert-multi! db table cols-or-rows values-or-opts)
(insert-multi! db table cols values opts)
Given a database connection, a table name and either a sequence of maps (for
rows) or a sequence of column names, followed by a sequence of vectors (for
the values in each row), and possibly a map of options, insert that data into
the database.
When inserting rows as a sequence of maps, the result is a sequence of the
generated keys, if available (note: PostgreSQL returns the whole rows).
When inserting rows as a sequence of lists of column values, the result is
a sequence of the counts of rows affected (a sequence of 1's), if available.
Yes, that is singularly unhelpful. Thank you getUpdateCount and executeBatch!
The :transaction? option specifies whether to run in a transaction or not.
The default is true (use a transaction). The :entities option specifies how
to convert the table name and column names to SQL entities.

Better ways to traverse a map of maps

I'm doing some data analytics aggregations and here is my data structures:
{
12300 {
views {
page-1 {
link-2 40
link-6 9
}
page-7 {
link-3 9
link-11 8
}
}
buttons {
page-1 {
link-22 2
}
}
}
34000 ....
}
Where 12300, 34000 are a time values.
What I want to do is to traverse that data structure and insert entries into a database, something like this:
insert into views (page, link, hits, time) values (page-1, link-2, 40, 12300)
insert into views (page, link, hits, time) values (page-1, link-6, 9, 12300)
What would be an idiomatic way to code that? Am I complicating the data structure? do you suggest any better way to collect the data?
Assuming you have a jdbc connection from clojure.java.jdbc, this should come close to what you want.
(jdbc/do-prepared "INSERT INTO views (page, link, hits, time) VALUES (?, ?, ?, ?)"
(for [[time data] m
[data-type page-data] data
[page links] page-data
[link hits] links]
[page link hits time]))
;; why aren't we using data-type, eg buttons?
Edit for clarified problem
(let [m '{12300 {views {page-1 {link-2 40
link-6 9}
page-7 {link-3 9
link-11 8}}
buttons {page-1 {link-22 2}}}
34000 {views {page-2 {link-2 5}}}}]
(doseq [[table rows] (group-by :table (for [[time table] m
[table-name page-data] table
[page links] page-data
[link hits] links]
{:table table-name, :row [page link hits time]}))]
(jdbc/do-prepared (format "INSERT INTO %s (page, link, hits, time) VALUES (?, ?, ?, ?)" table)
(map :row rows))))
Simple solution: take advantage of the fact that you are using maps of maps and use get-in, assoc-in functions to view/change data. See these for examples:
http://clojuredocs.org/clojure_core/clojure.core/get-in
http://clojuredocs.org/clojure_core/clojure.core/assoc-in
Advanced solution: use functional zippers. This allows you to traverse and change a tree-like structure in a functional manner.
An example here:
http://en.wikibooks.org/wiki/Clojure_Programming/Examples/API_Examples/Advanced_Data_Structures#zipper
If you've got special data structures, not maps of maps, you can create a zipper yourself by simply implementing the 3 required methods. After that, all zipper functions will work on your data structure, too.

Queries on ActiveRecord Association collection object

I have a set of rows which I've fetched from a table. Let's say the object Rating. After fetching this object, I have say 100 entries from the database.
The Rating object might look like this:
table_ratings
t.integer :num
So what I now want to do is perform some calculations on these 100 rows without performing any other queries. I can do this, running an additional query:
r = Rating.all
good = r.where('num = 2') # this runs an additional query
"#{good}/#{r.length}"
This is a very rough idea of what I want to do, and I have other more complex output to build. Let's imagine I have over 10 different calculations I need to perform on these rows, and so aliasing these in a sql query might not be the best idea.
What would be an efficient way to replace the r.where query above? Is there a Ruby method or a gem which would give me a similar query api into ActiveRecord Association collection objects?
Rating.all returns an array of all Rating objects. From there, shift your focus to selecting and mapping from the array. eg.:
#perfect_ratings = r.select{|x| x.a == 100}
See:
http://www.ruby-doc.org/core-1.9.3/Array.html
ADDITIONAL COMMENTS:
Going over the list of methods available for array, I find myself using the following frequently:
To check a variable against multiple values:
%w[dog cat llama].include? #pet_type # returns true if #pet_type == 'cat'
To create another array(map and collect are aliases):
%w[dog cat llama].map(|pet| pet.capitalize) # ["Dog", "Cat", "Llama"]
To sort and drop duplicates:
%w[dog cat llama dog].sort.uniq # ["cat", "dog", "llama"]
<< to add an element, + to add arrays, flatten to flatten embedded arrays into a single level array, count or length or size for number of elements, and join are the others I tend to use a lot.
Finally, here is an example of join:
%w[dog cat llama].join(' or ') # "dog or cat or llama"

MongoDB performance. Embedded documents search speed

I was wandering what keep MongoDB faster. Having a few parent documents with big arrays of embedded documents inside of them or having a lot of parent documents with few embedded documents inside.
This question only regards querying speed. I'm not concerned with the amount of repeated information, unless you tell me that it influences the search speed. (I don't know if MongoDb automatically indexes Id's)
Example:
Having the following Entities with only an Id field each one:
Class (8 different classes )
Student ( 100 different students )
In order to associate students with classes, would I be taking most advantage of MongoDB's speed if I:
Stored all Students in arrays, inside the classes they attend
Inside each student, I kept an array with the classes they attend.
This example is just an example. A real sittuation would involve thousands of documents.
I am going to search for specific students inside a given class.
If so, you should have a Student collection, with a field set to the class (just the class id is maybe better than an embedded and duplicated class document).
Otherwise, you will not be able to query for students properly:
db.students.find ({ class: 'Math101', gender: 'f' , age: 22 })
will work as expected, whereas storing the students inside the classes they attend
{ _id: 'Math101', student: [
{ name: 'Jim', age: 22 } , { name: 'Mary', age: 23 }
] }
has (in addition to duplication) the problem that the query
db.classes.find ( { _id: 'Math101', 'student.gender': 'f', 'student.age': 22 })
will give you the Math class with all students, as long as there is at least one female student and at least one 22-year-old student in it (who could be male).
You can only get a list of the main documents, and it will contain all embedded documents, unfiltered, see also this related question.
I don't know if MongoDb automatically indexes Id
The only automatic index is the primary key _id of the "main" document. Any _id field of embedded documents is not automatically indexed, but you can create such an index manually.

Resources