Clojure jdbc - query single column flattened result - jdbc

I'm trying to read data (cca 760k rows) from a single column into one (flattened) vector. Result of clojure.java.jdbc/query is seq of maps, e.g. ({:key "a"} {:key "b"} ...). With option :as-arrays? true provided, [[:key] ["a"] ["b"] ...] is returned. To flatten the result, I've also used option :row-fn first and got [:key "a" "b" ...]. Finally, I've applied rest to get rid of the :key.
Wrapping and unwrapping of rows with vectors seems like a lot of unnecessary work. I'm also not happy with performance. Is there a faster / more idiomatic way? I've tried...
(jdbc/with-db-connection [con -db-spec-]
(with-open [^Statement stmt (.createStatement (:connection con))
^ResultSet res (.executeQuery stmt query)]
(let [ret (ArrayList.)]
(while (.next res)
(.add ret (.getString res 1)))
(into [] ret))))
... but it's not much faster, and it's ugly.
EDIT
Nicer way to do it is via transducers (see here):
(into []
(map :key)
(jdbc/reducible-query
connection
["SELECT key FROM tbl"]
{:raw? true}))

You can just use :row-fn :key. Not sure what performance you are expecting but on my i5 PC, retrieving 760K records took ~3 seconds (H2 file based database)
(time
(count
(jdbc/query db ["select top 760000 key from table1"] {:row-fn :key})))
;; => 760000
"Elapsed time: 3003.456295 msecs"

Related

Clojurescript, how to access a map within a indexed sequence

(println (get-in #app-state ["my-seq"]))
Returns the following sequence with type cljs.core/IndexedSeq
([-Jg95JpA3_N3ejztiBBM {create_date 1421803605375,
website "www.example.com", first_name "one"}]
[-Jg95LjI7YWiWE233eK1 {create_date 1421803613191,
website "www.example.com", first_name "two"}]
[-Jg95MwOBXxOuHyMJDlI {create_date 1421803618124,
website "www.example.com", first_name "three"}])
How can I access the maps in the sequence by uid? For example, the map belonging to
-Jg95LjI7YWiWE233eK1
If you need the order of the data, you have the following possibilities:
Store the data once in order and once as a map. So, when adding a new entry, do something like:
(defn add-entry
[uid data]
(swap! app-state update-in ["my-seq"]
#(-> %
(update-in [:sorted] conj data)
(update-in [:by-uid] assoc uid data))))
With lookup functions being:
(defn sorted-entries
[]
(get-in #app-state ["my-seq" :sorted]))
(defn entry-by-uid
[uid]
(get-in #app-state ["my-seq" :by-uid uid]))
This has the best lookup performance but will use more memory and make the code a little bit more complex.
Search the entry in the seq:
(defn entry-by-uid
[uid]
(->> (get #app-state "my-seq")
(filter (comp #{uid} first))
(first)))
In the worst case, this has to traverse the whole seq to find your entry.
If order does not matter, I recommend storing the data as a map in the first place:
(defn add-entry
[uid data]
(swap! app-state assoc-in ["my-seq" uid] data))
(defn entry-by-uid
[uid]
(get-in #app-state ["my-seq" uid]))

Dynamic values passing in Prepared Statement

I am using Clojure.java.jdbc for database access in clojure.
I wanted to use prepared statements with select.
From my previous question I got the answer like this,
(jdbc/query (:conn dbinfo)
["select * from users where username = ? and password = ?"
"harikk09"
"amma123"])
It is working also.
Now,
this parameter list I want to make dynamic. so I write a function like,
(defn values-builder (fn[param] (:value #(:value (param 1)))))
which actually works correctly and return a collection of values using a println.
(println (map values-builder params))
gives
(harikk09 amma123)
But when I tried to execute it like this, where sql-query is the previously mentioned query
(jdbc/query (:conn dbinfo) sql-query (map values-builder params))
, it throws an exception:
Caused by: java.lang.IllegalArgumentException: No value supplied for key:
Clojure.lang.LazySeq#ab5111fa
Can anyone help me to rectify this error?
I think clojure expects a list of parameters without () or [].
The JDBC query and prepared values together need to be a collection. So you need to make a collection out of a string and a collection of parametrized values. To prepend a single item onto the front of a collection, use cons
(jdbc/query (:conn dbinfo) (cons sql-query (map values-builder params)))
Use apply to splice in the arguments
(apply jdbc/query (:conn dbinfo) sql-query (map values-builder params))
Update: as noted below apply won't work since sql needs to be in a vector with the params
in that case you need to cons the sql query onto the generated params list
(jdbc/query (:conn dbinfo) (cons sql-query (map values-builder params)))

clojure sqlkorma library: out of memory error

I'm doing what I thought was a fairly straightforward task: run a sql query (over about 65K rows of data) using sqlkorma library (http://sqlkorma.com), and for each row transforming it in some way, and then writing to CSV file. I don't really think that 65K rows is all that large given that I have a 8GB laptop, but I also assumed that a sql result set would be lazily fetched and so the whole thing would never get held in memory at the same time. So I was really really surprised when I ended up with this stack trace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentHashMap$BitmapIndexedNode.assoc(PersistentHashMap.java:777)
at clojure.lang.PersistentHashMap.createNode(PersistentHashMap.java:1101)
at clojure.lang.PersistentHashMap.access$600(PersistentHashMap.java:28)
at clojure.lang.PersistentHashMap$BitmapIndexedNode.assoc(PersistentHashMap.java:749)
at clojure.lang.PersistentHashMap$TransientHashMap.doAssoc(PersistentHashMap.java:269)
at clojure.lang.ATransientMap.assoc(ATransientMap.java:64)
at clojure.lang.PersistentHashMap.create(PersistentHashMap.java:56)
at clojure.lang.PersistentHashMap.create(PersistentHashMap.java:100)
at clojure.lang.PersistentArrayMap.createHT(PersistentArrayMap.java:61)
at clojure.lang.PersistentArrayMap.assoc(PersistentArrayMap.java:201)
at clojure.lang.PersistentArrayMap.assoc(PersistentArrayMap.java:29)
at clojure.lang.RT.assoc(RT.java:702)
at clojure.core$assoc.invoke(core.clj:187)
at clojure.core$zipmap.invoke(core.clj:2715)
at clojure.java.jdbc$resultset_seq$thisfn__204.invoke(jdbc.clj:243)
at clojure.java.jdbc$resultset_seq$thisfn__204$fn__205.invoke(jdbc.clj:243)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.Cons.next(Cons.java:39)
at clojure.lang.PersistentVector.create(PersistentVector.java:51)
at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:31)
at clojure.core$vec.invoke(core.clj:354)
at korma.db$exec_sql$fn__343.invoke(db.clj:203)
at clojure.java.jdbc$with_query_results_STAR_.invoke(jdbc.clj:669)
at korma.db$exec_sql.invoke(db.clj:202)
at korma.db$do_query$fn__351.invoke(db.clj:225)
at clojure.java.jdbc$with_connection_STAR_.invoke(jdbc.clj:309)
at korma.db$do_query.invoke(db.clj:224)
at korma.core$exec.invoke(core.clj:474)
at db$query_db.invoke(db.clj:23)
at main$_main.doInvoke(main.clj:32)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
As far as I can tell from the stack, it has not made it outside the query code (meaning it hasn't reached my transformation/write to CSV code at all). If it matters, my sql is fairly straightforward, basically SELECT * FROM my_table WHERE SOME_ID IS NOT NULL AND ROWNUM < 65000 ORDER BY some_id ASC. This is oracle (to explain rownum above), but I don' think that matters.
EDIT:
Code sample:
(defmacro query-and-print [q] `(do (dry-run ~q) ~q))
(defn query-db []
(query-and-print
(select my_table
(where (and (not= :MY_ID "BAD DATA")
(not= :MY_ID nil)
(raw (str "rownum < " rows))))
(order :MY_ID :asc))))
; args contains rows 65000, and configure-app sets up the jdbc
; connection string, and sets a var with rows value
(defn -main [& args]
(when (configure-app args)
(let [results (query-db)
dedup (dedup-with-merge results)]
(println "Result size: " (count results))
(println "Dedup size: " (count dedup))
(to-csv "target/out.csv" (transform-data dedup)))))
clojure.java.sql creates lazy sequences:
(defn resultset-seq
"Creates and returns a lazy sequence of maps corresponding to
the rows in the java.sql.ResultSet rs. Based on clojure.core/resultset-seq
but it respects the current naming strategy. Duplicate column names are
made unique by appending _N before applying the naming strategy (where
N is a unique integer)."
[^ResultSet rs]
(let [rsmeta (.getMetaData rs)
idxs (range 1 (inc (.getColumnCount rsmeta)))
keys (->> idxs
(map (fn [^Integer i] (.getColumnLabel rsmeta i)))
make-cols-unique
(map (comp keyword *as-key*)))
row-values (fn [] (map (fn [^Integer i] (.getObject rs i)) idxs))
rows (fn thisfn []
(when (.next rs)
(cons (zipmap keys (row-values)) (lazy-seq (thisfn)))))]
(rows)))
Korma fully realizes the sequence by dropping each row to a vector:
(defn- exec-sql [{:keys [results sql-str params]}]
(try
(case results
:results (jdbc/with-query-results rs (apply vector sql-str params)
(vec rs))
:keys (jdbc/do-prepared-return-keys sql-str params)
(jdbc/do-prepared sql-str params))
(catch Exception e
(handle-exception e sql-str params))))
Besides the with-lazy-results route in https://github.com/korma/Korma/pull/66, as a completely different way to resovle the problem, you can simply increase the heap size available to your JVM by setting the appropriate flag. JVMs are not allowed to use all the free memory on your machine; they are strictly limited to the amount you tell them they're allowed to use.0
One way to do this is to set :jvm-opts ["-Xmx4g"] in your project.clj file. (Adjust the exact heap size as necessary.) Another way is to do something like:
export JAVA_OPTS=-Xmx:4g
lein repl # or whatever lanuches your Clojure process
The with-lazy-results route is better in the sense that you can operate on any sized result set, but it's not merged into mainline Korma and requires some updating to work with recent versions. It's good to know how to adjust the JVM's allowed heap size anyway.

Access Java fields dynamically in Clojure?

I'm a novice in clojure and java.
In order to access a Java field in Clojure you can do:
Classname/staticField
which is just the same as
(. Classname staticField)
(correct me if I'm wrong)
How can I access a static field when the name of the field in held within a variable? i.e.:
(let [key-stroke 'VK_L
key-event KeyEvent/key-stroke])
I want key-stroke to be evaluated into the symbol VK_L before it tries to access the field.
In this case you might want to do something like the following:
user=> (def m 'parseInt)
#'user/m
user=> `(. Integer ~m "10")
(. java.lang.Integer parseInt "10")
user=> (eval `(. Integer ~m "10"))
10
If you feel like it's a bit hack-ish, it's because it really is such. There might be better ways to structure the code, but at least this should work.
I made a small clojure library that translates public fields to clojure maps using the reflection API:
(ns your.namespace
(:use nl.zeekat.java.fields))
(deftype TestType
[field1 field2])
; fully dynamic:
(fields (TestType. 1 2))
=> {:field1 1 :field2 2}
; optimized for specific class:
(def-fields rec-map TestType)
(rec-map (TestType. 1 2))
=> {:field1 1 :field2 2}
See https://github.com/joodie/clj-java-fields
Reflection is probably the proper route to take, but the easiest way is
(eval (read-string (str "KeyEvent/" key-stroke)))
This will just convert the generated string into valid clojure and then evaluate it.
You can construct the right call as an s-expression and evaluate it as follows:
(let [key-stroke 'VK_L
key-event (eval `(. KeyEvent ~key-stroke))]
....do things with your key-event......)
However, if what you are trying to do is test whether a particular key has been pressed, you may not need any of this: I usually find that it is easiest to write code something like:
(cond
(= KeyEvent/VK_L key-value)
"Handle L"
(= KeyEvent/VK_M key-value)
"Handle M"
:else
"Handle some other key")

Can you have hash tables in lisp?

Can you have hash tables or dicts in Lisp? I mean the data structure that is a collection of pairs (key, value) where values can be acceded using keys.
Common Lisp has at least four different ways to do that (key value storage):
property lists (:foo 1 :bar 2)
assoc lists ((:foo . 1) (:bar . 2))
hash tables
CLOS objects (slot-value foo 'bar) to get and (setf (slot-value foo 'bar) 42) to set. The slot name can be stored in a variable: (let ((name 'bar)) (slot-value foo name)) .
For simple usage assoc lists or property lists are fine. With a larger number of elements they tend to get 'slow'. Hash tables are 'faster' but have their own tradeoffs. CLOS objects are used like in many other object systems. The keys are the slot-names defined in a CLOS class. Though it is possible to program variants that can add and remove slots on access.
Of course - Common Lisp has hash tables.
(setq a (make-hash-table))
(setf (gethash 'color a) 'brown)
(setf (gethash 'name a) 'fred)
(gethash 'color a) => brown
(gethash 'name a) => fred
(gethash 'pointy a) => nil
Property lists are good for very small examples of demonstrative purpose, but for any real need their performance is abysmal, so use hash tables.
If you're referring to Common Lisp, hash tables are provided by a type called hash-table.
Using these tables involves creating one with function make-hash-table, reading values with gethash, setting them by using gethash as a place in concert with setf, and removing entries with remhash.
The mapping from key value to hash code is available outside of hash tables with the function sxhash.
Clojure has a built-in map type:
user=> (def m {:foo "bar" :baz "bla"})
#'user/m
user=> (m :foo)
"bar"
See http://clojure.org/data_structures
Sure. Here's the SRFI defining the standard hash table libraries in Scheme:
http://srfi.schemers.org/srfi-69/srfi-69.html
There's built-in hash tables, that use a system hash function (typically SXHASH) and where you can have a couple of different equality checkers (EQ, EQL, EQUAL or EQUALP depending on what you consider to be "the same" key).
If the built-in hash tables are not good enough, there's also a generic hash table library. It will accept any pair of "hash generator"/"key comparator" and build you a hash table. However, it relies on having a good hash function to work well and that is not necessarily trivial to write.
In Lisp it's usually called a property list.

Resources