Yes, I am aware of the emacs profiler feature. I'm looking for something similar to the time keyword in bash, something like:
(time (myfunc))
which would return or print the time taken by the myfunc call. Is there such a thing?
benchmark.el provides benchmark-run and benchmark-run-compiled functions as well as a benchmark version to run interactively. The linked example:
C-u 512 M-x benchmark (sort (number-sequence 1 512) '<)
Elapsed time: 0.260000s (0.046000s in 6 GCs)
The timer used by all those functions is the benchmark-elapse macro, which you can also use directly if desired:
ELISP> (require 'benchmark)
ELISP> (benchmark-elapse
(sit-for 2))
2.00707889
I found exactly what I was looking for at http://nullprogram.com/blog/2009/05/28/
(defmacro measure-time (&rest body)
"Measure and return the running time of the code block."
(declare (indent defun))
(let ((start (make-symbol "start")))
`(let ((,start (float-time)))
,#body
(- (float-time) ,start))))
Related
I have the following directories:
mod01
mod02
mod03
...
mod100
When I use
(list (directory-files dir t "\\(mod\\)\\([0-9]\\)" nil))
the output is:
mod01
mod02
mod03
...
mod10
mod100
...
mod99
As you can see, mod100 is not in the correct position.The desired output is:
mod01
mod02
...
mod10
mod11
...
mod100
Thank you for your advice
Supply a custom predicate function extracting the numeric part:
(sort
(directory-files dir t "\\(mod\\)\\([0-9]\\)" nil)
(lambda (x y)
(<
(string-to-number (replace-regexp-in-string ".*mod\\([[:digit:]]+\\).*" "\\1" x))
(string-to-number (replace-regexp-in-string ".*mod\\([[:digit:]]+\\).*" "\\1" y)))))
As described in the help doc for directory-files sorting uses the predicate string-lessp, for which (string-lessp "100" "9") returns t. You could write your own predicate and set nosort to true and use cl-sort to sort the contents by extracting the numeric part of the strings. If you are on a machine with access to sort -V, you could just wrap a shell command,
(defun my-sort (&optional dir)
(interactive "D")
(with-temp-buffer
(shell-command
(concat "ls " (shell-quote-argument (or dir default-directory)) "| sort -V")
(current-buffer))
(split-string (buffer-string) "\n")))
Using sort's version sorting should result in the desired ordering.
Preamble
I was looking throught the source code in clojure.core for no particular reason.
I started reading defmacro ns, here is the abridged source:
(defmacro ns
"...docstring..."
{:arglists '([name docstring? attr-map? references*])
:added "1.0"}
[name & references]
(let [...
; Argument processing here.
name-metadata (meta name)]
`(do
(clojure.core/in-ns '~name)
~#(when name-metadata
`((.resetMeta (clojure.lang.Namespace/find '~name) ~name-metadata)))
(with-loading-context
~#(when gen-class-call (list gen-class-call))
~#(when (and (not= name 'clojure.core) (not-any? #(= :refer-clojure (first %)) references))
`((clojure.core/refer '~'clojure.core)))
~#(map process-reference references))
(if (.equals '~name 'clojure.core)
nil
(do (dosync (commute ##'*loaded-libs* conj '~name)) nil)))))
Looking Closer
And then trying to read it I saw some strange macro patterns, in particular we can look at:
~#(when name-metadata
`((.resetMeta (clojure.lang.Namespace/find '~name) ~name-metadata)))
The clojure.core version
Here is a standalone working extraction from the macro:
(let [name-metadata 'name-metadata
name 'name]
`(do
~#(when name-metadata
`((.resetMeta (clojure.lang.Namespace/find '~name) ~name-metadata)))))
=> (do (.resetMeta (clojure.lang.Namespace/find (quote name)) name-metadata))
When I ran this could I couldn't help but wonder why there is a double list at the point `((.resetMeta.
My version
I found that by just removing the unquote-splicing (~#) the double list was unnecessary. Here is a working standalone example:
(let [name-metadata 'name-metadata
name 'name]
`(do
~(when name-metadata
`(.resetMeta (clojure.lang.Namespace/find '~name) ~name-metadata))))
=> (do (.resetMeta (clojure.lang.Namespace/find (quote name)) name-metadata))
My Question
Thus, why does clojure.core choose this seemingly extraneous way of doing things?
My Own Thoughts
Is this an artifact of convention?
Are there other similar instances where this is used in more complex ways?
~ always emits a form; ~# can potentially emit nothing at all. Thus sometimes one uses ~# to splice in a single expression conditionally:
;; always yields the form (foo x)
;; (with x replaced with its macro-expansion-time value):
`(foo ~x)`
;; results in (foo) is x is nil, (foo x) otherwise:
`(foo ~#(if x [x]))
That's what's going on here: the (.resetMeta …) call is emitted within the do form that ns expands to only if name-metadata is truthy (non-false, non-nil).
In this instance, it doesn't really matter – one could use ~, drop the extra brackets and accept that the macroexpansion of an ns form with no name metadata would have an extra nil in the do form. For the sake of a prettier expansion, though, it makes sense to use ~# and only emit a form to handle name metadata when it is actually useful.
I want to extract the urls in reddit, my code is
#lang racket
(require net/url)
(require html)
(define reddit (string->url "http://www.reddit.com/r/programming/search?q=racket&sort=relevance&restrict_sr=on&t=all"))
(define in (get-pure-port reddit #:redirections 5))
(define response-html (read-html-as-xml in))
(define content-0 (list-ref response-html 0))
(close-input-port in)
The content-0 above is
(element
(location 0 0 15)
(location 0 0 82)
...
I'm wondering how to extract specific content from it.
Usually it's more convenient to deal with HTML as x-expressions instead of the html module's structs.
Also you should probably use call/input-url to handle closing the port automatically.
You can combine both of these ideas by defining a read-html-as-xexpr function and using it like this:
#lang racket/base
(require html
net/url
xml)
(define (read-html-as-xexpr in) ;; input-port? -> xexpr?
(caddr
(xml->xexpr
(element #f #f 'root '()
(read-html-as-xml in)))))
(define reddit (string->url "http://www.reddit.com/r/programming/search?q=racket&sort=relevance&restrict_sr=on&t=all"))
(call/input-url reddit
get-pure-port
read-html-as-xexpr)
That will return a big x-expression like:
'(html
((lang "en") (xml:lang "en") (xmlns "http://www.w3.org/1999/xhtml"))
(head
()
(title () "programming: search results")
(meta
((content " reddit, reddit.com, vote, comment, submit ")
(name "keywords")))
(meta
((content "reddit: the front page of the internet") (name "description")))
(meta ((content "origin") (name "referrer")))
(meta ((content "text/html; charset=UTF-8") (http-equiv "Content-Type")))
... snip ...
How to extract specific pieces of that?
For simple HTML where I don't expect the overall structure to change, I will often just use match.
However a more correct and robust way to go about it is to use the xml/path module.
UPDATE: I noticed your question started by asking about extracting URLs. Here's the example updated to use se-path*/list to get all the href attributes of all the <a> elements:
#lang racket/base
(require html
net/url
xml
xml/path)
(define (read-html-as-xexprs in) ;; (-> input-port? xexpr?)
(caddr
(xml->xexpr
(element #f #f 'root '()
(read-html-as-xml in)))))
(define reddit (string->url "http://www.reddit.com/r/programming/search?q=racket&sort=relevance&restrict_sr=on&t=all"))
(define xe (call/input-url reddit
get-pure-port
read-html-as-xexprs))
(se-path*/list '(a #:href) xe)
Result:
'("#content"
"http://www.reddit.com/r/announcements/"
"http://www.reddit.com/r/Art/"
"http://www.reddit.com/r/AskReddit/"
"http://www.reddit.com/r/askscience/"
"http://www.reddit.com/r/aww/"
"http://www.reddit.com/r/blog/"
"http://www.reddit.com/r/books/"
"http://www.reddit.com/r/creepy/"
"http://www.reddit.com/r/dataisbeautiful/"
"http://www.reddit.com/r/DIY/"
"http://www.reddit.com/r/Documentaries/"
"http://www.reddit.com/r/EarthPorn/"
"http://www.reddit.com/r/explainlikeimfive/"
"http://www.reddit.com/r/Fitness/"
"http://www.reddit.com/r/food/"
... snip ...
I'm trying to have a function write a database sql dump to text file from a select statement. The volume returned can be very large, and I'm interested in doing this as fast as possible.
With a large result set I also need to log every x-interval the total number of rows written and how many rows per second have been written since last x-interval. I have a (map ) that is actually doing the write during a (with-open ) so i believe the side-effect of logging rows completed should happen there. (See comments in code).
My questions are:
How do i write "rows-per-second" during the interval and "total rows so far"?
Is there anything additional I want to keep in mind while writing large jdbc result sets to a file (or named-pipe, bulk loader, etc.) ?
Does the (doall ) around the (map ) function fetch all results... making it non-lazy and potentially memory intensive?
Would fixed width be possible as an option? I believe that would be faster for a named pipe to bulk loader. The trade-off would be on disk i/o in place of CPU utilization for downstream parsing. However this might require introspection on the result set returned (with .getMetaData?)
(ns metadata.db.table-dump
[:use
[clojure.pprint]
[metadata.db.connections]
[metadata.db.metadata]
[clojure.string :only (join)]
[taoensso.timbre :only (debug info warn error set-config!)]
]
[:require
[clojure.java.io :as io ]
[clojure.java.jdbc :as j ]
[clojure.java.jdbc.sql :as sql]
]
)
(set-config! [:appenders :spit :enabled?] true)
(set-config! [:shared-appender-config :spit-filename] "log.log")
(let [
field-delim "\t"
row-delim "\n"
report-seconds 10
sql "select * from comcast_lineup "
joiner (fn [v] (str (join field-delim v ) row-delim ) )
results (rest (j/query local-postgres [sql ] :as-arrays? true :row-fn joiner ))
]
(with-open [wrtr (io/writer "test.txt")]
(doall
(map #(.write wrtr %)
; Somehow in here i want to log with (info ) rows written so
; far, and "rows per second" every 10 seconds.
results ))
) (info "Completed write") )
Couple general tips:
At the JDBC level you may need to use setFetchSize to avoid loading the entire resultset into RAM before it even gets to Clojure. See What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
Make sure clojure.java.jdbc is actually returning a lazy seq (it probably is?)-- if not, consider resultset-seq
doall will indeed force the whole thing to be in RAM; try doseq instead
Consider using an atom to keep count of rows written as you go; you can use this to write rows-so-far, etc.
Sketch:
(let [ .. your stuff ..
start (System/currentTimeMillis)
row-count (atom 0)]
(with-open [^java.io.Writer wrtr (io/writer "test.txt")]
(doseq [row results]
(.write wrtr row)
(swap! row-count inc)
(when (zero? (mod #row-count 10000))
(println (format "written %d rows" #row-count))
(println (format "rows/s %.2f" (rate-calc-here)))))))
You may get some use out of my answer to Idiomatic clojure for progress reporting?
To your situation specifically
1) You could add an index to your map as the second argument to the anonymous function, then in the function you are mapping look at the index to see what row you are writing. which can be used to update an atom.
user> (def stats (atom {}))
#'user/stats
user> (let [start-time (. (java.util.Date.) getTime)]
(dorun (map (fn [line index]
(println line) ; write to log file here
(reset! stats [{:lines index
:start start-time
:end (. (java.util.Date.) getTime)}]))
["line1" "line2" "line3"]
(rest (range)))))
line1
line2
line3
nil
user> #stats
[{:lines 3, :start 1383183600216, :end 1383183600217}]
user>
The contents of stats can then be printed/logged every few seconds to update the UI
3) you most certainly want to use dorun instead of doall because as you suspect this will run out of memory on a large enough data set. dorun drops the results as they are written so you can run it on infinitely large data if you want to wait long enough.
How can I write an Emacs Lisp function to find all hrefs in an HTML file and extract all of the links?
Input:
<html>
<a href="http://www.stackoverflow.com" _target="_blank">StackOverFlow</a>
<h1>Emacs Lisp</h1>
<a href="http://news.ycombinator.com" _target="_blank">Hacker News</a>
</html>
Output:
http://www.stackoverflow.com|StackOverFlow
http://news.ycombinator.com|Hacker News
I've seen the re-search-forward function mentioned several times during my search. Here's what I think that I need to do based on what I've read so far.
(defun extra-urls (file)
...
(setq buffer (...
(while
(re-search-forward "http://" nil t)
(when (match-string 0)
...
))
I took Heinzi's solution and came up with the final solution that I needed. I can now take a list of files, extract all URL's and titles, and place the results in one output buffer.
(defun extract-urls (fname)
"Extract HTML href url's,titles to buffer 'new-urls.csv' in | separated format."
(setq in-buf (set-buffer (find-file fname))); Save for clean up
(beginning-of-buffer); Need to do this in case the buffer is already open
(setq u1 '())
(while
(re-search-forward "^.*<a href=\"\\([^\"]+\\)\"[^>]+>\\([^<]+\\)</a>" nil t)
(when (match-string 0) ; Got a match
(setq url (match-string 1) ) ; URL
(setq title (match-string 2) ) ; Title
(setq u1 (cons (concat url "|" title "\n") u1)) ; Build the list of URLs
)
)
(kill-buffer in-buf) ; Don't leave a mess of buffers
(progn
(with-current-buffer (get-buffer-create "new-urls.csv"); Send results to new buffer
(mapcar 'insert u1))
(switch-to-buffer "new-urls.csv"); Finally, show the new buffer
)
)
;; Create a list of files to process
;;
(mapcar 'extract-urls '(
"/tmp/foo.html"
"/tmp/bar.html"
))
If there is at most one link per line and you don't mind some very ugly regular expression hacking, run the following code on your buffer:
(defun getlinks ()
(beginning-of-buffer)
(replace-regexp "^.*<a href=\"\\([^\"]+\\)\"[^>]+>\\([^<]+\\)</a>.*$" "LINK:\\1|\\2")
(beginning-of-buffer)
(replace-regexp "^\\([^L]\\|\\(L[^I]\\)\\|\\(LI[^N]\\)\\|\\(LIN[^K]\\)\\).*$" "")
(beginning-of-buffer)
(replace-regexp "
+" "
")
(beginning-of-buffer)
(replace-regexp "^LINK:\\(.*\\)$" "\\1")
)
It replaces all links with LINK:url|description, deletes all lines containing anything else, deletes empty lines, and finally removes the "LINK:".
Detailed HOWTO: (1) Correct the bug in your example html file by replacing <href with <a href, (2) copy the above function into Emacs scratch, (3) hit C-x C-e after the final ")" to load the function, (4) load your example HTML file, (5) execute the function with M-: (getlinks).
Note that the linebreaks in the third replace-regexp are important. Don't indent those two lines.
You can use the 'xml library, examples of using the parser are found here. To parse your particular file, the following does what you want:
(defun my-grab-html (file)
(interactive "fHtml file: ")
(let ((res (car (xml-parse-file file)))) ; 'car because xml-parse-file returns a list of nodes
(mapc (lambda (n)
(when (consp n) ; don't operate on the whitespace, xml preserves whitespace
(let ((link (cdr (assq 'href (xml-node-attributes n)))))
(when link
(insert link)
(insert "|")
(insert (car (xml-node-children n))) ;# grab the text for the link
(insert "\n")))))
(xml-node-children res))))
This does not recursively parse the HTML to find all the links, but it should get you started in the direction of the general solution.