Lexer/filter for comments - comments

Is there an OCaml tool that allows filtering comments in source files, similar to gcc -E?
Ideally, I'm looking for something that will remove everything but comments, but the other way around would also be useful.
For instance, if there is a way to use camlp4/campl5/ppx to obtain OCaml comments (including non-OCamldoc comments defined with a single asterisk), I would like to know. I haven't had much success looking for comment nodes in Camlp4's AST (though I know it must exist, because there are even bugs related to the fact that Camlp4 modifies their placement).
Here's an example: in the following file:
(*** three asterisks *)
let f () =
Format.printf "end"
let () =
(* one asterisk (* nested comment *) *)
Printf.printf "hello world\n";
(** two asterisks *)
f();
()
I'd like to ideally obtain:
(*** three asterisks *)
(* one asterisk (* nested comment *) *)
(** two asterisks *)
The whitespace between them and the presence or absence of (* *) are mostly irrelevant, but it should preserve comments of all kinds. My immediate purpose is to be able to filter it to a spell checker, but cleaning comments (i.e. having a filter that strips comments only) could also be useful: I could clean the comments and then use diff to obtain what has been removed.

You can use ocamldoc with a custom generator that will dump comments using the textual representation.

I have made some interesting experiments with camlp5, playing along with the idea of pretty-printing "" for any code item. The following code:
let ignore _ _ _ = ""
let rule f = Extfun.(extend f [Evar (),false, fun _ -> Some ignore])
let () =
Eprinter.extend Pcaml.pr_str_item None [ None, rule ];
Eprinter.extend Pcaml.pr_sig_item None [ None, rule ]
will disable the pretty printing of any str_item (i.e. toplevel items of module implementation) or sig_item (toplevel items of module interfaces), by extending the corresponding default printer with a catch-all rule that output an empty string for any str_item. Compile pr_comment.ml with
ocamlfind ocamlc -c -package camlp5 pr_comment.ml
and use it as
camlp5o pr_o.cmo path/to/pr_comment.cmo -o only_comment.ml my_file.ml

Well, there is now a lexer based on ocamlwc that strips everything but the comments in the code, called ocaml-comment-sieve. It is based on the simple lexer used in ocamlwc.
However, this tool is GPL-licensed (because it is derived from ocamlwc, which is GPL-licensed), so it cannot be posted here. Still, it does satisfy my requirements, so until someone suggests a better way, I'll consider it as an answer.

Related

Compile error using fast pipe operator after pipe last in ReasonML

The way that the "fast pipe" operator is compared to the "pipe last" in many places implies that they are drop-in replacements for each other. Want to send a value in as the last parameter to a function? Use pipe last (|>). Want to send it as the first parameter? Use fast pipe (once upon a time |., now deprecated in favour of ->).
So you'd be forgiven for thinking, as I did until earlier today, that the following code would get you the first match out of the regular expression match:
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> Belt.Option.getExn
-> Array.get(1)
But you'd be wrong (again, as I was earlier today...)
Instead, the compiler emits the following warning:
We've found a bug for you!
OCaml preview 3:10-27
This has type:
'a option -> 'a
But somewhere wanted:
'b array
See this sandbox. What gives?
Looks like they screwed up the precedence of -> so that it's actually interpreted as
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> (Belt.Option.getExn->Array.get(1));
With the operators inlined:
Array.get(Belt.Option.getExn, 1, Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
or with the partial application more explicit, since Reason's syntax is a bit confusing with regards to currying:
let f = Array.get(Belt.Option.getExn, 1);
f(Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
Replacing -> with |. works. As does replacing the |> with |..
I'd consider this a bug in Reason, but would in any case advise against using "fast pipe" at all since it invites a lot of confusion for very little benefit.
Also see this discussion on Github, which contains various workarounds. Leaving #glennsl's as the accepted answer because it describes the nature of the problem.
Update: there is also an article on Medium that goes into a lot of depth about the pros and cons of "data first" and "data last" specifically as it applies to Ocaml / Reason with Bucklescript.

How can I use rules suggested by solve_direct? (by (rule …) doesn't always work)

Sometimes <statement> solve_direct (which I usually invoke via <statement> try) lists a number of library theorems and says “The current goal can be solved directly with: …”.
Let <theorem> be one search result of solve_direct, then in most cases I can prove <statement> by (rule theorem).
Sometimes, however, such a proof is not accepted, resulting in the error message “Failed to apply initial proof method”.
Is there a general, different technique for reusing theorems found by solve_direct?
Or does it depend on the individual situation? I could try to work out a minimal example and attach it to this question.
Personally, I just tend to just use:
apply (metis thm)
which works most of the time without forcing me to think very hard (but will still occasionally fail if tricky resolution is required).
Other methods that will also typically work include:
apply (rule thm) (* If "thm" has no premises. *)
apply (erule thm) (* If "thm" has a single premise. *)
apply (erule thm, assumption+) (* If "thm" has multiple premises. *)
Why is there no one single answer? The answer is a little complex:
Internally, solve_direct calls find_theorems solves, which then performs the following:
fun etacn thm i = Seq.take (! tac_limit) o etac thm i;
(* ... *)
if Thm.no_prems thm then rtac thm 1 goal
else (etacn thm THEN_ALL_NEW (Goal.norm_hhf_tac THEN' Method.assm_tac ctxt)) 1 goal;
This is the ML code for something similar to rule thm if there are no premises on the rule, or:
apply (erule thm, assumption+)
if there are multiple premises on the rule. As commented by Brian on your question, the above might still fail if there are complex meta-logical connectives in the assumptions (which the norm_hhf_tac deals with, but is not directly exposed as an Isabelle method as far as I am aware).
If you wanted, you could write a new method that exposes the tactic used by find_theorems directly, as follows:
ML {*
fun solve_direct_tac thm ctxt goal =
if Thm.no_prems thm then rtac thm 1 goal
else (etac thm THEN_ALL_NEW (Goal.norm_hhf_tac THEN' Method.assm_tac ctxt)) 1 goal;
*}
method_setup solve =
{* Attrib.thm >> (fn thm => fn ctxt =>
SIMPLE_METHOD' (K (solve_direct_tac thm ctxt ))) *}
"directly solve a rule"
This could then be used as follows:
lemma "⟦ a; b ⟧ ⟹ a ∧ b"
by (solve conjI)
which should hopefully solve anything solve_direct throws at you.
I found another way of using solve_direct's suggestions with by rule. When certain very basic rules from the library, such as Hilbert_Choice.someI2, are suggested, it seems that one of the facts in context actually is a rule itself, which may be applicable. The following worked for me at least in two concrete situations (source):
re-examine the “rule-like” fact, the other facts (if any) and the goal
if necessary, reorder the other facts
do the proof using <other_facts> ... by (rule <rule-like-fact>)
You can try fact or rule_tac. If I recall correctly, rule sometimes fails to apply a given rule in the presence of other facts and I am not entirely sure why; that question will have to be answered by someone who is more familiar with the implementation details of these methods than I am.

Equivalent of python 'pass' in drracket

Does anyone know if DrRacket has got an equivalent of Python's pass statement or any other idiom that can be used to instruct the executing code to do nothing?
If you want to write an empty statement (one with no useful result), maybe this will work for you:
(void)
... But it'd be better if you demonstrated with an example what exactly is that you want to do. Anyway, here's a link to the appropriate section in the documentation.
Here is a way to do your hash example.
(hash-ref (hash) "not there" void)
But now you have to check whether you get back a void or a value you want. You may also be interested in hash-update! or hash-update (docs), which combines checking for an existing key with a default behavior for the case where the key does not exist.
As I write this answer, your question is kind of spread among (a) your original question and (b) your comment to Oscar's answer:
Its a case of wanting to indexing a hash map with a key that may or may not exist so if a no key found error returns, I just want to ignore the error and continue with the execution.
Taking that last part literally: The general way to handle -- and effectively "ignore" -- an exception is with-handlers:
(hash-ref (hash) "nothing")
;; hash-ref: no value found for key
;; key: "nothing"
(with-handlers ([exn:fail? (lambda (exn)
"hum dee dum")])
(hash-ref (hash) "nothing"))
;; "hum dee dum"
This doesn't seem to have anything to do with Python pass as described here, but maybe pass can be used to ignore errors in Python; I just don't know it well enough
If you can deal with a syntactic form, then begin works.
(if <conditional> (begin))
if you need a function then this works
(lambda ignore #f)
And you might just avoid the need for 'something that does nothing' by reworking your logic leading to the 'nothing'.

How to trace a program for debugging in OCaml ?

I have a general question regarding coding practices...
While debugging, at some point of my code, I need some code to print the current state; When I don't debug, I don't want to leave the code there because it bothers visibility of other code...
It is hard to package them into one function because most of the time, it involves local variables, and I don't want to pass everything as arguments...
So how do you generally manage this kind of "printing/checking" code? is there any good practice?
I used to have a debug function, that would only print the final string only if a flag was set. Now, I prefer to just add if statements:
they are not much longer
nothing is computed is the condition is false
it is easy while reading the code to see that it is only for debugging
I also used to have camlp4 macros, that would generate if statements from function applications, but it only works in projects where camlp4 is used, which I tend to avoid nowadays.
Note that, usually, I use not one debug flag, but many debug flags, one per module, and then meta-tags that will trigger debugging of several modules or orthogonal aspects. They are put in a hashtable as list of flags, and I can set them with either an argument or an environment variable.
I often use a debug function which prints value only when a debug flag is set to true:
let debug_flag = ref false
let debug fmt =
if !debug_flag then Printf.eprintf fmt
else Printf.ifprintf stderr fmt
I use a logging syntax extension:
http://toss.svn.sourceforge.net/viewvc/toss/trunk/Toss/caml_extensions/pa_log.ml?revision=1679&view=markup
You can also pass line number to the logging function (which is hard-coded to AuxIO.log in the source above) using Loc.start_line _loc (perhaps I'll add it).
Note that the conditional should be part of the syntax extension, this way it will not compute the arguments if it does not need to print them. Also, we have the flexible "printf" syntax.
Also, I use a command in Emacs:
(defun camldev-insert-log-entry ()
(interactive)
(insert "(* {{{ log entry *)
LOG 2 \"\";
(* }}} *)")
(goto-char (- (point) 12)))
together with `folding-mode'.

OCaml delimiters and scopes

I'm learning OCaml and although I have years of experience with imperative programming languages (C, C++, Java) I'm getting some problems with delimiters between declarations or expressions in OCaml syntax.
Basically I understood that I have to use ; to concatenate expressions and the value returned by the sequence will be the one of last expression used, so for example if I have
exp1; exp2; exp3
it will be considered as an expression that returns the value of exp3. Starting from this I could use
let t = something in exp1; exp2; exp3
and it should be ok, right?
When am I supposed to use the double semicol ;;? What does it exactly mean?
Are there other delimiters that I must use to avoid syntax errors?
I'll give you an example:
let rec satisfy dtmc state pformula =
match (state, pformula) with
(state, `Next sformula) ->
let s = satisfy_each dtmc sformula
and adder a state =
let p = 0.;
for i = 0 to dtmc.matrix.rows do
p <- p +. get dtmc.matrix i state.index
done;
a +. p
in
List.fold_left adder 0. s
| _ -> []
It gives me syntax error on | but I don't get why.. what am I missing? This is a problem that occurs often and I have to try many different solutions until it suddently works :/
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
What I basically ask is: what are the delimiters I have to use and when I have to use them. In addition are there differences I should consider while using the interpreter ocaml instead that the compiler ocamlc?
Thanks in advance!
The ;; delimiter terminates a top-level entity. In the ocaml toplevel (interpreter), it signals to the interpreter that a particular piece of input is finished and should be evaluated.
In programs to be compiled with ocamlc or ocamlopt, you don't need it near as often, as consecutive top-level let (without in), module, type, exception, and similar statements automatically signal the beginning of a new "phrase". If you include a top-level expression in a module that is to be evaluated only for its side-effects (such as generating some output or registering a module), you'll need a ;; before it to tell the compiler to stop compiling the previous phrase and start compiling a new thing. Otherwise, if the previous thing is a let, it will assume that the new expression is part of the let. For example:
let msg = "Hello, world";; (* we need ;; here *)
print_endline msg;; (* ;; is optional here, unless we have another expression *)
When you do and don't need ;; is somewhat subtle, so I usually terminate all my module-level entities with it so I don't have to worry about when it is and isn't needed.
; is used to separate sequential "statements" within a single expression. So foo; bar is a single sequential expression composed of foo and bar, while foo;; bar is only valid at the top level of a module and signifies two expressions.
On let without in: that construct is only valid in a module definition and variables so bound will be bound through the end of the module. Often, this is just the end of the file; if you have nested modules, however, its scope can be more limited. It does not work inside another expression or definition such as a function definition, unless it is within a local module definition.
let p = 0.;
This is the error. The ; needs to be an in. You can't use let without in only to define global functions, you can't use it inside an expression.
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
You can only ever use one or the other (except in the interactive interpreter where you are allowed to mix expressions and definitions). When defining a global function or value, you need let without in. Inside an expression you need let with in.
;; is used to terminate input and start interpreting in ocaml REPL, it has no special meaning when compiling with ocamlc or ocamlopt.
You cannot assign to arbitrary value with <- operator, you have to use ref type for mutable variables:
let p = ref 0. in
for i = 0 to dtmc.matrix.rows do
p := !p +. get dtmc.matrix i state.index
done;
a +. !p

Resources