Screen scraping in clojure - ruby

I googled, but I can't find a satisfactory answer. This SO question is related but kinda old as well as the exact opposite of what I am looking for: a way to do screen-scraping using XPath, not CSS selectors.
I've used enlive for some basic screen-scraping but sometimes one needs the power of XPath selectors. So here it is:
Is there any equivalent to Nokogiri or lxml for clojure (java)? What is the state of the "pure java Nokogiri"? Any way to use the library from clojure? Any better alternatives than this hack?

There are a couple of possibilities here.
Several of these require semi-well formed XML to work. If you don't have it, I would pair clj-tagsoup with hiccup to produce the XML (parse with clj-tag-soup, which produces a form that hiccup and write out as XML) and work with that.
First, just use the native JDK capabilities. Assuming the document is well formed enough, try using clj-xpath which provides a wrapper around the native JDK parsing.
If that doesn't suffice, consider taking a more Clojure data structure based route. A simpler path could just use the output of TagSoup and a combination of maps, filters, and nths.
If you need something more advanced, consider using zippers to provide structure around the data, making it easier to manipulate. Use clojure.xml/parse and clojure.zip/xml-zip to produce the zipper, and go from there. An example can be found at http://techbehindtech.com/2010/06/25/parsing-xml-in-clojure/.
Using the native structures is my preferred route for anything complicated, as you can bring the full power of the language to bear.
If you provide a sample of why you need XPath, I can provide some sample code.

Related

Is there an easy way to replace a deprecated method call in Xcode?

So iOS 6 deprecates presentModalViewController:animated: and dismissModalViewControllerAnimated:, and it replaces them with presentViewController:animated:completion: and dismissViewControllerAnimated:completion:, respectively. I suppose I could use find-replace to update my app, although it would be awkward with the present* methods, since the controller to be presented is different every time. I know I could handle that situation with a regex, but I don't feel comfortable enough with regex to try using it with my 1000+-files-big app.
So I'm wondering: Does Xcode have some magic "update deprecated methods" command or something? I mean, I've described my particular situation above, but in general, deprecations come around with every OS release. Is there a better way to update an app than simply to use find-replace?
You might be interested in Program Transformation Systems.
These are tools that can automatically modify source code, using pattern-directed source-to-source transformations ("if you see this source-level pattern, replace it by that source-level pattern") that operate on code structures rather than text. Done properly, these transformations can be reliable and semantically correct, and they're a lot easier to write than low-level procedural code that navigates and smashes nanoscopic actual tree structures.
It is not the case that using such tools is easy; such tools have to know how to parse the language of interest into compiler data structures, (e.g., ObjectiveC), process the patterns, and regenerate compilable source code from the modified structures. Even with the basic transformation engine, somebody needs to carefully define parsers (and unparsers!) for the dialects of the languages of interest. And it takes time to learn how to use such a even if you have such parsers/unparsers. This is worth it if the changes you need to make are "regular" (in the program transformation sense, not the regexp sense) and widespread (as yours seem to be).
Our DMS Software Reengineering toolkit has an ObjectiveC front end, and can carry out such transformations.
no there is no magic like that

Scala dynamic class management

I would like to know if the following is possible in Scala (but I think the question can be applied also to Java):
Create a Scala file dynamically (ok, no problem here)
Compile it (I don't think this would be a real problem)
Load/Unload the new class dynamically
Aside from knowing if dynamic code loading/reloading is possible (it's possible in Java so I think it's feasible also in Scala) I would like also to know the implication of this in terms of performance degradation (I could have many many classes, with no name clash but really many of them!).
TIA!
P.S.: I know other questions about class loading in Scala exist, but I haven't been able to find an answer about performance!
Yes, everything you want to do is certainly possible. You might like to take a look at ScalaMock, which is an example of creating Scala source code dynamically. And at SBT which is an example of calling the compiler from code. And then there are many different systems that load classes dynamically - look at the documentation for loadLibrary as a starting point.
But, depending on what you want to achieve, you might like to look at Scala Macros instead. They provide the same kind of flexibility as you would get by generating source code and then compiling it, but without many of the downsides of that approach. The original version of ScalaMock used to work by generating source code, but I'm in the process of moving to using macros instead.
It's all possible in Scala, as is clearly demonstrated by the REPL. It's even going to be relatively easy with Scala 2.10.

Parsing XML, how is this actually done? [duplicate]

So, just as a fun project, I decided I'd write my own XML parser. No, not to parse a specific document, and no, not using an XML parser library. I mean writing code to parse out any XML document into a usable data structure. Just because I like the challenge. :-)
With that said, so far it's proved to be... interesting. It's not as easy to parse (especially when you start taking into account special characters, CDATA, empty tags, comments, etc.) as it initially looked.
Are there any well documented XML parsing algorithms or explanations anywhere that anyone knows of? It seems like there are well-documented Queue and Stack and BTree and etc. etc. etc. implementations everywhere, but I'm not sure I've ever seen a simple, well-documented XML parser algorithm...
I repeat: I am not looking for a pre-built parser library! I am looking for information on how to create my own pre-built parser library! Do not tell me "use expat" or "use SAX" or whatever. That's not what I'm asking for.
Antlr offers a tutorial on parsing XML. It breaks the process down into phases: lexing, parsing, tree parsing, etc. Looks pretty interesting.
I don't know if it would be "cheating" in your book, but you could try parsing your XML with a ready-built all-purpose language parser like ANTLR. The result would be a list of tokens (if you just use the lexer) or a parse tree (if you include the parser) and you could then re-build the parse tree almost 1:1 into an XML structure.
Maybe. I haven't thought about the ways in which XML might be different from "normal" ANTLR fodder like programming languages, and whether you would be able to define a suitable grammar.
VTD-XML is probably the simplest parsing technique possible...
http://expat.sourceforge.net/
Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags). An introductory article on using Expat is available on xml.com.

How can one get a list of Mathematica's built-in global rewrite rules?

I understand that over a thousand built-in rewrite rules in Mathematica populate the global rules table by default. Is there any way to get Mathematica to give a full or even partial list of those rules?
The best way is to get a job at Wolfram Research.
Failing that, I think that for things not completely compiled into the kernel you can recover most of the rules/definitions. Look at
Attributes[fn]
where fn is the command that you're interested in. If it returns
{Protected, ReadProtected}
then there's something you can get a look at (although often it's just a MakeBoxes (formatting) definition or a AutoLoad/Stub type definition). To see what's there run
Unprotect[fn];
ClearAttributes[fn, ReadProtected];
??fn
Quite often you'll have to run an example of the command to load it if it was a stub. You'll also have to dig down from the user-facing commands to the back-end implementations.
Eventually you'll most likely reach a core command that is compiled into the kernel that you can not see the details of.
I previously mentioned this in tips for creating Graph diagrams and it got a mention in What is in your Mathematica tool bag?.
An good example, with a nice bite-sized and digestible bit of code is Experimental`AngularSlider[] mentioned in Circular/Angular slider. I'll leave it up to you to look at the code produced.
Another example is something like BoxWhiskerChart, where you need to call it once in order to load all of the code. Then you see that BoxWhiskerChart proceeds to call Charting`iBoxWhiskerChart which you'll have to unprotect to look at, etc...

Unifying enums across multiple languages

I have one large project with components in multiple languages that each depend on some of the same enum values. What solutions have you come up with to unify enums across multiple arbitrary languages? I can think of a few, but I'm looking for the best solution.
(In my implementation, I'm using Php, Java, Javascript, and SQL.)
You can put all of the enums in a text file, then use a code generator to write out the appropriate syntax for each language from that common file so that each component has the enums. Make that text file the authoritative source of information.
You can express the text file in XML but I'd think a tab-delimited flat file would work just fine.
Make them in a format that every language can understand or has a library for. I am using JSON for this at the moment.
Then you can include it with two ways:
For development: Load it from a file/URL at runtime
good for small changes you want too see immediately
slow
For productive usage: Include it in the files
using a build script
fast
no instant feedback
I would apply the dry principle and using code generator as such you could add anew language easely even if it has not enum natively existing.

Resources