preprocessing methods of H2O - h2o

What preprocessing methods are used in the autoML pipeline? It would be nice to have a brief summary of all these steps in the doc. By preprocessing, I mean: one-hot-encoding, normalization, imputation, etc.
Thank you,
Yassine

Currently, only target encoding is done, but more work is being done to add for preprocessing. See the docs for more info.

Related

SAT Solver: SAT4J - more examples?

I haven't used before a SAT solver, so I started to learn how to use SAT4J. Mostly, I am using its API, but I am finding hard to understand sometimes what some arguments (in classes or methods) mean or what their format/type is acceptable. For example:
public BinaryClause(IVecInt ps, ILits voc)
My question is if there are some usage examples, which can help me more in understanding the implemented features in SAT4j?
Thank you in advance!
You can find some usage examples of most features in the unit tests:
http://www.sat4j.org/maven234/org.ow2.sat4j.core/xref-test/index.html
BinaryClause class is not meant to be used by end users:
http://www.sat4j.org/maven234/org.ow2.sat4j.core/apidocs/index.html
We try to maintain user level doc up to date. The developer level may change over time, to may lack documentation.

CoreNLP: How can I get only collapesed dependencies?

I'm parsing over 60,000 sentences with CoreNLP to get dependencies relations.
Because I only need collapsed dependencies, other dependencies types -- basic and collapsed-cc-processed -- are redundant for my own use, and make it hard to build my own codes, which take xml-output as input.
Can I get only collapsed dependencies?
If so, please let me know.
Thanks.
There is currently no way to do this. Computing the additional representations take very little computation, and so they are always reported. They should be marked specially in the XML output, however; hopefully it's not hard to filter the correct representation in the downstream code.

Find image in another image using javacv

I want to find an image in another image. I already tried a "template matching" approach, but i didn't know how make it invariant to changes in scale, rotation, perspective, etc.
I have read about feature detection and suspect that usage of sift-features might be the best approach. Beside that i need an implementation of using feature detection using javacv not opencv.
Is there any implementation using feature detection or any other proposal for my problem?
If you understand the basics of JavaCV you can look at the ObjectFinder example of JavaCV.
ObjectFinder # code.google.com
This example shows you, how to do the important steps to solve your problem.
Before using the ObjectFinder you have to call the following method to load the non free modules (e.g. SURF):
com.googlecode.javacv.cpp.opencv_nonfree.initModule_nonfree();
Just for completeness. You can use the image feature matching capabilities provided by opencv, described here. There is even a full implementation of a well working matcher with javacv (in scala though, but it's easily portable into java).

How can one get a list of Mathematica's built-in global rewrite rules?

I understand that over a thousand built-in rewrite rules in Mathematica populate the global rules table by default. Is there any way to get Mathematica to give a full or even partial list of those rules?
The best way is to get a job at Wolfram Research.
Failing that, I think that for things not completely compiled into the kernel you can recover most of the rules/definitions. Look at
Attributes[fn]
where fn is the command that you're interested in. If it returns
{Protected, ReadProtected}
then there's something you can get a look at (although often it's just a MakeBoxes (formatting) definition or a AutoLoad/Stub type definition). To see what's there run
Unprotect[fn];
ClearAttributes[fn, ReadProtected];
??fn
Quite often you'll have to run an example of the command to load it if it was a stub. You'll also have to dig down from the user-facing commands to the back-end implementations.
Eventually you'll most likely reach a core command that is compiled into the kernel that you can not see the details of.
I previously mentioned this in tips for creating Graph diagrams and it got a mention in What is in your Mathematica tool bag?.
An good example, with a nice bite-sized and digestible bit of code is Experimental`AngularSlider[] mentioned in Circular/Angular slider. I'll leave it up to you to look at the code produced.
Another example is something like BoxWhiskerChart, where you need to call it once in order to load all of the code. Then you see that BoxWhiskerChart proceeds to call Charting`iBoxWhiskerChart which you'll have to unprotect to look at, etc...

How to build a static code analysis tool?

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.
I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.
Any books, articles , blogs , sites..etc would help.
Thanks.
I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.
A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.
You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.
Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.
People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.
Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as
many tools for such languages.
Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.
There is a blog by DeepSource that covers everything one needs to know to build an understanding of static code analysis and equip you with the basic theory and the right tools so that you can write analyzers on your own.
Here’s the link: https://deepsource.io/blog/introduction-static-code-analysis/
Obviously you need a parser for the language. A good high level AST is useful.
You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
Combine the AST with the mistakes in #2.

Resources