cross validation in crf++ - crf

I was wondering how to do cross validation in CRF++. It was written in the documentation:
crf_learn -f 3 -c 1.5 template_file train_file model_file
-c float:
With this option, you can change the hyper-parameter for the CRFs. With larger C value,
CRF tends to overfit to the give training corpus. This parameter trades the balance
between overfitting and underfitting. The results will significantly be influenced
by this parameter. You can find an optimal value by using held-out data or more
general model selection method such as cross validation.
How can one do a cross validation as mentioned in this manual

The manual is trying to say that you can figure out the optimal value of C parameter by performing cross validation or testing on a held-out set yourself. CRF++ doesn't have such a functionality.
Thanks

If my understanding is correct, CRF++ doesn't have cross validation functionality in-built. we have to do it seperately

Related

Optimize other parameters than the predefined, using step-wise algorithm in optuna.integration.lightgbm

As far as I understand, the LightGBM integration in Optuna uses a step-wise algorithm to optimize the hyper-parameters such as lambda_l1, lambda_l2 etc.
Although it is great, I would very much want to add additional parameters such as learning_rates.
I know I can just use Optuna the "regular" way but since the integrated lgbm part should be way faster, I would prefer using that.
Is there a way to add additional parameters to optimize, or are we forced to use all of (and only) the specified parameters? I can see theres e.g a parameter called learning_rates but in the docs it is not specified what that does and how to use it (I think it's the learning-rate for each tree). Setting it in the lgb.train like
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
)

Pros and Cons of GraphQL query directives

I am faced with the following decision:
query {
# Option A
date(format: "DD/MM/YYYY")
# Option B
date #formatDate(format: "DD/MM/YYYY")
}
and am not entirely sure, which option to go with. In both cases, date per se returns an integer timestamp, formatting it yields a string.
Personally, I think that B is the better given following arguments:
Pro:
‌- Separation of arguments for data fetching and post-processing
‌- Reusability (#formatDirective needs to be defined once and could be used on any field returning a date (custom scalar), whereas A would need an implementation for each field that provides a date)
‌- Extendability (Allows for directive chaining, and also easy adding of new directives in a later stage)
Con:
‌- A good number of articles (even Apollo Docs) discourage usage of query directives
I am not (yet haha) interested in arguments regarding the actual implementation. I would like to work my way backwards, starting with the result I want to achieve.
Should I move this question to the Software Engineering Stack Exchange, does it fit better there?

PyFMI parameter estimation and handling of fixed model parameters different from default

I have started to in PyFMI use parameter estimation with the procedure model.estimate() and works well.
From the documentation (Andersson et al 2016) as well as practical use I understand that model parameters are taken from the compiled FMU-model if not estimated. It would have been very practical to have an option to provide a dictionary with a set of the fixed parameter values different from the default of the model. Is there any way to provide that?
The current workflow is that for a larger model built up of parts from libraries, then you need to make a copy of these models and set parameters to the proper value in the code, and then compile it. It is a somewhat tedious procedure. Perhaps I have misunderstood something?
Andersson et al (2016): "PyFMI: A Python package for…”
https://portal.research.lu.se/portal/files/7201641/pyfmi_tech.pdf
From my contact Christian Winther at Modelon I learn that I understand the workflow right. He see also the advantage to have a possibility to have a list (or dictionary) of parameters that is changed from the default parameters and remain constant during parameter estimation. It may come in a future update.

How to pass functions as arguments to other functions in Julia without sacrificing performance?

EDIT to try to address #user2864740's edit and comment: I am wondering if there is any information particularly relevant to 0.4rc1/rc2 or in particular a strategy or suggestion from one of the Julia developers more recent than those cited below (particularly #StefanKarpinski's Jan 2014 answer in #6 below). Thx
Please see e.g.
https://groups.google.com/forum/#!topic/julia-users/pCuDx6jNJzU
https://groups.google.com/forum/#!topic/julia-users/2kLNdQTGZcA
https://groups.google.com/forum/#!msg/julia-dev/JEiH96ofclY/_amm9Cah6YAJ
https://github.com/JuliaLang/julia/pull/10269
https://github.com/JuliaLang/julia/issues/1090
Can I add type information to arguments that are functions in Julia?
Performance penalty using anonymous function in Julia
(As a fairly inexperienced Julia user) my best synthesis of this information, some of which seems to be dated, is that the best practice is either "avoid doing this" or "use FastAnonymous.jl."
I'm wondering what the bleeding edge latest and greatest way to handle this is.
[Longer version:]
In particular, suppose I have a big hierarchy of functions. I would like to be able to do something like
function transform(function_one::Function{from A to B},
function_two::Function{from B to C},
function_three::Function{from A to D})
function::Function{from Set{A} to Dict{C,D}}(set_of_As::Set{A})
Dict{C,D}([function_two(function_one(a)) => function_three(a)
for a in set_of_As])
end
end
Please don't take the code too literally. This is a narrow example of a more general form of transformation I'd like to be able to do regardless of the actual specifics of the transformation, BUT I'd like to do it in such a way that I don't have to worry (too much) about checking the performance (that is, beyond the normal worries I'd apply in any non-function-with-function-as-parameter case) each time I write a function that behaves this way.
For example, in my ideal world, the correct answer would be "so long as you annotate each input function with #anon before you call this function with those functions as arguments, then you're going to do as well as you can without tuning to the specific case of the concrete arguments you're passing."
If that's true, great--I'm just wondering if that's the right interpretation, or if not, if there is some resource I could read on this topic that is closer to a "logically" presented synthesis than the collection of links here (which are more a stream of collective consciousness or history of thought on this issue).
The answer is still "use FastAnonymous.jl," or create "functor types" manually (see NumericFuns.jl).
If you're using julia 0.4, FastAnonymous.jl works essentially the same way that official "fast closures" will eventually work in base julia. See https://github.com/JuliaLang/julia/issues/11452#issuecomment-125854499.
(FastAnonymous is implemented in a very different way on julia 0.3, and has many more weaknesses.)

Complicated Algorithm - How to store rules separate from processing code?

I'm working on a project which will do some complicated analyzing on some user-supplied input. There will be 3 parts of the code:
1) Input supplied by user, such as keywords
2) Rules, such as if keyword 1 is repeated 3 times in keyword 5, do this, etc.
3) And the analyzing itself which executes the rules and processes the user input, and generates the output necessary based on the processing.
Naturally this will lead to a lot of spaghetti code and many, many if statements in the processing code. I want to avoid that, and keep the rules (i.e. the if statements) separately from the code which loops through the user input and generates the output.
How can I do that, i.e. what is the best way?
If you have enough rules that you want to externalize, you could try using a business rules engines, like Drools in Java.
A business rules engine is a software system that executes one or more business rules in a runtime production environment. The rules might come from legal regulation ("An employee can be fired for any reason or no reason but not for an illegal reason"), company policy ("All customers that spend more than $100 at one time will receive a 10% discount"), or other sources. (Wikipedia)
It could be a little bit overhead depending of what you're trying to do. In my company we're using such kind of tools for our quality analysis tool.
Store it in XML. Easy to parse and update.
I had designed a code generator, which can be controllable from a xml file.
For each command I had a entry in the xml. I was processing the node to generate the opcode for that command. Node itself contains the actions I need to do for getting the opcode. For some commands I had to look into database, all those things I had put in this xml file.
Well, i doubt that it is necessary to have hughe if statements if polymorphism is applied correctly.
Actually, you need a proper domain model for your rules. This goes somehow into the direction of the command pattern, depending on the complexitiy of your code maybe in combination with the state machine pattern.
Once you have your model, defining rules is instantiate them correctly.
This could be done by having an xml definition, which is parsed and transformed into your model. But the new modern and even more fancy way would be using DSLs. If you program in Java and have a certain freedom about your libraries, this would be a proper use case for Embedded DSLs with Groovy. Basically you would need a Builder which constructs your model, that's all.
You always can implement factory that will create certain strategies according to passed parameters. And then you will use those strategies in your code without any if.
If it's just detecting keywords, a finite state machine or similar. If it's doing more, then other pattern matching systems, such as rules engines.
Adding an embedded scripting language to your application might help. The rules would then be expressed in scripts, executed by the applications on processing.
The idea is that scripts are easy to change and contain high level logic that will be executed by your application in details.
There are a lot of scripting languages available to do this : lua, Python, Falcon, squirrel, angelscript, etc.
Have a look at rule engines!
The approach from Lars may also be arguable.

Resources