Optimize other parameters than the predefined, using step-wise algorithm in optuna.integration.lightgbm - lightgbm

As far as I understand, the LightGBM integration in Optuna uses a step-wise algorithm to optimize the hyper-parameters such as lambda_l1, lambda_l2 etc.
Although it is great, I would very much want to add additional parameters such as learning_rates.
I know I can just use Optuna the "regular" way but since the integrated lgbm part should be way faster, I would prefer using that.
Is there a way to add additional parameters to optimize, or are we forced to use all of (and only) the specified parameters? I can see theres e.g a parameter called learning_rates but in the docs it is not specified what that does and how to use it (I think it's the learning-rate for each tree). Setting it in the lgb.train like
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
)

Related

Does cscope have a query language/api?

I am trying to do some deep code analysis on a python2 codebase that is large and messy enough that most analysis tools I've tried have not worked. I have however been able to use pycscope to generate a cscope database. I can even use it to do basic things like find function usages or all functions called directly from a given function.
To me, it seems like the fact that there is a database and that it can be used for simple things means that it should be possible to use it for more complex things too. For example if I wanted to find all functions that are called from within a given function recursively, or all codepaths that depend on that function.
However cscope documentation is very light and I'm not much of a c-expert. Is there an actual query language for it? An extension that knows how to use the database in this manner?

Seaborn global hue order

I very often use the hue argument to distinguish between categories but it seems like seaborn isn't consistent in how it matches hues to categories (from what I've read it depends on the plotted data, in particular its order). I would like to avoid passing the hue_order argument everywhere because I know I will forget it at some point and not notice it (which will lead to misinterpretations because I will suppose hues are correct).
Is there a way to set the hue_order globally (fixed order for all plots)?
Even better, would it possible to set categorical indexes to all behave the same (eg., alphanumeric order)?
For now I use the following ugly strategy:
SNS_SETTINGS = dict(hue_order=[...])
sns.displot(df, **SNS_SETTINGS, x="time", kind="ecdf", hue="algorithm")
A very practical solution is to add the hue parameter in the SNS_SETTINGS dictionary. This coupling will ensure the needed consistency across your plots.
Another solution, that may or may not be adequate in your specific case, would be to define custom functions with functools.partial, defining the parameters once to have shorter function calls:
from functools import partial
displot_by_algorithm = partial(sns.displot, hue="algorithm", hue_order=[...])
This way, you can later call
displot_by_algorithm(df, x="time", kind="ecdf")
Of course, you will have to define such function for all the different plotting functions you want to use, so the trade-off might not be worth it.

Why do we need to separate or breakdown one Use Case into two or more use cases?

Why do you need to, in many instances, separate or breakdown one Use Case into two or more use cases?
The only reason to split a use case in multiple use cases is to share a significant piece of functionality among multiple use cases by isolating that piece of functionality in a separate use case.
Example: 'search product information' may be a separate use case included by use cases 'buy product' and 'hire product'.
Apart from 'include' there are also examples of the same principle using 'extend' or 'generalize'.
By doing so, you prevent that the shared behaviour is copied in multiple use cases, with the chance of growing inconsistencies.
In the previous example: We want to make sure that customers don't get a different way to search for product information when buying compared to when hiring products. With an included use case, people who read the use cases are immediately aware of that fact.
First of all: you don't. Starting to do that means you are doing functional analysis. The point in use case synthesis is to find the goal(s) (aka. added value) the different actors have when interacting with the system under consideration. It's quite futile to separate a goal into sub-goals at that level. Either you have some added value or you don't have it. So if someone has settled a use case and tries to break it down then the use case is either wrong (no use case) or it's useless since the use case already shows the added value.
My personal opinion about include and extend: they are basically evil and a wrong concept introduced by techies (which most of the UML designers are) with no business background. Using them means you are already starting functional analysis. But UCs are synthesized from requirements. That is, you drag your net through that requirements soup and fish out those that fit together to build a story which makes sense - and which delivers added value: a use case.
And as always: read Bittner/Spence about use cases.

How to pass functions as arguments to other functions in Julia without sacrificing performance?

EDIT to try to address #user2864740's edit and comment: I am wondering if there is any information particularly relevant to 0.4rc1/rc2 or in particular a strategy or suggestion from one of the Julia developers more recent than those cited below (particularly #StefanKarpinski's Jan 2014 answer in #6 below). Thx
Please see e.g.
https://groups.google.com/forum/#!topic/julia-users/pCuDx6jNJzU
https://groups.google.com/forum/#!topic/julia-users/2kLNdQTGZcA
https://groups.google.com/forum/#!msg/julia-dev/JEiH96ofclY/_amm9Cah6YAJ
https://github.com/JuliaLang/julia/pull/10269
https://github.com/JuliaLang/julia/issues/1090
Can I add type information to arguments that are functions in Julia?
Performance penalty using anonymous function in Julia
(As a fairly inexperienced Julia user) my best synthesis of this information, some of which seems to be dated, is that the best practice is either "avoid doing this" or "use FastAnonymous.jl."
I'm wondering what the bleeding edge latest and greatest way to handle this is.
[Longer version:]
In particular, suppose I have a big hierarchy of functions. I would like to be able to do something like
function transform(function_one::Function{from A to B},
function_two::Function{from B to C},
function_three::Function{from A to D})
function::Function{from Set{A} to Dict{C,D}}(set_of_As::Set{A})
Dict{C,D}([function_two(function_one(a)) => function_three(a)
for a in set_of_As])
end
end
Please don't take the code too literally. This is a narrow example of a more general form of transformation I'd like to be able to do regardless of the actual specifics of the transformation, BUT I'd like to do it in such a way that I don't have to worry (too much) about checking the performance (that is, beyond the normal worries I'd apply in any non-function-with-function-as-parameter case) each time I write a function that behaves this way.
For example, in my ideal world, the correct answer would be "so long as you annotate each input function with #anon before you call this function with those functions as arguments, then you're going to do as well as you can without tuning to the specific case of the concrete arguments you're passing."
If that's true, great--I'm just wondering if that's the right interpretation, or if not, if there is some resource I could read on this topic that is closer to a "logically" presented synthesis than the collection of links here (which are more a stream of collective consciousness or history of thought on this issue).
The answer is still "use FastAnonymous.jl," or create "functor types" manually (see NumericFuns.jl).
If you're using julia 0.4, FastAnonymous.jl works essentially the same way that official "fast closures" will eventually work in base julia. See https://github.com/JuliaLang/julia/issues/11452#issuecomment-125854499.
(FastAnonymous is implemented in a very different way on julia 0.3, and has many more weaknesses.)

Abstracting away from data structure implementation details in Clojure

I am developing a complex data structure in Clojure with multiple sub-structures.
I know that I will want to extend this structure over time, and may at times want to change the internal structure without breaking different users of the data structure (for example I may want to change a vector into a hashmap, add some kind of indexing structure for performance reasons, or incorporate a Java type)
My current thinking is:
Define a protocol for the overall structure with various accessor methods
Create a mini-library of functions that navigate the data structure e.g. (query-substructure-abc param1 param2)
Implement the data structure using defrecord or deftype, with the protocol methods defined to use the mini-library
I think this will work, though I'm worried it is starting to look like rather a lot of "glue" code. Also it probably also reflects my greater familiarity with object-oriented approaches.
What is the recommended way to do this in Clojure?
I think that deftype might be the way to go, however I'd take a pass on the accessor methods. Instead, look into clojure.lang.ILookup and clojure.lang.Associative; these are interfaces which, if you implement them for your type, will let you use get / get-in and assoc / assoc-in, making for a far more versatile solution (not only will you be able to change the underlying implementation, but perhaps also to use functions built on top of Clojure's standard collections library to manipulate your structures).
A couple of things to note:
You should probably start with defrecord, using get, assoc & Co. with the standard defrecord implementations of ILookup, Associative, IPersistentMap and java.util.Map. You might be able to go a pretty long way with it.
If/when these are no longer enough, have a look at the sources for emit-defrecord (a private function defined in core_deftype.clj in Clojure's sources). It's pretty complex, but it will give you an idea of what you may need to implement.
Neither deftype nor defrecord currently define any factory functions for you, but you should probably do it yourself. Sanity checking goes inside those functions (and/or the corresponding tests).
The more conceptually complex operations are of course a perfect fit for protocol functions built on the foundation of get & Co.
Oh, and have a look at gvec.clj in Clojure's sources for an example of what some serious data structure code written using deftype might look like. The complexity here is of a different kind from what you describe in the question, but still, it's one of the few examples of custom data structure programming in Clojure currently available for public consumption (and it is of course excellent quality code).
Of course this is just what my intuition tells me at this time. I'm not sure that there is much in the way of established idioms at this stage, what with deftype not actually having been released and all. :-)

Resources