using lightgbm with average precision recall score - lightgbm

I am using LightGBM and would like to use average precision recall as a metric.
I tried defining feval:
cv_result = lgb.cv(params=params, train_set=lgb_train, feature_name=Rel_Feat_Names, feval=APS)
where APS defined as:
def APS(preds, train_data):
y_pred_val = []
y_test_val = []
for i, stat in enumerate(train_data.get_label.isnull()):
if ~stat:
y_pred_val.append(preds[i])
y_test_val.append(train_data.get_label[i])
aps = average_precision_score(np.array(y_test_val), np.array(y_pred_val))
return aps
and I get an error:
TypeError: Unknown type of parameter:feval, got:function
I also try to use "MAP" as the metric
cv_result = lgb.cv(params=params, train_set=lgb_train, feature_name=Rel_Feat_Names, "metric="MAP")
but got the following error:
"lightgbm.basic.LightGBMError: For MAP metric, there should be query information"
I can't find what is the query information required.
How can I use feval corrctly and define the query required for "MAP"
Thanks

Right now you can put map (alias mean_average_precision) as your metric as described here, but to answer the question of applying feval correctly:
Output of the customized metric should be a tuple of name, value and greater_is_better, so in your case:
def APS(preds, train_data):
aps = average_precision_score(train_data.get_label(), preds)
return 'aps', aps, False
then also include in your params the following: 'objective': 'binary', 'metric': 'None'

Related

Fix deprecation warning `Dangerous query method` on `.order`

I have a custom gem which creates a AR query with input that comes from an elasticsearch instance.
# record_ids: are the returned ids of the ES results
# order: is the order of the of the ids that ES returns
search_class.where(search_class.primary_key => record_ids).order(order)
Right now the implementation is that I build the order string directly into the order variable so it looks like this: ["\"positions\".\"id\" = 'fcdc924a-21da-440e-8d20-eec9a71321a7' DESC"]
This works fine but throws a deprecation warning which ultimately will not work in rails6.
DEPRECATION WARNING: Dangerous query method (method whose arguments are used as raw SQL) called with non-attribute argument(s): "\"positions\".\"id\" = 'fcdc924a-21da-440e-8d20-eec9a71321a7' DESC". Non-attribute arguments will be disallowed in Rails 6.0. This method should not be called with user-provided values, such as request parameters or model attributes. Known-safe values can be passed by wrapping them in Arel.sql()
So I tried couple of different approaches but all of them with no success.
order = ["\"positions\".\"id\" = 'fcdc924a-21da-440e-8d20-eec9a71321a7' DESC"]
# Does not work since order is an array
.order(Arel.sql(order))
# No errors but only returns an ActiveRecord_Relation
# on .inspect it returns `PG::SyntaxError: ERROR: syntax error at or near "["`
.order(Arel.sql("#{order}"))
# .to_sql: ORDER BY [\"\\\"positions\\\".\\\"id\\\" = 'fcdc924a-21da-440e-8d20-eec9a71321a7' DESC\"]"
order = ['fcdc924a-21da-440e-8d20-eec9a71321a7', ...]
# Won't work since its only for integer values
.order("idx(ARRAY#{order}, #{search_class.primary_key})")
# .to_sql ORDER BY idx(ARRAY[\"fcdc924a-21da-440e-8d20-eec9a71321a7\", ...], id)
# Only returns an ActiveRecord_Relation
# on .inspect it returns `PG::InFailedSqlTransaction: ERROR:`
.order("array_position(ARRAY#{order}, #{search_class.primary_key})")
# .to_sql : ORDER BY array_position(ARRAY[\"fcdc924a-21da-440e-8d20-eec9a71321a7\", ...], id)
I am sort of stuck since rails forces attribute arguments in the future and an has no option to opt out of this. Since the order is a code generated array and I have full control of the values I am curious how I can implement this. Maybe someone had this issue before an give some useful insight or idea?
You could try to apply Arel.sql to the elements of the array, that should work, ie
search_class.where(search_class.primary_key => record_ids)
.order(order.map {|i| i.is_a?(String) ? Arel.sql(i) : i})

How to get word2index from gensim

By doc we can use this to read a word2vec model with genism
model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)
This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?
The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property.
For example:
word = "whatever" # for any word in model
i = model.vocab[word].index
model.index2word[i] == word # will be true
Even simpler solution would be to enumerate index2word
word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)}
word2index['hi'] == 30308 # True

How to use Resolv::DNS::Resource::Generic

I would like to better understand how Resolv::DNS handles records that are not directly supported. These records are represented by the Resolv::DNS::Resource::Generic class, but I could not find documentation about how to get the data out of this record.
Specifically, my zone will contain SSHFP and TLSA records, and I need a way to get to that data.
Through reverse engineering, I found the answer - documenting it here for others to see.
Please note that this involves undocumented features of the Resolv::DNS module, and the implementation may change over time.
Resource Records that the Resolv::DNS module does not understand are represented not through the Generic class, but rather through a subclass whose name represents the type and class of the DNS response - for instance, an SSHFP record (type 44) will be represented as Resolv::DNS::Resource::Generic::Type44_Class1
The object will contain a method "data" that gives you access to the RDATA of the record, in plain binary format.
Thus, to access an SSHFP record, here is how to get it:
def handle_sshfp(rr) do
# the RDATA is a string but contains binary data
data = rr.data.bytes
algo = data[0].to_s
fptype = data[1].to_s
fp = data[2..-1].to_s
hex = fp.map{|b| b.to_s(16).rr.rjust(2,'0') }.join(':')
puts "The SSHFP record is: #{fptype} #{algo} #{hex}"
end
Resolv::DNS.open do |dns|
all_records = dns.getresources('myfqdn.example.com', Resolv::DNS::Resource::IN::ANY ) rescue nil
all_records.each do |rr|
if rr.is_a? Resolv::DNS::Resource::Generic then
classname = rr.class.name.split('::').last
handle_sshfp(rr) if classname == "Type44_Class1"
end
end
end

PIG: Cannot turn (key, (tuple_of_3_things)) into (key, tupelement1, tupelement2, tupelement3)

I have a relation, reflat1. Below are the output of DESCRIBE and DUMP.
reflat1: {cookie: chararray,tupofstuff: (category: chararray,weight: double,lasttime: long)}
(key1,(613,1.0,1410155702)
(key2,(iOS,1.0,1410155702)
(key3,(G.M.,1.0,1410155702)
Yes, I notice that the parentheses do not get closed. I have no clue why. Perhaps the reason there are no parentheses is the source of all of my problems.
I want to transform it to a relation (let's call it reflat2) with 4 fields, which would ideally look like:
(key1, 613, 1.0,1410155702)
(key2, iOS, 1.0,1410155702)
(key3, G.M., 1.0,1410155702)
But my code is NOT working. Below is the relevant bit.
reflat2 = foreach reflat1 {
GENERATE
cookie as cookie,
tupofstuff.(category) as category,
tupofstuff.(weight) as weight,
tupofstuff.(lasttime) as lasttime;
};
r1 = LIMIT reflat2 100;
dump r1;
Which leads to the schema I'd expect:
DESCRIBE reflat2
reflat2: {cookie: chararray,category: chararray,weight: double,lasttime: long}
But gives an error on the dump:
Unable to open iterator for alias r1
When I look at the errors on the failed MapReduce jobs, I see:
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple
Which is weird, because if anything I'm casting a tuple to a string (and a double and a long), not vice versa.
Using FLATTEN on TUPLE brings the elements of tuples outside. You should use flatten as below:
reflat2 = foreach reflat1 GENERATE cookie, FLATTEN(tupofstuff);
Hope this helps.

A ruby SPARQL query that SELECT(s) all triples for a given subject

I'm using the following ruby library to query a SPARQL endpont.
http://ruby-rdf.github.io/sparql-client/
I'm really close to achieving what I want. With the following code I can get a print out of all the triples in my database.
sparql = SPARQL::Client.new("http://localhost:3030/ds/query")
query = sparql.select.where([:s, :p, :o]).offset(100).limit(1000)
query.each_solution do |solution|
puts solution.inspect
end
But now I want to change this just slightly and select all the triples for a given subject. I thought the following would work, but it doesn't.
sparql = SPARQL::Client.new("http://localhost:3030/ds/query")
itemname = "lectio1"
query = sparql.select.where(["<http://scta.info/items/#{itemname}>", :p, :o]).offset(100).limit(1000)
query.each_solution do |solution|
puts solution.inspect
end
This would work in a straightforward SPARQL syntax, but somehow replacing the symbol :s with the literal subject I want queried doesn't work. The error that Sinatra gives me is:
expected subject to be nil or a term, was "<http://scta.info/items/lectio1>";
As you noted in the comments,
The error that Sinatra gives me is: expected subject to be
nil or a term, was "<http://scta.info/items/lectio1>"
You're pasing the method a string, but it's expecting nil or a term. The kind of term that it's expecting is an RDF::Term. In your particular cases, you want a URI (which is a subclass of RDF::Resource, which is a subclass of RDF::Term). You can create the reference that you're looking for with
RDF::URI.new("http://scta.info/items/#{itemname}")
so you should be able to update your code to the following (and depending on your imports, you might be able to drop the RDF:: prefix):
sparql = SPARQL::Client.new("http://localhost:3030/ds/query")
itemname = "lectio1"
query = sparql.select.where([ RDF::URI.new("http://scta.info/items/#{itemname}"), :p, :o]).offset(100).limit(1000)
query.each_solution do |solution|
puts solution.inspect
end

Resources