How do lightgbm encode categorial features? - lightgbm

I have the following structure of one lightGbm tree:
{'split_index': 0,
'split_feature': 41,
'split_gain': 97.25859832763672,
'threshold': '3||4||8',
'decision_type': '==',
'default_left': False,
'missing_type': 'None',
'internal_value': 0,
'internal_weight': 0,
'internal_count': 73194,
'left_child': {'split_index': 1,
and the feature in 0 node is categorial and I feed this feature in format "category".
where can I find the appropriate between number format and category?

The numbers you see are the values of the codes attribute of your categorical features. For example:
import pandas as pd
s = pd.Series(['a', 'b', 'a', 'a', 'b'], dtype='category')
print(s.cat.codes)
# 0 0
# 1 1
# 2 0
# 3 0
# 4 1
# dtype: int8
so in this case 0 is a and 1 is b.
You can build a mapping from the category code to the value with something like the following:
dict(enumerate(s.cat.categories))
# {0: 'a', 1: 'b'}
If the categories in your column don't match the ones in the model, LightGBM will update them.

Related

traversing graph and creating dynamic variable

I have a simple graph with few nodes and these nodes have attributes such as "type" and "demand".
def mygraph():
G = nx.Graph()
G.add_nodes_from([("N1", {"type":"parent","demand": 10}),
("N2"{"type":"parent","demand": 12}),
("N3", {"type":"parent","demand": 25}),
("S1", {"type":"server","demand": 12}),
("S2,{"type":"server","demand": 20})])
I am passing this graph to another function in pyomo library. The dummy pyomo function is as follows:
def mymodel():
g=mygraph()
**VARIABLES**
model.const1 = Constraint(my constraint1)
model.const2 = Constraint(my constraint2)
model.obj1 = Objective(my objective)
status = SolverFactory('glpk')
results = status.solve(model)
assert_optimal_termination(results)
model.display()
mymodel()
I am trying to:
In graph function mygraph():, I need to find the total number of nodes in the graph G with attribute type==parent.
In pyomo function mymodel():, I need to create the number new of VARIABLES equal to the number of nodes with attribute type==parent. So in the case above, my program must create 3 new variables, since 3 nodes have attribute type==parent in my graph function. The values of these newly created variables will be accessed from the demand attribute of the same node thus, it should be something like this;
new_var1=demand of node1 (i.e., node1_demand=10 in this case)
new_var2=demand of node2 (i.e., node2_demand=12)
new_var3=demand of node3 (i.e., node2_demand=25)
For the first part you can loop over the nodes:
sum(1 for n,attr in G.nodes(data=True) if attr['type']=='parent')
# 3
# or to get all types
from collections import Counter
c = Counter(attr['type'] for n,attr in G.nodes(data=True))
# {'parent': 3, 'server': 2}
c['parent']
# 3
c['server']
# 2
For the second part (which also gives you the answer of the first part of you check the length):
{n: attr['demand'] for n,attr in G.nodes(data=True) if attr['type']=='parent'}
# or
[attr['demand'] for n,attr in G.nodes(data=True) if attr['type']=='parent']
Output:
{'N1': 10, 'N2': 12, 'N3': 25}
# or
[10, 12, 25]
instanciating attributes
def mymodel():
g = mygraph()
nodes = [attr['demand']
for n,attr in G.nodes(data=True)
if attr['type']=='parent']
# initialize model?
for i,n in enumerate(nodes, start=1):
setattr(model, f'const{1}', Constraint(something with n))
# ...

Sort and return dict by specific list value

I found this: Sort keys in dictionary by value in a list in Python
and it is almost what I want. I want to sort exactly as is defined in the above post, i.e., by a specific item in the value list, but I want to return the entire original dictionary sorted by the specified list entry, not a list of the keys.
My last try has failed:
details = {'India': ['New Dehli', 'A'],
'America': ['Washington DC', 'B'],
'Japan': ['Tokyo', 'C']
}
print('Country-Capital List...')
print(details)
print()
temp1 = sorted(details.items(), key=lambda value: details[value][1])
print(temp1)
The error:
{'India': ['New Dehli', 'A'], 'America': ['Washington DC', 'B'], 'Japan': ['Tokyo', 'C']}
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 11, in <module>
temp1 = sorted(details.items(), key=lambda value: details[value][1])
File "C:/Users/Mark/PycharmProjects/main/main.py", line 11, in <lambda>
temp1 = sorted(details.items(), key=lambda value: details[value][1])
TypeError: unhashable type: 'list'
You are trying to use the (key, value) pair from the dict.items() sequence as a key. Because value is a list, this fails, as keys must be hashable and lists are not.
Just use the value directly:
temp1 = sorted(details.items(), key=lambda item: item[1][1])
I renamed the lambda argument to item to make it clearer what is being passed in. item[1] is the value from the (key, value) pair, and item[1][1] is the second entry in each list.
Demo:
>>> details = {'India': ['New Dehli', 'A'],
... 'America': ['Washington DC', 'B'],
... 'Japan': ['Tokyo', 'C']
... }
>>> sorted(details.items(), key=lambda item: item[1][1])
[('India', ['New Dehli', 'A']), ('America', ['Washington DC', 'B']), ('Japan', ['Tokyo', 'C'])]

ruby arrays can each_with_index do steps?

I have a txt file of records:
firstname lastname dob ssn status1 status2 status3 status4 firstname lastname dob ...
I can get this into an array:
tokens[0] = firstname
...
tokens[8] = firstname (of record 2).
tokens[9] = lastname (of record 2) and so on.
I want to iterate over tokens array in steps so I can say:
record1 = tokens[index] + tokens[index+1] + tokens[index+2] etc.
and the step (in the above example 8) would handle the records:
record2, record3 etc etc.
step 0 index is 0
step 1 (step set to 8 so index is 8)
etc.
I guess I should say these records are coming from a txt file that I called .split on:
file = File.open(ARGV[0], 'r')
line = ""
while !file.eof?
line = file.readline
end
##knowing a set is how many fields, do it over and over again.
tokens = line.split(" ")
Does this help?
tokens = (1..80).to_a #just an array
tokens.each_slice(8).with_index {|slice, index|p index; p slice}
#0
#[1, 2, 3, 4, 5, 6, 7, 8]
#1
#[9, 10, 11, 12, 13, 14, 15, 16]
#...
Using each_slice you could also assign variables to your fields inside the block:
tokens.each_slice(8) { |firstname, lastname, dob, ssn, status1, status2, status3, status4|
puts "firstname: #{firstname}"
}

Carrot2 circle chart

Anyone know how to create circle chart like the one used in carrto2?
The mbostock/d3 gallery has good visualizations for Carrot2 output.
This carrot2-rb ruby client for Carrot2 returns an object with a clusters array. The scores and phrases attributes can be used in a simple doughnut chart.
More dynamic visualizations like expandable dendrograms are possible with tree structures like flare.json.
Here is a zoomable wheel based on Carrot2 results.
This is the coffeescript code I wrote to create flare.json using the documents elements.
clusters = [{"id":0,"size":3,"phrases":["Coupon"],"score":0.06441151442396735,"documents":["0","1","2"],"attributes":{"score":0.06441151442396735}},{"id":1,"size":2,"phrases":["Exclusive"],"score":0.7044284368639101,"documents":["0","1"],"attributes":{"score":0.7044284368639101}},{"id":2,"size":1,"phrases":["Other Topics"],"score":0.0,"documents":["3"],"attributes":{"other-topics":true,"score":0.0}}]
flare = get_flare clusters
get_children = (index, index2, clusters, documents) ->unless index == (clusters.length - 1) # If not last cluster
orphans = {'name': ''}
intr = _.intersection(documents, clusters[index2].documents);
if intr.length > 0 # continue drilling
if index2 < (clusters.length - 1) # Up until last element.
# Get next layer of orphans
orphan_docs = _.difference(intr, clusters[index2 + 1].documents)
if orphan_docs.length > 0
orphans = {'name': orphan_docs, 'size': orphan_docs.length}
if _.intersection(intr, clusters[index2 + 1].documents).length > 0
return [orphans, {'name': clusters[index2+1].phrases[0], 'children': get_children(index, (index2 + 1), clusters, intr)}]
else
return [orphans]
else
# At second to last cluster, so terminate here
return [{'name': inter}]
else # No intersection, so return bundle of current documents.
return [{'name': documents}]
return [{'name': _.intersection(clusters[index].documents, clusters[index2].documents)}]
get_flare = (clusters) ->
# Make root object
flare =
name: "root"
children: []
children = flare.children
_.each(clusters[0..(clusters.length - 2)], (cluster, index) -> # All clusters but the last. (It has already been compared to previous ones)
#All documents for all remaining clusters in array
remaining_documents = _.flatten(_.map clusters[(index + 1)..clusters.length], (c) ->
c.documents
)
root_child = {'name': cluster.phrases[0], 'children': []}
# Get first layer of orphans
orphan_docs = _.difference(cluster.documents, remaining_documents)
if orphan_docs.length > 0
root_child.children.push {'name': orphan_docs, size: orphan_docs.length}
for index2 in [(index + 1)..(clusters.length - 1)] by 1
if _.intersection(cluster.documents, clusters[index2].documents).length > 0
root_child.children.push {'name': clusters[index2].phrases[0], 'children': get_children(index, (index2), clusters, cluster.documents)}
children.push root_child
)
flare
You can buy their Circles Javascript component: http://carrotsearch.com/circles-overview

Parsing text in Ruby

I'm working on a script for importing component information for SketchUp. A very helpful individual on their help page, assisted me in creating one that works on an "edited" line by line text file. Now I'm ready to take it to the next level - importing directly from the original file created by FreePCB.
The portion of the file I wish to use is below: "sample_1.txt"
[parts]
part: C1
ref_text: 1270000 127000 0 -7620000 1270000 1
package: "CAP-AX-10X18-7X"
value: "4.7pF" 1270000 127000 0 1270000 1270000 1
shape: "CAP-AX-10X18-7"
pos: 10160000 10160000 0 0 0
part: IC1
ref_text: 1270000 177800 270 2540000 2286000 1
package: "DIP-8-3X"
value: "JRC 4558" 1270000 177800 270 10668000 508000 0
shape: "DIP-8-3"
pos: 2540000 27940000 0 90 0
part: R1
ref_text: 1270000 127000 0 3380000 -600000 1
package: "RES-CF-1/4W-4X"
value: "470" 1270000 127000 0 2180000 -2900000 0
shape: "RES-CF-1/4W-4"
pos: 15240000 20320000 0 270 0
The word [parts], in brackets, is just a section heading. The information I wish to extract is the reference designator, shape, position, and rotation. I already have code to do this from a reformatted text file, using IO.readlines(file).each{ |line| data = line.split(" ");.
My current method uses a text file re-formatted as thus: "sample_2.txt"
C1 CAP-AX-10X18-7 10160000 10160000 0 0 0
IC1 DIP-8-3 2540000 27940000 0 90 0
R1 RES-CF-1/4W-4 15240000 20320000 0 270 0
I then use an array to extract data[0], data[1], data[2], data[3], and data[5].
Plus an additional step, to append ".skp" to the end of the package name, to allow the script to insert components with the same name as the package.
I would like to extract the information from the 1st example, without having to re-format the file, as is the case with the 2nd example. i.e. I know how to pull information from a single string, split by spaces - How do I do it, when the text for one array, appears on more than one line?
Thanks in advance for any help ;-)
EDIT: Below is the full code to parse "sample_2.txt", that was re-formatted prior to running the script.
# import.rb - extracts component info from text file
# Launch file browser
file=UI.openpanel "Open Text File", "c:\\", "*.txt"
# Do for each line, what appears in braces {}
IO.readlines(file).each{ |line| data = line.split(" ");
# Append second element in array "data[1]", with SketchUp file extension
data[1] += ".skp"
# Search for component with same name as data[1], and insert in component browser
component_path = Sketchup.find_support_file data[1] ,"Components"
component_def = Sketchup.active_model.definitions.load component_path
# Create transformation from "origin" to point "location", convert data[] to float
location = [data[2].to_f, data[3].to_f, 0]
translation = Geom::Transformation.new location
# Convert rotation "data[5]" to radians, and into float
angle = data[5].to_f*Math::PI/180.to_f
rotation = Geom::Transformation.rotation [0,0,0], [0,0,1], angle
# Insert an instance of component in model, and apply transformation
instance = Sketchup.active_model.entities.add_instance component_def, translation*rotation
# Rename component
instance.name=data[0]
# Ending brace for "IO.readlines(file).each{"
}
Results in the following output, from running "import.rb" to open "sample_2.txt".
C1 CAP-AX-10X18-7 10160000 10160000 0<br>IC1 DIP-8-3 2540000 27940000 90<br>R1 RES-CF-1/4W-4 15240000 20320000 270
I am trying to get the same results from the un-edited original file "sample_1.txt", without the extra step of removing information from the file, with notepad "sample_2.txt". The keywords, followed by a colon (part, shape, pos), only appear in this part of the document, and nowhere else, but... the document is rather lengthy, and I need the script to ignore all that appears before and after, the [parts] section.
Your question is not clear, but this:
text.scan(/^\s+shape: "(.*?)"\s+pos: (\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)
will give you:
[["CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
["DIP-8-3", "2540000", "27940000", "0", "90", "0"],
["RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]
Added after change in the question
This:
text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/m)
will give you
[["C1", "CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
["IC1", "DIP-8-3", "2540000", "27940000", "0", "90", "0"],
["R1", "RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]
Second time Added after change in the question
This:
text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)/m)
will let you capture numbers even if they are negative.
Not sure exactly what you're asking, but hopefully this helps you get what you're looking for.
parts_text = <<EOS
[parts]
part: **C1**
ref_text: 1270000 127000 0 -7620000 1270000 1
package: "CAP-AX-10X18-7X"
value: "4.7pF" 1270000 127000 0 1270000 1270000 1
shape: "**CAP-AX-10X18-7**"
pos: **10160000** **10160000** 0 **0** 0
part: **IC1**
ref_text: 1270000 177800 270 2540000 2286000 1
package: "DIP-8-3X"
value: "JRC 4558" 1270000 177800 270 10668000 508000 0
shape: "**DIP-8-3**"
pos: **2540000** **27940000** 0 **90** 0
part: **R1**
ref_text: 1270000 127000 0 3380000 -600000 1
package: "RES-CF-1/4W-4X"
value: "470" 1270000 127000 0 2180000 -2900000 0
shape: "**RES-CF-1/4W-4**"
pos: **15240000** **20320000** 0 **270** 0
EOS
parts = parts_text.split(/\n\n/)
split_parts = parts.each.map { |p| p.split(/\n/) }
split_parts.each do |part|
stripped = part.each.collect { |p| p.strip }
stripped.each do |line|
p line.split(" ")
end
end
This could be done much more efficiently with regular expressions, but I opted for methods that you might already be familiar with.

Resources