clusteredPoints of cluster result disappear [mahout] - hadoop

I got CSV and TEXT format results like followings with clusterdump.
CSV:
0,Sports_38.txt
1,Sports_23.txt
2,Sports_36.txt
3,Sports_13.txt
4,Sports_31.txt,Sports_32.txt
5,Sports_28.txt,Sports_29.txt
6,Sports_2.txt
9,Sports_15.txt
TEXT:
{"identifier":"VL-1","r":[],"c":[...,"n":7}
Top Terms:
什 => 15.829998016357422
利物浦 => 13.629814147949219
克 => 11.317766189575195
格 => 10.938775062561035
特 => 10.842317581176758
尔 => 10.447234153747559
切尔西 => 9.742402076721191
比赛 => 8.247735023498535
表现 => 7.909337520599365
批评 => 7.462332725524902
I noticed that just one point of VL-1 in CSV file but 7 points of VL-1 in TEXT file (VL-1's "n" equals 7).
Why did some points disappear? And how can I get every points' cluster?
Thanks a lot.

I also got empty clusteredPoints if the data is a little bigger.
I finally found the reason by myself.
clusterClassificationThreshold should be 0 in Kmeans.run's 8th parameter.(mahout 1.0)
Check this: http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700#windwardsolutions.com%3E

Related

How to hash by choosing the key and the string value with Ruby

I'm traying to hash urls with Ruby but I had some problems the size of my urls differs from one url to another, hence my hash key doesn't give me the right result.
2 examples of my urls
url1="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/wan/16170515?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
url2="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/BBBB/wan/1617051?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
Example of my code to hash url1:
url1="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/wan/16170515?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
spliturl=my_url.gsub("?","/")
url=spliturl.split("/")
if !url.count.even?
url.push(nil)
h=Hash[*url]
puts h
end
My result:
{"Services"=>"name_services", "prise"=>"name_prise", "tech"=>"sites", "xxxx"=>"yyyy", "devices"=>"AAAA", "wan"=>"16170515", "startDate=2021-01-18T23:00:00.000Z&endDate=2021-01-19T08:22:42.000Z&timeProfile=1&tz=CET"=>nil}
The "sites" has become a value and the "sites" value has become a key !!
{"tech"=>"sites", "xxxx"=>"yyyy", "devices"=>"AAAA", "wan"=>"16170515"}
But the result I would like to have from url1:
{"sites" => "xxxx/yyyy", "devices" => "AAAA", "wan" => "16170515"}
and from url2:
{"sites" => "xxxx/yyyy", "devices" => "AAAA/BBBB", "wan" => "1617051"}
I have one idea how you could solve the problem:
result = url1.match /\/sites\/(?<sites>.*)\/devices\/(?<devices>.*)\/wan\/(?<wan>.*)\?/
Then to get values from results:
result[:sites] => "xxxx/yyyy"
result[:devices] => "AAAA"
result[:wan] => "16170515"

Magento attribute type issue

I'm using Magento EE 1.14.2. I'm trying to export all attributes with its type in web service. All is working fine but weight attribute. Its a system attribute. The attribute type should be text but it's showing me weight. I checked fresh magento also. My code is,
$attribute = Mage::getSingleton('eav/config')->getAttribute('catalog_product', 'weight');
print_r($attribute);
My o/p is like,
[attribute_id] => 80
[entity_type_id] => 4
[attribute_code] => weight
[attribute_model] =>
[backend_model] =>
[backend_type] => decimal
[backend_table] =>
[frontend_model] =>
[frontend_input] => weight
[frontend_label] => Weight
[frontend_class] =>
[source_model] =>
Any hints or Ideas ,
Thanks.

Not able to style Excel with spreadsheet gem (Ruby)

Trying to style an excel following - ruby spreadsheet row background color but nothing is happening for me -
Here goes my code -
My formats:
pass_format = Spreadsheet::Format.new :color=> :blue, :pattern_fg_color => :green, :pattern => 1
fail_format = Spreadsheet::Format.new :color=> :blue, :pattern_fg_color => :red, :pattern => 1
skip_format = Spreadsheet::Format.new :color=> :blue, :pattern_fg_color => :yellow, :pattern => 1
Trying to use them here(just showing one rest are decided by if elses):
sheet1.row(counter).default_format = skip_format
sheet1[counter, 3] = 'Skipped'
sheet1.row(counter).default_format = skip_format
sheet1.row(counter).set_format(3, skip_format)
Counter is the row I am currently in. Here I am not sure whether I should format first or write first. What am I doing wrong? How to fix this?
Actually it's getting applied as I found from .inspect-
#<Spreadsheet::Format:0x007f082f9c1d58 #font=#<Spreadsheet::Font:0x007f082f9c1a88 #name="Arial", #color=:red, #previous_fast_key=nil, #size=nil, #weight=nil, #italic=nil, #strikeout=nil, #outline=nil, #shadow=nil, #escapement=nil, #underline=nil, #family=:swiss, #encoding=nil>, #number_format="GENERAL", #rotation=0, #pattern=1, #bottom_color=:black, #top_color=:black, #left_color=:black, #right_color=:black, #diagonal_color=:black, #pattern_fg_color=:yellow, #pattern_bg_color=:pattern_bg, #regexes={:date=>/[YMD]/, :date_or_time=>/[hmsYMD]/, :datetime=>/([YMD].*[HS])|([HS].*[YMD])/, :time=>/[hms]/, :number=>/[#]/}, #used_merge=0>
but even if it shows color red here in excel it's black. :(
I am editing the original file then writing in a new file as
book.write "Result.xls"
Is it the wrong approach? I am going to try to make a new workbook before editing and update.
Well, it was not possible to format the existing excel then write it as a new Excel. Formatting was lost in that.
To overcome I created a new excel (populated with my existing data read from the old excel) formatted it as I want then used
book.write "xxx.xls"

Magento get Shipping collectRates() attributes in observer

In a shippingModule in the collectRates method, i have set up a few values:
$method->setCarrier('test_customrate');
$method->setCarrierTitle($this->getConfigData('title'));
$method->setMethod('test_customrate');
$method->setMethodTitle($this->getConfigData('name'));
$method->setPrice($this->getConfigData('price'));
$method->setCost(2);
$method->setUsername($this->getConfigData('username'));
$method->setPassword($this->getConfigData('password'));
$result->append($method);
Where are these values stored in the checkout session?
I can't find them anywhere.
I have now found that with the below mentioned code in the observer, i can get a couple of values back as mentioned above. However, some values like cost, username and password are not present here.
$rates = Mage::getSingleton('checkout/session')->getQuote()->getShippingAddress()
->getShippingRatesCollection();
foreach ($rates as $rate) {
Mage::log($rate->getData());
}
this retrieves something of the following structure:
2013-06-01T15:36:10+00:00 DEBUG (7): Array
(
[rate_id] => 852
[address_id] => 93
[created_at] => 2013-06-01 15:36:06
[updated_at] => 2013-06-01 15:36:09
[carrier] => test_customrate
[carrier_title] => test_customrate
[code] => test_customrate_test_customrate
[method] => test_customrate
[method_description] =>
[price] => 0.0000
[method_title] => test123
[error_message] =>
)
I worked around this by simply getting the values directly from the shipping module config.
Like this:
Mage::getStoreConfig('section/group/field');
So I'm just getting the data from system.xml like i did in the collectRates().
However this is quite static and i would still prefer a method to get this straight from the order.
For now this fixes my problem. If anything knows any other ways feel free to answer

capped collection mongodb

I have issues with mongoDB.
Currently i'm working with Ruby mongodb drivers and there r some strange things r going on:
i need to insert 20 documents in the capped collection but when i write the following code, it inserts only 3 docs and i can't get what's going on:
coll = db.create_collection("test",:capped => true, :max=>20)
1024.times{#pad_string +=" "}
20.times{coll.insert({
:HostName => #hostname,
:CommandLine => #cmdline,
:Pid => "1111",
:BlockName => #blockname,
:ExitCode => 0,
:StartTime => Time.now,
:EndTime => Time.utc(2000,"jan",1,00,00,00),
:StdErr => #pad_string,
:Stdout => #pad_string}
)}
actually the point is that i insert #pad_string with 1024 preallocated spaces. As soon as i do that before inserting 1024.times{#pad_string +=" "}, it inserts only 3 docs maximum.
When you cap a collection based on the number of objects you also have to cap it based on size - I wonder what size the ruby driver is sending down.
try this:
coll = db.create_collection("test",:capped => true, :size=>100000, :max=>20)
Then tweak the size to whatever works for you (it's in bytes).

Resources