How many lines and documents should be there in the training data opennlp categorizer - opennlp

I am following the documentation for Apache open-nlp. I was able to understand the sentence detection, tokenizer, name-finder. But I got stuck for Categorizer. The reason, I can not understand, how to create a model for Categorization.
I do understand that I need to create a file. The format is very clear, it needs to be a category space and a document in a single line. Save the file with .train extension.
So I created the following file:
Refund What is the refund status for my order #342 ?
NewOffers Are there any new offers for your products ?
I gave this command-
opennlp DoccatTrainer -model en-doccat.bin -lang en -data en-doccat.train -encoding UTF-8
It starts doing something and then returns with an error. These are the contents in the command prompt:
Indexing events using cutoff of 5
Computing event counts... done. 2 events
Indexing... Dropped event Refund:[bow=What, bow=is, bow=the, bow=refund, bow=status, bow=for, bow=my, bow=order, bow=#342, bow=?]
Dropped event NewOffers:[bow=Are, bow=there, bow=any, bow=new, bow=offers, bow=for, bow=your, bow=products, bow=?]
done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:162)
at opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:61)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)
I am just not able to figure out why is this giving a null pointer exception here? I also tried to increase two more lines, but no result.
Refund What is the refund status for my order #342 ?
NewOffers Are there any new offers for your products ?
Refund Can I place a refund request for electronics ?
NewOffers Is there any new offer on buying worth 5000 ?
I found this blog, but here also pretty much the same thing is done. On trying his training file it works with a charm. What is wrong in my file? How do I resolve the error.
When I try opennlp DoccatTrainer it opens help for me, so path is not an issue. Any help is appreciated.
EDIT: I changed the file to
Refund What is the refund status for my order #342 ? Can I place a refund request for clothes ?
NewOffers Are there any new offers for your products ? what are the offers on new products or new offers on old products?
Refund Can I place a refund request for electronics ?
NewOffers Is there any new offer on buying worth 5000 ?
and it works, I thought it has got to do something with the document (apparently should be two sentences) and removed the last two lines.
to make it
Refund What is the refund status for my order #342 ? Can I place a refund request for clothes ?
NewOffers Are there any new offers for your products ? what are the offers on new products or new offers on old products?
But then again it fails, the question now summarizes to what kind of data/ format/document does it need?
Thanks

you have to add more than 5 samples from each category. because default cutoff mark size is 5,
Please refer this blog post
http://madhawagunasekara.blogspot.com/2014/11/nlp-categorizer.html

You can use the -cutoff flag in your DoccatTrainer command to change the default. In your case, you would add -cutoff 1 to set the minimum number of documents per category to 1.

Related

Magento - Set Order as Complete for Free Downloable Product

I searched a lot on this website but couldn't find a specific answer for my problem so here i go, thanks in advance.
I have Store Credit and Coupons working on my Store, with Credit Card. Store credit is working fine, also the coupons but i cannot make those purchases to be ser as COMPLETE. They stay in "Pending" and i need it as Complete so the download will be enable. If downloadble product is in Pending, then you cannot download.
The place where i worked a lot is on app/code/core/Mage/Payment/Model/Method/Free.php
I checked on this file and at the bottom i have:
/**
* Get config payment action, do nothing if status is pending
*
* #return string|null
*/
public function getConfigPaymentAction()
{
return $this->getConfigData('order_status') == 'pending' ? null : parent::getConfigPaymentAction();
}
It's saying: "If pending, do nothing"...and what i want if it's pending, change it as Complete.
Also i'm using Zero Subtotal Checkout and it's setup as "Complete" th status but seems that's not working or something is overriding this.
By default, users can only download when the invoice is made(order complete). You can, however, change this setting so that users can download products on pending orders.
Here's how to do it:
Go to System->Configuration->Catalog->Downloadable Product Options and change the value for Order Item Status to Enable Download to Pending.
Soledad, There are two approaches to this issue. Firstly, set up the status for downloadable products correctly and use Zero Checkout Payment Method. Secondly, if the first one is not working, add manually the status. I've written an article about it at my blog. I hope it helps.

reading EMV card using PPSE and not PSE

I'm trying to read the data off a contactless Visa Paywave card.
For the Paywave, I have to submit a SELECT using PPSE (2PAY.SYS.DDF01) instead of PSE (1PAY.SYS.DDF01).
The EMV book 1, section 11.3.4, table 43 only describes how to interpret the response for a successful SELECT command using PSE. Does anyone know or can refer me to a source that shows how to process the data returned from a successful SELECT command using PPSE?
Here's my request APDU:
00A404000e325041592e5359532e444446303100
Here's the response:
6F2F840E325041592E5359532E4444463031A51DBF0C1A61184F07A0000000031010500A564953412044454249548701019000
I understand tag 84, tag 85, tag BF0C from the response. According to the examples for reading PSE, I should be able to just send GET PROCESSION OPTIONS (to get the AIP and AFL) with PDOL = null after this successful response as follows: 80A80000830000.
But request 80A80000830000 returns error code 6985 - Command not allowed; conditions of use not satisfied.
I also tried reading all the files after successfully selecting the PPSE by traversing through every single SFI (0-30) and every single record (0-16) of each SFI. Yes, I also did the 3 bit shift and bitwise-OR the SFI with 0x4. But I got no data.
I'm stuck, any help that would point me into getting some info from my Paywave card would be appreciated!
Have you tried this tool from EMVLAB http://www.emvlab.org/emvtags/
Using that tool,
http://www.emvlab.org/tlvutils/?data=6F2F840E325041592E5359532E4444463031A51DBF0C1A61184F07A0000000031010500A564953412044454249548701019000
2PAY.SYS.DDF01 is for contactless (e.g. NFC ) cards, while 1PAY.SYS.DDF01 is for contact cards.
After successfully (SW1 SW2 = 90 00) reading a PSE, you should only search for the SFI (tag 88) which is a mandatory field in the FCI template returned.
With the SFI as your start index, your would have to read the records starting from the start index until you get a 6A83 (RECORD_NOT_FOUND). E.g. if your SFI is 1, you would do a readRecord with record_number=1. That would probably be successful. Then you increament record_number to 2 and do readRecord again. The increament to 3 .... Repeat it until you get 6A83 as your status.
The records read would be ADFs (at least 1). Then your would have to compare the read ADF Names with what your terminal support and also based on the ASI (Application Selection Indicator). At the end you would have a list of possible ADFs (Candidate list)
All the above steps (1-3) are documented in chapter 12.3.2 Book1 v4.3 of the EMV spec.
You would have to make a final selection (Chapter 12.4 Book1)
Read the spec book 1 chapter 12.3 - 12.4 for all the detailed steps.
You seem to have the flow mixed up a bit, you want to:
Send 1PAY or 2PAY, it doesn't actually matter for all of the cards I've tested. This will return a list of the AIDs available on the card. Alternately you can just select an AID straight away if you know it's there but good practice would be to check first.
Get the list of AIDs returned in response to 1PAY/2PAY, in PayWave's case this will probably be A0000000031010 if you sent 2PAY but you may get more if you send 1PAY.
Select one of the AIDs sent back (or one you already know is on there).
Then loop through the SFIs and records sending the Read Records command to get the data.
You don't have to send Get Processing Options before sending the Read Records command even though that's now a normal transaction flow goes.
I think the information you're looking for is available from this VISA website. But only if you're a registered and/or licensed partner of VISA.
EDIT: Looking at the resulting TLV struct under BF0C:
tag=0xBF0C, length=0x1A
tag=0x61, length=0x18
tag=0x4F, length=0x07, value=0xA0000000031010 // looks like an AID to me
tag=0x50, length=0x0A, value="VISA DEBIT"
tag=0x87, length=0x01, value=0x01
I would guess that you need to first select A0000000031010 before getting the processing options.
I was selecting application 2PAY.SYS.DDF01. when I should have been selecting AID = 0xA0000000031010. It looks like there's no records under application 2PAY.SYS.DDF01.
But there was 1 record under application 0xA0000000031010. After I got this application, I performed a READ RECORD, and the first record gave me the PAN and all the credit card info I wanted.
Thanks everyone for chiming in.

Magento CE 1.7.0.2 - Editing order not canceling original

I'm not sure exactly what change would of caused the issue where if I edit a processing order and place new one it does not cancel the original one. Looked on Google and StackOverflow for existing solution but came up empty really.
Steps to Reproduce (Scenario):
You need to edit an order because customer forgot to add an item to it so I click "Edit" on that order which is in "Processing" status
Place the order
Looking at the Sales->orders list I can see that the original order is in Processing status still IN ERROR. The new order has same order# with "-1" appended at end which is good
So, I was wondering if anyone else has experienced such an issue. It used to cancel the original order after you placed it. The warning JS message that pops up after clicking "Edit" says it would place new order and mark current as Canceled so something is wrong. Nothing seems out of the ordinary in my config.
EDIT: Guess nobody has experienced an issue such as this. I can't think of anything that would cause this. Since this post I've upgraded Magento to CE 1.7.0.2.
Thanks,
George
EDIT: Screenshot attached:
You can't cancel order that is already processing ( invoiced ) / Shipped / Complete !!
You only can cancel order in NEW State
In your case you had an invoiced order and you want to do re-order :-
Press Edit and go ahead with the new Orders ( The order will be suffixed by -1/-2 because this means this order linked/related to the previous order )
Go the original order and Refund it complete ( it will be in closed state / status )
You need to understand the work flow of the order and the operations you can take on the order in EACH STATE

Scraping Real Time Visitors from Google Analytics

I have a lot of sites and want to build a dashboard showing the number of real time visitors on each of them on a single page. (would anyone else want this?) Right now the only way to view this information is to open a new tab for each site.
Google doesn't have a real-time API, so I'm wondering if it is possible to scrape this data. Eduardo Cereto found out that Google transfers the real-time data over the realtime/bind network request. Anyone more savvy have an idea of how I should start? Here's what I'm thinking:
Figure out how to authenticate programmatically
Inspect all of the realtime/bind requests to see how they change. Does each request have a unique key? Where does that come from? Below is my breakdown of the request:
https://www.google.com/analytics/realtime/bind?VER=8
&key= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&ds= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&pageId=rt-standard%2Frt-overview
&q=t%3A0%7C%3A1%3A0%3A%2Ct%3A11%7C%3A1%3A5%3A%2Cot%3A0%3A0%3A4%2Cot%3A0%3A0%3A3%2Ct%3A7%7C%3A1%3A10%3A6%3D%3DREFERRAL%3B%2Ct%3A10%7C%3A1%3A10%3A%2Ct%3A18%7C%3A1%3A10%3A%2Ct%3A4%7C5%7C2%7C%3A1%3A10%3A2!%3Dzz%3B%2C&f
The q variable URI decodes to this (what the?):
t:0|:1:0:,t:11|:1:5:,ot:0:0:4,ot:0:0:3,t:7|:1:10:6==REFERRAL;,t:10|:1:10:,t:18|:1:10:,t:4|5|2|:1:10:2!=zz;,&f
&RID=rpc
&SID= [What is this? Where does it come from? 16 character uppercase alphanumeric, stays the same each request]
&CI=0
&AID= [What is this? Where does it come from? integer, starts at 1, increments weirdly to 150 and then 298]
&TYPE=xmlhttp
&zx= [What is this? Where does it come from? 12 character lowercase alphanumeric, changes each request]
&t=1
Inspect all of the realtime/bind responses to see how they change. How does the data come in? It looks like some altered JSON. How many times do I need to connect to get the data? Where is the active visitors on site number in there? Here is a dump of sample data:
19
[[151,["noop"]
]
]
388
[[152,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[49,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2,0],"name":"Total"}]}}]]]
]
388
[[153,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[52,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[2,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2],"name":"Total"}]}}]]]
]
388
[[154,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[53,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,3,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3],"name":"Total"}]}}]]]
]
Let me know if you can help with any of the items above!
To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/
This is quite similar to Google Analytics API. To start development on this,
https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide
With Google Chrome I can see the data on the Network Panel.
The request endpoint is https://www.google.com/analytics/realtime/bind
Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.
After about 2.5 minutes the connection is closed and a new one is open.
On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.
I hope that can give you a place to start.
Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.
<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX goes here
$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>
Ammendum:
A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.
Client side Javascript with AJAX for fine tuning or overkill
The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/
When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.
As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.
I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.
Instead of binding to /bind I get data from /getData (no pun intended).
At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1
Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:
Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters
Values:
pageID: Required but seems to only be used for internal analytics.
propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.
dataType:
t: Current data
ot: Overtime/Past
c: Unknown, returns only a "count" value
dimensions (| separated or alone), most values are only applicable for t:
1: Country
2: City
3: Location code?
4: Latitude
5: Longitude
6: Traffic source type (Social, Referral, etc.)
7: Source
8: ?? Returns (not set)
9: Another location code? longer.
10: Page URL
11: Visitor Type (new/returning)
12: ?? Returns (not set)
13: ?? Returns (not set)
14: Medium
15: ?? Returns "1"
page:
At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.
For some reason 0 returns an impossibly high metrictotal
limit: Result limit per page, maximum of 50
filters:
Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States
You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).
Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States
Documentation
Documentation 2
I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

magento order id increment jumps

For some reason order ID's (increment_id on sales_flat_order table) are not incrementing subsequently on my Magento 1.6.1. This is how it looks after a number of live orders placed:
increment_id created_at updated_at
100000001 2011-12-14 12:35:24 2011-12-14 12:35:25
100000002 2011-12-14 13:02:39 2011-12-14 13:02:39
100000003 2011-12-14 13:04:18 2011-12-14 13:04:18
100000004 2012-02-01 16:54:58 2012-02-01 16:54:58
100000005 2012-03-14 12:22:35 2012-03-14 12:22:35
100000006 2012-03-20 13:10:48 2012-03-20 13:10:48
100000011 2012-03-29 20:58:48 2012-03-29 20:58:48
100000012 2012-03-29 21:06:43 2012-03-29 21:06:43
100000013 2012-03-30 10:48:20 2012-03-30 10:48:21
100000014 2012-03-30 13:05:40 2012-03-30 13:05:41
100000015 2012-04-03 15:51:01 2012-04-03 15:51:02
100000016 2012-04-19 15:00:49 2012-04-19 15:00:50
100000017 2012-05-09 12:09:21 2012-05-09 12:09:22
100000019 2012-05-24 05:35:35 2012-05-24 05:35:36
100000020 2012-05-24 05:41:11 2012-05-24 05:41:12
100000008 2012-05-24 05:48:52 2012-05-24 05:48:53
My question is why is Magento jumping increments sometimes? And worse yet, in my example order with increment 100000008 goes after 100000020. Does someone know why this is happening and if there's a way to fix it?
This is normal, albeit understandably disconcerting.
When Magento enters the checkout process it 'reserves' an increment_id and places it on the quote (cart) object. You can see the code that gets an increment id at:
Mage_Eav_Model_Entity_Type::fetchNewIncrementId()
The last used ID for each store is stored in eav_entity_store. If a customer abandons their cart (ie the quote object) before completing the checkout process, the reserved increment_id will never show up on an order. You can see this effect sometimes in the order numbers as they come in on a busy store - occassionally a really old order id comes through in the day's orders from a customer that is checking out an old cart.
This behaviour exists to allow Magento to send payment gateways the final order id (increment_id), before the order is completed allowing the gateway to associate the order id with the order. If the customer abandons the payment process in the gateway, the order id is dead (or more correctly still attached to the quote).
You can see this happening in the PayPal express module at:
Mage_Paypal_Model_Express_Checkout::start()
which calls
Mage_Sales_Model_Quote::reserveOrderId()
If you want to find your 'missing' increment_ids, take a look in sales_flat_quote under the field reserved_order_id. You should see them attached to unconverted quote objects (carts).
This behaviour can create issues with some payment gateways; Moneris comes to mind. When you send Moneris' hosted paypage the same order id twice, it chokes and creates a cryptic error state for the customer. This condition occurs when the customer visits the hosted pay page, backs out and re-visits the page. Hence in some cases, it is necessary to re-generate the order id associated with the quote object programmatically.
I was facing the same issue but it was only when the server was hit with a huge amount of load. This issue occurs because the db goes into the lock state while converting quote into order. On further inspection, I found out that the issue was that it tried to write into sales_flat_order_grid table within transaction right after insert into sales_flat_order table. With concurrent queries it caused locking collisions. The real solution is to move stuff of sales_flat_order_grid out of the transaction.
The link helped me understand the issue
The patch resolved the issue for me.
You have to remove function _afterSave from the Mage_Sales_Model_Abstract and add
public function afterCommitCallback(){
if (!$this->getForceUpdateGridRecords()) {
$this->_getResource()->updateGridRecords($this->getId());
}
parent::afterCommitCallback();
}
Let me know if it solves the problem for you.
We have had this same issue multiple times over the past couple of months. Upon checking our payment service provider transaction list we see 1000's of low value (micro) transactions being declined due to potential fraud issues. My opinion is that a fraudster is trying to use our checkout process to probe the list of cards they have to find out what cards are valid and what cards are dead. I have reported it to action fraud, our web host and our payment provider.
In summary, my advise would be for you to check your PSP list of transactions for the same time period.
Good luck with it,
Brisc.

Resources