Yahoo Pipes: Missing Filter Rules - filter

I am trying to build a simple RSS filter in Yahoo Pipes for the first time.I would like to filter by date.
I have a bunch of RSS feeds from a "fetch feed" data source and I connect them to a filter. I am now supposed to get a item.y:published.utime field in the Rules dropdown. However, it is not there: I do not have any of the item.y fields at all.
If I start by editing an example filter, these fields are there. What is going on?

Dropdown lists include only the basic options as these are most commonly used by novices making simple pipes. Yahoo developers also left many options off the dropdown lists because some of them are quite long and would spoil the layout such as this example.. *item.y:published.day_ordinal_suffix*
If you want to use the special 'y' fields as sources of data then simply type them by hand into the rules box of the filter module using the keyboard device attached to your computer.

Related

Applying different parsefilters to each domain in the same topology

I am trying to crawl different websites (e-commerce websites) and extract specific information from the pages of each website (i.e. product price, quantity, date of publication, etc.).
My question is: how to configure the parsing since each website has a different HTML layout which means I need different Xpaths for the same item depending on the website? Can we add multiple parser bolts in the topology for each website? If yes, how can we assign different parsefilters.json files to each parser bolt?
You need #586. At the moment there is no way to do it but to put all your XPATH expressions regardless of the site you want to use them on in the parsefilters.json.
You can't assign different parsefilters.json to the various instances of a bolt.
UPDATE however you could have multiple XpathFilters sections within the parseFilters.json. Each could cover a specific source, however, there is currently no way of constraining which source a parse filter gets applied to. You could extend XPathFilter so that it takes some extra config e.g. regular expression a URL must match in order to be applied. That would work quite nicely I think.
I've recently added JsoupFilters which will be in the next release. These should be useful for your use case but that still doesn't solve the issue that you need an implementation of the filter that organizes the resources per host. It shouldn't be too hard to implement taking the URL filter one as a example and would also make a very nice contribution to the project.

Kofax Seperate Main Invoice from Supporting Document without using Seperator sheet

When a batch gets created documents should get separated automatically without using separator sheet or Barcode separator.
How can I classify documents for Invoice and supporting document.
In our project we get many invoices with supporting document so the scanning person has to insert the separator sheets manually, so to avoid this we want to automatically classify the supporting documents.
In general the concept would be that you would enable separation in the project and then train your classes with examples to be used for the layout or content classifiers.
However, as I'm sure you've seen, the obstacle with invoices is that they are different enough between vendors that it would not reliably classify all to an Invoice class. Similarly with "Supporting Documents" which are likely to be very different from each other, so unfortunately there isn't a completely easy answer without separator sheets (or barcode stickers affixed to supporting docs).
What you might want to do is write code in the one of the separation events like Document_AfterSeparate event. Despite the name, the document has not yet been split at this point, but the classifiers have run. See Scripting Help topic "Server Script Events Sequence > Document Separation > Standard Document Separation" for more detail. Setting the SplitPage property on the CDocPage (pXDoc.CDoc.Pages.ItemByIndex(lPage).SplitPage) will allow you to use your own logic to determine which pages to separate.
For example if you know that you will always have single page invoices, you can split on the first page and classify accordingly. Or you can try to search for something that indicates the end of the invoice like "Total" or other characteristics. There is an example of how you can use locators to help separation in the Scripting Help topic "Script Samples > Use Locator Results for Standard Document Separation". The example uses a Barcode Locator, but the same concept works if you wanted to try it with a Format Locator or anything else.
Without Separator sheets you will need a smart classification software like Kofax Transformation Module (KTM). Its kind of expensive. you will need to verify the cost saving and ROI.

What data structure is best to represent search filters?

I'm implementing a native iOS application, which provides a feature to search for articles using keywords and several search filters such as author, year, publisher etc. Each of these search filters has several options such as 2014, 2013, and 2012 for the year filter, or Academe Research Journals, Annex Publishers, and Elmer Press for the publisher filter. Each of these options has a BOOL stating whether the option is selected or not. I need an object that wraps the search keywords and search filters so that I can send it to the server, which is responsible for the search operation.
Which data structure should I use to represent these search filters in the wrapper class?
Something like XML comes to mind:
<year>2014</year>
<publisher>Annex Publishers</publisher>
Although I find it rather bulky. I'd probably prefer something like:
year=2014|publisher=Annex Publishers
You'll need to escape = and | appearing in the field names or values, but this is easy to do.
This could just be a single string sent across.
In terms of actual data structures, you could have a map of field name to value, only containing entries where the option is selected. Or you could have a class containing pointers / references for each field, set to null if the option is not selected.
Another totally different consideration is to use an enumerated type, i.e. mapping each possible value to an integer, typically resulting in less memory used and faster (and possibly more robust) code, depending on how exactly this is done.
You could map it as follows, for example:
Academe Research Journals 0
Annex Publishers 1
Elmer Press 2
Then, rather than sending "Annex Publishers" as publisher, you could just send 1.
year=2014|publisher=1
The extension for multiple possible values for a field can be done in various ways, but it's fairly easy to do:
<year>2014</year>
<year>2013</year>
<publisher>Annex Publishers</publisher>
Or:
year=2014,2013|publisher=Annex Publishers

How do I take each line of a text file and insert them into a web form? Specifically, for testing domain name availability

I wrote a Ruby script that appended "data" to the beginning of every word of the English dictionary, and then filtered out various strings using different parameters, and now I want to use a site like namecheap or gandi.net in order to take each of these strings and insert them into the domain name availability checker in order to determine which ones are available.
It is my understanding that this will involve making a POST HTTP request of some kind, as well as grabbing the element in question, but I don't really understand the dynamics of what to read about in order to do this kind of thing.
I imagine that after a few requests I will be limited, but as a learning exercise I am still curious as to how I would go about doing this.
I inspected the element (on namecheap) to see what the tag looked like, to find any uniquely identifiable class/id names that I could use to grab that specific part of the source, and found that inside a fieldset tag, there was a line of HTML that I can't seem to paste here, so here is a picture:
Thanks in advance for any guidance in helping me learn about web scripting!

Autocomplete vs Drop-down. When to use?

I've read somewhere (can't remember/find where) an article about web usability describing when to use drop downs and when to use autocomplete fields.
Basically, the article says that the human brain cannot store more then the last five options presented to choose.
For example, in a profile form, where there is your current occupation, and the system gives you a bunch of options, when you read the sixth options, your brain can't remember the first one anymore. This example is a great place to use an autocomplete field, where the user types something that he thinks that would be his occupation and then select the better from the few options filtered.
I would like to hear your opinion about this subject.
When should I use a drop-down and when I should use a Autocomplete field?
For a limited list, don't use an autocomplete edit box or combobox, but use a listbox where all values are visible all at once. For limited lists, especially with static content of up to about 8 items, this takes up real estate, but presents the user with a better immediate overview.
For less than 5 items a radiogroup or checkbox group (multiple selections) may also be better.
For lists whose content is dynamic, like a list of contacts, a (scrolling) listbox or combobox are appropriate because you never know how many items will be in the list. To keep it manageable, you will need to allow for some kind of filtering and/or autocomplete.
Autocomplete usually suffers from the fact that what the users types needs to match a string from the beginning. I hate those except for when they are used to complete a value based on what I typed in that (type of) field before. E.g. what browsers nowadays offer when filling out online forms.
Allowing a user to start typing in a combobox usually suffers the same drawback. But admittedly it doesn't need to if the filtering is based on "like %abc%" instead of "starts with abc"
When dealing with lists that can have many similar items, I really like the way GMail's "To" field handles it. You start typing any part of someone's name or e-mail address and GMail will drop down a list presenting all the contacts whose name or e-mail address contains the characters you have typed so far anywhere within them. Using the up and down keys changes the selection in the dropped down list (without affecting what you have typed) and pressing enter adds the currently selected item to the "To" field. By far the best user experience I have had so far when having to select something from a list.
Haven't found any components yet that can do this, but it's not too hard to "fake" by combining an edit box and a listbox that drops down when you start typing and has its contents is filtered based on what has been typed so far.
I would use 2 criteria,
1) How long is the list, if the list contains 5 elements you better use a combobox as it will be easier for the user (better UX)
2) In case the list is long, how easy for the user to remember the prefix of what he/she is looking for... if it's not easy, using autocomplete is irrelevant..
I'd say it depends on the diversity in the list, and the familiarity with the list items.
If for example the list contained over 5 car brands (list items I'm familiar with), no problem.
If on the other hand the list has over 5 last names, it could take me some more time before I'd make a selection.
You should probably just try out both options and trust your gut on which you find easier to use.
Here's the opposite approach:
The worst time to use an auto-complete box is when you have a finite and relatively small set of options, and the user doesn't know the range of valid options. For example, if you're selling used cars and you have a mixed bag of brands, simply listing the brands in a combobox is more efficient and easier to browse than an auto-complete method.
Being able to remember the last 5 options is most likely irrelevant unless you have a giant list of options and are requiring the user to select the most relevant item.
An alternative approach is to use both. I believe Dojo has a widget that acts as both a combobox and an auto-complete field. You can either choose to start typing and it will narrow down the possible options, or you can interact with it with your mouse and browse it like a combobox.
I usually look at how big the list is going to be. If there are going to be more than 15 options then it just seems easier to find if they can just start typing.
The other circumstance for me is when there is an other option and they can free type it. This essentially eliminates the need for two controls since you can combine in one.
The main difference has nothing to do with usability but more to do with what defines the acceptable inputs.
You normally use a ComboBox when you have a predefined list of acceptable inputs (e.g. an Enum or list of occupations).
An auto-complete field is best used when there are many known inputs BUT custom input is accepted as well. The user will become frustrated if they type "Programmer" in as their occupation but it wasn't one of the pre-defined, acceptable inputs, and they are given a message that their input is not valid.
Keep in mind that ComboBoxes do allow you to type in them to select the first matching option. Some types of ComboBoxes (depending on the UI framework you are using) even allow free-form text fields at the top or side of the field to search or add to the list.
Of coarse the best way to determine what YOUR user will prefer is testing: A/B, field, user, etc.
Hope this helps you solve your dilemma!

Resources