How to test only relevant items and remove popular items from test whit Crossfold in Lenskit 3.0-M2 - lenskit

How can I configure the crossfold task to generate test sets only with relevant items. That is, with rating values higher than the average rating of the user and do not consider the popular items?
Is it posible with lenskit-3.0?

It is not directly possible - the crossfolder does not take into account rating values.
A patch that added such capabilities would be welcome; you can also directly emit train/test sets yourself in Python or R.

Related

Calling one feature file as background step into another using cucumber ruby

I have a feature file A with 8-9 lines of steps which is covering one scenario. Now I need to use feature file A as a background step in feature file B by reducing number of steps to 3-4.
My approach:
In feature file B - Reduce steps(from feature file A) to 3-4 lines, use helper methods and add as background steps.
Feature file A:
Feature: I want to create an event
Background: User is Logged In
Given a logged in user
Scenario: Creating an event
Given I select event
And I add event details
And I add start and end time
Then Timings will be added successfully
When I add ticket information and continue
And Publish my event
Then I verify event will be created successfully
Feature file B
Feature: Place an order
Background: Event is created
Given a logged in user
When I select event and fill in required details
Then event should be published
I'm concern about duplication. I'm using feature file A as a background step in feature file B by reducing number of steps but functionally both feature files are testing same feature.
Please suggest a better approach if possible. Thank you
So fundamentally using helper methods to reduce number of steps is only something you should do if the compressed steps are also conveying the information correctly. So here is an example (You don't need helpers here either), that would be a good use case.
Given I have a party of 2/1/0 # This means adults/children/infants
And the child is under 12
And I am flexible on my flights
And I am going to Spain
When I search for flights
Given I have Spanish flights displayed # You could also add the pax in here if you wanted
Now if you're wanting to use helper methods, that is also fine, but you need to remember cucumber is primarily a tool for encouraging collaboration as well as providing documentation, testing and specification in the same place. So once you try to DRY up your lines, think about whether actually you just want to "compress" the lines down.
i.e.
Given('I am {int} years old') do |age|
#person.age = age
end
Given('my name is {word}') |name|
#person.name = name
end
Given('my hometown is {string}') |hometown|
#person.location = hometown
end
Can become
Given('I am {word}, {int} years old from {string}') |name, age, hometown|
#person.name = name
#person.age = age
#person.location = hometown
end
Hopefully some of these tips will have given you some thought.

Is there a way to represent a measurement with multiple values as a observation?

I got a problem that I have some measurement data ( like Echo measurements), that can potentially have multiple values associated with it. In other words, you have a single measurement, but multiple values associated.
Is there a standard way to represent multiple values for a single measurement as a Observation?If so, what is the best way?
I notice that under observation, you can have multiple components, should I put my LOINC code for my measurement just at observation level and put each value at component level? Or I have to use extensions?
Thanks!
I am not sure exactly what your data looks like but here are a couple of patterns:
There is sampledData Datatype that can be used for datastreams like an EKG
example
If you have discrete values that are all interpreted together with an observation ( they can't stand alone as independent observations) the using components with an Observation.code= code, Observation.value[x] is empty , Observation.component.code= code, Observation.component.value[x]= result value. here is an example of this pattern.
In some cases you will have an Observation.value[x] as well.
Note Observation.component.code is required for each component.
For grouping indpendent observations together using component is not appropriate. This grouping is done using DiagnosticReport.result or Observation.related. The DiagnosticReport resource which typically used for reporting diagnostics in responce to an order.

Mahout Content Based Recommendation Engine

I am working on a recommendation problem (Content based recommendation). I have my data set in mongodb in json format.
Problem Statement
There are items which have their own properties, and users have some preference regarding each properties. Now I am thinking to predict how much the item x will be liked by the user based on the properties of item and comparing the preferences of the user for same properties that item x have. I want to build a recommendation system to recommend the items to user , based on their preference.
I am thinking of using Mahout and CBAYES Classifier algorithm to predict , "how much item x will be liked by User A ". But I haven't found any example and data set for implementing CBAYES using mahout.
If you have any other suggestion to use any other classifier algorithm then please recommend.
You can calculate “how much item x will be liked by User A” by using cosine similarity. Please refer the following link for your more information.
Reference link: What's difference between Collaborative Filtering Item-based recommendation and Content-based recommendation
Regards,
Rajasekar

dynamically classify categories

I am new at the idea of programming algorithms. I can work with simplistic ideas, but my current project requires that I create something a bit more complicated.
I'm trying to create a categorization system based on keywords and subsets of 'general' categories that filter down into more detailed categories that requires as little work as possible from the user.
I.E.
Sports >> Baseball >> Pitching >> Nolan Ryan
So, if a user decides they want to talk about "Baseball" and they filter the search, I would like to also include 'Sports"
User enters: "baseball"
User is then taken to Sports >> Baseball
Now I understand that this would be impossible without a living - breathing dynamic program that connects those two categories in some way. It would also require 'some' user input initially, and many more inputs throughout the lifetime of the software in order to maintain it and keep it up to date.
But Alas, asking for such an algorithm would be frivolous without detailing very concrete specifics about what I'm trying to do. And i'm not trying to ask for a hand out.
Instead, I am curious if people are aware of similar systems that have already been implemented and if there is documentation out there describing how it has been done. Or even some real life examples of your own projects.
In short, I have a 'plan' but it requires more user input than I really want. I feel getting more info on the subject would be the best course of action before jumping head first into developing this program.
Thanks
IMHO It isn't as hard as you think. What you want is called Tagging and you can do it Automatically just by setting the correlation between tags (i.e. a Tag can have its meaningful information plus its reation with other ones. Then, if user select a Tag well, you related that with others via looking your ADT collection (can be as simple as an array).
Tag:
Sport
Related Tags
Football
Soccer
...
I'm hoping this helps!
It sounds like what you want to do is create a tree/menu structure, and then be able to rapidly retrieve the "breadcrumb" for any given key in the tree.
Here's what I would think:
Create the tree with all the branches. It's okay if you want branches to share keys - as long as you can give the user a "choice" of "Multiple found, please choose which one... ?"
For every key in the tree, generate the breadcrumb. This is time-consuming, and if the tree is very large and updating regularly then it may be something better done offline, in the cloud, or via hadoop, etc.
Store the key and the breadcrumb in a key/value store such as redis, or in memory/cached as desired. You'll want every value to have an array if you want to share keys across categories/branches.
When the user selects a key - the key is looked up in the store, and if the resulting value contains only one match, then you simply construct the breadcrumb to take the user where you want them to go. If it has multiple, you give them a choice.
I would even say, if you need something more organic, say a user can create "new topic" dynamically from anywhere else, then you might want to not use a tree at all after the initial import - instead just update your key/value store in real-time.

Tag/Keyword based recommendation

I am wondering what algorithm would be clever to use for a tag driven e-commerce enviroment:
Each item has several tags. IE:
Item name: "Metallica - Black Album CD", Tags: "metallica", "black-album", "rock", "music"
Each user has several tags and friends(other users) bound to
them. IE:
Username: "testguy", Interests: "python", "rock", "metal", "computer-science"
Friends: "testguy2", "testguy3"
I need to generate recommendations to such users by checking their interest tags and generating recommendations in a sophisticated way.
Ideas:
A Hybrid recommendation algorithm can be used as each user has friends.(mixture of collaborative + context based recommendations).
Maybe using user tags, similar users (peers) can be found to generate recommendations.
Maybe directly matching tags between users and items via tags.
Any suggestion is welcome. Any python based library is also welcome as I will be doing this experimental engine on python language.
1) Weight your tags.
Tags fall into several groups of interest:
My tags that none of my friends share
Tags a number of my friends share, but I don't
My tags that are shared by a number of my friends.
(sometimes you may want to consider friend-of-a-friend tags too, but in my experience the effort hasn't been worth it. YMMV.)
Identify all tags that the person and/or the person's friends have in interests, and attach a weight to the tags for this individual. One simple possible formula for tag weight is
(tag_is_in_my_list) * 2 + (friends_with_tag)/(number_of_friends)
Note the magic number 2, which makes your own opinion worth twice as much as that of all of your friends put together. Feel free to tweak :-)
2) Weight your items
For each item that has any of the tags in your list, just add up all of the weighted values of the tags. A higher value = more interest.
3) Apply a threshold.
The simplest way is to show the user the top n results.
More sophisticated systems also apply anti-tags (i.e. topics of non-interest) and do many other things, but I have found this simple formula effective and quick.
If you can, track down a copy of O'Reilly's Programming Collective Intelligence, by Toby Segaran. There's a model solution in it for exactly this problem (with a whole bunch of really, really good other stuff).
Your problem is similar to product recommendation engines, such as Amazon's well publicized site. These use a learning algorithm called association rules, which basically build a conditional probability of user X buying product Y based on common features Z between the user and product. A lot of open source toolkits implement association rules, such as Orange and Weka.
You can use the Python Semantic module for Drools to specify your rules in python scripting language. You can accomplish this easily using Drools. It is a terrific rules engine that we used to solve several recommendation engines.
I would use a Restricted Boltzmann Machine. Gets around the problem of similar but not identical tags quite neatly.

Resources