dynamically classify categories - algorithm

I am new at the idea of programming algorithms. I can work with simplistic ideas, but my current project requires that I create something a bit more complicated.
I'm trying to create a categorization system based on keywords and subsets of 'general' categories that filter down into more detailed categories that requires as little work as possible from the user.
I.E.
Sports >> Baseball >> Pitching >> Nolan Ryan
So, if a user decides they want to talk about "Baseball" and they filter the search, I would like to also include 'Sports"
User enters: "baseball"
User is then taken to Sports >> Baseball
Now I understand that this would be impossible without a living - breathing dynamic program that connects those two categories in some way. It would also require 'some' user input initially, and many more inputs throughout the lifetime of the software in order to maintain it and keep it up to date.
But Alas, asking for such an algorithm would be frivolous without detailing very concrete specifics about what I'm trying to do. And i'm not trying to ask for a hand out.
Instead, I am curious if people are aware of similar systems that have already been implemented and if there is documentation out there describing how it has been done. Or even some real life examples of your own projects.
In short, I have a 'plan' but it requires more user input than I really want. I feel getting more info on the subject would be the best course of action before jumping head first into developing this program.
Thanks

IMHO It isn't as hard as you think. What you want is called Tagging and you can do it Automatically just by setting the correlation between tags (i.e. a Tag can have its meaningful information plus its reation with other ones. Then, if user select a Tag well, you related that with others via looking your ADT collection (can be as simple as an array).
Tag:
Sport
Related Tags
Football
Soccer
...
I'm hoping this helps!

It sounds like what you want to do is create a tree/menu structure, and then be able to rapidly retrieve the "breadcrumb" for any given key in the tree.
Here's what I would think:
Create the tree with all the branches. It's okay if you want branches to share keys - as long as you can give the user a "choice" of "Multiple found, please choose which one... ?"
For every key in the tree, generate the breadcrumb. This is time-consuming, and if the tree is very large and updating regularly then it may be something better done offline, in the cloud, or via hadoop, etc.
Store the key and the breadcrumb in a key/value store such as redis, or in memory/cached as desired. You'll want every value to have an array if you want to share keys across categories/branches.
When the user selects a key - the key is looked up in the store, and if the resulting value contains only one match, then you simply construct the breadcrumb to take the user where you want them to go. If it has multiple, you give them a choice.
I would even say, if you need something more organic, say a user can create "new topic" dynamically from anywhere else, then you might want to not use a tree at all after the initial import - instead just update your key/value store in real-time.

Related

Unified IDs of geographical locations

I want to use locations' titles in my app, like 'Chicago, Illinois, USA', or 'Surrey, British Columbia, Canada', or one of Springfields.
I am going to add them to the DB one by one during the app lifecycle, no need to add all at once, and think that it would be nice to identify them all with unique IDs. I could just go from 1 to n, as a key.
But for future potential flexibility I could use some criteria to make sure I will get that very Springfield when I decode and enter its ID somewhere, like Google.
May be I can use lat/lon data from public sources, e.g. Wikipedia and turn the pair into a key? Or may be there are already some IDs assigned by authorities or some agency that are kind of a standard?
One possibility is to use a GeoHash of the location. This would give you a unique code for each well positioned location you are using. An added bonus is that it would allow you to determine how close they were to each other too.

Can Groups be used to emulate the "class" or "struct" data structures from other languages

Is there a data structure within LiveCode that can be used as a "holder" for associated data, letting me handle it collectively? I come from a Java / Javascript / C background so I am looking for a Class or Struct sort of data structure.
I've found examples of Groups, which seem to have some of this functionality, but it feels a bit like I'm bending the language to meet my needs.
As a specific example, suppose I had an image field on my screen that would randomly display an image and, when pressed, play an associated sound clip. I'd expect to create a list of "structures" that contained the path to the image and the path to the associated sound clip, and use that data to populate the image field and to decide what sound clip to play.
Would a Group be the correct structure to use in this case? Or am I approaching this in a way that isn't really fitting with the way LiveCode works?
It takes a little getting used to, but the xTalk world is much simpler and more open than any ordinary procedural language. So much of what you once had to manage is no longer required.
So when splash21 said that you could store all your image and sound references in a custom property, he was really saying that the LiveCode environment contains intrinsic, high level functionality that makes these sorts of things instantly accessible, and the only thing required of you is to call for them, and they simply work.
The only way to appreciate this is to make a few simple programs, to really see what is possible. Make your application. Everything you mentioned can be accomplished with perhaps a dozen lines of code in a single handler. I recommend that you join the LiveCode use list and forums. The community is vibrant and eager to help, frequently with full blown solutions to specific problems, but more importantly, as guides and mentors to new users
Craig Newman
Arrays in LiveCode are actually associative arrays (like hash maps). A key is associated with a value. The value might be as well an array.
Chapter 5.5.7 of the User's Guide says
Array elements may contain nested or sub-elements, making them multi-dimensional.
This type of array is ideal for processing hierarchical data structures such as trees or
XML. To access a sub-element, simply declare it using an additional set of square
brackets.
put "ABC" into myVariable["myKeyName"][“aChildElement”]
see also
How to store pictures in a stack?
Dave- I'm hoping to get a struct-like container implemented in the near future. Meanwhile you can, as splash21 mentioned, use custom properties (or better yet, custom property sets) to do what you want. This will give you a pseudo-struct for each object and you can implement the file and sound specifications into the properties. And if you use that in conjunction with a behavior object you'll end up very close to a real inheritable class formation.

Database structure advice needed about services provided by user

My project calls for something similar to yellow pages:
Storing users services and providing look-up by service.
My current solution is very rigid and cumbersome -
I have 3 preset lists: Industry, Trade, Specialty.
User starts at top level and defines his service(s).
I also create a "search in place" option where providing strings performs a look-up in the string of "Industry - Trade - Specialty".
I noticed Google, LinkedIn and yellow-pages provide a much easier solution where users can put in free text, and the system will give results for Certified Public Accountant even if search term is CPA.
Any thoughts on a smarter, more efficient and easy for the user solution?
I am not looking for the exact db structure, the general algorithm will suffice.
Thank you.
Have you considered a tag structure? That allows entries in multiple places, tag hierarchies, tag "remaps" (standardize tag names).
Basically, you do NOT have a hierarchy in - entries are flat, but have tags attached that can form hierarchies. This is flexibility you need - a company may provide multiple services, you may want to have tags standardized, too.
Simply said:
A table (or more - I keep it general) for the entries
A table for tags, parenting itself (form a tree)
An EntryTAble tag.
The tag table must allow crosslinks (alternative tag, status etc.).

How can I do "related tags"?

I have tags on my website, and I input them one by one when I create a blog post. I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.
I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database ... thoughts ?
It all boils down to create associations between certain characteristics of your posts and certain tags, and then - when you press the "publish" button - to analyse the new post and propose all tags matched with your post characteristics.
This can be done in several ways from a "totally hard-coded" association to some sort of "learning AI"... and everything in-between.
Hard-coded solutions
This are the simplest algorithms to implement. You should first decide what characteristics of your post are relevant for tagging (e.g.: it's length if you tag them "short" or "long", the presence of photos or videos if you tag them "multimedia-content", etc...). The most obvious is however to focus on which words are used in posts. For example you could build a mapping like this:
tag_hint_words = {'code-development' : ['programming',
'language', 'python', 'function',
'object', 'method'],
'family' : ['Theresa', 'kids',
'uncle Ben', 'holidays']}
Then you would check your post for the presence of the words in the list (the code between [ and ] ) and propose the tag (the word before :) as a possible candidate.
A common approach is to give "scores", or in other word to put a number that indicates the probability a given tag is the right one. For example: if your post would contain the sentence...
After months of programming, we finally left for the summer holidays at uncle Ben's cottage. Theresa and the kids were ecstatic!
...despite the presence of the word "programming" the program should indicate family as the most likely tag to use, as there are many more words hinting.
Learning AI's
One of the obvious limitations of the above method is that - say one day you pick up java beside python - you would probably need to change your code and include words like "java" or "oracle" too. The same applies if you create new tags.
To circumvent this limitation (and have some fun!!) you could try to implement a learning algorithm. Learning algorithms are those who refine their outcome the more you use them (so they indeed... learn!). Some algorithm requires initial training (many spam filters and voice recognition programs need this initial "primer"). Some don't.
I am absolutely no expert on the subject, but two common AI's are: the Naive Bayes Classifier and some flavour of Neural network.
Although the WP pages might look scary, they are surprisingly easy to implement (at least in Python). Here's the recording of a lecture at PyCon 2009 on the subject "Easy AI with Python". I found it very informative and even somehow inspiring! :)
HTH!
You should have a look at this post :
Any suggestions for a db schema for storing related keywords?
If you're looking for a schema for storing related tags it will help.
Relevancy searches where multiple agents play a part are usually done using Collaborative filtering. You might want to give that a look see.
Look up Clustering (Machine Learning algorithm). Don't be intimidated by math, it's a pretty straightforward algorithm. Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.

Shortening a "Long" parameter list

I'm refactoring one of my projects - a shopping cart. One of the tightly coupled areas of my code is the "Viewer" class - to generate information for the user to see, it often needs a combination of two or more of the following objects:
The store catalog.
The customer's order.
The customer's mailing information.
I can't really break up the display methods, for various reasons.
Martin Fowler's Refactoring identifies this as a "Long parameter list" smell. The relevant refactoring here is "Introduce parameter object." I am, however, hesitant to do that, as doing so would couple loosely related data. It would also lock me to a very narrow one-to-one relationship between those three objects - while that would work for my application as it is now, it makes no real-world sense. (As there is only one store catalog, there can be many "Customer mailing information" objects, and each of those may be related to many "Customer's order" objects).
Does anyone have any elegant solutions to this?
A parameter list of three parameters needs no refactoring. Start worrying when you reach, say, 8 or 10 parameters.
Try to name the thing that binds a catalog, an order, and an address. Start, maybe with CatalogOrderAddressTuple. Ugly, isn't it? Well, maybe as a utility class of your Viewer, it should just be an inner class, where you could get by with just Tuple - or Data. Still ugly.
It doesn't sound like these belong as actual fields to the Viewer - but explore what that would look like, how your code would change if each Viewer were simply constructed with the data on which it operates.
As ammoQ & Ryan Prior have said, this isn't much of a smell, but I'd say it's worth playing with some alternatives before giving up entirely.
As stated by ammoQ, looking for refactoring opportunities with so few parameters is stretching.
See also: KISS and YAGNI.
It seems to me that introducing a parameter object would have the opposite effect of locking you to a one-to-one relationship between those three objects.
If a single customer address is being passed around, and in the future someone decides to separate billing address and shipping address, then it would probably be simpler to add the new address to a parameter object, rather than add a new parameter up and down the call stack.
(This is just an example, of course. Ideally, the address information would exist separately in the order and customer objects, since all information on posted orders should be immutable, even if the customer's address changes. But that wasn't what you were asking about!)

Resources