Related
I've gone through the ocumentation and tried understanding the Phrase List feature. Although I'm sure of the purpose of the Phrase List feature, I couldn't quite get the purpose of the "interchangable" option intutively.
Any thorough explanation would be appreciated.
#Srichakradhar at your suggestion, posting answer related to your question on gitter to here on StackOverflow as well to benefit the community as a whole!:
"...regarding your question on phrase lists, happy to speak high-levelly on what the feature does :)
#srichakradhar
So ultimately the goal with LUIS is to understand the meaning of the user’s input (utterance), and through calculations, it returns to you the value of how confident it is about the meaning of the input. Using phrase lists is one of the ways to improve the accuracy of determining the meaning of the user’s utterance
—more specifically, when adding features to a phrase list, it can put more weight on the score of an intent or entity.
Using a couple of examples to illustrate the high-level concept of how features help determine intent/entity score, and in turn predict the user’s utterance’s meaning:
For example, if I wanted to describe a class called Tablet, features I could use to describe it could include screen, size, battery, color, etc. If an utterance mentions any of the features, it’ll add points/weight to the score of predicting that the utterance’s meaning is describing Tablet. However, features that would be good to include in a phrase list are words that are maybe foreign, proprietary, or perhaps just rare. For example, maybe I would add, “SurfacePro”, “iPad”, or “Wugz” (a made-up tablet brand) to the phrase list of Tablet. Then if a user’s utterance includes “Wugz”, more points/weight would be put onto predicting that Tablet is the right entity to an utterance.
Or maybe the intent is Book.Flight and features include “Book”, “Flight”, “Cairo”, “Seattle”, etc. And the utterance is “Book me a flight to Cairo”, points/weight towards the score of Book.Flight intent would be added for “Book”, “flight”, “Cairo”.
Now, regarding interchangeable vs. non-interchangeable phrase lists.
Maybe I had a Cities phrase list that included “Seattle”, “Cairo”, “L.A.”, etc. I would make sure that the phrase list is non-interchangeable, because it would indicate that yes “Seattle” and “Cairo” are somehow similar to one-another, however they are not synonyms—I can’t use them interchangeably or rather one in place of the other. (“book flight to Cairo” is different from “book flight to Seattle”)
But if I had a phrase list of Coffee that included features “Coffee”, “Starbucks”, “Joe”, and marked the list as interchangeable, I’m specifying that the features in the list are interchangeable. (“I’d like a cup of coffee” means the same as “I’d like a cup of Joe”)
For more on Phrase Lists - Phrase List features in LUIS
For more on improving prediction - Tutorial: Add phrase list to improve predictions"
Taken from documentation (here):
A phrase list may be interchangeable or non-interchangeable. An
interchangeable phrase list is for values that are synonyms, and a
non-interchangeable phrase list is intended for values that aren't
synonyms but are similar in another way.
There is also a great reply here on MSDN:
Choose "Exchangeable" when the list of words or phases in your feature
form a class or group -- for example, months like "January",
"February", "March"; or names like "John", "Mary", "Frank". These
features are "exchangeable" in the sense that an utterance where one
word/phrase appears would be labeled similarly if the word/phrase were
exchanged with another. For example, if "show the calendar for January" has the same intent as "show the calendar for February", this
suggests choosing "exchangeable".
Choose "Not exchangeable" for words/phrases that are useful in your
domain, but which do not form a class or group. For example, the
words "calendar", "email", "show", and "send" might be relevant to
your domain, but might all be associated with different intents, like
"show my calendar" or "send an email".
If you're not sure, you can try either and see if there's any
difference in performance.
This is basically a modelling questions. Clinicians keeps lot of important information documented inside Clinical Notes for various type of encounters. How does FHIR specification suggests to model these notes ? Looking at the FHIR documentation does not provide a clear guidance on it.
Appreciate your help in advance.
There's a sub-WG on that:
http://wiki.hl7.org/index.php?title=ClinicalNote_FHIR_Resource_Proposal
I haven't had to deal hugely with interop (yet), so I've been sticking that type of thing in Narrative:
https://www.hl7.org/FHIR/narrative.html
Would a Clinical Impression cover it?
That doesn't seem quite correct as a clinical impression has some really specific fields whereas a note will typically be a block of text.
I think it would be through DiagnosticReport and DocumentReference.
The Proposal of Clinical Notes in Wikki will have different Categories which are mentioned on the page. but if you are looking for a specific category then which is different than the ones listed [Like Surgery/Pathology] it would be difficult in ClinicalNotes. Documentrefernce can achieve this.
I have tags on my website, and I input them one by one when I create a blog post. I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.
I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database ... thoughts ?
It all boils down to create associations between certain characteristics of your posts and certain tags, and then - when you press the "publish" button - to analyse the new post and propose all tags matched with your post characteristics.
This can be done in several ways from a "totally hard-coded" association to some sort of "learning AI"... and everything in-between.
Hard-coded solutions
This are the simplest algorithms to implement. You should first decide what characteristics of your post are relevant for tagging (e.g.: it's length if you tag them "short" or "long", the presence of photos or videos if you tag them "multimedia-content", etc...). The most obvious is however to focus on which words are used in posts. For example you could build a mapping like this:
tag_hint_words = {'code-development' : ['programming',
'language', 'python', 'function',
'object', 'method'],
'family' : ['Theresa', 'kids',
'uncle Ben', 'holidays']}
Then you would check your post for the presence of the words in the list (the code between [ and ] ) and propose the tag (the word before :) as a possible candidate.
A common approach is to give "scores", or in other word to put a number that indicates the probability a given tag is the right one. For example: if your post would contain the sentence...
After months of programming, we finally left for the summer holidays at uncle Ben's cottage. Theresa and the kids were ecstatic!
...despite the presence of the word "programming" the program should indicate family as the most likely tag to use, as there are many more words hinting.
Learning AI's
One of the obvious limitations of the above method is that - say one day you pick up java beside python - you would probably need to change your code and include words like "java" or "oracle" too. The same applies if you create new tags.
To circumvent this limitation (and have some fun!!) you could try to implement a learning algorithm. Learning algorithms are those who refine their outcome the more you use them (so they indeed... learn!). Some algorithm requires initial training (many spam filters and voice recognition programs need this initial "primer"). Some don't.
I am absolutely no expert on the subject, but two common AI's are: the Naive Bayes Classifier and some flavour of Neural network.
Although the WP pages might look scary, they are surprisingly easy to implement (at least in Python). Here's the recording of a lecture at PyCon 2009 on the subject "Easy AI with Python". I found it very informative and even somehow inspiring! :)
HTH!
You should have a look at this post :
Any suggestions for a db schema for storing related keywords?
If you're looking for a schema for storing related tags it will help.
Relevancy searches where multiple agents play a part are usually done using Collaborative filtering. You might want to give that a look see.
Look up Clustering (Machine Learning algorithm). Don't be intimidated by math, it's a pretty straightforward algorithm. Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.
assuming that I know nothing about everything and that I'm starting in programming TODAY what do you say would be necessary for me to learn in order to start working with Natural Language Processing?
I've been struggling with some string parsing methods but so far it is just annoying me and making me create ugly code. I'm looking for some fresh new ideas on how to create a Remember The Milk API like to parse user's input in order to provide an input form for fast data entry that are not based on fields but in simple one line phrases instead.
EDIT: RTM is todo list system. So in order to enter a task you don't need to type in each field to fill values (task name, due date, location, etc). You can simply type in a phrase like "Dentist appointment monday at 2PM in WhateverPlace" and it will parse it and fill all fields for you.
I don't have any kind of technical constraints since it's going to be a personal project but I'm more familiar with .NET world. Actually, I'm not sure this is a matter of language but if it's necessary I'm more than willing to learn a new language to do it.
My project is related to personal finances so the phrases are more like "Spent 10USD on Coffee last night with my girlfriend" and it would fill location, amount of $$$, tags and other stuff.
Thanks a lot for any kind of directions that you might give me!
This does not appear to require full NLP. Simple pattern-based information extraction will probably suffice. The basic idea is to tokenize the text, then recognize/classify certain keywords, and finally recognize patterns/phrases.
In your example, tokenizing gives you "Dentist", "appointment", "monday", "at", "2PM", "in", "WhateverPlace". Your tool will recognize that "monday" is a day of the week, "2PM" is a time, etc. Finally, you can find patterns like [at] [TIME] and [in] [Place] and use those to fill in the fields.
A framework like GATE may help, but even that may be a larger hammer than you really need.
Have a look at NLTK, its a good resource for beginner programmers interested in NLP.
http://www.nltk.org/
It is written in python which is one of the easier programming languages.
Now that I understand your problem, here is my solution:
You can develop a kind of restricted vocabulary, in which all amounts must end witha $ sign or any time must be in form of 00:00 and/or end with AM/PM, regarding detecting items, you can use list of objects from ontology such as Open Cyc. Open Cyc can provide you with list of all objects such beer, coffee, bread and milk etc. this will help you to detect objects in the short phrase. Still it would be a very fuzzy approach.
The QA manager where I work just informed me there is a bug in my desktop app due to the sign-on prompt being "Operator Id" when it should be "Operator ID". Her argument being that "Id" refers to the ego portion of Freud's "psychic apparatus" and is not semantically correct.
Now being an anal engineer (AE) I of course had to go and lookup Id vs ID and from my cursory investigations (google) it seems ID is just as commonly used for Freud's ego as Id is.
So my reasoning would be that Id is a shortened version of "Identifier" and is more correct or at least more commonly used than ID which would typically indicate a two word abbreviation.
I could just change the UI but then I wouldn't be holding up my profession as an AE so I was wondering if there any best practices or references for this sort of thing that I could use to support my argument? Keeping in mind that this question relates to the user interface and not the source code where abbreviations and casing are a whole different branch of philosophy.
According to Merriam-Webster, the abbreviation is "ID". If it were a correct abbreviation, it would have to be "Id." with the period.
Personally, I use "Id". The compiler doesn't care but my eyes do. Compare:
GetIDByWhatever <-- looks terrible
GetIdByWhatever <-- oh so pretty!
Aesthetics is more important than grammar when it comes to code, always. (Update: 4 years later, I don't stand by this statement anymore)
The 'D' doesn't stand for anything, so I've always considered it an abbreviation, not an acronym - and therefore I too use 'Id', not 'ID'.
I don't know about your qa's reasoning - words can have more than one meaning - this is not unusual in English :)
But it looks like the common usage is actually 'ID' (right or wrong :P), which is probably the format your users would expect.
As an UAUA (ultra-anal usability analyst), please use ID instead of Id.
Visually, it's more recognizable in English. Grammatically, "Id" is a word (rhymes with "squid") and the Freudian definition has been given above. We're never verbally asked to show "id", but "ID." I.D. is fine but passe, as the periods imply multiple words.
So.
Just use ID, okay?
OK.
It's interesting that so many feel "Id" should be the way to go.
I feel "ID" is appropriate because it hints at how we pronounce it -- I.D. Also, when I read "Id" in a running sentence, I sometimes have to come back and read it again just to ensure it's not a typo for "is" or "it".
So, as a technical writer, this is an issue that comes up for me quite regularly when reviewing other people's work, whether it be programmers, BAs or other writers. Typically, id refers to ego as others have said before me and the accepted abbreviation for identification is ID, just because plenty of people don't know or understand the rule doesn't mean that they are correct (sorry to be blunt), mind you the rules for punctuation and spelling to a large degree are almost as changeable as fashion!
However, what no-one seems to have asked is, does your company have a standard? At the end of the day if your company has a style guide and they have covered this topic in that guide, you should follow the guide. If it is not covered, then may I suggest that you raise the issue with the person that maintains the guide and include any stakeholders in the conversation. Consistency is key here. If the company you work for doesn't have a style guide, then perhaps it is time to start one!
Hope this helps...
The QA manager's line of reasoning is silly. Lots of English words have multiple meanings. "Lead", "lead", "lead" (metal, be at the front of, or a connector).
I would just try to be consistent with the capitalization used elsewhere in the app.
User interface and code are very different beasts...
"ID" is the correct answer for a user interface.
In code, consistency is your friend. Whether you like it or not, go with what is already there elsewhere in the code. If it's not there, then read up and make a decision, or get with the team and work out a way to go that everyone can agree to. Consistency makes life so much easier.
I prefer Id because when used with other 2-letter text, it doesn't become a single all-caps word
Photovoltaics systems ... PVID (one word or 2?) PvId (much more clear).
ID = Idaho! Id = Freud! Let the OCD begin!
There is a little OcD in all of us!
Anyway, the google style guide says this:
"ID: Not Id or id, except in string literals or enums. In some contexts, best to spell out as identifier or identification."
I'm going with that.
Microsoft is more vague, from what I could find.
as a short version of Identifier, I would use Id. Also ID it's freaky when you have functions like
getUserIDByName()
Multiple capitals in domain terms are quite problematic with CamelCase, as they can produce ambiguities and therefore dishomogeneity in your interfaces and namings
How would you say it if you were reading out loud? I'd pronounce the two letters. ID is correct, analogous with similar abbreviations such as TV. (No dots, please, as the letters don't stand for anything.)
When I'm dealing with abbreviations like this, I like to format them in small block capitals, but that's just a personal taste. Capitals, anyway.
(But I probably would continue to use Id in the code itself.)
I think its depend on the way we spell. We don't spell "it", but "ai-di". Id-ID is spell by two sounds, so people make the D in cap to avoid thinking id is a "word". Its more like a character symbol. I like the "ID" more, just because it's nicer.
ID is correct. I have been in the Id camp for years, simply because I believed it was an argument between abbreviations and acronyms. Until one day I learned of a 3rd type (subtype really) called Initialisms which are specific to abbreviations or acronyms that are specifically read one letter at a time rather than pronounced as a single word. Initialisms are all caps.