How can I retrieve codings grouped by code categories in the RQDA package in Rstudio? - rstudio

Here is an example of the data I am using:
"Q7: How does gender income inequality manifest in the communities in which you live and/or work? What do you believe is needed to help close the wealth gap between men and women as well as among women of different races in the county?
Wouldn’t go into pay equity because that would get dismissed. The wealth gap is a more compelling argument.
Equity is more emotional. And wealth gap is more numbers. It goes toward the same thing though."
I am running Rstudio and using library(RQDA) and library(tidyverse).
I am trying to analyze several qualitative interviews formatted in question/answer form as in the provided example. I finished the coding process and now I'm trying to find themes. While coding, I created code categories that correspond with each interview question with the hopes that I would be able to pull out all the codings per code category now. Unfortunately, I cannot figure out how to do it and would appreciate some assistance!
thanks

I am not sure about this but I understood code categories to be helpful for structuring your work by your theoretical perspective. If you create a code category for each interview question (i.e., the topic of the question is your theme/code category), you may have various codes belonging to one "code category", which might not have that much in common. Alternatively, you could create cases (case 1 might be the answers to the first interview question, etc.): "b) Open a file, select part of the file, select a case name, then click button "Link" in "Cases" tab, you can thus link the selected part of file to the selected case" (http://rqda.r-forge.r-project.org/documentation_2.html).

Related

text mining/analyse user commands/questions algorithm or library

I got a financial application and I wish to add to it the ability to get user command or input in textbox and then take the right action. for example, wish the user to write "show the revenue in the last 10 days" and it'll show the revenue to him/her - the point is that I wish it to really understand the meaning of the question, so the previus statement will bring the same results as "do I got any revenue in the last 10 days" or something like that - BI (something like the Wolfram|Alpha engine).
I wonder if there's any opensource library or algorithm books or whatever that I can use to learn the subject. Regards to opensource libraries - I don't mind which language it'll be written in.
I've read about this subject and saw many engines and services (OpenNLP, Apache UIMA, CoreNLP etc.) but did not figure out if they're right for my needs.
Any answer or suggestion is welcome.
Many thanks!
The field you're talking about is usually called "natural language processing". It's hard, and an active field of research. There are various libraries which you could consider based on your preferred programming language and use case:
http://en.wikipedia.org/wiki/List_of_natural_language_processing_toolkits
I've used NLTK a little bit. This field is seriously difficult to get right, so you might want to try to restrict your application to some small set of verbs and nouns such that people are using a controlled vocabulary in the first instance, and then try to extend it beyond that.

How can I do "related tags"?

I have tags on my website, and I input them one by one when I create a blog post. I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.
I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database ... thoughts ?
It all boils down to create associations between certain characteristics of your posts and certain tags, and then - when you press the "publish" button - to analyse the new post and propose all tags matched with your post characteristics.
This can be done in several ways from a "totally hard-coded" association to some sort of "learning AI"... and everything in-between.
Hard-coded solutions
This are the simplest algorithms to implement. You should first decide what characteristics of your post are relevant for tagging (e.g.: it's length if you tag them "short" or "long", the presence of photos or videos if you tag them "multimedia-content", etc...). The most obvious is however to focus on which words are used in posts. For example you could build a mapping like this:
tag_hint_words = {'code-development' : ['programming',
'language', 'python', 'function',
'object', 'method'],
'family' : ['Theresa', 'kids',
'uncle Ben', 'holidays']}
Then you would check your post for the presence of the words in the list (the code between [ and ] ) and propose the tag (the word before :) as a possible candidate.
A common approach is to give "scores", or in other word to put a number that indicates the probability a given tag is the right one. For example: if your post would contain the sentence...
After months of programming, we finally left for the summer holidays at uncle Ben's cottage. Theresa and the kids were ecstatic!
...despite the presence of the word "programming" the program should indicate family as the most likely tag to use, as there are many more words hinting.
Learning AI's
One of the obvious limitations of the above method is that - say one day you pick up java beside python - you would probably need to change your code and include words like "java" or "oracle" too. The same applies if you create new tags.
To circumvent this limitation (and have some fun!!) you could try to implement a learning algorithm. Learning algorithms are those who refine their outcome the more you use them (so they indeed... learn!). Some algorithm requires initial training (many spam filters and voice recognition programs need this initial "primer"). Some don't.
I am absolutely no expert on the subject, but two common AI's are: the Naive Bayes Classifier and some flavour of Neural network.
Although the WP pages might look scary, they are surprisingly easy to implement (at least in Python). Here's the recording of a lecture at PyCon 2009 on the subject "Easy AI with Python". I found it very informative and even somehow inspiring! :)
HTH!
You should have a look at this post :
Any suggestions for a db schema for storing related keywords?
If you're looking for a schema for storing related tags it will help.
Relevancy searches where multiple agents play a part are usually done using Collaborative filtering. You might want to give that a look see.
Look up Clustering (Machine Learning algorithm). Don't be intimidated by math, it's a pretty straightforward algorithm. Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.

What's needed for NLP?

assuming that I know nothing about everything and that I'm starting in programming TODAY what do you say would be necessary for me to learn in order to start working with Natural Language Processing?
I've been struggling with some string parsing methods but so far it is just annoying me and making me create ugly code. I'm looking for some fresh new ideas on how to create a Remember The Milk API like to parse user's input in order to provide an input form for fast data entry that are not based on fields but in simple one line phrases instead.
EDIT: RTM is todo list system. So in order to enter a task you don't need to type in each field to fill values (task name, due date, location, etc). You can simply type in a phrase like "Dentist appointment monday at 2PM in WhateverPlace" and it will parse it and fill all fields for you.
I don't have any kind of technical constraints since it's going to be a personal project but I'm more familiar with .NET world. Actually, I'm not sure this is a matter of language but if it's necessary I'm more than willing to learn a new language to do it.
My project is related to personal finances so the phrases are more like "Spent 10USD on Coffee last night with my girlfriend" and it would fill location, amount of $$$, tags and other stuff.
Thanks a lot for any kind of directions that you might give me!
This does not appear to require full NLP. Simple pattern-based information extraction will probably suffice. The basic idea is to tokenize the text, then recognize/classify certain keywords, and finally recognize patterns/phrases.
In your example, tokenizing gives you "Dentist", "appointment", "monday", "at", "2PM", "in", "WhateverPlace". Your tool will recognize that "monday" is a day of the week, "2PM" is a time, etc. Finally, you can find patterns like [at] [TIME] and [in] [Place] and use those to fill in the fields.
A framework like GATE may help, but even that may be a larger hammer than you really need.
Have a look at NLTK, its a good resource for beginner programmers interested in NLP.
http://www.nltk.org/
It is written in python which is one of the easier programming languages.
Now that I understand your problem, here is my solution:
You can develop a kind of restricted vocabulary, in which all amounts must end witha $ sign or any time must be in form of 00:00 and/or end with AM/PM, regarding detecting items, you can use list of objects from ontology such as Open Cyc. Open Cyc can provide you with list of all objects such beer, coffee, bread and milk etc. this will help you to detect objects in the short phrase. Still it would be a very fuzzy approach.

Tag/Keyword based recommendation

I am wondering what algorithm would be clever to use for a tag driven e-commerce enviroment:
Each item has several tags. IE:
Item name: "Metallica - Black Album CD", Tags: "metallica", "black-album", "rock", "music"
Each user has several tags and friends(other users) bound to
them. IE:
Username: "testguy", Interests: "python", "rock", "metal", "computer-science"
Friends: "testguy2", "testguy3"
I need to generate recommendations to such users by checking their interest tags and generating recommendations in a sophisticated way.
Ideas:
A Hybrid recommendation algorithm can be used as each user has friends.(mixture of collaborative + context based recommendations).
Maybe using user tags, similar users (peers) can be found to generate recommendations.
Maybe directly matching tags between users and items via tags.
Any suggestion is welcome. Any python based library is also welcome as I will be doing this experimental engine on python language.
1) Weight your tags.
Tags fall into several groups of interest:
My tags that none of my friends share
Tags a number of my friends share, but I don't
My tags that are shared by a number of my friends.
(sometimes you may want to consider friend-of-a-friend tags too, but in my experience the effort hasn't been worth it. YMMV.)
Identify all tags that the person and/or the person's friends have in interests, and attach a weight to the tags for this individual. One simple possible formula for tag weight is
(tag_is_in_my_list) * 2 + (friends_with_tag)/(number_of_friends)
Note the magic number 2, which makes your own opinion worth twice as much as that of all of your friends put together. Feel free to tweak :-)
2) Weight your items
For each item that has any of the tags in your list, just add up all of the weighted values of the tags. A higher value = more interest.
3) Apply a threshold.
The simplest way is to show the user the top n results.
More sophisticated systems also apply anti-tags (i.e. topics of non-interest) and do many other things, but I have found this simple formula effective and quick.
If you can, track down a copy of O'Reilly's Programming Collective Intelligence, by Toby Segaran. There's a model solution in it for exactly this problem (with a whole bunch of really, really good other stuff).
Your problem is similar to product recommendation engines, such as Amazon's well publicized site. These use a learning algorithm called association rules, which basically build a conditional probability of user X buying product Y based on common features Z between the user and product. A lot of open source toolkits implement association rules, such as Orange and Weka.
You can use the Python Semantic module for Drools to specify your rules in python scripting language. You can accomplish this easily using Drools. It is a terrific rules engine that we used to solve several recommendation engines.
I would use a Restricted Boltzmann Machine. Gets around the problem of similar but not identical tags quite neatly.

How should centenarian date of birth fields be handled?

"A centenarian is a person who has attained the age of 100 years or more." - Wikipedia
There are several ways to prompt a user for Date of Birth, but let's say we've chosen the drop down method.
How would you handle the oldest selectable date? Do you pick an arbitrary year (such as 1875) and populate to present?
Or, do you consult some resource for a record breaking age (Jeanne Calment, age 122), add a couple of years, and populate backwards?
Why not a text input box where they type in "1899" or whatever? When it is received you can validate that it is a legitimate number based on whatever criteria you use. I get annoyed by listboxes to select year of birth, because listboxes should not have that many values in them.
Rereading the question, you are assuming that listbox is the only option. In that case, 130 years ago seems like a good enough cutoff to me. If you're worried that the next world-record breaker will happen to be using your system, why not go with something like 200 years ago. Although I'd still say you should just use a text box.
As you're going to have to perform server-side validate the input regardless of the control used, why not use a standard text input?
`<input type="text" maxlength="4"/>
This:
Eliminates the centenarian problem.
Easier to use, especially for older individuals (Assuming the list is in descending order).
Smaller page size (don't need to include 100+ <option>....</option> tags.
If you must use a <select> box, I agree with your Wikipedia methodology. Ceiling the age of the oldest recorded person (126 -> 130 or 140) would be fairly risk-free.
It depends I think. Are you creating an application for young people?
If the application needs to be accessible for everyone, just create a configurable check and update it now and then. Look at http://en.wikipedia.org/wiki/Oldest_people#Ten_oldest_people_currently_living for the oldest possible date.
A date picker is preferable, don't use irritating listboxes with 100+ values.
My opinion on data entry like this is to cater to the user, not the programmer.
While, yes, a drop-down causes the least error handling, it's also tedious for the users.
Go for text entry that you need to validate. More code is needed in the back [allowing for both 2 digit years and 4 digit], but it's an easier experience for the user
If you want to stick to drop downs, just pick an old enough date. Personally I think drop downs are bad both for the user and you. I find selecting my birthday from a list of hundred numbers annoying (even though firefox lets me select the date by typing it). But also I think a selection makes people tend to input fake years more than if they had to type it in.
If you can live with Javascript, a Combobox might be the best of both worlds. You can list 100 years in the list, and let older people type. This only has minor ethical problem, i.e. you purposefully make elderly people type =)

Resources