Parser to parse search terms and extract valuable information [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat".
Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?

The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.
Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.
The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.

You should write such linguistic rules in grammars such as GATE and http://code.google.com/p/graph-expression/.
Examples:
Token+ in (LocationLookup).

Not too sure but two approaches as per my experience with parsing -
Define a grammar which can parse the expression and collect values / parameters. You might want to come up with a dictionary of keywords using which you can then deduce the the type of search.
Be strict when defining your grammar so that the expression itself tells you about the type of search.
eg LOC: A in B , VALUE $ to Euro. etc.
For parser see ANTLR / jcup & jflex.

Related

Substrings in SCPI Message Mapping

I'm new to SAP SCPI Development and been learning about it for weeks now, and today my exercise is all about message mapping and becoming familiar with the message mapping environment. However, I am stuck figuring out how use the "substring" expression during my development.
The requirement is to turn emails from "abcd.12345#company.com" to just "abcd.12345". I would have to remove the "#" and the domain.
I just asked this question to my instructor but I am having trouble how to do it. He mentioned to use the "substring" expression to shorten the email. But my problem is that there are too many records on my source .xsd file and the length of emails vary.
I am unsure on what to input on the message mapping and making a custom groovyscript function so I came to Stack Overflow to ask for help.
Message Mapping Substring
I am open to read and understand your takes on this. Many thanks!

Is it correct to say that any question having the term ”Softmax function” is a duplicate copy?

I would like you to leave your comments on: If it is right to say that any question having the term ‘softmax function’ is a duplicate copy to the other questions having the term ‘softmax function’ on it?
I think that it is wrong to say that any question with the term “Softmax function” is a duplicate copy question. Why? Because from the Cambridge Dictionary, the meaning of the term ”duplicate copy” is define as “been an exact copy of something or something that is an exact copy of something else”.
Diversity of opinions, questions and answers on the term ”softmax function” should be well come and should not be called duplicate copy. But, instead should be called related, and grouped under related questions.
For more information on how to apply softmax in machine learning, please click the link below:
How to use softmax output in python for neural-network and machine-learning to interpret Multinomial Logit Model?

Is it wrong to use routes versus query string parameters? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a Web API controller with two actions. One action returns a list of all entities from the database. The second action takes a query string parameter and filters the entities.
The search action is wired up to use the query string parameter. It works but we encountered an issue with a doc tool not working since the action signatures are the same (the doc tool does not take into account the query string).
Is it wrong to move the search parameter from the query string to an attribute in the route? Is this an acceptable approach? Will is cause problems I'm not thinking about?
Currently, this is a URL I use:
domain.com/api/entities?q=xyz
I'm considering moving to a route-based approach:
domain.com/api/entities/xyz
If you are implementing a search feature, or other type of feature where you need to have multiple optional parameters, it is better to use query string parameters. You can supply all of them, some of them, or none of them and put them in any order, and it will just work.
// Anything Goes
/controller/action?a=123&b=456&c=789
/controller/action?c=789&a=123&b=456
/controller/action?b=456&c=789
/controller/action?c=789
/controller/action
On the other hand, if you use URL paths and routing, a parameter can only be optional if there is not another parameter to the right of it.
// OK
/controller/action/123/456/789
/controller/action/123/456
/controller/action/123
// Not OK
/controller/action/123/456/789
/controller/action/456/789
/controller/action/789
It is possible by customizing routing to be able to pass optional values in any order, but it seems like a long way to go when query strings are a natural fit for this scenario.
Another factor to consider is whether the values being put into the URL have unsafe characters in them that need to be encoded. It is poor form and sometimes not feasible to encode the path of the URL, but the rules of what types of encoded characters that can be put into a query string are more relaxed. Since URLs don't allow spaces, it is a better fit for a multiple word text search field to be encoded with the space in the query string (preserving it as is) rather than trying to find a solution to swapping out the space with a - to put into the query string and then having to change it back to a space on the server side when running the query.
search = "foo bar"
// Spaces OK
/controller/action?search=foo%20bar (works fine and server is able to interpret)
// Spaces Not OK
/controller/action/foo bar/ (not possible)
/controller/action/foo%20bar/ (may work, but a questionable design choice)
/controller/action/foo-bar/ (may work, but requires conversion on the server)
Finally, another choice worth considering is using POST instead of GET, as that would mean these values wouldn't need to be in the URL at all.

Is there a best practice to documenting a Command Line Interface? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have designed a few programs that have a CLI and want to document them as standard as possible. Are there any agreements out there as to the best way to do this?
An example:
Let's say the Program is "sayHello" and it takes in a few parameters: name and message. So a standard call would look like this:
> sayHello "Bob" "You look great"
Okay, so my command usage would look something like this:
sayHello [name] [message]
That may already be a mistake if brackets have a specific meaning in usage commands. But let's go a step farther and say "message" is optional:
sayHello [name] [message (optional)]
And then just one more wrinkle, what if there is a default we want to denote:
sayHello [name] [message (optional: default 'you look good')]
I realize this usage statement looks a little obtuse at this point. I'm really asking if there are somewhat agreed-upon standards on how to write these. I have a sneaking suspicion that the parenthesis and brackets all have specific meanings.
While I am unaware of any official standard, there are some efforts to provide conventions-by-framework. Docopt is one such framework, and may suit your needs here. In their own words:
docopt helps you:
define interface for your command-line app, and
automatically generate parser for it.
There are implementations for many programming languages, including shell.
You might want to look at the manuals for common Unix commands (e.g. man grep) or the help documentation for Windows commands (e.g. find /?) and using them as a general guide. If you picked either of those patterns (or used some elements common to both), you'd at least surprise the fewest number of people.
Apache commons also has some classes in the commons-cli package that will print usage information for your particular set of command-line options.
Options options = new Options();
options.addOption(OptionBuilder.withLongOpt("file")
.withDescription("The file to be processed")
.hasArg()
.withArgName("FILE")
.isRequired()
.create('f'));
options.addOption(OptionBuilder.withLongOpt("version")
.withDescription("Print the version of the application")
.create('v'));
options.addOption(OptionBuilder.withLongOpt("help").create('h'));
String header = "Do something useful with an input file\n\n";
String footer = "\nPlease report issues at http://example.com/issues";
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("myapp", header, options, footer, true);
Using the above will generate help output that looks like:
usage: myapp -f [-h] [-v]
Do something useful with an input file
-f,--file <FILE> The file to be processed
-h,--help
-v,--version Print the version of the application
Please report issues at http://example.com/issues

Searching a datastore for related topics by keyword

For example, how does StackOverflow decide other questions are similar?
When I typed in the question above and then tabbed to this memo control I saw a list of existing questions which might be the same as the one I am asking.
What technique is used to find similar questions?
I got an email from team#stackoverflow.com on Mar 20 that mentions how it works:
the "ask a question" search is
exclusively on title and will not
match anything in the body. It is a
mystery to me why people think it's
better.
The last sentence refers to the search bar, which I've found is less useful when I'm trying to find a specific question I've already seen.
I think it's plain old word matching. However, I might add that this feature does not work as well as I would like it to. It's much better to do google search with site:stackoverflow.com prefix than to rely on SO to provide the relevant suggestions.
Poorly -- using MS SQL Full Text Search, I believe. You'll have better luck using Lucene, IMO. For more background on the topic see the Wikipedia article on Lucene or the general topic of information retrieval.
The matching program would store an index of all questions. When you ask a question, all keywords in your question are matched against the index. This is similar to Google Search. Lucene open source search can be (and with high probability has been) used for this. Since the results are not quite accurate, I presume they index just the headlines of the questions, as an approximation.
The other related keyword is collaborative filtering, the algorithm popularized by Amazon to recommend products based on behavior of other similar customers. In the current case, an alternative algorithm based on collaborative filtering is: keywords are extracted from the question, then tags associated (in the history) with the keywords are found. Questions which have those tags are returned. Well, experiments are needed to see whether it works well at all.

Resources