I am trying to get an entity that contains dashes, it's a product id. Even though I have trained LUIS with samples that have dashes it only returns the characters up to the first dash. For example ABC123-100 returns ABC123. How can I get LUIS to recognize the whole ID?
You can add a "pattern Feature" and define the entity patterns that you need to identify as the entity. That'll solve your problem
Related
I have a scenario where i want to search for 'bank of india' and documents retrieved have hits for 'reserve bank of india', 'state bank of india', etc. Basically the search string named entity is part of another named entity as well.
What are the ways to avoid it in elasticsearch?
If you use keyword type instead of text as the mapping for your entity field you will no longer have those partial matches. keyword says treat this text like a single unit (named entities are like this), while text says treat each word as a unit and consider the field as a bag of words, So the query looks for the most word matches, regardless of order or if all of the words are there. There are different queries that can get at that requiring order (match_phrase) and requiring all words to be matches (minimum_should_match parameter), but I like to use the term query if you follow the keyword mapping strategy. Does that make sense?
Using Laravel TNTSearch how can I define different cases for keywords to end up with the same result on query.
Having the following keyword Softwareentwickler I want to be able to get the same result on Software-Entwickler keyword as well.
Is there any workaround for this case?
The technique you want is called query expansion. What you have to do is simply replace the query with the one you want and perform the search.
Take a look at this
Here, when someone searches for Russia it will replace the query with russian federation
I am using Talend to do a fuzzy match, I want to do a fuzzy match between two columns of the same table. But tFuzzyMatch will get Match Column from one input and Look up from another input.
What I need is that I want get Match column and lookup from single source.
I am using Talend 5.5.1
You can perform in-line fuzzy matching by using the tMatchGroup component. This will search for groups of matching/potentially matching records inside the single flow.
The tMatchGroup also allows for the output to be split by match likelihood that is specified by thresholds, giving an output for confident matches, suspect matches and rows that are unique.
The tMatchGroup component also has a graphical wizard showing expected match groups from a sample taken from the input data which can be useful when attempting to tweak the matching algorithms and parameters.
For example, a very basic job using the tMatchGroup component may look like this:
Where the tMatchGroup is configured so that a match group must have an exact age but will apply Jaro-Winkler to the name columns:
You can see in the above screenshot that the tMatchGroup has a match group containing 2 records, one with the name "Tom" and the other with the name "Thom" and they both have the exact same age.
This can be achieved by creating a duplicate input source, one used as the main source while the other used as a lookup, the rest is the same configuration as you would set otherwise.
In the tFuzzyMatch settings, you can compare the two columns within the same input (which is not tricked by using two input components that point to the same source) by choosing the Lookup and Matching Column settings inside the tFuzzyMatch components to point to the two columns you need to compare, one from each source (which logically are from the same source).
Hope it helps.
For a phrase search, we want to bring up results only if there's an exact match (without ignoring stopwords). If it's a non-phrase search, we are fine displaying results even if the root form of the word matches etc.
We currently pass our data through standardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter. Due to this when user wants to search for "password management", search brings up results containing "password manager".
If I remove StemFilter, then I will not be able to match for the root form of the word for non-phrase queries. I was thinking if I should index the same data as part of two fields in document.
I have asked same question at Different indexing and search strategies on same field without doubling index size?. However folks at office are not happy about indexing the same data as part of two fields. (we currently have around 20 text fields in lucene document). Is there any way to support both the cases I listed above using TokenFilters?
Say, for a StopFilter, make changes so that it emits both the input token and ? (for ignored word) with same position increments. Similarly for StemFilter, it emits both the input token and stemmed token with same position increments. Basically input and output tokens (even ignored ones) have same positions.
Is it safe to go ahead with this approach? Has anyone else faced the requirements listed here? Are there any Filters readily available which do something similar to what I mentioned in my approach?
Thanks
I don't understand what you mean by "input and output tokens." Are you storing the data twice - once as stemmed and once non-stemmed?
If you aren't storing it twice, I don't think your method will work. Suppose the stored word is jumping and they search for jumped. Your query parser can emit jump and jumped but it still won't match jumping unless you have a value stored as jump.
And if you're going to store the value once as stemmed and once as non-stemmed, then why not just store it in two fields? Then you won't have to deal with weird tokenizer changes.
it is too simple to describe:
q=mydynamicfield_txt:"video"
I want only hits when mydynamicfield is exact "video.
Other way round, how to supress hits, where "video" is only part of the field (like "home video").
Is this supported with Solr3.1 out of the box, or do I have to add my own special brackets like "SOLRSTARTSOLR video SOLRENDSOLR" in my index, to retrieve later my term between "START" and "END". Kind of manual regex anchoring.
This is PITA cause it needs special handling in index/gui and breaks highlighting.
Where is the way to go?
regards
Peter
(=PA=)
One solution to create untokenized(KeywordAnalyzed) field and search within it - all your text will be distinct token in Solr index.
Other solution is to write filter which will read token count from index and compare to query tokens i.e. filter entities where doc_tokens > query_tokens assuming that all query tokens are matched.