How to make Searchkick/Elasticsearch where clause case insensitive? - ruby

I'm using ElasticSearch with Ruby (Searchkick). At the moment, by default the where filter is a case sensitive.
I'm using ElasticSearch with my EquityContract model, so once I search for Food I get different results to search for food.
[21] pry(main)> EquityContract.search('*', {:load=>false, :where=>{:industry=>"FOOD"}, limit:1})
effective_url=http://127.0.0.1:9200/equity_contracts_development/_search response_code=200 return_code=ok total_time=0.002279
EquityContract Search (5.9ms) curl http://127.0.0.1:9200/equity_contracts_development/_search?pretty -d '{"query":{"filtered":{"query":{"match_all":{}},"filter":{"and":[{"term":{"industry":"FOOD"}}]}}},"size":1,"from":0}'
=> #<Searchkick::Results:0x0000010b65c5c8
#klass=
EquityContract(id: integer, ticker: text, name: string, country: string, currency: string, instrument: string),
#options=
{:page=>1,
:per_page=>1,
:padding=>0,
:load=>false,
:includes=>nil,
:json=>false,
:match_suffix=>"analyzed",
:highlighted_fields=>[]},
#response=
{"took"=>1,
"timed_out"=>false,
"_shards"=>{"total"=>5, "successful"=>5, "failed"=>0},
"hits"=>{"total"=>0, "max_score"=>nil, "hits"=>[]}}>
While I get some results when I do the same with Food:
[23] pry(main)> EquityContract.search('*', {:load=>false, :where=>{:industry=>"Food"}, limit:1})
ETHON: performed EASY effective_url=http://127.0.0.1:9200/equity_contracts_development/_search response_code=200 return_code=ok total_time=0.002795
EquityContract Search (7.5ms) curl http://127.0.0.1:9200/equity_contracts_development/_search?pretty -d '{"query":{"filtered":{"query":{"match_all":{}},"filter":{"and":[{"term":{"industry":"Food"}}]}}},"size":1,"from":0}'
=> #<Searchkick::Results:0x000001112d1880
#klass=
EquityContract(id: integer, ticker: text, name: string, country: string, currency: string, instrument: string),
#options=
{:page=>1,
:per_page=>1,
:padding=>0,
:load=>false,
:includes=>nil,
:json=>false,
:match_suffix=>"analyzed",
:highlighted_fields=>[]},
#response=
{"took"=>1,
"timed_out"=>false,
"_shards"=>{"total"=>5, "successful"=>5, "failed"=>0},
"hits"=>
{"total"=>73,
"max_score"=>1.0,
"hits"=>
[{"_index"=>"equity_contracts_development_20160320195353552",
"_type"=>"equity_contract",
"_id"=>"1181",
"_score"=>1.0,
"_source"=>
{"name"=>"Some name",
"ticker"=>"some ticker",
"country"=>"SA",
How can I change this to make it more case-insensitive so I'll het the same results for both?

I saw Searchkiq generates term query, but what you need is a full-text query. I'm not familiar with Searchkiq so I can't tell you how.
According to the documentation on ElasticSearch official website,
Term query
Queries like the term or fuzz queries are low-level queries that have no analysis phase. They operate on a single term. A term query for the term Foo looks for that exact term in the inverted index and calculates the TF/IDF relevance _score for each document that contains the term.
Full-text query
Queries like the match or query_string queries are high-level queries that understand the mapping of a field ... If you query a full-text (analyzed) field, they will first pass the query string through the appropriate analyzer to produce the list of terms to be queried.

Related

How to prevent slow match / match_phrase queries for keywords in Kibana?

How can I achieve that a match query for certain fields is equivalent to a term query?
I have a larger index in Elastic covering events. Each event has an eventid field consisting of a random hex string (e.g. f4fc38c993c1a8273f9c40eedc9050b7) as well as some other fields. The eventid is indexed as keyword in Elastic.
If I query based on this field in Kibana, the query often runs into timeouts, because Kibana automatically generates a match query for eventid:f4fc38c993c1a8273f9c40eedc9050b7.
If I set a manual filter using { "query": { "term": { "eventid": "f4fc38c993c1a8273f9c40eedc9050b7" } } } (so a term instead of match query) I get a response quite quickly.
From my understanding, these should be pretty much equivalent, as keyword fields aren't analyzed, so the match query should be equivalent to a term query.
What am I missing?

How can I find the true score from Elasticsearch query string with a wildcard?

My ElasticSearch 2.x NEST query string search contains a wildcard:
Using NEST in C#:
var results = _client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq.Query("Micro*")))
.From(pageNumber)
.Size(pageSize));
Comes up with something like this:
$ curl -XGET 'http://localhost:9200/_all/_search?q=Micro*'
This code was derived from the ElasticSearch page on using Co-variants. The results are co-variant; they are of mixed type coming from multiple indices. The problem I am having is that all of the hits come back with a score of 1.
This is regardless of type or boosting. Can I boost by type or, alternatively, is there a way to reveal or "explain" the search result so I can order by score?
Multi term queries like wildcard query are given a constant score equal to the boosting by default. You can change this behaviour using .Rewrite().
var results = client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq
.Query("Micro*")
.Rewrite(RewriteMultiTerm.ScoringBoolean)
)
)
.From(pageNumber)
.Size(pageSize)
);
With RewriteMultiTerm.ScoringBoolean, the rewrite method first translates each term into a should clause in a bool query and keeps the scores as computed by the query.
Note that this can be CPU intensive and there is a default limit of 1024 bool query clauses that can be easily hit for a large document corpus; running your query on the complete StackOverflow data set (questions, answers and users) for example, hits the clause limit for questions. You may want to analyze some text with an analyzer that uses an edgengram token filter.
Wildcard searches will always return a score of 1.
You can boost by a particular type. See this:
How to boost index type in elasticsearch?

How to find all documents with specific string in field?(Elasticsearch)

I have a document with fields:
"provider": "AppStore",
"device_model": "iPad3,6[graphicsDeviceName: PowerVR SGX 554]",
"days_in_game": 34,
And I need to get all documents with iPad string in device_model!
Is it possible?
There are two types of search queries in Elasticsearch ie. term queries and match queries. The match first analyzes the query string, then looks for documents containing the words in the query and returns result depending upon how closely it matches.
What the term query does is basically a yes or no query and will return only the documents that have an exact match.
I think for your case a term query is better fit. And since field does not contain the exact word iPad but something like iPad3 you should use a prefix, wildcard or possibly a regexp query depending upon what your document actually contain(take a look at this)
You could use the following query:
{
"query": {
"prefix": {
"device_model": "iPad"
}
}

Querystring query without analysis on es?

I have a field "animal" that is not_analyzed. Is a document with "animal": "fox"
searchable with a querystring query if a user passes in "fo" as the querystring? Or would the user have to pass in "fox" in order to match that document?
"fo" won't match "fox" if you're using not_analyzed.
There are three types of index mappings in elasticsearch:
analyzed analyses your text and then indexes it. ("fo" matches "fox")
not_analyzed indexes your text (makes it searchable) exactly as it is. ("fo" doesn't match "fox", only "fox" does)
no the field is not searchable.
reference
If you search for fo* it will work , else you need to search for the exact term.

ElasticSearch: Matching multiple queries

I am using Tire (ElasticSearch Ruby gem), and want to match a few fields on the keyword "community marketing". However, I also want ElasticSearch to return me results for the keyword "communities marketing" as well. The standard analyzer does not parse/tokenize "communities" as "community" so they're separate keywords.
How do I get ElasticSearch to return me results for both "community marketing" and "communities marketing"? I prefer to do this in query time, rather than index time. I'm fine with ElasticSearch standard analyzer and prefer not to mess around with it.
fields = ["title", "popular_hash_tags"]
keyword = "communities marketing"
keyword2 = "community marketing"
s = Tire.search "articles" do
query do
match fields, keyword, :operator => "AND"
#NOW I also want to match keyword2??
end
end
I suggest digging through the query DSL of Elasticsearch. You will find a lot of interesting stuff.
For instance, the "should" clause of a bool filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

Resources