Elasticsearch ignore words breakers - elasticsearch

i'm new to Elasticsearch and i've got a problem regarding querying.
I indexed strings like that:
my-super-string
my-other-string
my-little-string
This strings are slugs.
So, they are no spaces, only alphanumeric characters. Mapping for the related field is only "type=string".
I'm using a query like this:
{ "query":{ "query_string":{ "query": "*"+<MY_QUERY>+"*", "rewrite": "top_terms_10" } }}
Where "MY_QUERY" is also a slug. Something like "my-super" for example.
When searching for "my" i get results.
When searching for "my-super" i get no results and i'd like to have "my-super-string".
Can someone help me on this? Thanks!

I would suggest using match_phrase instead of using query string with leading and trailing wildcards. Even standard analyzer should be able to split slug into tokens correctly, so there is not need for wildcards.
curl -XPUT "localhost:9200/slugs/doc/1" -d '{"slug": "my-super-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/2" -d '{"slug": "my-other-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/3" -d '{"slug": "my-little-string"}'
echo
curl -XPOST "localhost:9200/slugs/_refresh"
echo
echo "Searching for my"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my"} } }'
echo
echo "Searching for my-super"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-super"} } }'
echo
echo "Searching for my-other"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-other"} } }'
echo
echo "Searching for string"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "string"} } }'
Alternatively, you can create your own analyzer that will split slugs into tokens only on "-"
curl -XDELETE localhost:9200/slugs
curl -XPUT localhost:9200/slugs -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer" : {
"slug_analyzer" : {
"tokenizer": "slug_tokenizer",
"filter" : ["lowercase"]
}
},
"tokenizer" :{
"slug_tokenizer" : {
"type": "pattern",
"pattern": "-"
}
}
}
}
},
"mappings" :{
"doc" : {
"properties" : {
"slug" : {"type": "string", "analyzer" : "slug_analyzer"}
}
}
}
}'

Related

Elasticsearch bash script not working but if I copy and paste in the terminal it works

I have creating a bash script to test my index, however this bash script it gives different results than if I copy directly the CURL command in my terminal . I retrieve different result than if I launch my code with sh [file-name] What can be happening?
#!/bin/sh
alias curl="curl -s"
echo "Delete index"
curl -X DELETE "localhost:9200/products?pretty"
echo
echo "Create index"
curl -X POST "localhost:9200/products?pretty" -d '
{
"products" : {
"settings" : {
"index" : {
"analysis" : {
"filter" : {
"my_synonym" : {
"ignore_case" : "true",
"expand" : "true",
"type" : "synonym",
"synonyms" : [ "pote, foundation"] }
},
"analyzer" : {
"folding_analyzer" : {
"filter" : [ "standard", "lowercase", "asciifolding", "my_synonym" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings" : {
"product" : {
"dynamic" : "false",
"properties" : {
"brand_name" : {
"type" : "string",
"index_options" : "offsets",
"analyzer" : "folding_analyzer"
},
"product_name" : {
"type" : "string",
"index_options" : "offsets",
"analyzer" : "folding_analyzer"
}
}
}
}
}
}
'
echo
echo "Info index"
curl -XGET 'localhost:9200/products/_settings,_mappings?pretty'
echo
echo "Test doc:"
curl -X POST "localhost:9200/products/product/1?pretty" -d '{
"product_name": "Foundation Brush",
"brand_name": "Bobbi Brown"
}'
echo
echo "Test doc:"
curl -X POST "localhost:9200/products/product/2?pretty" -d '{
"product_name": "Foundation Primer",
"brand_name": "Laura Mercier"
}'
echo
echo "Test doc:"
curl -X POST "localhost:9200/products/product/3?pretty" -d '{
"product_name": "Lock-It Tattoo Foundation",
"brand_name": "Kat Von D"
}'
echo
echo "Test doc:"
curl -X POST "localhost:9200/products/product/4?pretty" -d '{
"product_name": "Diorskin Airflash Spray Foundation",
"brand_name": "Dior"
}'
echo
echo "Test doc:"
curl -X POST "localhost:9200/products/product/5?pretty" -d '{
"product_name": "Diorskin Airflash Spray Lancôme",
"brand_name": "Dior"
}'
echo
echo "Info index"
curl -XGET 'localhost:9200/products/_settings,_mappings?pretty'
echo
echo "Search all"
curl -X GET "localhost:9200/products/_search?pretty" -d '{
"query": {
"match_all": {}
}
}'
echo
Indexed documents in elasticsearch are not immediately available for searching. They only show up in search after refresh operation takes place. By default this operation occurs automatically every 1sec. So when you paste commands one by one it occurs while you are getting settings and mappings.
When you run bash script there is simply no time for refresh to take place. So, you need to add explicit refresh yourself after the last index command:
curl -XPOST 'http://localhost:9200/products/_refresh?pretty'

How to perform wildcard search on a date field?

I've a field containing values like 2011-10-20 with the mapping :
"joiningDate": { "type": "date", "format": "dateOptionalTime" }
The following query ends up in a SearchPhaseExecutionException.
"wildcard" : { "ingestionDate" : "2011*" }
Seems like ES(v1.1) doesn't provide that much of ecstasy. This post suggests the idea of scripting (unaccepted answer says even more). I'll try that, just asking if anyone has did it already ?
Expectation
A search string 13 should match all documents where the joiningDate field has values :
2011-10-13
2013-01-11
2100-13-02
I'm not sure if I understand your needs correctly, but I would suggest you to use "range query" for the date field.
The code below will return the results what you want to get.
{
"query": {
"range": {
"joiningDate": {
"gt": "2011-01-01",
"lt": "2012-01-01"
}
}
}
}'
I hope this could help you.
Edit (Searching date containing "13" itself.)
I suggest you to use "Multi field" functionality of Elasticsearch.
It means you can index "joiningDate" field by two different field type at the same time.
Please see and try the example codes below.
Create a index
curl -XPUT 'localhost:9200/blacksmith'
Define mapping in which the type of "joiningDate" field is "multi_field".
curl -XPUT 'localhost:9200/blacksmith/my_type/_mapping' -d '{
"my_type" : {
"properties" : {
"joiningDate" : {
"type": "multi_field",
"fields" : {
"joiningDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"verbatim" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}'
Indexing 4 documents (3 documents containing "13")
curl -s -XPOST 'localhost:9200/blacksmith/my_type/1' -d '{ "joiningDate": "2011-10-13" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/2' -d '{ "joiningDate": "2013-01-11" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/3' -d '{ "joiningDate": "2130-12-02" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/4' -d '{ "joiningDate": "2014-12-02" }' # no 13
Try wildcard query to the "joiningDate.verbatim" field NOT the "joiningDate" field.
curl -XGET 'localhost:9200/blacksmith/my_type/_search?pretty' -d '{
"query": {
"wildcard": {
"joiningDate.verbatim": {
"wildcard": "*13*"
}
}
}
}'

elasticsearch: indexing & searching arabic text

When I put the following things into elasticsearch-1.0.1 I expect the search queries to return the posts with id 200 and 201. But I get nothing returned. Hits:0.
What am I doing wrong? I'm searching exactly for what I put in, but get nothing out... (here's the test code for download: http://petoria.de/tmp/arabictest.sh).
But please keep in mind: I want to use the Arabic analyzer, because I want to develop my own analyzer later.
Best,
Koem
curl -XPOST localhost:9200/posts -d '{
"settings" : {
"number_of_shards" : 1
}
}'
curl -XPOST localhost:9200/posts/post/_mapping -d '
{
"post" : {
"properties" : {
"arabic_text" : { "type" : "string", "index" : "analyzed", "store" : true, "analyzer" : "arabic" },
"english_text" : { "type" : "string", "index" : "analyzed", "store" : true, "analyzer" : "english" }
}
}
}'
curl -XPUT 'http://localhost:9200/posts/post/200' -d '{
"english_text" : "palestinian1",
"arabic_text" : "فلسطينيه"
}'
curl -XPUT 'http://localhost:9200/posts/post/201' -d '{
"english_text" : "palestinian2",
"arabic_text" : "الفلسطينية"
}'
search for palestinian1
curl -XGET 'http://localhost:9200/posts/post/_search' -d '{
"query": {
"query_string" : {
"analyzer" : "arabic",
"query" : "فلسطينيه"
}
}
}'
search for palestinian2
curl -XGET 'http://localhost:9200/posts/post/_search' -d '{
"query": {
"query_string" : {
"analyzer" : "arabic",
"query" : "الفلسطينية"
}
}
}'
Just add the encoding to your URL, you can do it by specifying the "Content-Type" header as bellow:
Content-Type:text/html;charset=UTF-8

How can I boost certain fields over others in elasticsearch?

My goal is to apply the boost to field "name" (see example below), but I have two problems when I search for "john":
search is also matching {name: "dany", message: "hi bob"} when name is "dany" and
search is not boosting name over message (rows with name="john" should be on the top)
The gist is on https://gist.github.com/tomaspet262/5535774
(since stackoverflow's form submit returned 'Your post appears to contain code that is not properly formatted as code', which was formatted properly).
I would suggest using query time boosting instead of index time boosting.
#DELETE
curl -XDELETE 'http://localhost:9200/test'
echo
# CREATE
curl -XPUT 'http://localhost:9200/test?pretty=1' -d '{
"settings": {
"analysis" : {
"analyzer" : {
"my_analyz_1" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}'
echo
# DEFINE
curl -XPUT 'http://localhost:9200/test/posts/_mapping?pretty=1' -d '{
"posts" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "my_analyz_1"
},
"message" : {
"type" : "string",
"analyzer" : "my_analyz_1"
}
}
}
}'
echo
# INSERT
curl localhost:9200/test/posts/1 -d '{name: "john", message: "hi john"}'
curl localhost:9200/test/posts/2 -d '{name: "bob", message: "hi john, how are you?"}'
curl localhost:9200/test/posts/3 -d '{name: "john", message: "bob?"}'
curl localhost:9200/test/posts/4 -d '{name: "dany", message: "hi bob"}'
curl localhost:9200/test/posts/5 -d '{name: "dany", message: "hi john"}'
echo
# REFRESH
curl -XPOST localhost:9200/test/_refresh
echo
# SEARCH
curl "localhost:9200/test/posts/_search?pretty=1" -d '{
"query": {
"multi_match": {
"query": "john",
"fields": ["name^2", "message"]
}
}
}'
Im not sure if this is relevant in this case, but when testing with such small amounts of data, I always use 1 shard instead of default settings to ensure no issues because of distributed calculation.

No query registered for [match]

I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo

Resources