How to perform wildcard search on a date field? - elasticsearch

I've a field containing values like 2011-10-20 with the mapping :
"joiningDate": { "type": "date", "format": "dateOptionalTime" }
The following query ends up in a SearchPhaseExecutionException.
"wildcard" : { "ingestionDate" : "2011*" }
Seems like ES(v1.1) doesn't provide that much of ecstasy. This post suggests the idea of scripting (unaccepted answer says even more). I'll try that, just asking if anyone has did it already ?
Expectation
A search string 13 should match all documents where the joiningDate field has values :
2011-10-13
2013-01-11
2100-13-02

I'm not sure if I understand your needs correctly, but I would suggest you to use "range query" for the date field.
The code below will return the results what you want to get.
{
"query": {
"range": {
"joiningDate": {
"gt": "2011-01-01",
"lt": "2012-01-01"
}
}
}
}'
I hope this could help you.
Edit (Searching date containing "13" itself.)
I suggest you to use "Multi field" functionality of Elasticsearch.
It means you can index "joiningDate" field by two different field type at the same time.
Please see and try the example codes below.
Create a index
curl -XPUT 'localhost:9200/blacksmith'
Define mapping in which the type of "joiningDate" field is "multi_field".
curl -XPUT 'localhost:9200/blacksmith/my_type/_mapping' -d '{
"my_type" : {
"properties" : {
"joiningDate" : {
"type": "multi_field",
"fields" : {
"joiningDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"verbatim" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}'
Indexing 4 documents (3 documents containing "13")
curl -s -XPOST 'localhost:9200/blacksmith/my_type/1' -d '{ "joiningDate": "2011-10-13" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/2' -d '{ "joiningDate": "2013-01-11" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/3' -d '{ "joiningDate": "2130-12-02" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/4' -d '{ "joiningDate": "2014-12-02" }' # no 13
Try wildcard query to the "joiningDate.verbatim" field NOT the "joiningDate" field.
curl -XGET 'localhost:9200/blacksmith/my_type/_search?pretty' -d '{
"query": {
"wildcard": {
"joiningDate.verbatim": {
"wildcard": "*13*"
}
}
}
}'

Related

elasticsearch: getMinuteOfDay() applied to time() in date range filter

I'm trying to build an elasticsearch query to return documents for times between midnight and the current time of day, for all dates. For example, if I run the query at 09:00:00, then the query will return any document with a timestamp between midnight and 09:00:00 regardless of the date.
Here's an example dataset:
curl -XPUT localhost:9200/test/dt/_mapping -d '{"dt" : {"properties" : {"created_at" : {"type" : "date", "format": "YYYY-MM-DD HH:mm:ss" }}}}'
curl -XPOST localhost:9200/test/dt/1 -d '{ "created_at": "2014-10-09 07:00:00" }'
curl -XPOST localhost:9200/test/dt/2 -d '{ "created_at": "2014-10-09 14:00:00" }'
curl -XPOST localhost:9200/test/dt/3 -d '{ "created_at": "2014-10-08 08:00:00" }'
curl -XPOST localhost:9200/test/dt/4 -d '{ "created_at": "2014-10-08 15:00:00" }'
curl -XPOST localhost:9200/test/dt/5 -d '{ "created_at": "2014-10-07 09:00:00" }'
curl -XPOST localhost:9200/test/dt/6 -d '{ "created_at": "2014-10-07 16:00:00" }'
and an example filter:
curl -XPOST localhost:9200/test/dt/_search?pretty -d '{
"query": {
"filtered" : {
"filter" : {
"bool": {
"must": [
{
"script" : {
"script" : "doc[\"created_at\"].date.getMinuteOfDay() < 600"
}
}
]
}
}
}
}
}'
where 600 is a static parameter that I want to replace with a dynamic minutes parameter depending on the time of day when the query is run.
My best attempt is to use the getMinuteOfDay() method to filter each doc, but my problem is how to get getMinuteOfDay() for the current time. I've tried variations using time() instead of the hard-coded parameter 600 in the above query, but can't figure it out.
Any ideas?
This is pretty late but this can be done with DateTime.now(), you should change your query to
GET test/dt/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"script": {
"script": "doc[\"created_at\"].date.getMinuteOfDay() < DateTime.now().getMinuteOfDay()"
}
}
]
}
}
}
}
}
Hope this helps!

Elasticsearch index aliases (w/ routing) and parent/child docs

I'm trying to set up an index with the following characteristics:
The index houses data for many projects. Most work is project-specific, so I set up aliases for each project, using project_id as the routing field. (And as an associated term filter.)
The data in question have a parent/child structure. For simplicity, let's call the doc types "mama" and "baby."
So we create the index and aliases:
curl -XDELETE http://localhost:9200/famtest
curl -XPOST http://localhost:9200/famtest -d '
{ "mappings" :
{ "mama" :
{ "properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
},
"baby" :
{ "_parent" :
{ "type" : "mama" },
"properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
}
}
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family1",
"index": "famtest",
"routing": "100",
"filter":
{ "term": { "project_id": "100" } }
}
} ]
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family2",
"index": "famtest",
"routing": "200",
"filter":
{ "term": { "project_id": "200" } }
}
} ]
}'
Now let's make some mamas:
curl -XPOST localhost:9200/family1/mama/1 -d '{ "name" : "Family 1 Mom", "project_id" : "100" }'
curl -XPOST localhost:9200/family2/mama/2 -d '{ "name" : "Family 2 Mom", "project_id" : "200" }'
These documents are now available via /familyX/_search. So now we want to add a baby:
curl -XPOST localhost:9200/family1/baby/1?parent=1 -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
Unfortunately, ES doesn't like that:
{"error":"ElasticSearchIllegalArgumentException[Alias [family1] has index routing associated with it [100], and was provided with routing value [1], rejecting operation]","status":400}
So... any idea how to use alias routing and still set the parent id? If I understand this right, it shouldn't be a problem: all project operations ("family1", in this case) go through the alias, so parent and child docs will wind up on the same shard anyway. Is there some alternative way to set the parent id, without interfering with the routing?
Thanks. Please let me know if I can be more specific.
Interesting question! As you already know the parent id is used for routing too since children must be indexed in the same shard as the parent documents. What you're trying to do is fine, since parent and children would fall into the same family, thus in the same shard anyway since you configured the routing in the family alias.
But I'm afraid the parent id has higher priority than the routing defined in the alias, which gets overwritten, but that's not possible and that's why you get the error. In fact, if you try again providing the routing in your index request it works:
curl -XPOST 'localhost:9200/family1/baby/1?parent=1&routing=100' -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
I would fill in a github issue with a curl recreation.

How can I boost certain fields over others in elasticsearch?

My goal is to apply the boost to field "name" (see example below), but I have two problems when I search for "john":
search is also matching {name: "dany", message: "hi bob"} when name is "dany" and
search is not boosting name over message (rows with name="john" should be on the top)
The gist is on https://gist.github.com/tomaspet262/5535774
(since stackoverflow's form submit returned 'Your post appears to contain code that is not properly formatted as code', which was formatted properly).
I would suggest using query time boosting instead of index time boosting.
#DELETE
curl -XDELETE 'http://localhost:9200/test'
echo
# CREATE
curl -XPUT 'http://localhost:9200/test?pretty=1' -d '{
"settings": {
"analysis" : {
"analyzer" : {
"my_analyz_1" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}'
echo
# DEFINE
curl -XPUT 'http://localhost:9200/test/posts/_mapping?pretty=1' -d '{
"posts" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "my_analyz_1"
},
"message" : {
"type" : "string",
"analyzer" : "my_analyz_1"
}
}
}
}'
echo
# INSERT
curl localhost:9200/test/posts/1 -d '{name: "john", message: "hi john"}'
curl localhost:9200/test/posts/2 -d '{name: "bob", message: "hi john, how are you?"}'
curl localhost:9200/test/posts/3 -d '{name: "john", message: "bob?"}'
curl localhost:9200/test/posts/4 -d '{name: "dany", message: "hi bob"}'
curl localhost:9200/test/posts/5 -d '{name: "dany", message: "hi john"}'
echo
# REFRESH
curl -XPOST localhost:9200/test/_refresh
echo
# SEARCH
curl "localhost:9200/test/posts/_search?pretty=1" -d '{
"query": {
"multi_match": {
"query": "john",
"fields": ["name^2", "message"]
}
}
}'
Im not sure if this is relevant in this case, but when testing with such small amounts of data, I always use 1 shard instead of default settings to ensure no issues because of distributed calculation.

No query registered for [match]

I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo

Elasticsearch ignore words breakers

i'm new to Elasticsearch and i've got a problem regarding querying.
I indexed strings like that:
my-super-string
my-other-string
my-little-string
This strings are slugs.
So, they are no spaces, only alphanumeric characters. Mapping for the related field is only "type=string".
I'm using a query like this:
{ "query":{ "query_string":{ "query": "*"+<MY_QUERY>+"*", "rewrite": "top_terms_10" } }}
Where "MY_QUERY" is also a slug. Something like "my-super" for example.
When searching for "my" i get results.
When searching for "my-super" i get no results and i'd like to have "my-super-string".
Can someone help me on this? Thanks!
I would suggest using match_phrase instead of using query string with leading and trailing wildcards. Even standard analyzer should be able to split slug into tokens correctly, so there is not need for wildcards.
curl -XPUT "localhost:9200/slugs/doc/1" -d '{"slug": "my-super-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/2" -d '{"slug": "my-other-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/3" -d '{"slug": "my-little-string"}'
echo
curl -XPOST "localhost:9200/slugs/_refresh"
echo
echo "Searching for my"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my"} } }'
echo
echo "Searching for my-super"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-super"} } }'
echo
echo "Searching for my-other"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-other"} } }'
echo
echo "Searching for string"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "string"} } }'
Alternatively, you can create your own analyzer that will split slugs into tokens only on "-"
curl -XDELETE localhost:9200/slugs
curl -XPUT localhost:9200/slugs -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer" : {
"slug_analyzer" : {
"tokenizer": "slug_tokenizer",
"filter" : ["lowercase"]
}
},
"tokenizer" :{
"slug_tokenizer" : {
"type": "pattern",
"pattern": "-"
}
}
}
}
},
"mappings" :{
"doc" : {
"properties" : {
"slug" : {"type": "string", "analyzer" : "slug_analyzer"}
}
}
}
}'

Resources