null_value mapping in Elasticsearch - elasticsearch

I have created a mapping for a tweetb type in a twitter index:
curl -XPUT http://www.mydomain:9200/twitter/tweetb/_mapping -d '{
"twitter": {
"mappings": {
"tweetb": {
"properties": {
"message": {
"type": "string",
"null_value": "NA"
}
}
}
}
}
}'
Then, I put one document:
curl -XPUT http://www.mydomain.com:9200/twitter/tweetb/1 -d '{"message": null}'
Then, I tried to get the inserted doc back:
curl -XGET http://www.mydomain:9200/twitter/tweetb/1
And that returned:
{
"_index": "twitter",
"_type": "tweetb",
"_id": "1",
"_version": 2,
"found" : true,
"_source" : { "message": null }
}
I was expecting "message" : "NA" in the _source field. However, it looks like "null_value" isn't working. Am I missing something?

The "null_value" field mapping does not change the value stored, rather it changes the value that is used in searches.
If you try searching for your "message" using "NA", then it should appear in the results:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"match" : { "message" : "NA" }
}
}'
Of interest, it should respond with the actual value being null. Now, if you add a new document whose raw value is literally "NA" and perform the search, then you should see both results returned for the above query--one with a value and the other with null defined.
Perhaps of similar interest, this works for other queries as well based on how it is indexed, which is why a lowercase n.* matches, but N.* semi-surprisingly will not match:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"regexp" : { "message" : "n.*" }
}
}'

Related

Prefix Query not working in ElasticSearch

I've loaded the 'accounts.json' data from the following link into an ES instance on my machine:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/_exploring_your_data.html
This adds 1000 docs to the index 'bank' with the type 'account'. Simple enough!
Every doc has this structure:
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie#euron.com",
"city": "Hobucken",
"state": "CO"
}
Now I'm trying to run a simple 'prefix' query on this index.
Here is one that works just fine (comes back with plenty of correct results):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "address" : "963" } }
}'
Here is another one (this one doesn't work):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "Op" } }
}'
But there is definitely a record present which should be returned in the previous request. The following works:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match" : { "firstname" : "Opal" } }
}'
I have also verified the mapping and there doesn't seem to be any difference in the 2 fields, 'firstname' and 'address':
curl -XGET 'localhost:9200/bank/_mapping/account?pretty'
Here is the relevant mapping portion for those 2:
"address": {
"type": "string"
}
"firstname": {
"type": "string"
}
Can't figure out why one prefix query works and the other one doesn't. Any pointers on what I'm missing?
I think you'll find that this will do what you expect:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "op" } }
}'
The reason is that, since you have not specified an analyzer, the standard analyzer is used, which converts tokens to lower-case.
Here is some code that I used to test out my suspicion:
http://sense.qbox.io/gist/a4f087cee78fd694dd4223eb56e842e1cd1d5847

How to perform wildcard search on a date field?

I've a field containing values like 2011-10-20 with the mapping :
"joiningDate": { "type": "date", "format": "dateOptionalTime" }
The following query ends up in a SearchPhaseExecutionException.
"wildcard" : { "ingestionDate" : "2011*" }
Seems like ES(v1.1) doesn't provide that much of ecstasy. This post suggests the idea of scripting (unaccepted answer says even more). I'll try that, just asking if anyone has did it already ?
Expectation
A search string 13 should match all documents where the joiningDate field has values :
2011-10-13
2013-01-11
2100-13-02
I'm not sure if I understand your needs correctly, but I would suggest you to use "range query" for the date field.
The code below will return the results what you want to get.
{
"query": {
"range": {
"joiningDate": {
"gt": "2011-01-01",
"lt": "2012-01-01"
}
}
}
}'
I hope this could help you.
Edit (Searching date containing "13" itself.)
I suggest you to use "Multi field" functionality of Elasticsearch.
It means you can index "joiningDate" field by two different field type at the same time.
Please see and try the example codes below.
Create a index
curl -XPUT 'localhost:9200/blacksmith'
Define mapping in which the type of "joiningDate" field is "multi_field".
curl -XPUT 'localhost:9200/blacksmith/my_type/_mapping' -d '{
"my_type" : {
"properties" : {
"joiningDate" : {
"type": "multi_field",
"fields" : {
"joiningDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"verbatim" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}'
Indexing 4 documents (3 documents containing "13")
curl -s -XPOST 'localhost:9200/blacksmith/my_type/1' -d '{ "joiningDate": "2011-10-13" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/2' -d '{ "joiningDate": "2013-01-11" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/3' -d '{ "joiningDate": "2130-12-02" }'
curl -s -XPOST 'localhost:9200/blacksmith/my_type/4' -d '{ "joiningDate": "2014-12-02" }' # no 13
Try wildcard query to the "joiningDate.verbatim" field NOT the "joiningDate" field.
curl -XGET 'localhost:9200/blacksmith/my_type/_search?pretty' -d '{
"query": {
"wildcard": {
"joiningDate.verbatim": {
"wildcard": "*13*"
}
}
}
}'

Return document on update elasticsearch

Lets say I'm updating user data
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"fields": ["_source"]
}'
Heres an example of what I'm getting back when I perform an update
{
"_index" : "test",
"_type" : "type1",
"_id" : "1",
"_version" : 4
}
How do I perform an update that returns the given document post update?
The documentation is a little misleading with regards to returning fields when performing an Elasticsearch update. It actually uses the same approach that the Index api uses, passing the parameter on the url, not as a field in the update.
In your case you would submit:
curl -XPOST 'localhost:9200/test/type1/1/_update?fields=_source' -d '{
"doc" : {
"name" : "new_name"
}
}'
In my testing in Elasticsearch 1.2.1 it returns something like this:
{
"_index":"test",
"_type":"testtype",
"_id":"1","_version":9,
"get": {
"found":true,
"_source": {
"user":"john",
"body":"testing update and return fields",
"name":"new_name"
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

Elasticsearch index aliases (w/ routing) and parent/child docs

I'm trying to set up an index with the following characteristics:
The index houses data for many projects. Most work is project-specific, so I set up aliases for each project, using project_id as the routing field. (And as an associated term filter.)
The data in question have a parent/child structure. For simplicity, let's call the doc types "mama" and "baby."
So we create the index and aliases:
curl -XDELETE http://localhost:9200/famtest
curl -XPOST http://localhost:9200/famtest -d '
{ "mappings" :
{ "mama" :
{ "properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
},
"baby" :
{ "_parent" :
{ "type" : "mama" },
"properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
}
}
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family1",
"index": "famtest",
"routing": "100",
"filter":
{ "term": { "project_id": "100" } }
}
} ]
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family2",
"index": "famtest",
"routing": "200",
"filter":
{ "term": { "project_id": "200" } }
}
} ]
}'
Now let's make some mamas:
curl -XPOST localhost:9200/family1/mama/1 -d '{ "name" : "Family 1 Mom", "project_id" : "100" }'
curl -XPOST localhost:9200/family2/mama/2 -d '{ "name" : "Family 2 Mom", "project_id" : "200" }'
These documents are now available via /familyX/_search. So now we want to add a baby:
curl -XPOST localhost:9200/family1/baby/1?parent=1 -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
Unfortunately, ES doesn't like that:
{"error":"ElasticSearchIllegalArgumentException[Alias [family1] has index routing associated with it [100], and was provided with routing value [1], rejecting operation]","status":400}
So... any idea how to use alias routing and still set the parent id? If I understand this right, it shouldn't be a problem: all project operations ("family1", in this case) go through the alias, so parent and child docs will wind up on the same shard anyway. Is there some alternative way to set the parent id, without interfering with the routing?
Thanks. Please let me know if I can be more specific.
Interesting question! As you already know the parent id is used for routing too since children must be indexed in the same shard as the parent documents. What you're trying to do is fine, since parent and children would fall into the same family, thus in the same shard anyway since you configured the routing in the family alias.
But I'm afraid the parent id has higher priority than the routing defined in the alias, which gets overwritten, but that's not possible and that's why you get the error. In fact, if you try again providing the routing in your index request it works:
curl -XPOST 'localhost:9200/family1/baby/1?parent=1&routing=100' -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
I would fill in a github issue with a curl recreation.

No query registered for [match]

I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo

Resources