Format reading ElasticSearch dates - elasticsearch

This is my mapping for one of the properties in my ElasticSearch model:
"timestamp":{
"type":"date",
"format":"dd-MM-yyyy||yyyy-MM-dd'T'HH:mm:ss.SSSZ||epoch_millis"
}
I'm not sure if I'm misunderstanding the documentation. It clearly says:
The first format will also act as the one that converts back from milliseconds to a string representation.
And that is exactly what I want. I would like to be able to read directly (if possible) the dates as dd-MM-yyyy.
Unfortunately, when I go to the document itself (so, accessing to the ElasticSearch's endpoint directly, not via the application layer) I still get:
"timestamp" : "2014-01-13T15:48:25.000Z",
What am I missing here?.

As #Val mentioned, you'd get the value/format as how it is being indexed.
However if you want to view the date in particular format regardless of the format it has been indexed, you can make use of Script Fields. Note that it would be applied at querying time.
Below query is what your solution would be.
POST <your_index_name>/_search
{
"query":{
"match_all":{ }
},
"script_fields":{
"timestamp":{
"script":{
"inline": "def sf = new SimpleDateFormat(\"dd-MM-yyyy\");def dt = new Date(doc['timestamp'].value);def mydate = sf.format(dt);return mydate;"
}
}
}
}
Let me know how it goes.

Related

Elastic Search - Accessing a member of an element inside a list

I'm relatively new to elastic search and have a question about accessing an element inside of an element inside of a list. The structure is as follows:
{
'TestA':'1',
'TestB':{
'TestC':'2',
'TestD':[
{
'TestE':'3',
'TestF':'4'
},
{
'TestE':'5',
'TestF':'6'
}
]
}
}
With this following structure I want to return all the results from the query in which TestF has a value of 6. I was wondering if this is possible with the following template.
{
"query":{
"bool":{
"must":[
{
"match":{
"TestB.TestD.TestF":'6'
}
}
]
}
}
}
Would {"match" : { "TestB.TestD.TestF": '6'}} be able to search through each element of 'TestD' or would I need to use some other command to iterate through the list? This is with elastic search 5.0. Thanks in advance!
Yes, your match query should find the results you are looking for. Elasticsearch flattens arrays when it puts them in the inverted index. For more information, check out the docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#_how_arrays_of_objects_are_flattened
Arrays of inner object fields do not work the way you may expect.
Lucene has no concept of inner objects, so Elasticsearch flattens
object hierarchies into a simple list of field names and values.

ElasticSearch - creating exceptions for fuzzy terms

I have simple elastic query that does a simple text field search with the fuziness distance of one:
GET /jobs/_search
{
"query": {
"fuzzy": {
"attributes.title": {
"value": "C#"
"fuzziness": 1
}
}
}
}
The above query does exactly what it is told to do, but I have a cases where I don't want a word to resolve (with fuzziness) to another specific word. In this case, I don't want C# to also return C++ results. Similarly I don't want cat to return car results.
However I do still need the fuzziness option if someone did actually misspelled cat. In that case it can return results of both cat and car.
I think this is possible with some bool query combination, it should be something like this:
bool:
//should
//match query without fuzzy
//bool
//must
//must with fuzzy query
//must_not with match query

Converting stringified float to float in Elasticsearch

I have a mapping in an Elasticsearch index with a certain string field called duration. However, duration is actually a float, but it's passed in as a string from my provisioning chain, so it will always look something like this : "0.12". So now I'd like to create a new index with a new mapping, where the duration field is a float. Here's what I've done, which isn't working at all, either for old entries or for incoming new ones.
First, I create my new index with my new mapping by doing the following :
PUT new_index
{
"mappings": { "new_mapping": {"properties": {"duration": {"type": "float"}, ... }
}
I then check that the new mapping are really in place using :
GET new_index/_mapping
I then copy the contents of the old index into the new one :
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
However, when I look at the entries in new_index, be it the ones I've added with that last POST or the new ones that came in since through my provisioning chain, the duration entry is still a string, even when its _type is new_mapping.
What am I doing wrong here ? Or is there simply no way to convert a string to a float within Elasticsearch ?
The duration field in the new index will be indexed as float (as per your mapping), however if the duration field in the source document is still a string, it will stay as a string in the _source, but still be indexed as float.
You can do a range query "from 1.00 to 3.00" on the new index and compare with what you get in the old index. Since the old index will run a lexical range (because of the string type) you might get results with a duration of 22.3, while in the new index you'll only get durations that are really between 1.00 and 3.00.

Elasticsearch 2.x index mapping _id

I ran ElasticSearch 1.x (happily) for over a year. Now it's time for some upgrading - to 2.1.x. The nodes should be turned off and then (one-by-one) on again. Seems easy enough.
But then I ran into troubles. The major problem is the field _uid, which I created myself so that I knew the exact location of a document from a random other one (by hashing a value). This way I knew that only that the exact one will be returned. During upgrade I got
MapperParsingException[Field [_uid] is a metadata field and cannot be added inside a document. Use the index API request parameters.]
But when I try to map my former _uid to _id (which should also be good enough) I get something similar.
The reason why I used the _uid param is because the lookup time is a lot lower than a termsQuery (or the like).
How can I still use the _uid or _id field in each document for the fast (and exact) lookup of certain exact documents? Note that I have to call thousands exact ones at the time, so I need an ID like query. Also it may occur the _uid or _id of the document does not exist (in that case I want, like now, a 'false-like' result)
Note: The upgrade from 1.x to 2.x is pretty big (Filters gone, no dots in names, no default access to _xxx)
Update (no avail):
Updating the mapping of _uid or _id using:
final XContentBuilder mappingBuilder = XContentFactory.jsonBuilder().startObject().startObject(type).startObject("_id").field("enabled", "true").field("default", "xxxx").endObject()
.endObject().endObject();
CLIENT.admin().indices().prepareCreate(index).addMapping(type, mappingBuilder)
.setSettings(Settings.settingsBuilder().put("number_of_shards", nShards).put("number_of_replicas", nReplicas)).execute().actionGet();
results in:
MapperParsingException[Failed to parse mapping [XXXX]: _id is not configurable]; nested: MapperParsingException[_id is not configurable];
Update: Changed name into _id instead of _uid since the latter is build out of _type#_id. So then I'd need to be able to write to _id.
Since there appears to be no way around setting the _uid and _id I'll post my solution. I mapped all document which had a _uid to uid (for internal referencing). At some point it came to me, you can set the relevant id
To bulk insert document with id you can:
final BulkRequestBuilder builder = client.prepareBulk();
for (final Doc doc : docs) {
builder.add(client.prepareIndex(index, type, doc.getId()).setSource(doc.toJson()));
}
final BulkResponse bulkResponse = builder.execute().actionGet();
Notice the third argument, this one may be null (or be a two valued argument, then the id will be generated by ES).
To then get some documents by id you can:
final List<String> uids = getUidsFromSomeMethod(); // ids for documents to get
final MultiGetRequestBuilder builder = CLIENT.prepareMultiGet();
builder.add(index_name, type, uids);
final MultiGetResponse multiResponse = builder.execute().actionGet();
// in this case I simply want to know whether the doc exists
if (only_want_to_know_whether_it_exists){
for (final MultiGetItemResponse response : multiResponse.getResponses()) {
final boolean exists = response.getResponse().isExists();
exist.add(exists);
}
} else {
// retrieve the doc as json
final String string = builder.getSourceAsString();
// handle JSON
}
If you only want 1:
client.prepareGet().setIndex(index).setType(type).setId(id);
Doing - the single update - using curl is mapping-id-field (note: exact copy):
# Example documents
PUT my_index/my_type/1
{
"text": "Document with ID 1"
}
PUT my_index/my_type/2
{
"text": "Document with ID 2"
}
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
},
"script_fields": {
"UID": {
"script": "doc['_id']"
}
}
}

ElasticSearch - specify range for a string field

I am trying to retrieve the mentions of years between 1933 and 1949 from a string field called text. However, I cannot seem to find the working range query for that. What I tried to so far crashes:
{"query":
{"query_string":
{
"text": [1933 TO 1949]
}
}
}
I have also tried it like this:
{"query":
{"filtered":
{"query":{"match_all":{}},
"filter":{"range":{"text":[1933 TO 1949]}
}
}
}
but it still crashes.
A sample text field looks like the one below, containing a mention of the year 1933:
"Primera División 1933 (Argentinië), seizoen in de Argentijnse voetbalcompetitie\n* Primera Divisió n 1933 (Chili), seizoen in de Chileense voetbalcompetitie\n* Primera División 1933 (Uruguay), seizoen in de Uruguayaanse voetbalcompetitie\n \n "
However, I also have documents not containing any years inside, and I would like to filter all the documents to preserve only the ones mentioning years in a given period. I read here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html that the range query can be applied to text fields as well, and I don't want to use any intermediate solution to identify dates inside texts.
What I basically want to achieve is to be able to get the same results as when using a search URI query:
urltomyindex/_search?q=text:%7B1933%20TO%201949%7D%27
which works perfectly.
Is it still possible to achieve my goal? Any help much appreciated!
This should do it:
GET index1/type1/_search
{
"query": {
"filtered": {
"filter": {
"terms": {
"fieldNameHere": [
"1933",
"1934",
"1935",
"1936",
"1937",
"1938",
"1939",
"1940",
"1941",
"1942",
"1943",
"1944",
"1945",
"1946",
"1947",
"1948",
"1949"
]
}
}
}
}
}
If you know you're going to be needing this kind of search frequently it would be much better to create a new field "yearPublished" or something like that so you can search it as a number vs a text field.

Resources