Changing live data coming into Elasticsearch? - elasticsearch

I've been given a set up where I have a program creating live data and posting them into Elasticsearch.
I am trying to visualise this data in Kibana, but I'm coming across many problems such as numbers for a field being of type string instead of integers or there being certain missing fields.
But mainly for now certain fields being integer instead of string would be useful. How do I go about this? Is it possible?
I have no access to source code of the system creating the live events data.
Thanks in advance.
Update: I should also mention additionally that for now I am restricted to Elasticsearch version 2.4

If your data is coming straight into Elasticsearch, your options are limited.
The best option is to have the program that is creating the data send valid, properly formatted data.
If that's not an option, you can set your Elasticsearch mapping to force the field to be numeric. This will have the side-effect of dropping all documents where this field is not numeric.
There is also the elasticsearch injest node, which allows for some (logstash-like) transformations of the data. Converting the type is one such allowed "processor".

Related

Revert ElasticSearch field type without data losing

I have a Kibana not owned by me.
As of now it contains 1000+ fields (generated from JSON inputs).
There are few fields which were (maybe manually configured as String), and few visualizations which used these fields.
At some moment the index fields list was refreshed, and these fields became Number. And for some time were indexed as numbers.
Now we have conflicts "The type of this field changes across indices. It is unavailable for many analysis functions."
Is there any chance to convert it to String without losing the data?
In general this will require the following procedure
backup the offending indices using some tool
Delete the indices
and restore the indices from the backup but make sure to change the mapping of that type.
I would recommend using https://www.npmjs.com/package/elasticdump for this purpose.

Dealing with random failure datatypes in Elasticsearch 2.X

So im working on a system that logs bad data sent to an api and what the full request was. Would love to be able to see this in Kibana.
Issue is the datatypes could be random, so when I send them to the bad_data field it fails if it dosen't match the original mapping.
Anyone have a suggestion for the right way to handle this?
(2.X Es is required due to a sub dependancy)
You could use ignore_malformed flag in your field mappings. In that case wrong format values will not be indexed and your document will be saved.
See elastic documentation for more information.
If you want to be able to query such fields as original text you could use fields in your mapping for multi-type indexing, to get fast queries on raw text values.

Set up Elasticsearch suggesters that can return suggestions from different data types

We're in the process of setting up Amazon Elasticsearch Service (running Elasticsearch version 2.3).
We have different types of data (that I'm currently thinking of as different document types within the same index).
We have a generic search in an app where we want an inline autocomplete function, that is, a completion suggester returning hits from all different data (document) types. How can that be set up?
When querying suggesters you have to specify an index, so that's why I wanted to keep all the data in the same index. According to the documentation, the completion suggester considers all documents in the index.
Setting up the completion suggester for the first document type was pretty straight forward and is working great. However, as far as I can see you to specify a suggest field when querying. That would be all good hadn't it been for the error message we get when setting up the mapping for the second document type:
Type: illegal_argument_exception Reason: "[suggest] is defined as an object in mapping [name_of_document_type] but this name is already used for a field in other types"
Writing this question I see that it's possible to specify more than one suggester in a single suggest query. Maybe that is what we have to solve it? (I.e. get X results from Y suggesters where we compare the scores to get the 1 suggestion we want to present to the user.)
One of the core principles of good data design for Elasticsearch (as with many data stores) is to optimise your data storage for ease of reading. Usually, this means embracing duplication.
With this in mind, I'd suggest having a separate autocomplete index with a mapping that's designed specifically for the suggester queries.
Whenever you insert or write one of your other documents, map it to your autocomplete type and add or update it in your autocomplete index at the same time (or, depending on how up-to-date it needs to be, create an offline process to update your autocomplete index e.g. every day).
Then, when you do your suggest query, you can just use your autocomplete index and not worry about dealing with different types of documents with different fields.

Using ElasticSearch to store data without indexing or analysis (NEST client)

We are using ES via the NEST client for search, and we'd like to try to leverage it to store some reports that the system generates as well.
The reports are strings containing CSV data and they can be quite large, 100mb+, and we've run into some problems. First we were exceeding the 100mb limit set in the http config, so I increased that and the error stopped.
Now we're getting System.OutOfMemoryExceptions.
With the reports, we don't need to analyse them or have them tokenized and indexed. We just need to be able to get them back out by their ID to send along to the browser. I haven't had a lot of luck finding details on how to use ES as a dumb key-value store though, or if that will help with the memory problem.
Additionally it crossed my mind to zip compress the data before sending it into ES, but again, not sure if that'd help or what would be involved.
Don't know how you have it currently configured, but you could try the string type with an index value of "not_analyzed" or "no". I'd also give the binary type a try. You should set store to "true" for either approach. That should prevent ES from attempting to analyze and index the field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

Recommended way to store data in elasticsearch

I want to use elasticsearch on my backend and I have few questions:
My DB contains semi-structured data of products, i.e. each product may have different attributes inside it.
I want to be able to search a text on most of the fields and also search a text on one specific field.
What is the recommended way to store the document in ES ? to store all text in on field (maybe using _all feature) or leave it in different fields.
My concern of different fields is that I might have a lot of indexes (because I have many different product attributes)
I'm using couchbase as my main DB.
What is the recommended way to move the documents from it to ES, assuming I need to make some modifications on the document ?
To update the index from my code explicitly or use external tool ?
10x,
It depends on how many docs you are indexing at a time. If the number of docs are like >2million. Then it's better to store everything in one field , which will save time while indexing.
If the docs indexed are very less, then index them field by field and then search on _all field. This will give a clear view on the data and will be really helpful for what to display and what not to display.

Resources