Using elasticsearch JS client,if I want to get all the indices, it provides an API https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-cat-indices
However, the output format is string.
If I want to find the latest index, I need to manipulate the string such as split by space, iterate thru the array and filter the content by some criteria etc..
Instead, Is it possible to get json output from the API?
PS: I did notice this thread which is suggesting to use /*/_aliases and that works well but I was wondering how to leverage elasticsearch js client APIs
The cat APIs are meant to be consumed by humans (hence not JSON).
If you want to get JSON data, you can use the indices.stats call (which hits the /_stats API endpoint).
client.indices.stats({
index: "_all",
level: "indices"
}, function(err, res) {
// res contains JSON data about indices stats
});
UPDATE:
Actually, the cat APIs also return JSON data, if you specify the parameter format: json in the request:
client.cat.indices({"format": "json"}, function(err, res) {
...
});
Related
I've been working with Elasticsearch for some days. As i'm creating a CRUD, I've come across the updateByQuery method. I'm working with nestjs, and the way that I'm updating a field is:
await this.elasticSearch.updateByQuery(
{
index: 'my_index_user',
body:{
query:{
match:{
name: 'user_name',
}
},
script: {
inline : 'ctx._source.name = "new_user_name"'
}
}
}
);
My question is:
Why does elasticsearch need this syntax 'ctx._source.name = "new_user_name"' to specifie what the new value of the field name should be? What is ctx._source is this context?
As mentioned in the official doc of source filtering, using this you can fetch field value in the _source (Value which sent to Elasticsearch and this is stored as it is, and doesn't go through the analysis process).
Let's take an example of text field for which standard analyzer(Default) is applied, and you store the value of foo bar in this field, Elasticsearch
breaks the value of field as it goes through the analysis process and foo and bar two tokens are stored in the inverted index of Elasticsearch, but if you want to see the original value ie foo bar, you can check the _source and get it.
Hence, it's always better to have the original value(without analysis process) to be in the _source, hence using this API, you are updating the field value there.. this also helps when you want to reindex later to new index or change the way its analyzed as you have the original value in _source.
I've been trying to filter on a non indexed nested JSON field, and taking into account the following sample document (retrieved using Kibana Dev Tools).
{
"id": "",
"timestamp": "",
"innerData": {
"innerField": "",
"innerJson": """{ inner json in here }"""
}
}
I've got several questions since I've been trying to filter keys within innerJson (not indexed) without any success.
Is it triple quoted in Kibana for sake of readability since being a JSON it contains several quotes? Is it possible to filter values for the innerData.innerJson map the same you would do for current document fields such as innerData.innerField?
GET /my_index/_search
{
"query": {
"wildcard": {
"innerData.innerJson.INNER_JSON_FIELD": {
"value": "*sample*"
}
}
}
}
I've been playing around with ES for a bit and taking a big index into account (100M entries approximate) I assume that by doing such filter in a non indexed field this would be an expensive operation? Any recommended optimization to not hit the performance too much?
JSON itself does not support triple quotes so yes, it's a Kibana readability convention. When you inspect the proxied requests, you can see that all double quotes inside of double quotes get escaped --> " \"...\" ".
As to the field innerJson -- since it's not indexed, it's not searchable so there's really no way to filter on it, let alone access the stringified's JSON inner properties. Leaving is as text and wildcarding on it is very expensive but it would be possible if it were indexed.
How many key-value pairs does the inner JSON have? What prevents you from parsing it before ingesting into ES?
I try to add new field which is value comes from hashed existing field value. So, i want to do;
my_index.hashedusername(new field) = crc32(my_index.username) (existing field)
For example
POST _update_by_query
{
"query": {
"match_all": {}
},
"script" : {
"source": "ctx._source.hashedusername = crc32(ctx._source.username);"
}
}
Please give me an idea how to do this..
java.util.zip.CRC32 is not available in the shared painless API so mocking that package will be non-trivial -- perhaps even unreasonable.
I'd suggest to compute the CRC32 hashes beforehand and only then send the docs to ES. Alternatively, scroll through all your documents, compute the hash and bulk-update your documents.
The painless API was designed to perform comparatively simple tasks and CRC32 is certainly outside of its purpose.
I have some polled data sources which i want to pass to Elasticsearch:
{ "foo": { "bar": [
{"name": "Hello", "route":"5A", "lat":"2345678", "lon":"2345678" },
{"name": "World", "route":"5D", "lat":"8765432", "lon":"8798765" },
]}}
Ideally I want to pass all the array contents to Elasticsearch to be created as documents.
want to take the passed in JSON, unwrap it, and then essentially run a filter to add a new kv pair:
<filter>
#type record_transformer
<record>
geo_coord "{lat},{lon}"
</record>
</filter>
Which seems pretty simple to me.
Ideally I want to have fluentd handle the polling of 6 websites, then ingest as above, and submit them as documents to elastic i set up.
I am currently just working through a docker container to see if i can get this all working, so right now I am trying to do some simple tests with std out.
Ultimately, i catch the above payload but how do i add a new property to each of the list items? After I get done the parsing of data, then i should easily be able to ship it to Elasticsearch.
I'm using such command to import data to RethinkDB
rethinkdb import --force -f ${folder}/json/data.json --table test.data -c localhost:28015
It imports data perfectly. But I have some of fields in my json as time:
{
"id": "1",
"date": "2015-09-19",
"time": {
"begin": "09:00",
"end": "10:30"
}
}
When I'm trying to query these fields like data or time.begin, time.end treating them as time - RethinkDB doesn't understand it and throw exception
r.db('test').table('data').filter(function(t) {
return t("date").date()
})
RqlRuntimeError: Not a TIME pseudotype: `"2015-09-19"` in:
r.db("test").table("data").filter(function(var_43) { return var_43("date").date(); })
^^^^^^^^^^^^^^
Is any way to specify for RethinkDB which field in the JSON are with time type?
JSON doesn't provide a standard way of specifying a time field, but there are a couple ways you can do this with RethinkDB: either modify the data before or after inserting it. RethinkDB time objects are more than just the strings you have shown here, and contain millisecond time resolution along with timezone data.
Time objects can be constructed using r.now(), r.time(), r.epoch_time(), and r.ISO8601(). Because of the format of your time strings, I would use r.ISO8601(). It is important to note that your data doesn't appear to contain timezone information, so you should be sure that your data won't return incorrect results if they are all put in the same timezone.
Another thing to keep in mind when using times in RethinkDB is that the data will be converted into an appropriate time object in your client. Since it appears that you are using Javascript, you will get back a Date object. For Python, you would get a datetime.datetime object, etc. If you would rather get the raw time pseudotype format (see below), you can specify timeFormat: "raw" as a global optarg to your query (see the documentation for run() for details).
Post-process the data inside RethinkDB
This is probably the easiest option, and what I would recommend. After importing your data, you can run a query to modify each row to convert the strings into time objects. Based on the format of your data, this should work:
r.db('test').table('data').replace(function(row) {
return row.merge({
'begin_time': r.ISO8601(row('date').add('T').add(row('time')('begin')), { defaultTimezone: '+00:00' }),
'end_time': r.ISO8601(row('date').add('T').add(row('time')('end')), { defaultTimezone: '+00:00' })
}).without('date', 'time');
}).run(conn, callback)
This replaces the date and time fields from all the rows in your test.data table with begin_time and end_time time objects that can be used as you expect. The defaultTimezone field is required because the time string doesn't contain timezone information, but you should change these values to whatever is appropriate.
Modify the JSON data
This is a bit lower-level and can be tricky, but if you don't mind getting your hands dirty, this could be more suited to your needs.
RethinkDB time objects are communicated in JSON using a particular format to represent a 'pseudotype'. These are types not standardized in JSON that still exist in RethinkDB. The format for a time pseudotype looks like this:
{
"$reql_type$": "TIME",
"epoch_time": 1413843783.195,
"timezone": "+00:00"
}
Where epoch_time is the number of seconds since the UNIX epoch (Jan 1, 1970). If the data you are importing follows this format, you can insert this directly and it will be interpreted by the database as a valid time object. It would be up to you to modify the data you are importing, but your example row would look something like this:
{
"id": "1",
"begin_time": {
"$reql_type$": "TIME",
"epoch_time": 1442653200,
"timezone": "+00:00"
},
"end_time': {
"$reql_type$": "TIME",
"epoch_time": 1442658600,
"timezone": "+00:00"
}
}
My same caveat for timezones applies here as well.