Elasticsearch Scripting: updating every array element - elasticsearch

es version 5.4.1
I have a document like that
{'_id': 'AWUe4rSpgJ6eZRtHLwRC',
'_index': 'test',
'_score': 1,
'_source': {'data': [1, 2, 3, 4]},
'_type': 'test'}
and I am trying to add a number to all elements in data.
what I try is :
{
'query':{
'match_all':{}
},
'script':{
'inline': "for (int i=0;i<ctx._source.data.size();i++) {ctx._source.data[i]=ctx._source.data[i] + 1000000};",
'lang':'painless'
}
}
but I got
{'reason': "unexpected token [';'] was expecting one of [<EOF>]."}
I am new to painless. How can I make it current?

Related

How to return latest distinct rows in elasticsearch ignoring a timestamp field

I have documents like:
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 1}
Which I search for via _search?from=0&size=20&sort=timestamp%3Adesc
I would like to now search for just the latest distinct row - e.g:
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}
But I would like to do this without explicitly indicating the foo, bar, fields as there could be a lot and are not consistently there - the timestamp field however is consistent.
I have found a sollution where I create a hash field of all the fields apart from the timestamp before storing in the document. Then I use the collapse functionality in opensearch - the hits will then return the latest distinct hash.
GET /.../_search?sort=timestamp%3Adesc
{
"collapse": {
"field": "hash",
"inner_hits": {
"name": "",
"size": 0,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}

How use whereIn in laravel for get exactly all of array condition items

hi I want to use whereIn to get data from a table :
$values = DB::table('attribute_product')->whereIn('value_id' , [1,5])->get();
but I want to get columns that have all [1,5] items not just one of the array item
my table data :
{
"attribute_id": 1,
"product_id": 1,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 1,
"value_id": 2
},
{
"attribute_id": 13,
"product_id": 1,
"value_id": 3
},
{
"attribute_id": 14,
"product_id": 1,
"value_id": 4
},
{
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
{
"attribute_id": 13,
"product_id": 8,
"value_id": 10
},
{
"attribute_id": 14,
"product_id": 8,
"value_id": 11
}
I want just return that have both value_ids [1,5]:
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
but that code I wrote above returns:
{
"attribute_id": 1,
"product_id": 1,
"value_id": 1
},
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
This should work:
$values = [1, 5];
$filtered = DB::table('attribute_product')
->whereIn('value_id', $values)
->get()
->groupBy('product_id')
->filter(function ($product) use ($values) {
return $product->pluck('value_id')
->intersect($values)
->count() === count($values)
})
->flatten();
PS: I don't like this solution too much since it does the calculation in memory. You should make use of relationships to do this at database level.
You can Use Laravel groupBy
$values = DB::table('attribute_product')->orderBy('product_id', 'desc')->whereIn('value_id', [1, 5])->groupBy('value_id')->get();
I recreated the database sample you shared:
INSERT INTO
attribute_product(attribute_id, product_id, value_id)
VALUES
(1, 1, 1), (12, 1, 2), (13, 1, 3), (14, 1, 4), (1, 8, 1), (12, 8, 5), (13, 8, 10), (14, 8, 11);
Came out with this raw query:
SELECT
attribute_product.*,
SUM(value_id) as filter
FROM
`attribute_product`
WHERE
value_id IN(1, 5)
GROUP BY
product_id
HAVING
filter = 6;
Then built the query with the Illuminate\Database\Query\Builder
$filter = [1, 5];
DB::table('attribute_product')
->groupBy('product_id')
->select([
'attribute_product.*',
DB::raw("SUM(value_id) as filter"),
])
->whereIn('value_id', $filter)
->having('filter', '=', array_sum($filter))
->get();
This solution gets completely managed by the database engine which avoid your server the load of manipulating Collections.
Opinion
I feel that this is a tricky way to reach your goal, which imply for me that your database design doesn't fit very well the business logic/use cases.
I think that a good database design helps doing complex data retrieve with simple queries (Using joins of course) or intuitive Eloquent Relationships
OLD ANSWER
$filter_value_id = [1, 5];
$values_by_product = DB::table('attribute_product')
->whereIn('value_id', $filter_value_id)
->get()
->groupBy('product_id');
foreach ($values_by_product as $product => $value) {
echo "product id: $product<br>";
if ($value->count() === sizeof($filter_value_id))
dump($value);
}

cannot agregate in elasticsearch

I have a service with logs in elasticsearch. I want to get users who have used my service.
Detailed log lines were returned on my request, but I want to get a unique "kubernetes.pod_name":
{
"size": 10000,
"_source": ["kubernetes.pod_name"],
"query": {"bool": {"filter": [
{"match": {"kubernetes.labels.app" : "jupyterhub"}},
{"match_phrase": {"log": "200 GET"}}
]}},
"aggs": {"pods": {"terms": {"field": "kubernetes.pod_name"}}}
}
why aren't the log lines grouped in the "aggs" section? What to do to get unique users?
Upd:
my query return:
{'took': 614,
'timed_out': False,
'_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
'hits': {'total': 17703,
'max_score': 0.0,
'hits': [{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': 'vQ6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-lyisova-2evg'}}},
{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': 'xA6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-lyisova-2evg'}}},
{'_index': 'dwh-dev-2020-10-14',
'_type': 'container_log',
'_id': '6g6vJHUBU_u817onY-cZ',
'_score': 0.0,
'_source': {'kubernetes': {'pod_name': 'jupyter-bogdanov'}}},
...
I want to get 20 lines instead of 17703 where each line corresponds to a unique "kubernetes.pod_name"
You can merge between terms aggregation and filter aggregation
{
"aggs": {
"labels_filter": {
"filter": [
{
"match": {
"kubernetes.labels.app": "jupyterhub"
}
},
{
"match_phrase": {
"log": "200 GET"
}
}
],
"aggs": {
"pods": {
"terms": {
"field": "kubernetes.pod_name"
}
}
}
}
}
}

Find the keywords that matched the query in Elastic Search

I'm using Elastic Search to search several indices.
When the user performs a query, the matches are split between 1 or 2 keywords that yield results. I'd like to be able to know for every hit, which keyword it originated from.
So if the user searched for "ventolin for asthma", I'd like to know which hits are for "ventolin" and which are for "asthma".
That is, for this query:
{
'query': {
'multi_match': {
'query': 'ventolin for asthma',
'fuzziness': 2,
'prefix_length': 1,
'type': 'best_fields',
'fields': ['term*']
}
}
}
And these hits:
{
...
'hits': {
'total': {
'value': 287,
'relation': 'eq'
},
'max_score': 10.301256,
'hits': [{
'_index': 'normalized-term-mapping',
'_type': '_doc',
'_id': '194526',
'_score': 10.301256,
'_source': {
'term': 'Ventolin Evohaler',
...
}
}, {
'_index': 'normalized-term-mapping',
'_type': '_doc',
'_id': '194362',
'_score': 8.529675,
'_source': {
'term': 'Childhood Asthma',
...
}
},
...
]
}
}
I want to match the first hit with the keyword Ventolin and the second hit with Asthma.
Note that:
I use fuzziness == 2, so the keywords may not exactly match the hit term
The indices use an analyzer (not a complex one but not trivial)
I can try and write code to match the terms with the query, but that would effectively mean reimplementing the elastic analysis in code which is not a great solution.
Is there a way to get the matched term from the original query from Elastic?
Yes, there is a way to get the matched terms using the Highlight API.
You're using a multi_match query so the default highlight options may be sufficient for you. You do need to specify the fields you want to highlight with something like this:
{
'query': {
'multi_match': {
'query': 'ventolin for asthma',
'fuzziness': 2,
'prefix_length': 1,
'type': 'best_fields',
'fields': ['term*']
}
},
'highlight': {
'fields': {
'term*': {}
}
}
}
However, this won't return an array of matched items. Instead, you will get the fields with existing matches marked (usually with HTML, but you can customize it). You could use that markup to post-process and isolate the individual matches if you need them.

elasticsearch response hits is not showing up

I am utilizing elasticsearch and after running a search, this is the response I get
{'took': 7, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': 1, 'max_score': 0.2876821, 'hits': []}}
My question is why is hits.total = 1 but hits.hits is empty?
Here is the query I used:
"query": {
"bool": {
"must": [
{"match_phrase": { "theName": "bill" } }
]
}
}
I know data exists in my node because when I did the search below (with the same url + index + type in the post request), I got hits.hits to be filled with the result.
"query" : {
"match_all" : {}
}
I had from = 40 that was causing the issue.

Resources