How Group By Queries in Elasticsearch? - elasticsearch

I am working on a project where I would like to query the wikipedia database so I could bring some results ... I am using a Javascript application and Jquery with a self suggestion field as the user types a list of words should appear ....
For this to happen I need to create a query in elasticsearch by bringing all categories and only them into the results.
Type a SQL = SELECT Field FROM Table Group By Field ..
How would I do this?
My Test is basic and return all objects....
GET _search
{
"query": {
"query_string": {
"fields": ["category"],
"query": "*"
}
}
}
Part of result is:
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2117924,
"max_score": 1,
"hits": [
{
"_index": "ptwikionary_content",
"_type": "page",
"_id": "41115",
"_score": 1,
"_source": {
"redirect": [],
"template": [
"Predefinição:-pt-",
"Predefinição:cabeçalho-idioma",
"Predefinição:nome categoria",
"Predefinição:pt",
"Predefinição:nome língua",
"Predefinição:flex.pt",
"Predefinição:link preto",
"Predefinição:paroxítona",
"Predefinição:gramática",
"Predefinição:gramática/core",
"Predefinição:etimologia",
"Predefinição:escopo",
"MediaWiki:Log"
],
"content_model": "wikitext",
"heading": [
"Português",
"Adjetivo",
"Antônimo",
"Etimologia"
],
"source_text": """
={{-pt-}}=
==Adjetivo==
{{flex.pt|s=desumilde|p=desumildes}}
{{paroxítona|de|su|mil|de}}, {{gramática|c2g}}
# que não é [[humilde]]
===Antônimo===
* [[humilde]]
=={{etimologia|pt}}==
: {{escopo|Morfologia}} [[des-]] + [[humilde]].
[[Categoria:Adjetivo (Português)]]
""",
"version_type": "external",
"wiki": "ptwiktionary",
"auxiliary_text": [
" Singular Plural Masculino desumilde desumildes Feminino"
],
"language": "pt",
"title": "desumilde",
"version": 2491983,
"external_link": [],
"namespace_text": "",
"namespace": 0,
"text_bytes": 274,
"incoming_links": 2,
"text": "de.su.mil.de, comum aos dois géneros que não é humilde humilde (Morfologia) des- + humilde.",
"category": [
"!Entrada (Português)",
"Polissílabo (Português)",
"Paroxítona (Português)",
"Entrada com etimologia (Português)",
"Adjetivo (Português)"
],
"defaultsort": false,
"outgoing_link": [
"desumilde",
"desumildes",
"des-",
"feminino",
"humilde",
"masculino",
"plural",
"português",
"singular",
"Categoria:Português"
],
"timestamp": "2018-03-29T21:53:29Z",
"popularity_score": 0.000065252908710355
}
},
{
How i Get category list from elasticsearch??? Type Sql :

Use aggregation for this. Here's an example:
GET _search
{
"aggs" : {
"categories" : {
"terms" : { "field" : "category" }
}
}
}

Related

How to get inner hits field values in Nest or Elastic.Net library ? Alterantivly how to specify output type in Nest or Elastic.Net library?

I am new to elasticsearch and I am having troubles with the Nest/Elastic.Net library.
I would like to retrieve not the entire document but just part of it. I am able to do it in Postman but I cannot do it via Elastic.Net library or Nest library.
Document structure looks like following
{
“Doc_id”: “id_for_cross_refference_with_othersystem”
“Ocr”:[
{
“word”: “example_word1”,
“box”: [],
“cord”: “some_number”,
},
{
“word”: “example_word2”,
“box”: [],
“cord”: “some_number2”,
}
]
}
The document has a huge amount of properties but I am interested only in Doc_id , ocr.word, ocr.box and ocr.cord.
The following postman request fully satisfies my needs :
{
"query": {
"bool": {
"must": [
{
"match": {
"doc_id": "2a558865-7dc2-4e4d-ad02-3f683159984e"
}
},
{
"nested": {
"path": "ocr",
"query": {
"match": {
"ocr.word": "signing"
}
},
"inner_hits": {
"_source": {
"includes":[
"ocr.word",
"ocr.box",
"ocr.conf"
]
}
}
}
}
]
}
},
"_source":"false"
}
Result of that request is following :
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 18.99095,
"hits": [
{
"_index": "irrelevant",
"_type": "irrelevant",
"_id": "irrelevant",
"_score": 18.99095,
"_source": {},
"inner_hits": {
"ocr": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 7.9260864,
"hits": [
{
"_index": "irrelevant",
"_type": "irrelevant",
"_id": "irrelevant",
"_nested": {
"field": "ocr",
"offset": 11
},
"_score": 7.9260864,
"_source": {
"box": [
],
"conf": "96.452858",
"word": "signing"
}
}
]
}
}
}
},
{
"_index": "there_rest _of_object_is_ommited",
},
{
"_index": "there_rest _of_object_is_ommited",
}
]
}
}
However when I try to convert that request to Nest Query DSL I am not able to achieve the same result.
When I try to use the NEST library I don’t see any way to provide output result model/type. It looks like the Type of Document should match the output type which is not my case.
Query that I am using :
var searchResponse = client2.Search<Model>(s => s
.Query(q1 => q1.Bool(b1 => b1.Must(s1 =>
s1.Match(m => m.Field(f => f.doc_id).Query("2a558865-7dc2-4e4d-ad02-3f683159984e")),
s2 => s2.Nested(n => n.Path("ocr").Query(q2 => q2.Bool(b => b.Must(m => m.Match(m => m.Field(f => f.ocr.First().word).Query("signing")))))
.InnerHits(ih => ih.Source(s => s.Includes(i => i.Field(f => f.ocr.First().word).Field(f => f.ocr.First().conf))))
)
)))
.Source(false)
);
Due to the fact that the Model type is created for a document and it doesn’t match the output type I am getting [null, null, null] as the output .
There is property such properties as Hits in ISearchResponse? But when I look into it I cannot see values of fields.
I tried using a low level client (Elastic.Net) and providing json request as a string. But It looks like there is not way of specifying the output type either. When I ran my code with the low level library it returns me 3 object of class Model with empty fields.
My questions are :
Is it possible to specify output type different from document type for Nest query DSL or Elatic.Net library ?
Is it possible to get values of the fields that I specified in request for inner hits with help of Nest or Elastic.Net libraries?
How would you solve such problem ? I mean we have huge documents and we don’t want to pass unnecessary information back and forth. The inner hits approach looks like a neat solution for us but it doesn’t look like it works with the recommended libraries Unless I am doing some silly mistake.
NOTE: I can achieve desired result with simple use of HTTPClient and manually doing what I need , but I hope to leverage library that is written for this purpose(Nest or Elastic.Net).

Error while remote indexing with elasticsearch

I'am trying to move from an ES cluster to another, in order to plan an update.
The both are same version (6.4). To achieve this, i'am using this command :
curl -XPOST -H "Content-Type: application/json" http://new_cluster/_reindex -d#reindex.json
And the reindex.json, is looking like this :
{
"source": {
"remote": {
"host": "http://old_cluster:9199"
},
"index": "megabase.33.2",
"query": {
"match_all": {}
}
},
"dest": {
"index": "megabase.33.2"
}
}
I whitelisted one the new cluster the old cluster, and its works but i can't go to the end of the migration of data, because i have this error, and i don't understand what it means here :
{
"took":1762,
"timed_out":false,
"total":8263428,
"updated":5998,
"created":5001,
"deleted":0,
"batches":11,
"version_conflicts":0,
"noops":0,
"retries":{
"bulk":0,
"search":0
},
"throttled_millis":0,
"requests_per_second":-1.0,
"throttled_until_millis":0,
"failures":[
{
"index":"megabase.33.2",
"type":"persona",
"id":"noYOA3IBTWbNbLJUqk6T",
"cause":{
"type":"mapper_parsing_exception",
"reason":"failed to parse [adr_inse]",
"caused_by":{
"type":"illegal_argument_exception",
"reason":"For input string: \"2A004\""
}
},
"status":400
}
]
}
The record in the original cluster look like this :
{
"took": 993,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": [
{
"_index": "megabase.33.2",
"_type": "persona",
"_id": "noYOA3IBTWbNbLJUqk6T",
"_score": 0,
"_source": {
"address": "Obfucated",
"adr_inse": "2A004",
"age": 10,
"base": "Obfucated",
"city": "Obfucated",
"cp": 20167,
"email_md5": "Obfucated",
"fraicheur": "2020-01-12T19:39:04+01:00",
"group": 1,
"latlon": "Obfucated",
"partner": "Obfucated",
"partnerbase": 2,
"sex": 2,
"sms_md5": "Obfucated"
}
}
]
}
}
Any clue on what i'am doing wrong ?
Thanks a lot
Found out, the mapping is not well created when using only the reindex method.
So i drop the new indice, recreate mapping using elasticdump :
elasticdump --input=http://oldcluster/megabase.33.2 --output=http://newcluster/megabase.33.2 --type=mapping
Then run the previous script, everything works flawless (and was rather quick)

Generating data tables in elastic search

I'm trying to make a data table which consists of some calculations
******************************************************
** Bidder * Request * CPM * Revenue * Response Time **
******************************************************
I've created an index which holds all the data, so my data is stored in following format:
{
"data": {
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 78,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "nits_media_bid_won",
"_type": "nits_media_data_collection",
"_id": "MIyt6m8BWa2IbVphmPUh",
"_score": 1,
"_source": {
"bidderCode": "appnexus",
"width": 300,
"height": 600,
"statusMessage": "Bid available",
"adId": "43d59b34fd61b5",
"requestId": "2c6d19dcc536c3",
"mediaType": "banner",
"source": "client",
"cpm": 0.5,
"creativeId": 98493581,
"currency": "USD",
"netRevenue": true,
"ttl": 300,
"adUnitCode": "/19968336/header-bid-tag-0",
"appnexus": {
"buyerMemberId": 9325
},
"meta": {
"advertiserId": 2529885
},
"originalCpm": 0.5,
"originalCurrency": "USD",
"auctionId": "a628c0c0-bd4d-4f2a-9011-82fab780910e",
"responseTimestamp": 1580190231422,
"requestTimestamp": 1580190231022,
"bidder": "appnexus",
"timeToRespond": 400,
"pbLg": "0.50",
"pbMg": "0.50",
"pbHg": "0.50",
"pbAg": "0.50",
"pbDg": "0.50",
"pbCg": null,
"size": "300x600",
"adserverTargeting": {
"hb_bidder": "appnexus",
"hb_adid": "43d59b34fd61b5",
"hb_pb": "0.50",
"hb_size": "300x600",
"hb_source": "client",
"hb_format": "banner"
},
"status": "rendered",
"params": [
{
"placementId": 13144370
}
],
"nits_account": "asjdfagsd2384vasgd19",
"nits_url": "http://nitsmedia.local/run-ad",
"session_id": "YTGpETKSk2nHwLRB6GbP",
"timestamp": "2020-01-28T05:43:51.702Z",
"geo_data": {
"continent": "North America",
"address_format": "{{recipient}}\n{{street}}\n{{city}} {{region_short}} {{postalcode}}\n{{country}}",
"alpha2": "US",
"alpha3": "USA",
"country_code": "1",
"international_prefix": "011",
"ioc": "USA",
"gec": "US",
"name": "United States of America",
"national_destination_code_lengths": [
3
],
"national_number_lengths": [
10
],
"national_prefix": "1",
"number": "840",
"region": "Americas",
"subregion": "Northern America",
"world_region": "AMER",
"un_locode": "US",
"nationality": "American",
"postal_code": true,
"unofficial_names": [
"United States",
"Vereinigte Staaten von Amerika",
"États-Unis",
"Estados Unidos",
"アメリカ合衆国",
"Verenigde Staten"
],
"languages_official": [
"en"
],
"languages_spoken": [
"en"
],
"geo": {
"latitude": 37.09024000000000143018041853792965412139892578125,
"latitude_dec": "39.44325637817383",
"longitude": -95.7128909999999990532160154543817043304443359375,
"longitude_dec": "-98.95733642578125",
"max_latitude": 71.5388001000000031126546673476696014404296875,
"max_longitude": -66.8854170000000038953658076934516429901123046875,
"min_latitude": 18.77629999999999910187398199923336505889892578125,
"min_longitude": 170.595699999999993679011822678148746490478515625,
"bounds": {
"northeast": {
"lat": 71.5388001000000031126546673476696014404296875,
"lng": -66.8854170000000038953658076934516429901123046875
},
"southwest": {
"lat": 18.77629999999999910187398199923336505889892578125,
"lng": 170.595699999999993679011822678148746490478515625
}
}
},
"currency_code": "USD",
"start_of_week": "sunday"
}
}
},
//Remaining data set....
]
},
}
}
So as per my data set I want to fetch all unique bidderCode (which will be represented as Bidder in the table) and make the data with calculation respective to it. For example
Request - This will be total number of docs count in aggregation
CPM - CPM will be sum of all CPM divided by 1000
Revenue - Total CPM multiplied by 1000
Response time - Average of (responseTimestamp - requestTimestamp)
How can I achieve this, I'm bit confused with it. I tried building the blocks by:
return $this->elasticsearch->search([
'index' => 'nits_media_bid_won',
'body' => [
'query' => $query,
'aggs' => [
'unique_bidders' => [
'terms' => ['field' => 'bidderCode.keyword']
],
'aggs' => [
'sum' => [
'cpm' => [
'field' => 'cpm',
'script' => '_value / 1000'
]
]
],
]
]
]);
But it is showing me error:
{
"error":{
"root_cause":[
{
"type":"x_content_parse_exception",
"reason":"[1:112] [sum] unknown field [cpm], parser not found"
}
],
"type":"x_content_parse_exception",
"reason":"[1:112] [sum] unknown field [cpm], parser not found"
},
"status":400
}
I'm new to this help me out in it. Thanks.
ElasticSearch isn't wrong -- you've swapped the aggregation name with its type. It cannot parse the agg type cpm.
Here's the corrected query:
GET nits_media_bid_won/_search
{
"size": 0,
"aggs": {
"unique_bidders": {
"terms": {
"field": "bidderCode.keyword",
"size": 10
},
"aggs": {
"cpm": { <----------
"sum": { <----------
"field": "cpm",
"script": "_value / 1000"
}
}
}
}
}
}

How to save debug json to database in laravel?

I make a rest API with auth with api_token, and I use debugger by Lanin to show when the error happened.
If api_token is wrong the debug will show something like below
{
"error": "Unauthenticated.",
"debug": {
"database": {
"total": 1,
"items": [
{
"connection": "mysql",
"query": "select * from `users` where `api_token` = '630dcedd96aa370891b1649a972b577b2f262c90093b1903ae0a2ea63cf2c07cs' limit 1;",
"time": 2.89
}
]
},
"cache": {
"hit": {
"keys": [
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b"
],
"total": 1
},
"miss": {
"keys": [
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b",
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b:timer",
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b"
],
"total": 3
},
"write": {
"keys": [
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b:timer",
"424f74a6a7ed4d4ed4761507ebcd209a6ef0937b"
],
"total": 2
},
"forget": {
"keys": [],
"total": 0
}
},
"profiling": [
{
"event": "request-time",
"time": 0.2898838520050049
}
],
"memory": {
"usage": 11831808,
"peak": 11896088
}
}
}
and I want to save error log to the database. How to save and read data if I want to save this to the table database? I want to save the query
"debug": {
"database": {
"total": 1,
"items": [
{
"connection": "mysql",
**"query": "select * from `users` where `api_token` = '630dcedd96aa370891b1649a972b577b2f262c90093b1903ae0a2ea63cf2c07cs' limit 1;",
"time": 2.89**
}
]
},
With an appropriate model (Debug with "connection", "query" and "time" field), you could decode the json into an array and create the model from that.
Something like this :
$jsonArray = json_decode($rawJson, true);
$debug = Debug:create($jsonArray);

No results from search when passing more than one parameter in user metadata

I want to apply document level security in elastic, but once I provide more than one value in user metadata I get no matches.
I am creating a role and a user in elastic and passing values inside user metadata to the role on whose basis the search should happen. It works fine if I give one value.
For creating role:
PUT _xpack/security/role/my_policy
{
"indices": [{
"names": ["my_index"],
"privileges": ["read"],
"query": {
"template": {
"source": "{\"bool\": {\"filter\": [{\"terms_set\": {\"country_name\": {\"terms\": {{#toJson}}_user.metadata.country_name{{/toJson}},\"minimum_should_match_script\":{\"source\":\"params.num_terms\"}}}}]}}"
}
}
}]
}
And for user:
PUT _xpack/security/user/jack_black
{
"username": "jack_black",
"password":"testtest",
"roles": ["my_policy"],
"full_name": "Jack Black"
"email": "jb#tenaciousd.com",
"metadata": {
"country_name": ["india" , "japan"]
}
}
I expect the output to be results for india and japan only. If the user searches for anything else they should get no results.
However, I do not see any results at all:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

Resources