sort data in doc elasticsearch - sorting

I have a result of elascticsearch(6.8) query like this:
{
'popular':10,
'sales' :
[
'price' : 2,
' price' : 1
]
},
{'popular' :5,
'sales' :
['price' : 4,
'price' : 3]}
}
it sort by popular desc/
And I want to sort prices in sale asc.
I used sort by popular desc and salas.price asc, but in not that result for me
Good result for me
{
'popular':10,
'sales' :
[
'price' : 1,
' price' : 2
]
},
{'popular' :5,
'sales' :
['price' : 3,
'price' : 4]}
}
Can I do it's in elasticsearch?

I think we can't do this. Because elastic send docs, and can sort docs, not data in doc

Related

Trino count query for array

I have a JSON value for a column which is array of objects. Here, my requirement is to find the count of objects matching the filter
Row 1: Coumn 1
{
"createdBy": 2,
"teams": [
{
"companyId" : 1,
"teamId": 1
},
{
"companyId" : 1,
"teamId": 2
}
]
}
Row 2: Coumn 1
{
"createdBy": 2,
"teams": [
{
"companyId" : 1,
"teamId": 3
},
{
"companyId" : 1,
"teamId": 4
}
]
}
Here, companyId 1 presents in 4 places and i need the query to get the count.
Query tried,
select count(1) from a where any_match(
split(
trim(
'[]'
FROM
json_query(
a.teams,
'lax $.companyId' WITH ARRAY WRAPPER
)
),
',"'
),
x -> trim(
'"'
FROM
x
) = 2
)
Here it returns 1 because of any_match. Not sure how to get the size.

Laravel : How get count from two table?

I have two table.
the first table:
- name_id
- name
the second table:
- counter_id
- name_id
- counter
and I made a button count (1) with a click . mean if I click to sum it inserts '1' in counter DB
and finally, I want the result as
peter 4
sam 3
my code is
$data = DB::table('name')
->join('counters', 'counters.name_id', '=', 'name.name_id')
->select('name.name', 'counters.name_id')
->get();
return $data;
and result now
[
{
"name": "peter ",
"name_id": 1
},
{
"name": "peter ",
"name_id": 1
},
{
"name": "peter ",
"name_id": 1
},
{
"name": "sam",
"name_id": 2
}
]
You can do something like this by using collection on laravel.
$data = [
[
"name" => "peter ",
"name_id" => 1
],
[
"name" => "peter ",
"name_id" => 1
],
[
"name" => "peter ",
"name_id" => 1
],
[
"name" => "sam",
"name_id" => 2
]
];
collect($data)->groupBy('name_id')->map(function($items) {
return [
'name' => $items[0]['name'],
'total' => count($items)
];
})
I solved in this way
->join('counters', 'counters.name_id', '=', 'name.name_id')
->select('name.name', DB::raw('Count(counters.name_id)'))
->groupBy('name.name')
->get();
```

Nested document to elasticsearch using logstash

Hi All i am trying to index the documents from MSSQL server to elasticsearch using logstash. I wanted my documents to ingest as nested documents but i am getting aggregate exception error
Here i place all my code
Create table department(
ID Int identity(1,1) not null,
Name varchar(100)
)
Insert into department(Name)
Select 'IT Application development'
union all
Select 'HR & Marketing'
Create table Employee(
ID Int identity(1,1) not null,
emp_Name varchar(100),
dept_Id int
)
Insert into Employee(emp_Name,dept_Id)
Select 'Mohan',1
union all
Select 'parthi',1
union all
Select 'vignesh',1
Insert into Employee(emp_Name,dept_Id)
Select 'Suresh',2
union all
Select 'Jithesh',2
union all
Select 'Venkat',2
Final select statement
SELECT
De.id AS id,De.name AS deptname,Emp.id AS empid,Emp.emp_name AS empname
FROM department De LEFT JOIN employee Emp ON De.id = Emp.dept_Id
ORDER BY De.id
Result should be like this
My elastic search mapping
PUT /departments
{
"mappings": {
"properties": {
"id":{
"type":"integer"
},
"deptname":{
"type":"text"
},
"employee_details":{
"type": "nested",
"properties": {
"empid":{
"type":"integer"
},
"empname":{
"type":"text"
}
}
}
}
}
}
My logstash config file
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://EC2AMAZ-J90JR4A\SQLEXPRESS:1433;databaseName=xxxx;"
jdbc_user => "xxxx"
jdbc_password => "xxxx"
statement => "SELECT
De.id AS id,De.name AS deptname,Emp.id AS empid,Emp.emp_name AS empname
FROM department De LEFT JOIN employee Emp ON De.id = Emp.dept_Id
ORDER BY De.id"
}
}
filter{
aggregate {
task_id => "%{id}"
code => "
map['id'] = event['id']
map['deptname'] = event['deptname']
map['employee_details'] ||= []
map['employee_details'] << {'empId' => event['empid'], 'empname' => event['empname'] }
"
push_previous_map_as_event => true
timeout => 5
timeout_tags => ['aggregated']
}
}
output{
stdout{ codec => rubydebug }
elasticsearch{
hosts => "https://d9bc7cbca5ec49ea96a6ea683f70caca.eastus2.azure.elastic-cloud.com:4567"
user => "elastic"
password => "****"
index => "departments"
action => "index"
document_type => "departments"
document_id => "%{id}"
}
}
while running logstash i am getting below error
Elastic search scrrenshot for reference
my elasticsearch output should be something like this
{
"took" : 398,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "departments",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 1,
"deptname" : "IT Application development"
"employee_details" : [
{
"empid" : 1,
"empname" : "Mohan"
},
{
"empid" : 2,
"empname" : "Parthi"
},
{
"empid" : 3,
"empname" : "Vignesh"
}
]
}
}
]
}
}
Could any one please help me to resolve this issue? i want empname and empid of all the employees should get inserted as nested document for respective department. Thanks in advance
Instead of aggregate filter i used JDBC_STREAMING it is working fine might be helpful to some one looking at this post.
input {
jdbc {
jdbc_driver_library => "D:/Users/xxxx/Desktop/driver/mssql-jdbc-7.4.1.jre12-shaded.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://EC2AMAZ-J90JR4A\SQLEXPRESS:1433;databaseName=xxx;"
jdbc_user => "xxx"
jdbc_password => "xxxx"
statement => "Select Policyholdername,Age,Policynumber,Dob,Client_Address,is_active from policy"
}
}
filter{
jdbc_streaming {
jdbc_driver_library => "D:/Users/xxxx/Desktop/driver/mssql-jdbc-7.4.1.jre12-shaded.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://EC2AMAZ-J90JR4A\SQLEXPRESS:1433;databaseName=xxxx;"
jdbc_user => "xxxx"
jdbc_password => "xxxx"
statement => "select claimnumber,claimtype,is_active from claim where policynumber = :policynumber"
parameters => {"policynumber" => "policynumber"}
target => "claim_details"
}
}
output {
elasticsearch {
hosts => "https://e5a4a4a4de7940d9b12674d62eac9762.eastus2.azure.elastic-cloud.com:9243"
user => "elastic"
password => "xxxx"
index => "xxxx"
action => "index"
document_type => "_doc"
document_id => "%{policynumber}"
}
stdout { codec => rubydebug }
}
You can also try to make use of aggregate in logstash filter plugin. Check this
Inserting Nested Objects using Logstash
https://xyzcoder.github.io/2020/07/29/indexing-documents-using-logstash-and-python.html
I am just showing a single object but we can also have multiple arrays of items
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/javalib/mssql-jdbc-8.2.2.jre11.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://host.docker.internal;database=StackOverflow2010;user=pavan;password=pavankumar#123"
jdbc_user => "pavan"
jdbc_password => "pavankumar#123"
statement => "select top 500 p.Id as PostId,p.AcceptedAnswerId,p.AnswerCount,p.Body,u.Id as userid,u.DisplayName,u.Location
from StackOverflow2010.dbo.Posts p inner join StackOverflow2010.dbo.Users u
on p.OwnerUserId=u.Id"
}
}
filter {
aggregate {
task_id => "%{postid}"
code => "
map['postid'] = event.get('postid')
map['accepted_answer_id'] = event.get('acceptedanswerid')
map['answer_count'] = event.get('answercount')
map['body'] = event.get('body')
map['user'] = {
'id' => event.get('userid'),
'displayname' => event.get('displayname'),
'location' => event.get('location')
}
event.cancel()"
push_previous_map_as_event => true
timeout => 30
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200", "http://elasticsearch:9200"]
index => "stackoverflow_top"
}
stdout {
codec => rubydebug
}
}
So in that example, I am having multiple ways of inserting data like aggregate, JDBC streaming and other scenarios

Filter on values present as a list of objects in elastic search

I am stuck a place where I am not able to construct a query that will help me query a particular field which is actually a list of objects.
Sample document:
"sample_object" : {
"name" : "TRS",
"name_number" : 2096873,
"dob" : "2011-02-09",
"sample_nested_object_1" : {
"id" : 6,
"name" : "example name"
},
"sample_nested_object_list_1" : [
{
"id" : 15,
"number" : 12
},
{
"id" : 18,
"number" : 15
}
],
"sample_nested_object_list_2" : [
{
"id" : 2958179,
"name" : "example name 1",
"type" : "example type"
},
{
"id" : 2958180,
"name" : "example name 2",
"type" : "example type"
}
],
"sample_nested_object_2" : {
"id" : 4,
"name" : "sample name"
}
}
I have successfully made queries that filter according to name, name_number, dob.
I have to query the data set according to the filters applied in the front end. So I would receive a list of id(s) for sample_nested_object_list_1 and have to find the sample_object(s) which have sample_nested_object_list_1 with the id(s) provided. Also have to do the same for sample_nested_object.
UPDATE:
Index mapping:
sample_object= {
'name': StringField(),
'name_number': IntegerField(),
'dob': DateField(),
'sample_nested_object_1': ObjectField(
properties={
'id': IntegerField(),
'name': StringField(),
}
),
'sample_nested_object_list_1': NestedField(
properties={
'id': IntegerField(),
'number': IntegerField(),
}
),
'sample_nested_object_list_2': NestedField(
properties={
'id': IntegerField(),
'name': StringField(),
'type': StringField(),
}
),
'sample_nested_object_2': ObjectField(
properties={
'id': IntegerField(),
'name': StringField(),
}
),
}
Query:
I am unable to think of how to construct a query if I wanted to retrieve a list of sample_object(s), of which have sample_nested_object_list_1 with id as 15, 16 and also sample_nested_object_1 with an id of 6.

Ruby MongodB - improving speed when working with multiple collections

I'm using MongoDB with Ruby using mongo gem.
I have the following scenario:
for each document in a collection say coll1, look at key1 and key2
search for document in another collection say coll2 with matching values for key1 and key2
if there is a match, add document fetched in #2 with a new key key3 whose value be set to value of key3 in the document referenced in #1
insert the updated hash into a new collection coll3
The general guideline with MongoDB has been to handle cross collection operations in application code.
So I do the following:
client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db,
:server_selection_timeout => 5)
cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id
cursor.each do |doc|
doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method
if(doc_coll2 && doc[:key3])
doc_coll2[:key3] = doc[:key3]
doc_coll2.delete(:_id) # remove key :_id
client[:coll3].insert_one(doc_coll2)
end
end
This works, but it takes a lot of time to finish this job - approximately 250ms per document in collection coll1 or 3600s (1 hour) for ~15000 records, which seems a lot, which could be associated with reading the document one at a time, do the check in app code and then writing one doc at a time back to a new collection.
Is there a way to get this operation be done faster? Is the way I'm doing even the right way to do it?
Example documents
coll1
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876"
}
coll2
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
coll3
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
A solution would be to use aggregation instead, and do this in one single query:
perform a join on key1 field with $lookup
unwind the array with $unwind
keep doc where coll1.key2 == coll2.key2 with $redact
reformat the document with $project
write it to coll3 with $out
so the query would be :
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{
$project: {
key1: 1,
key2: 1,
key3: 1,
key4: "$coll2_doc.key4",
key5: "$coll2_doc.key5",
key6: "$coll2_doc.key6",
key7: "$coll2_doc.key7",
key8: "$coll2_doc.key8",
key9: "$coll2_doc.key9",
}
},
{$out: "coll3"}
], {allowDiskUse: true} );
and db.coll3.find() would return
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
Edit: MongoDB 3.4 solution
If you don't want to specify all keys in the $project stage, you can take advantage of $addFields and $replaceRoot, two new operators introduced in MongoDB 3.4
the query would become:
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{$addFields: {"coll2_doc.key3": "$key3" }},
{$replaceRoot: {newRoot: "$coll2_doc"}},
{$out: "coll3"}
], {allowDiskUse: true})
After toying around this for sometime, realized that index were not added. Adding index reduces the query run time by orders of magnitude.
To add index, do the following.
db.coll1.ensureIndex({"key1": 1, "key2": 1});
db.coll2.ensureIndex({"key1": 1, "key2": 1});
Using index the overall query run time came to 1/10xxxxxxth of what it was earlier.
The learning is that while working with large data sets, index the fields used for find - that itself reduces query run time a lot.

Resources