ElasticSearch URI Search null field - elasticsearch

I need to create a query via URI to filter all data between two dates and also if this date field is null.
For example:
I have the field "creation_date" in some objects, however I want that in the resulting also does not appear the objects that the field does not have.
I tried something similar below:
http://localhost//elasticsearch/channels/channel/_search?q=channel.schedule.creation_date:[2018-06-19 TO 2018-12-22] OR channel.schedule.creation_date: NULL
As far as comparing the dates is OK, it works. The problem is to get the NULL values.
Edited
Source sample:
"_source": {
"channel": {
"activated": false,
"approved": false,
"content": "Jvjv",
"creation_date": "2018-06-21T13:06:10.000Z",
"facebookLink": "J jv",
"id": "Kvjvjv",
"instagramId": "Jvjv",
"name": "Kbkbkvk",
"ownerId": "sZtxdhiNbNY9sr2DtiCzlgJfsqb2",
"plan": 0,
"purpose": "Jvjv",
"recurrence": 1,
"segment": "Jvjvjv",
"twitterId": "Jvjv",
"youtubeId": "Jvj"
}
}
}

You can do this using the NOT(_exists_:field_name) constraint:
Can you try this ?
http://localhost//elasticsearch/channels/channel/_search?q=channel.schedule.creation_date:[2018-06-19 TO 2018-12-22] OR NOT(_exists_:channel.schedule.creation_date)

Related

Is it possible to use cockroach gen_random_uuid() function inside JSON data while inserting into JSON datatype in cockroachDB

I am new to cockroach DB and was wondering if the below ask is possible
One of the columns in my table is of JSON type and the sample data in it is as follows
{
"first_name": "Lola",
"friends": 547,
"last_name": "Dog",
"location": "NYC",
"online": true,
"Education": [
{
"id": "4ebb11a5-8e9a-49dc-905d-fade67027990",
"UG": "UT Austin",
"Major": "Electrical",
"Minor": "Electronics"
},
{
"id": "6724adfa-610a-4efe-b53d-fd67bd3bd9ba",
"PG": "North Eastern",
"Major": "Computers",
"Minor": "Electrical"
}
]
}
Is there a way to replace the "id" field in JSON as below to get the id generated dynamically?
"id": gen_random_uuid(),
Yes, this should be possible. To generate JSON data that includes a randomly-generated UUID, you can use a query like:
root#:26257/defaultdb> select jsonb_build_object('id', gen_random_uuid());
jsonb_build_object
--------------------------------------------------
{"id": "d50ad318-62ba-45c0-99a4-cb7aa32ad1c3"}
If you want to update in place JSON data that already exists, you can use the jsonb_set function (see JSONB Functions).

Indexing In ElasticSearch For Auditing

There is a microservice-based architecture wherein each service has a different type of entity. For example:
Service-1:
{
"entity_type": "SKU",
"sku": "123",
"ext_sku": "201",
"store": "1",
"product": "abc",
"timestamp": 1564484862000
}
Service-2:
{
"entity_type": "PRODUCT",
"product": "abc",
"parent": "xyz",
"description": "curd",
"unit_of_measure": "gm",
"quantity": "200",
"timestamp": 1564484863000
}
Service-3:
{
"entity_type": "PRICE",
"meta": {
"store": "1",
"sku": "123"
},
"price": "200",
"currency": "INR",
"timestamp": 1564484962000
}
Service-4:
{
"entity_type": "INVENTORY",
"meta": {
"store": "1",
"sku": "123"
},
"in_stock": true,
"inventory": 10,
"timestamp": 1564484864000
}
I want to write an Audit Service backed by elasticsearch, which will ingest all these entities and it will index based on entity_type, store, sku, timestamp.
Will elasticsearch be a good choice here? Also, how will the indexing work? So, for example, if I search for store=1, it should return all the different entities that have store as 1. Secondly, will I be able to get all the entities between 2 timestamps?
Will ES and Kibana (to visualize) be good choices here?
Yes. Your use case is pretty much exactly what is described in the docs under filter context:
In filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to published?

mgo with aggregation and grouping

I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.
Something like this:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
If I comment out the $group section, the query returns the join as expected.
If I run as is, I get NULL
If I move the $group to the bottom of the pipe I get an array response with Null values
Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?
The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.
After your first stage of $group the documents are only as below example:
{"_id": { "user": "foo"}},
{"_id": { "user": "bar"}},
{"_id": { "user": "baz"}}
They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables
As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.
{"$group": {
"_id":"$user",
"date_updated": {"$first":"$date_updated"},
"organization": {"$first":"$organization"}
}
}
Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.
Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.
{"$project": {
"user": "$_id",
"date_updated": 1,
"organization": 1
}
}
Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.
So, all of those combined will look something as below:
[
{"$group":{
"_id": "$user",
"date_updated": { "$first": "$date_updated"},
"organization": { $first: "$organization"}
}
},
{"$project":{
"user": "$_id",
"date_updated": 1,
"organization": 1
}
},
{"$match":{
"date_updated": {"$gt": durationDays } }
},
{"$lookup":{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details"
}
},
{"$lookup":{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details"
}
}
]
(I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

Aggregations on PyElasticSearch (pyes)

I wish to calculate value-count aggregations on some indexed product data, but I seem to be getting some parameters in the ValueCountAgg constructor wrong.
An example of such indexed data is as follows -:
{
"_index": "test-index",
"_type": "product_product",
"_id": "1",
"_score": 1,
"_source": {
"code": "SomeProductCode1",
"list_price": 10,
"description": null,
"displayed_on_eshop": "true",
"active": "true",
"tree_nodes": [],
"id": 1,
"category": {},
"name": "This is Product",
"price_lists": [
{
"price": 10,
"id": 1
},
{
"price": 10,
"id": 2
}
],
"attributes": {
"color": "blue",
"attrib": "something",
"size": "L"
},
"type": "goods"
}
}
I'm calculating aggregations as follows -:
for attribute in filterable_attributes:
count = ValueCountAgg(
name='count_'+attribute, field='attributes.'+attribute
)
query.agg.add(count)
where query is a ~pyes.query.Query object wrapped inside a ~pyes.query.Search object. filterable_attributes is a list of attribute names, such as color and size.
I have tried setting field=attribute as well, but it seems to make no difference. The resultset that I obtain on conducting the search has the following as its aggs attribute -:
{'count_size': {'value': 0}, 'count_color': {'value': 0}}
where size and color are indexed inside the attributes dictionary as shown above. These are evidently wrong results, and I think it is because I am not setting field properly.
Where am I going wrong?
I've found where I was going wrong.
According to Scoping Aggregations, the scope of an aggregation is by default associated with its query. My query was returning zero results, and I had to modify the search phrase for the same.
I got the required results after that, and aggregations are coming out right.
{'count_size': {'value': 3}, 'count_color': {'value': 3}}

CouchDB: Trouble querying a view with a key using rewrites

In CouchDB I have created a view called "zip", the map looks like this;
function (doc) {
if(doc.type == 'zip') {
emit(doc.zip_code, doc)
}
}
I then added a bunch of docs related to zip codes, a sample doc goes like this;
{
"_id": "zip/48114",
"_rev": "1-990b2c4f682ed0b6a27e2fa0c066c93d",
"zip_code": 48114,
"state": null,
"county": null,
"rep_code1": "INTL2",
"rep_code2": "MI1",
"type": "zip"
}
Now when I query the view directly like so,
http://localhost:5984/partslocator/_design/partslocator/_view/zip?key=48114
I get the row back that I am expecting;
{
"total_rows": 41683,
"offset": 20391,
"rows": [
{
"id": "zip/48114",
"key": 48114,
"value": {
"_id": "zip/48114",
"_rev": "1-990b2c4f682ed0b6a27e2fa0c066c93d",
"zip_code": 48114,
"state": null,
"county": null,
"rep_code1": "INTL2",
"rep_code2": "MI1",
"type": "zip"
}
}
]
}
I have then set up a vhost and am using rewrites, and my rewrite for 'zip' looks like this.
{from: "/zip/:zip", to: "_view/zip", query: {"key": ":zip"}}
To me this seems like it should be correct, however when I try to query the view with the rewrite url, it always returns zero rows.
rewrite url:
http://partslocatordev.com:5984/zip/48114
response:
{
"total_rows": 41683,
"offset": 41683,
"rows": []
}
Am I missing anything here?
Note: I am using rewrites in the same fashion with other views and they work, but I cannot figure out why this one in particular isn't.
It's likely that the rewriter is querying zip?key=":zip" rather than zip?key=:zip. You can use a formats field in your rewriter to name how different arguments should be typed. In this case, try this:
{
from: "/zip/:zip",
to: "_view/zip",
query: {"key": ":zip"},
formats: {
"zip": "int"
}
}
Alternatively, in your map function, emit a string as the ID rather than a number, like this:
function (doc) {
if(doc.type == 'zip') {
emit(String(doc.zip_code), doc)
}
}
That will handle cases where the zipcode isn't an integer, like in the UK.

Resources