Group by multiple fields in Spring Data MongoDB Aggregation

Group by multiple fields in Spring Data MongoDB Aggregation - spring-boot

I have an aggregation operation criteria, and at some point, I need to group by two fields, but one of them is a nested field inside the object. I have this:
criteria.add(group("name", "amount.currency").count().as("number"))
But I get this error
FieldPath field names may not contain '.'
How can I group by multiple fields including nested ones in Spring Data MongoDB

Related

Filter by timestamp in nested fields in Kibana 4

Is it possible to create a mapping (e.g. nested) that allows to filter by individual timestamps of orders, where the orders are nested properties of the indexed documents (products)?
In other words, I would like to define a time range in Kibana and receive a list of matching products that contain any orders matching the given time range.

As I knew, Kibana can not handle with nested Json, so first you need to change it to standard Json.

As per your question:- I would like to define a time range in Kibana and receive a list of matching products that contain any orders matching the given time range.
For this you can load the data in Kibana & filter the time in Kibana UI using Time Filter which will show you the orders matching the time range using the timestamp field as mentioned.

Elasticsearch query for what fields have a given type?

I have an elasticsearch (version 1.7) cluster with multiple indices. Each index has multiple doc_types, and each has fields w/ a variety of types. I'd like to get a list of field names for a given field type. This would be a necessarily nested list. For example, I'd like to query for field type "string", and return {index1: {doc_type1.1: [field1.1.1, field1.1.2], ...} -- the leaves of this nested dicts are only those fields w/ the given type. So the hits for this query won't be documents but rather a subset of the cluster's mapping. Is this possible using Elasticsearch?
One solution: I know I can get the mapping as a dict using Python, then work on the mapping dict to recover this nested list. But I think there should be an elasticsearch way of doing this, not a Python solution. In my searches through the documentation I just keep finding the "type filter" which filters by doc_type, not field type.

There's currently no way of achieving this. The _mapping endpoint will return all fields of the request mapping type(s).
However, there might be a way, provided your fields have a special naming convention hinting at their type, for instance name_str (string field for "name"), age_int (integer field for "age"), etc. In this case, you could use response filtering on the _mapping call and retrieve only the fields ending with _str:
curl -XGET localhost:9200/yourindex/_mapping/yourtype?filter_path=*_str

Elasticsearch: Exclude non-matching nested objects in results

In Elasticsearch, is there any way to exclude the nested objects that don't match a particular query/filter from the resulting _source?
For example, let's say that a document has four objects in a nested field. Querying on the required filters results in only matching objects 1 and 3. When we get the results via _source, we will pull back the entire document along with objects 1,2,3,4.
Is it possible to exclude objects 2 and 4 from the results? Or is that something that we have to re-iterate and exclude using application-side logic?

At the moment there is no way to include only the matched nested objects in the result.
There is a inner_hits feature coming out in elasticsearch 1.5.0 which should help with this.

You can achieve this with use of inner_hits which will return you only matching nested objects. you can exclude this nested field in source.
Suggested by Val at:
ElasticSearch - Get only matching nested objects with All Top level fields in search response

Elasticsearch indexed database table column structure

I have a question regarding the setup of my elasticsearch database index... I have created a table which I have rivered to index in elasticsearch. The table is built from a script that queries multiple tables to denormalize data making it easier to index by a unique id 1:1 ratio
An example of a set of fields I have is street, city, state, zip, which I can query on, but my question is , should I be keeping those fields individually indexed , or be concatenating them as one big field like address which contains all of the previous fields into one? Or be putting in the extra time to setup parent-child indexes?
The use case example is I have a customer with billing info coming from one direction, I want to query elasticsearch to see if that customer already exists, or at least return the closest result
I know this question is more conceptual than programming, I just can't find any information of best practices.

Concatenation
For the first part of your question: I wouldn't concatenate the different fields into a field containing all information. Having multiple fields gives you the advantage of calculating facets and aggregates on those fields, e.g. how many customers are from a specific city or have a specific zip. You can still use a match or multimatch query to query for information from different fields.
In addition to having the information in separate fields I would use multifields with an analyzed and not_analyzed part (fieldname.raw). This again allows for aggregates, facets and sorting.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
Think of 'New York': if you analyze it will be stored as ['New', 'York'] and you will not be able to see all People from 'New York'. What you'd see are all people from 'New' and 'York'.
_all field
There is a special _all field in elasticsearch which does the concatenation in the background. You don't have to do it yourself. It is possible to enable/disable it.
Parent Child relationship
Concerning the part whether to use nested objects or parent child relationship: I think that using a parent child relationship is more appropriate for your case. Nested objects are stored in a 'flattened' way, i.e. the information from the nested objects in arrays is stored as being part of one object. Consider the following example:
You have an order for a client:
client: 'Samuel Thomson'
orderline: 'Strong Thinkpad'
orderline: 'Light Macbook'
client: 'Jay Rizzi'
orderline: 'Strong Macbook'
Using nested objects if you search for clients who ordered 'Strong Macbook' you'd get both clients. This because 'Samuel Thomson' and his orders are stored altogether, i.e. ['Strong' 'Thinkpad' 'Light' 'Macbook'], there is no distinction between the two orderlines.
By using parent child documents, the orderlines for the same client are not mixed and preserve their identity.

Indexing nested documents in Elasticsearch with same field names

If I have a object of class Car that has an nested object of class Engine where both classes have the field named "id" do I have to do anything special when I create the mapping? Or is it sufficient to add the type "nested" to the engine mapping.
Elasticsearch head GUI is showing unexpected rows, but the search seems to give the correct result so it would be good to know if I need to do anything else in the mapping if two or more objects have the same field name.
Seems like the structured query builder returns the engine document with the id that I search for when I select car.id from the dropdown.

There shouldn't be any problem, you can just use the dot notation to refer to the fields in the nested documents.
Also, if you have a single engine per car you don't need to declare the engine as nested in your mapping.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio