Search by filters using views in CouchDB - view

I have a CouchDB database where I store models like this:
"_id": "id",
"_rev": "rev",
"field_1": "test",
"filed_2": 45,
"filed_3": 15,
"object_1": {
"field_1_1": 123,
"filed_1_2": 125
}
}
And I want to search for models by specific parameters in different ranges (filters).
For example, in one situation I need to find all the models with
field_2 from 10 to 50
field_3 from 10 to 20
object_1.field_1_1 from 100 to 150, object_1.field_1_2 from 120 to 130
In another case I need to find just all the models with field_2 from 10 to 50.
At the moment I wrote view like this:
function (doc) {
emit([doc.filed_2, doc.field_3, doc.object_1.field_1_1, doc.object_1.filed_1_2], 1);
}
So it generates that result:
{"id":"id","key":[45,15,123, 125],"value":1}
I can use this array-key to fetch necessary models and I can use "startkey" and "endkey" to generate ranges.
But Is there more efficient way to create search by different filters (some filters can be skipped, user selects the filters he wants to search by) in CouchDB? How Can I combine different parameters?
And How Can I skip parameters if they were not chosen for search (like in the second case)?
Thank you.

In CouchDB 2.x you can use the /db/_find endpoint with Mango expressions in order to query the database.
Please, check the expression syntax in order to check if it can cover your needs.

Related

ElasticSearch: query for N items of each category

I have an index of goods in ElasticSearch (5.5), of them every product has a field "category", like "GLOVES", "COAT", "TOWEL".
With the terms query I can select items belonging to several categories, e.g.
{
"terms": {
"div_id": ["COAT", "DRESS", "JACKET"]
}
}
Now the problem is that I want to have in response several items of each type, say, not less than 3 (given that total size of answer is 15 records).
And I have no clear idea how to do this. With the given "straight" way it may return any number from any category. The closest I get is to add random_score which makes result "diverse", but it then depends on how many percents every category takes in the index.
I suspect there should be different approach, but can't guess correct keywords, seemingly.
Thanks in advance!
You may want to try top hits agg documented here.

Elasticsearch question, should I have duplicate data along 2 different indices? Not sure how to set up the data

Edit: 3 different incides. Sorry about the title :c
I am trying to grasp elasticsearch as fast as I can but I think I've confused myself majorly here. How should I set this data up?
I have 3 major searches:
1: Search by pokemon name. Eg: Show all Charizard in the system.
2: Search by trainer name Eg: Show all of John Doe's pokemon/checkins at the pokecenter.
3: Search by checkins at the pokecenter.
Should each of these be in their own separate index? I am absolutely from an SQL background primarily so I want to have separate tables for all of these. But that isn't how elasticsearch works... so I am really confused here.
Should I have a separate index for each pokemon?
And then another separate index for each trainer?
And then another separate index for each checkin at the pokecenter?
Query return examples
1: Search by pokemon name.
{
1 : {
id: 9239329,
pokeId: 6,
name: Charizard,
trainerId: 2932
}
}
2: Search by trainer name
{
1 : {
id: 2932,
name: John Doe,
pokemon: [
9239329
]
}
}
3: Search by checkins at the pokecenter.
{
1 : {
id: 3232,
date: 11/11/1111,
pokemon: [
9239329
],
trainerId: 2932
}
}
But if I have a separate index.... and index for EACH of these ... while that would be fast wouldn't that just be crazy horrendous data duplication?
It depends on the scope of the project :
the ideal way is to have each one as it's separate index this allows you to scale them differently if needed and move them to another cluster and also allow each one to have different replica settings
The quick way , is to have the checkins as an index and the trainer as a nested object , and under that the pokemon as a nested object.
note: nested queries are slower, and writing the queries to return exactly what you want is a little tricker.

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

Rethinkdb multiple level grouping

Let's say I have a table with documents like:
{
"country": 1,
"merchant": 2
"product": 123,
...
}
Is it possible to group all the documents into a final json structure like:
[
{
<country_id>: {
<merchant_id>: {
<product_id>: <# docs with this product id/merchant_id/country_id>,
... (other product_id and so on)
},
... (other merchant_id_id and so on)
},
... (other country_id and so on)
]
And if yes, what would be the best and most efficient way?
I have more than a million of these documents, on 4 shards with powerful servers (22 Gb cache each)
I have tried this (in the data explorer, in JS, for the moment):
r.db('foo')
.table('bar')
.indexCreate('test1', function(d){
return [d('country'), d('merchant'), d('product')]
})
and then
r.db('foo')
.table('bar')
.group({index: 'test1'})
But the data explorer seems to hang, still working on it as you can see...
.group({index: 'test1'}).count() will do something pretty similar to what you want, except it won't produce the nested document structure. To produce the nested document structure it would probably be easiest to ungroup, then map over the ungrouped values to produce objects of the form you want, then merge all of them.
The problem with group queries on the whole table though is that they won't stream, you'll need to traverse the whole table to get the end result back. The data explorer is meant for small queries, and I think it times out if your query takes more than 5 minutes to return, so if you're traversing a giant table then it would probably be better to run that query from one of the clients.

Use one field to compare to another field and filter in oData

Lets say I have data like this (lots of it)
{
"name" : "Coffee",
"quantity": 100,
"restock": 10
}
I want to use an odata $filter to show me ONLY items where the quantity is LESS than the restock number
Is it possible to do something like $filter=quantity lt restock
I know that specific example fails. Is there a way to do this? Or do I need to fetch everything and post process it?
That query should absolutely be possible in most (all?) versions of OData: see http://services.odata.org/V4/OData/OData.svc/Products?$filter=Rating lt Price for a working example.

Resources