How to index a document using Elasticsearch NEST dynamically? - elasticsearch

I want to put documents from different customers to different indexes in Elasticsearch.
Documents have some common part in structure and a part that vary from customer to customer named Data.
C#
class Entity
{
public Guid CustomerId { get; set; }
public IDictionary<string, object> Data { get; set; }
}
JSON
{
"customerId": "10000000-0000-0000-0000-000000000000",
"data": {......}
}
Say we have 10000 "customers" with a million of documents for each. "Customers" may be created and deleted dynamically.
Is it good idea to put documents from different customers to different indexes in Elasticsearch?
Is it possible to create a new index in Elasticsearch based on customerId field of inserted document dynamically? How to do it using .Net client?
I'm looking for something like:
var index = entity.CustomerId;
client.CreateDocument<Entity>(entity, index);
PS I'm using Elasticsearch v7.6

Well, I found an answer. Anyway would be nice if someone comment it how is it good strategically.
Here we go:
var indexName = entity.CustomerId.ToString();
var request = new IndexRequest<Entity>(entity, indexName);
client.Index<Entity>(entity);

Related

Return subset of Redis Values matching specific property?

Let's say I have this kind of model:
public class MyModel
{
public long ID { get; set; }
public long ParentModelID { get; set; }
public long ReferenceID1 { get; set; }
public long ReferenceID2 { get; set; }
}
There are more attributes, but for examples sake, it is just this. There are around 5000 - 10000 rows of this model. Currently storing it in a Redis Set.
Is there an efficient way in REDIS to query only a subset of the whole Data Set? For example, in LINQ I can do:
allModels.Where(m => m.ParentModelID == my_id);
or
allModels.Where(m => m.ReferenceID1 == my_referenceid);
Basically, being able to search through the dataset without returning the whole dataset and performing the LINQ queries against that. Because querying and returning 10,000 rows to get only 100 is not efficient?
You can use an OHM (Object-Hash Mapper, like ORM) in your favorite language to achieve the LINQ-like behavior. There are quite a few listed under the "Higher level libraries and tools" section of the [Redis Clients page](https://redis.io/clients.
Alternatively, you can implement it yourself using the patterns described at https://redis.io/topics/indexes.
You can't use something like LINQ in Redis out of the box. Redis is just a key-value store, so it doesn't have the same principles or luxuries as a relational database. It doesn't have queries or relations, so something like LINQ just doesn't translate at all.
As a workaround, you could segment your data using different keys. Each key could reference a set that stores values with a specific range of reference Ids. That way you wouldn't need to retrieve all 10,000 items.
I would also recommend looking at hashes, this might be more appropriate than a set depending on your use case as they're better at storing complex data objects.

How to not index nested collection but still store in Elasticsearch using NEST?

Is there a way in NEST to skip a nested collection or type from being indexed, but still include with the document?
I can use below to completely skip the property, not just from being indexed:
[ElasticProperty(OptOut=true)]
public List<MyClass> subtype { get; set; }
Using Index=FieldIndexOption.no appears to have no effect (the mapping looks the same):
[ElasticProperty(Index=FieldIndexOption.no)]
public List<MyClass> subtype { get; set; }
I want to avoid specify the FieldIndexOption.no on each property of the nest type. Is there a another way?
Edit 1:
Code for creating the index:
elasticClient.CreateIndex("MyParentClass", new IndexSettings());
elasticClient.MapFromAttributes<MyParentClass>();
We're currently on version 0.12 of NEST (upgrade pending).

Elasticsearch NEST - SortAsceding doesn't sorts documents

I am trying to sort the result set based on a field name. But Sort doesn't works with string type.
Tried Code:-
public class Company
{
public long Number { get; set; }
public string Name{ get; set; }
}
My problem is : Sorting is not done when I use SortAscending API, like below
var resultSet = client.Search<Article>(s => s
.Type("Company")
.From(0)
.Size(200)
.QueryString("Stack OverFlow")
.SortAscending(f => f.Name));
Note: Documents are listed as Sorted if I set field name as Number(f => f.Number)
Please help
Your issue with sorting on the name field in your index is probably related to the fact that the field is being analyzed/tokenized. From the Elasticsearch Sort Guide:
For string based types, the field sorted on should not be analyzed / tokenized.
Therefore, you need to provide an additional field that is not analyzed/tokenized to perform your sort against. You can accomplish this by adding an additional field to your documents and setting the mapping for that type/field to not_analyzed or you can leverage multi_field (now just fields in version 1.x) on your existing name field. Please refer to the following for guidance on how to accomplish either of these options:
Multi-Fields (or Fields in v1.X)
Mapping

using NEST with Elastic Search for collections

I'm trying to get my hands dirty with Elastic Search via the NEST .Net api and running into a couple of problems. I suspect I've misunderstood something, or am modelling my docs incorrectly but would appreciate some help.
I have a document with collections in it. A similar trite example below :
public class Company
{
public DateTime RegisteredOn {get;set;}
public string Name {get;set;}
[ElasticProperty(Type = FieldType.nested)]
public List<Employee> Employees {get;set;}
}
public class Employee
{
public string FirstName {get;set;}
public string LastName {get;set;}
[ElasticProperty(Type = FieldType.nested)]
public List<SalesFigure> SalesFigures {get;set}
}
public class SaleFigure
{
public int AverageMonthlySaleValue {get;set;}
public int AverageVolumeSold {get;set;}
}
I've created an index with some data in at each level of the hierarchy and before indexing have called client.MapFromAttributes<Company>();
The following works, but I'd like to understand how I'd find all companies with employees with a firstName of Bob, and or find all companies with employees who have a an average AverageMonthlySaleValue > $1100
client.Search<Company>(query => query.Index("companies").Type("company")
.From(0)
.Size(100)
.Filter(x => x.Term(n => n.Name, "Microsoft")));
Nested queries/filters have been suggested as has suggestions that I ought to flatten my document which I can do, but I'm trying to create a model which better represents the real domain so am in a quandary.
Equally, I know that I'll also have to use facets at some point so want to structure everything correctly to support that.
Thanks
Tim
So it turns out there wasn't much wrong with the structure of my document. The example is trite and the real property I was querying on a collection was a string, not an int, so case sensitivity kicked in.
I had to change the query to use a lower case string value for comparison which worked. Something like the following worked.
client.Search<Company>(query => query.Index("companies")
.Type("company")
.From(0)
.Size(100)
.Filter(x => x.Term("company.employees.firstName", "microsoft")));
I've still to work out how to use a lamda in place of "company.employees.firstName" but it works for now.

How to use a Dictionary or Hashtable for LINQ query performance underneath an OData service

I am very new to OData (only started on it yesterday) so please excuse me if this question is too dumb :-)
I have built a test project as a Proof of Concept for migrating our current web services to OData. For this test project, I am using Reflection Providers to expose POCO classes via OData. These POCO classes come from in-memory cache. Below is the code so far:
public class DataSource
{
public IQueryable<Category> CategoryList
{
get
{
List<Category> categoryList = GetCategoryListFromCache();
return categoryList.AsQueryable();
}
}
// below method is only required to allow navigation
// from Category to Product via OData urls
// eg: OData.svc/CategoryList(1)/ProductList(2) and so on
public IQueryable<Category> ProductList
{
get
{
return null;
}
}
}
[DataServiceKeyAttribute("CategoryId")]
public class Category
{
public int CategoryId { get; set; }
public string CategoryName { get; set; }
public List<Product> ProductList { get; set; }
}
[DataServiceKeyAttribute("ProductId")]
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
}
To the best of my knowledge, OData is going to use LINQ behind the scenes to query these in-memory objects, ie: List in this case if somebody navigates to OData.svc/CategoryList(1)/ProductList(2) and so on.
Here is the problem though: In the real world scenario, I am looking at over 18 million records inside the cache representing over 24 different entities.
The current production web services make very good use of .NET Dictionary and Hashtable collections to ensure very fast look ups and to avoid a lot of looping. So to get to a Product having ProductID 2 under Category having CategoryID 1, the current web services just do 2 look ups, ie: first one to locate the Category and the second one to locate the Product inside the Category. Something like a btree.
I wanted to know how could I follow a similar architecture with OData where I could tell OData and LINQ to use Dictionary or Hashtables for locating records rather than looping over a Generic List?
Is it possible using Reflection Providers or I am left with no other choice but to write my custom provider for OData?
Thanks in advance.
You will need to process expression trees, so you will need at least partial IQueryable implementation over the underlying LINQ to Objects. For this you don't need a full blown custom provider though, just return you IQueryable from the propties on the context class.
In that IQueryable you would have to recognize filters on the "key" properties (.Where(p => p.ProductID = 2)) and translate that into a dictionary/hashtable lookup. Then you can use LINQ to objects to process the rest of the query.
But if the client issues a query with filter which doesn't touch the key property, it will end up doing a full scan. Although, your custom IQueryable could detect that and fail such query if you choose so.

Resources