How to store only limited number of doc in elasticsearch. - elasticsearch

I need to store only say 10 number docs under a particular index. And the 11th item should replace the old item i.e 1st item. So that i will be having only 10 doc at any time.
I am using elacticsearch in golang

If you want to store only 10 doc then
you should apply algo = (document no%10)+1.
the return value is your elasticsearch _id field
the algo retyrn only 1 to 10. and always index it.

I'm assuming you will have fixed names for documents, like 1, 2,...,10 or whatever. So one approach could be to use Redis to implement a circular list https://redis.io/commands/rpoplpush#pattern-circular-list (you can also implement your own algorithm to implement that circular list by code)
So basically you should follow the next steps:
You load those 10 values ordered in the circular list, let's say 1,2, 3, ... 10
When you want to store a document in Redis, you extract an element from the list, for our list that element will be 1
make a query count on your ElasticSearch index to get the number of the document in the index
if you get a count < 10 you call insert document query with your data and with the number extracted from the list as the document name. If count = 10 you call update document query on ElasticSearch
The circular will progress in this way:
The initial state of the list is [1, 2, ...10]. You extract 1 and after extracting it, it goes to the end of the list: [2,3,..., 10,1]
Second extraction from the list, current state of the list: [2, 3, ...10, 1]. You extract 2 and after extracting it, it goes to the end of the list: [3,4,..., 1,2]
and so on

Related

Google Sheets sum mapped text values by group

I'm looking to count the groups that have at least one occurrence of specific text values in a column. E.g., if any of term_01, term_02, or term_03 are in column B, for 1 or more records associated with a specific value in column A, count 1.
This is not too difficult with a helper column, but I'm trying to do it in one go, with a single formula.
Another way to think about this is that unless every record for a given column A value (e.g. group_01) has a value of term_04 in column B, add 1 to the value displayed in cell I1.
Solution with a helper column:
The helper can be created with an array formula:
Unfortunately, replacing the [sum_range] in SUMIF() with an array fails with "Error: Argument must be a range"
Is there a way to pass an array of values to SUMIF() instead of a range? Am I going about solving this problem the wrong way?
I would still be interested to know if there's a general solution to the question asked in the title, but I solved this specific problem with a slightly different approach.
If the number of records with a term_04 value equals the total number of records for that group, assign 0, otherwise assign 1, and sum the result.
=ArrayFormula(SUM(IF(COUNTIFS($D$3:$D$10, UNIQUE(D3:D10), $E$3:$E$10, $A$6)=(COUNTIF($D$3:$D$10, UNIQUE(D3:D10))), 0, 1)))

How to make query both to parent and child index?

I got parent index users and child purchase. Purchase has field purchase_count it is number of purchase made by user, for example first purchase of some user will be with purchase_count = 1, second with 2 etc.
I want to make query to get total number of users, number of users who had first purchase, number of users who had second etc. For example All: 100, 1: 10, 2: 6, 3: 3 etc..
I know how to do it in two requests, first get count of all users next term aggregation of purchases based on purchase_count field, but can I do it somehow in single query?
There is a datatype in Elasticsearch called parent-join or parent-child previously: https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
That datatype needs to be in a single index. There are no joins across indices in Elasticsearch.
You probably want to look into parent-join for your usecase, but you'll have to restructure your data to reside in a single index.

how to get distinct domain counts from Frame in h2o

In H2O, when we parse .csv file to Frame object how can we get distinct values count of a particular column(Vec).
For example, consider a column Fruits which has apple 3 times and mango 2 times. After parsing it to a frame, we can get distinct values using the domain() method, but how do you get distinct values along with their counts? In the example, I would be looking for:
apple,3
mango,2
you're looking for h2o.table
From R:
fr <- as.h2o(iris)
h2o.table(iris[,"Species"])
From python:
fr["Species"].table()

how to apply stringtoword vector filter

I am trying to use the weka gui to classify some textual data.
I am using the stringtoword filter with the attribute indices default value being set to first-last.
However, i tried to change it to things such as 1, 500-last
it gives me an error of invalid range list.
Initially my arff has only 2 attributes.
class
text
Is there anything i am doing wrongly ?
I am pretty sure there are a lot of words in the text file and when i run the default filter of first-last it gives me a whole 10,000 number of attributes
The attribute indices takes index, respectively indices of attributes whose values you wish convert to word vector. So you have two attributes class with index 1 and text with index 2.
Setting first-last takes both and very likely did nothing with class since it is usually single value, and make a word vector from attribute text.
Cut to the chase, your only options in this case is to use 2 or first-last, but result will be the same. 500 is out of range since you have only 2 attributes.
PS. If you wish use just range of words from obtained word vector, you can use Remove filter and specify indices of columns (words) you wish to remove...

With GWan's key/value store can more than one index be created for an entity?

For GWan's key value store can I create more than one index for a given single type of entity?
Also can I query more than one index at once such as find a item with age > 5 and height > 100 if I indexed age and height.
can I create more than one index for a given single type of entity?
If you mean, having several indexes for multiple fields in a record (more than one value for a key) then yes, you can. Just look at the kv.c example: http://gwan.ch/source/kv.c (for any reason, the Stackoverflow text formatting menu is not displayed, so I wrote the link in the text rather than embedded - also, if someone could PLEASE stop the captcha that I must enter to reply to each question, that would be nice).
can I query more than one index at once such as find a item with age > 5 and height > 100 if I indexed age and height?
You can easily write a function to do that and find the records that appear in the first search on the first index AND in the second search for the second index.
This is very fast as the results are returned sorted.

Resources