Elasticsearch-py Bulk Percolate Functionality - elasticsearch-py

Trying to get the bulk percolate functionality to work for Elasticsearch-py (i.e. mpercolate), but haven't been able to find an example online. I'm able to use the percolate function, so I can get this to work:
doc = {'doc' : {'field1' : 'this is a value', 'field2' : 'this is another value'}}
res = es.percolate(index = 'my_index', doc_type = 'my_doc_type', body = doc)
The documentation I've read so far seems to imply that if I want to do a bulk submission, I need to send header and body as strings, separated by a newline. Thus I've tried:
head = {'percolate' : {'index' : 'my_index', 'type' : 'my_doc_type'}}
doc = {'doc' : {'field1' : 'this is a value', 'field2' : 'this is another value'}}
doc2 = {'doc' : {'field1' : 'values and values', 'field2' : 'billions of values'}}
query_list = [head, doc, head, doc2]
my_body = '\n'.join([str(qry) for qry in query_list])
res = es.mpercolate(body = my_body)
which gives me a generic "elasticsearch.exceptions.TransportError". Anyone have a working example I can adapt?

You don't have to serialize the data yourself, just pass in the query_list as body and it should do the right thing

Related

EF.Functions.Contains including multiple keywords

I need to search on multiple columns (LearningModuleDesc and LearningModuleContent which works using the || statements below) but I also need to search on multiple keywords. .Net Core 2.2 and EF Core does not support the string array with Contains (like the example below) but some guidance of how I would go about this would be great.
string[] stringarray = new string[] { "mill", "smith" };
var results = _context.LearningModules
.Where(x => EF.Functions.Contains(x.LearningModuleDesc, stringarray)
|| EF.Functions.Contains(x.LearningModuleContent, stringarray)
);
If I understand correctly, you are looking for something like this
var results = _context.LearningModules.Where(
x => stringarray.Any(t => x.LearningModuleDesc.Contains(t)) ||
stringarray.Any(t => x.LearningModuleContent.Contains(t)))

How to search for an exact sub-string in aws cloudsearch?

When I search for 'bcde' I would like to get all of the following matches:
'abcde'
'bcdef'
'abcdef'
What is the way to achieve this result in AWS cloudsearch (preferably with a simple query parser)? Prefix will not give me the first result. Is there any other way?
After a few attempts at examples and without success. I decided as follows:
I created a text-array field and stored it part by part of the string from back to front and it worked.
example: my string is "abcde" and i search bcde. this would not work
but in my field text-field will be the following strings:
e, de, cde, bcde, abcde. So you will find "abcde" because he will find the term in the text-array field.
Oh man, but if i search bcd this term not in text-array field.
All right but the string "bcde" starts with "bcd" so IT WORKS! =)
my php file to insert looks like this:
$term = "abcde";
$arrStr = str_split($term);
$arrTerms = [];
$aux = 1;
foreach($arrStr as $str){
$arrTerms[] = substr($term,($aux * -1));
$aux++;
}
$data = [
'type' => 'add',
'id'=> [your_id],
'fields' => [
'id'=> [your_id],
'field-text' => $term
'field-text-array' => $arrTerms
],
];
If your index field is of type "text", A simple structured query will return all the matches which include your query string.
Example
Query : ( and part_part_number:'009' )
Result:
1 _score 10.379914
part_part_number 009
2 _score 10.379914
part_part_number A-009-DY
3 _score 10.379914
part_part_number BY-009

NEST MultiGet search all types possible?

I have got unique document ids (across all types) and I would like to check which document already exists in elasticsearch index. I try to search
var duplicateCheck = _elasticClient
.MultiGet(m => m.GetMany<object>(notices.Select(s => s.Id)).Fields("Id"));
but it returns wrong result - every document has set found property to false.
update
there is workaround here
var exisitngDocIds = _elasticClient.Search<ZPBase>(s => s
.AllTypes()
.Query(q => q.Ids(notices.Select(z=>z.Id)))
.Fields("Id")
.Take(notices.Count)
);
notices = notices.Where(q => !exisitngDocIds.Hits.Any(s => s.Id == q.Id)).ToList();
From the Multi Get API documentation I realized that you can use something similar to the following code to solve your problem:
var response = _elasticClient.MultiGet(m => m
.Index(MyIndex)
.Type("")
.GetMany<ZPBase>(noticeIds));
Note the empty string passed as the Type.

Delete from elasticsearch all items where field does not match a certan value

I'm trying to delete all items from Elasticsearch index where field time_crawl_started does NOT match a specific value. I'm using match_all query in combination with NOT filter.
This is what I got so far:
$client = new Elasticsearch\Client();
$params = Array(
'index' => ...,
'type' => ...
);
$params['body']['query']['filtered']['query']['match_all'] = Array();
$params['body']['query']['filtered']['filter']['not']['term']['time_crawl_started'] = $someDate;
$client->deleteByQuery($params);
The problem is that this deletes all items, even ones having time_crawl_started set to $someDate, which is simply a datetime such as "2014-02-17 19:13:31".
How should I change this to delete only the items that don't have the correct date?
The problem was that time_crawl_started field was analyzed and thus any comparison by value was wrong. I had to create index manually (as opposed to automagically by just inserting a new document into non-existing index) and specify mapping for my item type, setting 'index' => 'not_analyzed' for time_crawl_started.
And I ended up using script filter like this:
$params['body']['query']['filtered']['query']['match_all'] = Array();
$params['body']['query']['filtered']['filter']['script']['script'] = "doc['time_crawl_started'].value != \"" . $someDate . "\"";

Paginate the results from Amazon product search

I am using the gem amazon_product for searching the books in Amazon.
The search is perfectly fine but it gets me only a list of first 10 books.
I want to get all the search results and paginate them. How can I do this?
My code looks like this,
req = AmazonProduct["us"]
req.configure do |c|
c.key = "KEY"
c.secret = "SECRET_KEY"
c.tag = "TAG"
end
resp = req.search("Books", :power => params[:book][:search_term], :sort => "reviewrank")
#books = resp.to_hash["Items"]["Item"]
From their API page at - http://webservices.amazon.com/AWSECommerceService/AWSECommerceService.wsdl
They have "RelatedItemPage" and "ItemPage"
You should give this a try
resp = req.search("Books", :power => params[:book][:search_term], :itemPage => 20)
Hope this helps.

Resources