Solr query with space, only (#q=%20) stalls

Solr query with space, only (#q=%20) stalls - ajax

I have a web-based frontend (localhost, currently) that uses Ajax to query Solr.
It's working well, but if I submit a single space (nothing else) in the input/search box, the URL in the browser shows
...#q=%20
and in that circumstance I get a 400 error, and my web page stalls (doesn't refresh), apparently waiting for a response from Solr.
By comparison, if I submit a semicolon (;) rather than a space, then the page immediately refreshes, albeit with no results (displaying 0 to 0 of 0; expected).
My question is what is triggering the " " (%20) query fault in Solr, and how do I address it in solrconfig.xml?

Update. This solrconfig.xml entry resolved the q=%20 issue I described.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="hl">off</str>
<str name="wt">html</str>
<str name="df">query</str>
<!-- *** ADDITION: *** -->
<str name="defType">edismax</str>
</lst>
</requestHandler>
Note: development code, dummy data.
Before: https://www.youtube.com/watch?v=GhkA4XlqWds&list=PLTtHvbtVBhk6zAqSJ3A1-shsD_h5i3IuP&index=2
After: https://www.youtube.com/watch?v=507DdPOx1xA&list=PLTtHvbtVBhk6zAqSJ3A1-shsD_h5i3IuP&index=1
Additional discussion (reddit:Solr): https://old.reddit.com/r/Solr/comments/kt97ql/solr_query_with_space_only_q20_gives_error/

Related

Error working with "ScrollElasticSearchHttp" processor in NiFi

I am trying to retrieve data from an index in ElasticSearch. I configured the "QueryElasticSearchHttp" processor and it works just fine. However when I try to use the ScrollElasticsearchHttp processor with the same URL, query, index properties and set the 'scroll' to default 1 minute, it doesn't work.
I get an error response of 404 : "Elasticsearch returned code 404 with message Not found".
I am also tailing the log on the ES cluster and I see this error;
[DEBUG][o.e.a.s.TransportSearchScrollAction] [2] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:[127.0.0.1:9300][indices:data/read/search[phase/query+fetch/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [2]
at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:457) ~[elasticsearch-7.5.2.jar:7.5.2]
I am on Apache NiFi 1.10.0
Here is the config for the processor:
I should see a total of 441 hits, and with page size 20 I should see 23 queries being made to ES.
But I don't get a single result back. I have tried higher values for "scroll" and also played around with "page size" to no avail.
I also noticed that even though the ScrollElasticsearchHttp processor is set to run every 1m, on the ES log I don't see any error log repeated every minute.
Update:
When I cleared the state via UI: "View state" -> "Clear State", I was able to make a single call, that returned a page full of hits in one flowfile.
However, there are more pages to be retrieved. How do I make the processor to go fetch the next page?
My understanding was that the single invocation of the ScrollElasticsearchHttp will page through all the result sets and bring in each page as one flowfile. Is this not correct?

Please decrease the scheduling time to around 10-20 sec. So in every 10-20 sec processor will fetch the next set of records based on your page size.
You can check the state value when the fetching process is in progress i.e. you will find a scroll id in it. Once the fetching process is complete then state value will be changed to "finishedQuery" : true.

409 error when using streaming_bulk() - certain that document is only included once.

I am attempting to upload a large number of documents - about 7 million.
I have created actions for each document to be added and split them up into about 260 files, about 30K documents each.
Here is the format of the actions:
a = someDocument with nesting
esActionFromFile = [{
'_index': 'mt-interval-test-9',
'_type': 'doc',
'_id': 5641254,
'_source': a,
'_op_type': 'create'}]
I have tried using helpers.bulk, helpers.parallel_bulk, and helpers.streaming_bulk and have had partial success using helpers.bulk and helpers.streaming_bulk.
Each time I run a test, I delete, and then recreate the index using:
# Refresh Index
es.indices.delete(index=index, ignore=[400, 404])
es.indices.create(index = index, body = mappings_request_body)
When I am partially successful - many documents are loaded, but eventually I get a 409 version conflict error.
I am aware that there can be version conflicts created when there has not been sufficient time for ES to process the deletion of individual documents after doing a delete by query.
At first, I thought that something similar was happening here. However, I realized that I am often getting the errors from files the first time they have ever been processed (i.e. even if the deletion was causing issues, this particular file had never been loaded, so there couldn't be a conflict).
The _id value I am using is the primary key from the original database where I am extracting the data from - so I am certain they are unique. Furthermore, I have checked whether there was unintentional duplication of records in my actions arrays, or the files I created them from, and there are no duplicates.
I am at a loss to explain why this is happening, and struggling to find a solution to upload my data.
Any assistance would be greatly appreciated!

There should be information attached to the 409 response that should tell you exactly what's going wrong and which document caused it.
Another thing that could cause this would be a retry - when elasticsearch-py cannot connect to the cluster it will resend the request again to a different node. In some complex scenarios it can happen that a request will be thus sent twice. This is especially true if you enabled retry_on_timeout option.

solr query warmup troubles and solrconfig.xml

I am trying to configure warmup queries in solrconfig.xml on Solr version 4.10.3, but no matter how we do it the cache seems to disappear after about a minute or so, and then the first search again takes about 20 secs., with subsequent searches coming straight away.
The query looks like this (filter is the variable search-term):
solr/Nyheder/select?q=overskrift:" & filter & "+OR+underrubrik:" & filter & "+OR+tekst:" & filter&fl=id+oprettet+overskrift+underrubrik+tekst+pix
&sort=oprettet+desc
And the solrConfig.xml section (which seems to help nothing) looks like this (it is similar for the event="firstSearcher"):
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">*:*</str>
<str name="sort">oprettet desc</str>
<str name="fl">id oprettet overskrift underrubrik tekst pix</str>
</lst>
<lst>
<str name="q">overskrift:* OR underrubrik:* OR tekst:*</str>
<str name="sort">oprettet desc</str>
<str name="fl">id oprettet overskrift underrubrik tekst pix</str>
</lst>
</arr>
</listener>
Edit: added commit configuration
<autoCommit>
<maxTime>120000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>

Everytime you do a softcommit or a hard commit, your caches are more or less invalidated, since any type of commit will generally create a "newsearcher".
You probably have a softcommit or hardcommit set to 1 minute.
Check this link and see the sections Soft Commits and Hard commits.

It turned out that this was not related to SolR at all but to internal routing.
For those interested it turned out that ipv6 DNS was checked and needed to timeout before the IPv4 address was used, so the delay was from a DNS lookup+timeout and NOT from SOLr.

HazelCast IMap.values() giving OutofMemory on Tomcat

I'm still trying to get to know hazelcast and have to make a decision on whether to use it or not.
I wrote a simple application where in I startup the cache on (single node) server startup and load the Map at the same time with about 400 entries.The Object itself has two String fields. I have a service class that looksup the cache and tries to get all the values from the map.
However, I'm getting a OutofMemoryError on Java Heap Space while trying to get values out of the hazelcast map. Eventually we plan to move to a 5 node cluster to start with.
Following is the cache spring config:
<hz:hazelcast id="instance">
<hz:config>
<hz:group name="dev" password=""/>
<hz:properties>
<hz:property name="hazelcast.merge.first.run.delay.seconds">5</hz:property>
<hz:property name="hazelcast.merge.next.run.delay.seconds">5</hz:property>
</hz:properties>
<hz:network port="5701" port-auto-increment="false">
<hz:join>
<hz:multicast enabled="true" />
</hz:join>
</hz:network>
</hz:config>
</hz:hazelcast>
<hz:map instance-ref="instance" id="statusMap" name="statuses" />
Following is where the error occurs:
map = instance.getMap("statuses");
Set<Status> statuses = (Set<Status>) map.values();
return statuses;
Any other method of IMap works fine. I tried getting the keySet and size and both worked fine. It is only when I try to get the values that the OutofMemory error shows up.
java.lang.OutOfMemoryError: Java heap space
I've tried the above with a standalone java application and it works fine. I've also monitored with visual VM and don't see any spike in used Heap Memory when the error occurs which is all the more confusing. Available Heap is 1G and the used Heap was about 70MB when the error occurred.
However, when I take out cache implementation from the application, it works fine going to the Database and getting the data.
I've also tried playing around with the tomcat vm args to no success. It is the same OutofMemoryError when I access IMap.values() with or without SQLPredicate. Any help or direction in this matter will be greatly appreciated.
Thanks.

As the exception mentions you're running out of heap space since the values method tries to return all deserialized values at once. If they don't fit into memory you'll likely to get an OOME.
You can use paging to prevent this from happening: http://hazelcast.org/docs/latest/manual/html-single/hazelcast-documentation.html#paging-predicate-order-limit-

How big are your 400 entries?
And like Chris said, the whole data is being pulled in memory.
In the future we'll replace this by an iteration based approach where we'll only pull small chunks in memory instead of the whole thing.

I figured out the issue. The Status object was implementing "com.hazelcast.nio.serialization.Portable" for Serialization. I did not configure the corresponding serialization factory. After I configured the factory as follows, it worked fine:
<hz:serialization>
<hz:portable-factories>
<hz:portable-factory factory-id="1" class-name="ApplicationPortableFactory" />
</hz:portable-factories>
</hz:serialization>
Apologize for not giving the complete background initially as I myself noticed it later on. Thanks for replying though. I wasn't aware of the Paging Predicate and now I'm using it for sorting and paging results. Thanks again.

Flex big data volume performance (ADEP/LCDS dataservice)

As we have found a solution for Hibernate, server side loads the data very fast : less than a sec for thousands of records and more. Now the problem is on transporting data from server to browser. Two issue:
1.The datagrid always waits until the ArrayCollection is fully loaded. We overcame this by specifying :
<destination id="someAssemble">
<properties>
<use-transactions>true</use-transactions>
<source>com.assembler.SomeAssembler</source>
<scope>application</scope>
<item-class>vo.SomeVo</item-class>
<network>
<paging enabled="true" pageSize="50" />
</network>
==> the datagrid started display quickly. The problem is the server stops loading data from row 51. Is there a way to force Flex keep loading the data in the background (by config or by code )
2.If I try to load a big ArrayCollection (for example more than 20K records), it locks down the whole browser. Is it possible to load it smoothly behind the scene?
Please help! Thank you

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio