I am currently assessing if we can move our Solr based backend to Elasticsearch.
However, something I can't seem to work out is if there is an equivalent capability of a custom request handler configure in Solr (as would be configured in the solrconfig.xml) in Elasticsearch.
For context, in our Solr configuration, we have a number of statically defined request handlers with a set of pre-configured facets, ranged facets, facet pivots. Something akin to the below, configured in solrconfig.xml:
<requestHandler name="/foo" class="solr.SearchHandler">
<lst name="defaults">
<str name="fl">
field1,
field2
</fl>
<str name="facet.field">bar</str>
<str name='facet.range'>range_facet</str>
<str name='f.range_facet.facet.range.start'>0</str>
<str name='f.range_facet.facet.range.end'>10</str>
<str name='f.range_facet.facet.range.gap'>1</str>
</lst>
</requestHandler>
I could then GET a set of documents directly from that RequestHandler with something like this http://solr-host:8983/solr/collection-name/foo?q=*:*
and Solr would return a document set with only the desired field and facets. Fundamentally, the application executing the query does not need to be aware of (or configured to) request all returned elements at the time of query.
My question is this - in Elasticsearch, is there an ability to configure an endpoint that would return only the desired aggregations and/or fields without having to post those to the API at the time of the query?
There is a good article for this, https://sematext.com/blog/2014/04/29/parametrizing-queries-in-solr-and-elasticsearch/ . Elastic Search basically uses templates in place of handlers to make query calls associated with search .There are number of stored templates available for use too . See the documentation here Template Query
Related
The documentation and recommendation for using stored_fields feature in ElasticSearch has been changing. In the latest version (7.9), stored_fields is not recommended - https://www.elastic.co/guide/en/elasticsearch/reference/7.9/search-fields.html
Is there a reason for this?
Where as in version 7.4.0, there is no such negative comment - https://www.elastic.co/guide/en/elasticsearch/reference/7.4/mapping-store.html
What is the guidance in using this feature? Is using _source filtering a better option? I ask because in some other doc, _source filtering is supposed to kill performance - https://www.elastic.co/blog/found-optimizing-elasticsearch-searches
If you use _source or _fields you will quickly kill performance. They access the stored fields data structure, which is intended to be used when accessing the resulting hits, not when processing millions of documents.
What is the best way to filter fields and not kill performance with Elastic Search?
source filtering is the recommended way to fetch the fields and you are getting confused due to the blog, but you seem to miss the very important concept and use-case where it is applicable. Please read the below statement carefully.
_source is intended to be used when accessing the resulting hits, not when processing millions of documents.
By default, elasticsearch returns only 10 hits/search results which can be changed based on the size parameter and if in your search results, you want to fetch few fields value than using source_filter makes perfect sense as it's done on the final result set(not all the documents matching search results),
While if you use the script, and using source value try to read field-value and filter the search result, this will cause queries to scan all the index which is the second part of the above-mentioned statement(not when processing millions of documents.)
Apart from the above, as all the field values are already stored as part of _source field which is enabled by default, you need not allocate extra space if you explicitly mark few fields as stored(disabled by default to save the index size) to retrieve field-values.
Below are our master and slave (14 slaves) configurations
Master
<requestHandler name="/replication"
class="MyCustomizedReplicationHandler" >
<lst name="master">
<str name="replicateAfter">optimize</str>
<str name="replicateAfter">startup</str>
</lst>
</requestHandler>
Slave
<requestHandler name="/replication"
class="MyCustomizedReplicationHandler" >
<lst name="slave">
<str name="masterUrl">http://${masterHostName}/solr-master-4.2.0/${solr.core.name}</str>
<str name="pollInterval">04:00:00</str>
</lst>
</requestHandler>
We have scheduled a batch to run every day and the response time during the very first replication is high as there will not be any changes in the following replications of the same day.
Solr Query
(AC_SEARCH:(belfast*) AND (TYPE:(ARP) OR HAS_AIRPORTS:(true)))
I gave direct Solr query here but we use Solr Client to communicate with Solr from application.
The same kind of query's response time is different during replication time and non-replication time.
Please help me to fix this, anything do I need to change in the configuration to achieve this.
I am working on integrating Solr to my application. I have a List of keywords associated to each product. I use multivalued for Keyword field and indexed it. The problem is I want to Boost search result based on each item in the multivalued field in Solr index in order. (Currently I don't see order in the search result for multivalued field which I will fix it later.)
If I want to do this in my side I need to add different search fields and index them through Solr and set boost for each of them.
But I want to know if I use a list as multyvalued field in Solr can I do something like that without the cost of db schema change.
I am so new in Solr and if you find the question is so basic please give me any resource that gives me a hint to solve the problem. I am currently reading Apache Solr documentation and so far couldn't find anything that helps me.
We have large index of 200 GB. We have query requirements where we perform faceting on 5-6 fileds (whitespace tokenized). I have read solr documents which says Faceting tokenized field will populate fieldvalueCache. But for some reason all the facets are cached in FieldCache rather than fieldvaluecahe. Can someone explain as to why this is happening?
I guess this is due to Solr favors docValue to fieldValueCache.
https://issues.apache.org/jira/browse/LUCENE-5666
if you want to use fieldValueCache, you can do via json facet.
https://issues.apache.org/jira/browse/SOLR-8466
here is some more discussion regarding the changes
https://issues.apache.org/jira/browse/SOLR-7190
Here is some related discussion in stackoverflow,
lucene Fields vs. DocValues
I have a doubt about how indexed properties works in Alfresco 4.1.6 with SOLR 1.4.
I use something like this for my queries:
SearchParameters sp = new SearchParameters();
sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet results = getSearchService().query(sp);
where query variable is something like this:
PATH:" /app:company_home/app:user_homes/cm:_x0030_123//*" AND
((#cm\:title:food) OR (#cm\:name:abcde) OR (TEXT:valles) OR
(#doc\:custom_property:"report") OR (#doc\:custom_property2:"report")
AND (#doc\:custom_property3:"report") AND TYPE:"{my.model}voc_document"
On my model.xml I specify what custom properties are indexed
<index enabled="true">
My question is... How works SOLR 1.4 with the indexes if I put on the search query two or more indexed properties? Like Oracle? Oracle try the best index and use only this. Or maybe SOLR combine all the indexed properties and uses all the index on the query?
I need this answer to determine how many indexes put on my model.xml. Maybe put a lot of indexes don't give me the best and efficient result and is better index only a few properties.
And finally, one question. I use LANGUAGE_FTS_ALFRESCO, but I can see that exists a LANGUAGE_SOLR_FTS_ALFRESCO. Is the same? I need to use the second if I use SOLR?
Thanks a lot!
Best regards
There is only one "index". Every field you mark as indexable (which is enabled by default) ends up in your solr index. Alfresco takes your query and sends it to SOLR for processing.
If you don't have a lot of documents, you can go ahead and index every field. By far the biggest impact on indexing and search is the full text index of the content field, which is enabled by default also.
LANGUAGE_FTS_ALFRESCO will use whatever index subsystem you have enabled. In later versions it may use SOLR or the database depending on your configuration. If you try to LANGUAGE_SOLR_FTS_ALFRESCO, it's forcing SOLR, so if you don't have solr enabled, you would have an error.
Regards!