How Alfresco and SOLR works with indexes queries - oracle

I have a doubt about how indexed properties works in Alfresco 4.1.6 with SOLR 1.4.
I use something like this for my queries:
SearchParameters sp = new SearchParameters();
sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet results = getSearchService().query(sp);
where query variable is something like this:
PATH:" /app:company_home/app:user_homes/cm:_x0030_123//*" AND
((#cm\:title:food) OR (#cm\:name:abcde) OR (TEXT:valles) OR
(#doc\:custom_property:"report") OR (#doc\:custom_property2:"report")
AND (#doc\:custom_property3:"report") AND TYPE:"{my.model}voc_document"
On my model.xml I specify what custom properties are indexed
<index enabled="true">
My question is... How works SOLR 1.4 with the indexes if I put on the search query two or more indexed properties? Like Oracle? Oracle try the best index and use only this. Or maybe SOLR combine all the indexed properties and uses all the index on the query?
I need this answer to determine how many indexes put on my model.xml. Maybe put a lot of indexes don't give me the best and efficient result and is better index only a few properties.
And finally, one question. I use LANGUAGE_FTS_ALFRESCO, but I can see that exists a LANGUAGE_SOLR_FTS_ALFRESCO. Is the same? I need to use the second if I use SOLR?
Thanks a lot!
Best regards

There is only one "index". Every field you mark as indexable (which is enabled by default) ends up in your solr index. Alfresco takes your query and sends it to SOLR for processing.
If you don't have a lot of documents, you can go ahead and index every field. By far the biggest impact on indexing and search is the full text index of the content field, which is enabled by default also.
LANGUAGE_FTS_ALFRESCO will use whatever index subsystem you have enabled. In later versions it may use SOLR or the database depending on your configuration. If you try to LANGUAGE_SOLR_FTS_ALFRESCO, it's forcing SOLR, so if you don't have solr enabled, you would have an error.
Regards!

Related

Boosting search result based on each item in Solr Multivalued field

I am working on integrating Solr to my application. I have a List of keywords associated to each product. I use multivalued for Keyword field and indexed it. The problem is I want to Boost search result based on each item in the multivalued field in Solr index in order. (Currently I don't see order in the search result for multivalued field which I will fix it later.)
If I want to do this in my side I need to add different search fields and index them through Solr and set boost for each of them.
But I want to know if I use a list as multyvalued field in Solr can I do something like that without the cost of db schema change.
I am so new in Solr and if you find the question is so basic please give me any resource that gives me a hint to solve the problem. I am currently reading Apache Solr documentation and so far couldn't find anything that helps me.

Can we migrate non stored Index data in SOLR to Elastic search?

We are currently using SOLR for full-text search. Now we are planning to move from SOLR to ElasticSearch. When we were in this process i have read somewhere that there are some plugins available which will migrate data from SOLR-ElasticSearch. But it won't be able to migrate those records which are not stored in SOLR. So is there a plugin available which will migrate non-stored index data from SOLR to elastic search if so please let me know.
Currently am using SOLR-to-ES plugin, but it won't migrate the non-stored index data.
Thanks
If the field is not stored, then you don't have the original value. If you have it indexed, what's is in there is the value after it has gone through the analysis chain, and so is probably different than the original one (has no stopwords, is probably lowercased, maybe stemmed...stuff like that).
There are a couple of possibilities that might allow you to have the original content when not stored:
indexed field: if it has been analyzed with just the keyword tokenizer: then the indexed value is the original value.
field has docValues=true then the original value is also stored. This feature was introduced later, so your index might not be using it.
The issue is, the common plugings might not take advantage of those cases where stored=true is not totally necessary. You need to check them.

reindexing elastic search or updating indexes?

I am now on elastic search, I cant figure out how to update elastic search index,type or document without deleting and reindexing? or is it the best way to achieve it?
So if I have products in my sql product table, should I better delete product type and reindex it or even entire DB as index on elasticsearc. what is the best use case and how can I achieve it?
I would like to do it with Nest preferably but if it is easier, ElasticSearch works for me as well.
Thanks
This can be a real challenge! Historic records in elasticsearch will need to be reindexed when the template changes. New records will automatically be formatted according to the template you specify.
Using this link has helped us a lot:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
You'll want to be sure to have the logstash filter set up to match the fields in your template.

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene

Do elasticsearch queries touch the DB?

Just starting to use elasticsearch with haystack in django using postgres, and I'm pretty happy with it so far.
I'm wondering if the search queries (filters) through ES will submit a query to the DB or do they use data gathered during indexing?
Given that I can delete the data in the DB and still search, the answer seems to be yes, the queries do not touch the DB but only touch the index.
Also, I found this documentation on the matter:
http://django-haystack.readthedocs.org/en/latest/best_practices.html#avoid-hitting-the-database
Further, this is also from the docs:
For example, one great way to leverage this is to pre-rendering an
object’s search result template DURING indexing. You define an
additional field, render a template with it and it follows the main
indexed record into the index. Then, when that record is pulled when
it matches a query, you can simply display the contents of that field,
which avoids the database hit.:

Resources