how to specify use of STC algorithm in Solr admin console? - algorithm

I have a test Solr environment using Carrot2 on Ubuntu. With the Carrot2 workbench I can alternate between the three defined algorithms (Lingo, STC, kmeans). How do I do the same thing in the Solr admin query tool? is it an argument passed with the clustering parameter?
All 3 algorithms are defined in the solrconfig.xml, which is by and large a copy of the example from collection1. I'm inferring it is using Lingo by default but not sure where to switch it to STC if I'd like.

You can use the clustering.engine parameter to select the clustering engine at query time. To cluster the results with STC, add clustering.engine=stc to your request. In Solr Admin, you can pass the extra parameter in the Raw Query Parameters field of the query screen. You may also need to change the default Request-Handler to the one that has results clustering enabled.

Related

How to create visualisation on the fly using a script in Kibana 4

I have some requirement where I need to create different visualization for different users which will differ very slightly on the query param. So, I am considering to create a script which will enable me to do this.Have anyone done this on Kibana 4. Some pointers on how to create visualization using query would be of great help.
I would also like to create Dashboards on the fly but that can wait till I get this one sorted out.
If you want to go ahead with Java plugin (as mentioned in comments), here are the steps:
Create different visualizations with different X-axis parameters. Visualizations are basically json strings so you can write a java code which changes the value of x aggregation based on the mapping that you have. Now each chart will have different ids.
While you are creating a custom dashboard based on the user, check the mapping between user and the visualization and use the following command to add the visualization:
client.prepareIndex(,"visualization",).setSource().execute();

Forcing filter execution in ElasticSearch

Is there a way to force a (query) filter to be executed for every query irrespective of whether or not it is present in the search query request? In my case, I have a native search script which is used to filter documents based on a dynamically changing list which is maintained outside of the elasticsearch instance. Since I do not control all the clients which query the server, I can't guarantee that they will do the filtering properly or add a reference to the script in the request and would therefore like to force the filter execution within the ES server itself. Is this (easily) achievable? (I am using ES 1.7.0/2.0)
TIA
If users can submit arbitrary requests to the cluster, then there is absolutely nothing that you can do to stop them from doing whatever they want to do.
You really only have two options here:
If users can select arbitrary queries/filters, but you control the index or indices that they go too, then you can use filtered aliases to limit what they can see.
Use Shield (not free) to prevent arbitrary access to limit what indices/aliases any given request can access (with aliases using filters to hide data).
Aliases are definitely the way to go. Create an alias per client if you need a different filter per client and ask him to talk to that alias.

Kibana: How to visualise based on two fields

I have imported weblogs into Elasticsearch via Logstash. This has completed successfully.
I have a field in the log file (clientip) that is always populated and another field that is sometimes populated (trueclientip). I want to aggregate based on the coalescing of the two; e.g. if trueclientip is not empty then use that otherwise use clientip.
How can I do this with the Visualisation in Kibana? Do I need to generate a scripted field or is there another approach?
Thanks.
Define a scripted field that should have this formula: doc['trueclientip'].value ? doc['trueclientip'].value : doc['clientip'].value and use this in your aggregations.
But, there is a downside to this scripted fields functionality AND the ip type: it seems what you get back from the script is the number itself (which is logic because the scripted fields in Kibana 4 only use Lucene expressions as a language), not the string representation. IPs internally are actually long numbers in Lucene.
For example, 127.0.0.1 is represented internally as 2130706433. And this is what you will see in Visualize.
Is not ideal, indeed, and it would be good to have a more advanced scripting language in scripted fields, but a github issue already exists.

How Alfresco and SOLR works with indexes queries

I have a doubt about how indexed properties works in Alfresco 4.1.6 with SOLR 1.4.
I use something like this for my queries:
SearchParameters sp = new SearchParameters();
sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet results = getSearchService().query(sp);
where query variable is something like this:
PATH:" /app:company_home/app:user_homes/cm:_x0030_123//*" AND
((#cm\:title:food) OR (#cm\:name:abcde) OR (TEXT:valles) OR
(#doc\:custom_property:"report") OR (#doc\:custom_property2:"report")
AND (#doc\:custom_property3:"report") AND TYPE:"{my.model}voc_document"
On my model.xml I specify what custom properties are indexed
<index enabled="true">
My question is... How works SOLR 1.4 with the indexes if I put on the search query two or more indexed properties? Like Oracle? Oracle try the best index and use only this. Or maybe SOLR combine all the indexed properties and uses all the index on the query?
I need this answer to determine how many indexes put on my model.xml. Maybe put a lot of indexes don't give me the best and efficient result and is better index only a few properties.
And finally, one question. I use LANGUAGE_FTS_ALFRESCO, but I can see that exists a LANGUAGE_SOLR_FTS_ALFRESCO. Is the same? I need to use the second if I use SOLR?
Thanks a lot!
Best regards
There is only one "index". Every field you mark as indexable (which is enabled by default) ends up in your solr index. Alfresco takes your query and sends it to SOLR for processing.
If you don't have a lot of documents, you can go ahead and index every field. By far the biggest impact on indexing and search is the full text index of the content field, which is enabled by default also.
LANGUAGE_FTS_ALFRESCO will use whatever index subsystem you have enabled. In later versions it may use SOLR or the database depending on your configuration. If you try to LANGUAGE_SOLR_FTS_ALFRESCO, it's forcing SOLR, so if you don't have solr enabled, you would have an error.
Regards!

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene

Resources