Sort by a truncated date in SOLR - sorting

Is it possible to sort results from a SOLR query only on a truncated value of a date field (e.g. only on year-month excluding day and time)?
For example if I have
<doc>
<field name="id">b</field>
<field name="pubdate">2016-10-12T00:00:00Z</field>
<field name="title">b</field>
</doc>
<doc>
<field name="id">c</field>
<field name="pubdate">2016-10-13T00:00:00Z</field>
<field name="title">c</field>
</doc>
<doc>
<field name="id">a</field>
<field name="pubdate">2016-09-01T00:00:00Z</field>
<field name="title">a</field>
</doc>
I would like to query using sort=pubdate desc,title asc and obtain:
b
c
a
I looked at Function Queries but I couldn't find any date truncate function.
Solr version is 5.5.0 and pubdate is a TrieDateField.

Related

Use condition to get only records with non-numbers values in a fetchXml Dynamics CRM query

I'm querying Dynamics CRM using fetchXML.
I have an entity that have attribute (placeName) where its value is either string or value of a number.
I would like to have condition that only the records with a non-number value should be selected.
I didn't find any solution for that in the dynamics documents, but maybe there is a solution using an "out-of-the-box" (custom condition).
This is my current fetch query:
<fetch mapping="logical" distinct="true" version="1.0">
<entity name="locations">
<attribute name="placeID" />
<attribute name="placeName" /> // This can be values like "home" or 100 - I would like to take out only those which are not a number
</entity>
Though I couldn't find it documented anywhere, you can use like operator with regex-ish syntax.
For example, the following query would retrieve systemuser records that only contains numbers in their domainname:
<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="false">
<entity name="systemuser">
<condition attribute="domainname" operator="like" value="%[0-9]%" />
</entity>
</fetch>
And in your case, the following would retrieve records only with letters a-z or A-Z:
<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="false">
<entity name="locations">
<condition attribute="placeName" operator="not-like" value="%[0-9]%" />
</entity>
</fetch>

Solr - PatternCaptureGroupFilterFactory does not index regexp result

I am using Solr 7. I have a field "date" of type "pdate" that contains a date such as
2019-01-24T14:43:13Z.
<field name="date" type="pdate" stored="true"/>
I want from this date to index in a "hour" field the value 14:43:13.
To do that, I created a new field
<field name="hour" type="parsed_hour" indexed="true" stored="true"/>
I used a copyfield to put in "hour" the value of my "date"
<copyField source="date" dest="hour"/>
and setup a PatternCaptureGroupFilterFactory, to make sure that "hour" contains only the hour digits of the "date" value.
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<fieldType name="parsed_hour" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternCaptureGroupFilterFactory" pattern="(\d{2}:\d{2}:\d{2})" preserve_original="false"/>
</analyzer>
</fieldType>
I did try the regex and it is capturing the right group:
https://regex101.com/r/r7Id29/1
But as a result, after indexing I see that my "hour" field is just an exact copy of my "date field", without any modifcation.
I expected "hour" to have a value of "14:43:13".
Also if I test using the Analysis in Solr Admin UI, I see that my date is properly parsed by the PatternCaptureGroupFilterFactory to output my hour.
The problem is the result of the PatternCaptureGroupFilterFactory is just not pushed to the "hour" field...

unable to understand sorting data on query solr

I am trying to understand a solr sort clause I found in Legacy code -
q=*:*
sort=product(if(salesAmount,salesAmount,0.05), query($sortbq)) desc,
sortbq=*:*^10.000 brand:"nike"^1.600
fl=salesAmount,queryVal:query($sortbq)
Sample document in result looks like -
<!-- FOR brand=nike -->
<doc>
<double name="salesAmount">91743.75</double>
<str name="brand">Nike</str>
<float name="queryVal">2.3159266</float>
</doc>
<!-- FOR brand!=nike -->
<doc>
<str name="prdId">1070694</str>
<double name="sls_amt">92660.75</double>
<str name="brand">Lee</str>
<float name="queryVal">0.19959758</float>
</doc>
Can anybody please explain how this query($sortbq)) calculates a single value on which sorting is done? I tried the solr query with debug=true and getting the below values in debug section -
<str name="1139424">
1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm
</str>
<str name="1011619">
1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm
</str>
PS : If any one chooses to down-vote this question, please do mention reason in comments.
Please try by putting your sort clause in bq of solr query and put "debug.explain.structured=true"
You will find how the sort score is being calculated.

Sort results in alphabetical order with type=text_en

I have a solr text field as follows.
<field name="news_headline_ln_en" type="text_en" indexed="true" stored="true"/>
And when querying to sort results as follows, it doesn't show results in correct alphabetical order.
http://localhost:8983/solr/news/select?fl=news_headline_ln_en&indent=on&q=*:*&rows=100&sort=news_headline_ln_en%20desc&start=0&wt=json
Result response:
{
"responseHeader":{
"status":0,
"QTime":45,
"params":{
"q":"*:*",
"indent":"on",
"fl":"news_headline_ln_en",
"start":"11610",
"sort":"news_headline_ln_en asc",
"rows":"12021",
"wt":"json",
"_":"1478085256196"}},
"response":{"numFound":12621,"start":11610,"docs":[
{
"news_headline_ln_en":"Eleven stocks up despite UAE markets decline"},
{
"news_headline_ln_en":"\nOil Prices Decline on Fed Rate Rise Jitters"},
{
"news_headline_ln_en":"Euro unemployment rate declines in February"},
{
"news_headline_ln_en":"Investors Holding’s Q4 profits decrease"},
{
"news_headline_ln_en":"DED honors ‘On Time’ in Oud Metha for excellence"},
{
"news_headline_ln_en":"\nTreasures From The Deep -- WSJ"},
{
"news_headline_ln_en":"Tunisia shares deepen early losses"},
{
"news_headline_ln_en":"EGX deepens losses in week"},
{
As you can see it is not sorted alphabetically. Anyone does know a possible reason? Appreciate any help.
You can't. text_en isn't suited for sorting, as it tokenizes the input and breaks the text up into separate tokens. These tokens are not usable for sorting.
The solution is to add a copyField instruction that copies the content from the text_en field over to a field that is suitable for sorting, such as a string field or a text field with a KeywordTokenizer (which will allow you to lowercase the string, but keep it as a single token - if you want the sort to be case insensitive). If you're using a string field, you'll have to lowercase the field before indexing it yourself if you want the sort to be case insensitive.
<copyField source="news_headline_ln_en" dest="news_headline_ln_en_sort" />
.. and then use sort=text_sort for sorting. You can use the maxChars setting if you only need to copy the beginning of the original string (for example if you're sorting by the start of an article, you probably only need the first 20-40 characters of the article for the sort to be useful).
Also see defining fields and the Schema API.

Indexing field in ElasticSearch

In Solr schema we are defining the index for the field to be true/false which is being helpful in search query.
e.g. :
<field name="features" type="text" **indexed="true"** stored="true" multiValued="true"/>
How to achieve the same functionality in ElasticSearch. I know there is a mapping called "_index" but not sure about the functionality of it.
Can anyone help me with this?
When you define a mappping you can use the "index" attribute but it is not a boolean. It can hold one of three values. As stated in elastic docs:
The index attribute controls how the string will be indexed. It can contain one of three values:
analyzed
: First analyze the string and then index it. In other words, index this field as full text.
not_analyzed
: Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
no
: Don’t index this field at all. This field will not be searchable.
The default value of index for a string field is analyzed. If we want to map the field as an exact value, we need to set it to not_analyzed:
The usage is:
"field_name": {
"type": "string",
"index": "not_analyzed"
}

Resources