I have xml like this.
<Root>
<a1>
<a>test</a>
<b>
<c>1</c>
<c>2</c>
</b>
</a1>
<a1>
<a>test2</a>
<b>
<c>3</c>
<c>4</c>
</b>
</a1>
</Root>
I will import data from this xml into solr.
I am using XPathEntityProcessor and I want to concate values of nodes <c>.
Which will result in "1,2" and "3,4".
Is there any way to achieve this?
Why do you need to concatenate them? Solr supports multivalued fields out of the box, you just need to declare them that way.
However, if you really do want to do, use DIH to put them into multiValued field and then concat them either with custom/script transformer or with (Solr 4+) update.chain and Update Request Processor. There is one that can concatenate.
It is not possible with XPath in solr.
This query will work with any XPath 2.0-compatible query processors, which solr doesn't seem to be:
//b/string-join(c/text(), ",")
Related
I’m trying to tag my data according to a lookup table.
The lookup table has these fields:
• Key- represent the field name in the data I want to tag.
In the real data the field is a subfield of “Headers” field..
An example for the “Key” field:
“Server. (* is a wildcard)
• Value- represent the wanted value of the mentioned field above.
The value in the lookup table is only a part of a string in the real data value.
An example for the “Value” field:
“Avtech”.
• Vendor- the value I want to add to the real data if a combination of field- value is found in an document.
An example for combination in the real data:
“Headers.Server : Linux/2.x UPnP/1.0 Avtech/1.0”
A match with that document in the look up table will be:
Key= Server (with wildcard on both sides).
Value= Avtech(with wildcard on both sides)
Vendor= Avtech
So baisically I’ll need to add a field to that document with the value- “ Avtech”.
the subfields in “Headers” are dynamic fields that changes from document to document.
of a match is not found I’ll need to add to the tag field with value- “Unknown”.
I’ve tried to use the enrich processor , use the lookup table as the source data , the match field will be ”Value” and the enrich field will be “Vendor”.
In the enrich processor I didn’t know how to call to the field since it’s dynamic and I wanted to search if the value is anywhere in the “Headers” subfields.
Also, I don’t think that there will be a match between the “Value” in the lookup table and the value of the Headers subfield, since “Value” field in the lookup table is a substring with wildcards on both sides.
I can use some help to accomplish what I’m trying to do.. and how to search with wildcards inside an enrich processor.
or if you have other idea besides the enrich processor- such as parent- child and lookup terms mechanism.
Thanks!
Adi.
There are two ways to accomplish this:
Using the combination of Logstash & Elasticsearch
Using the only the Elastichsearch Ingest node
Constriant: You need to know the position of the Vendor term occuring in the Header field.
Approach 1
If so then you can use the GROK filter to extract the term. And based on the term found, do a lookup to get the corresponding value.
Reference
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_static.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
Approach 2
Create an index consisting of KV pairs. In the ingest node, create a pipeline which consists of Grok processor and then Enrich it. The Grok would work the same way mentioned in the Approach 1. And you seem to have already got the Enrich part working.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
If you are able to isolate the sub field within the Header where the Term of interest is present then it would make things easier for you.
The example text is like this 172.16.17.39:6693, 172.16.17.34:6693 or 172.16.17.39:6693, 172.16.17.34:6693, 172.16.17.34:6694, I use upstream_addr:"172.16.17.39:6693, 172.16.17.34:6693" I can get the exact items, But when I use below methods:
upstream_addr:","
upstream_addr:"\,"
upstream_addr:,
upstream_addr:\,
upstream_addr:','
upstream_addr:'\,'
I get nothing back. I have no clue after I saw the lucene query document or the elastic document several times.
Below is the example of the upstream_addr:
upstream_addr:"172.16.6.133:6671, 172.16.6.134:6671, 172.16.6.176:6671"
upstream_addr:"172.16.6.111:6671"
upstream_addr:"172.16.6.134:6671, 172.16.6.176:6671"
upstream_addr:"172.16.6.111:6671, 172.16.6.134:6671, 172.16.6.176:6671"
upstream_addr:"172.16.6.176:6671"
upstream_addr:"172.16.6.176:6671, 192.168.0.127:6671"
I need to get all the items with more than one ip:port pairs. I found all the items with more than one ip:port pairs contains a comma in it, so I want to get the items with comma in it.
I want to sort solr result by sku,
Here is my query to sort result
http://localhost:8983/solr/test_core/select?sort=skucode+asc&q=*skucode*&wt=xml
skucode tag stored data in numeric field.
<str name="id">39902395</str>
<arr name="skucode"><long>5076501</long></arr>
I have stored data in solr using xml file.
It gives error that can not sort on multivalued field: skucode
Or Store data without multivalued
Please let me know how to store data without multivalued or how to change from backend.
Your query will work only if the field on which you are performing sorting is single valued not multi-valued. You can use function query to perform sort on multi-valued field.
http://localhost:8983/solr/test_core/select?sort=field(skucode,max)+ASC&q=*skucode*&wt=xml
For more Information on function query :- https://cwiki.apache.org/confluence/display/solr/Function+Queries
The second approach in which the query that are you using will also work is that you can change the data type for skucode and make it single value. To make it single value change data-type for skucode from longs to long in schema.xml.
<field name="skucode" type="long"/>
P.S :- After changing its data-type you need to re-index to reflect your changes
I am making a Solr request and passing id of the docs in fq parameter . Solr as expected returns the doc sorted by score of the docs . I want Solr to return docs in the same order as i sent the ids .
So is there a value to sort parameter than i can give to get the desired result .
Instead of passing to fq. Use elevator component.
Edit elevate.xml file in your solr conf folder and
<query text="ipod">
<doc id="S03" />
<doc id="S01" />
<doc id="S02" />
</query>
irrespective of solr score value, it returns docs in order how you given in elevate.xml file.
OR
Use bq (Boosting) parameter with edismax.
example:
defType=edismax&q="ipod"&bq=id:S03^4+id:S02^3+id:S01^
The default sort order - if no sort is provided and all scores are similar (or using just fq's - should be the internal Lucene ID - meaning you should get the documents back in the order they're added.
You can ask for this explicitly by using sort=_docid_ asc.
But if this is really important - as documents will get reordered if updated and you might want to keep the original sorting order, add a field to keep track of the time when the document was added and sort by that field.
I am using solr for data indexing for storing some of field. I am using field as <field name="Content" type="string" indexed="true" stored="true" multiValued="true"/>, the data is in base64 encoded format.
For the field content I want to search in that data using keywords which are in plain text. By decoding base64 I can find that keyword in the content.
(like elastic search with attachment field type where we have to pass base64 encoded data and we can search in that data)
I'm using query on solr browser but not able to find the result:
http://localhost:8983/solr/collection/select?q=Content%3A*English*&wt=json&indent=true
Solr does not know your content is base64. Furthermore, type=string is not tokenized.
So, you need to do some pre-processing. Probably as a custom element somewhere. If you just want to search the field, you probably don't need to store it (just index) and could have a custom UpdateRequestProcessor that does base64 decoding.
If you want to actually store the field, then the processing needs to happen as the very first step of the indexing pipeline. So you need a custom CharacterFilter before you do tokenization.
Unfortunately, neither component exists in the base distribution right now. You would have to code it in Java or - if you are using UpdateRequestProcessor - in Javascript.