configuring elastic search index - elasticsearch

I was exploring the documentation and Q&A regarding on how to configure an index in ES. At some point I get really confused. I found two different versions/ways (?) on how to do it, but I can't seem to find what the difference is.
(1) this one: Elasticsearch: Constructing mappings for Java Client which seems to be one yml file that contains all the definitions for the index bookshelf (in the given example)
(2) a definition of a tweet: http://www.elasticsearch.org/guide/reference/mapping/object-type/ (JSON)
to me, (1) seems to be more conclusive. but anyway, what confuses me is that (1) has mappings defined, and (2) has properties - what is the difference? what is the correct/better way of defining fields for types of an index?

They're both kinda the same really. 1. Is just being fed to a client which will eventually just output a JSON file that looks like 2.
The reason 2 looks odd to you is because its documenting/demonstrating a specific type of mapping, not telling you how to create index mappings.
If you're using an elasticsearch client then consult the documentation for how they want you to specify mappings, if you want to interface with elasticsearch using REST commands directly then read this documentation, (read it anyway to understand how to construct mappings in general)

Related

Kibana (elasticsearch visualization) - how add plot based on sub-string of field?

I have a field in my logs called json_path containing data like /nfs/abc/123/subdir/blah.json and I want to create count plot on part of the string abc here, so the third chunk using the token /. I have tried all sorts of online answers, but they're all partial answers (nothing I can easily understand how to use or integrate). I've tried running POST/GET queries in the Console, which all failed due to syntax errors I couldn't manage to debug (they were complaining about newline control chars, when there were none that I could obviously see or see in a text editor explicitly showing control-characters). I also tried Management -> Index Patterns -> Scripted Field but after adding my code there, basically the whole Kibana crashed (stopped working temporarily) until I removed that Scripted Field.
All this elasticsearch and kibana stuff is annoyingly difficult, all the docs expect you to be an expert in their tool, rather than just an engineer needing to visualize some data.
I don't really want to add a new data field in my log-generation code, because then all my old logs will be unsupported (which have the relevant data, it just needs that bit of string processing before data viz). I know I could probably back-annotate the old logs, but the whole Kibana/elasticsearch experience is just frustrating and I don't use it enough to justify learning such detailed procedures (I actually learned a bunch of this stuff a year ago, and then promptly forgot it due to lack of use).
You cannot plot on a sub string of a field unless you extract that sub string into a new field. I can understand the frustration in learning a new product but to be able to achieve what you want you need to have that sub string value in a new field. Scripted fields are generally used to modify a field. To be able to extract sub string from a field I’d recommend using Ingest Node processor like grok processor. This will add a new field which you can use to plot in Kibana visualizations..

How to search on FHIR using complex nested queries

I haven't really found examples or instructions on how a complex nested query should look like when searching a FHIR resource.
Some examples (pseudo-code):
(name=Mary AND gender=female) OR (address-city=Springfield AND
address-state=NY)
((name=Mary AND gender=female) OR
(address-city=Springfield & address-state=NY)) AND active=true
Is that even possible? If yes, how?
FHIR supports quite an elaborate search syntax, but it isn't a query language. The searches you want cannot be done in 1 go with this, unless you have access to the server and can implement the queries on that yourself.
If you have access/influence server side, you can implement a named query, and then use the _query search parameter to execute that (see http://www.hl7.org/fhir/search.html#query).
If you don't have that access, you can perform your queries in a couple of steps. For example your first one would take 2 queries:
GET [fhir endpoint]/Patient?name=Mary&gender=female
GET [fhir endpoint]/Patient?address-city=Springfield&address-state=NY
Both would give you a Bundle of results. The two Bundles together would be the complete list of matching resources you were looking for.
For the second example query, you would need to supply both GETs with &active=true.

Partial Indexing of an XML file (Bleve)

I am evaluating a couple different libraries to see which one will best fit what I need.
Right now I am looking at Bleve, but I am happy to use any library.
I am looking to index full files except specific ones which are in XML format. For those I only want Bleve to index specific tags as most of the tags are worthless to search. I am trying to evaluate if this is possible but, being new to Bleve, I am not sure what part I need to customize.
The documentation is very good, but I can't seem to find this answer. All I need is an explanation with keywords and steps, no code is required, I just need a push as I have spent hours spinning my wheels with google searches and I am getting no where.
There are probably many ways to approach this. Here's one.
Bleve indexes documents which are collections of key/value metadata pairs.
In your case, a document could be represented by 2 key/value pairs: name of .xml file (to uniquely identify the document) and content of the file.
type Doc struct {
Name string
Body string
}
The issue is that body is XML and Bleve doesn't support XML out-of-the-box.
A way to address it would be to pre-process XML file by stripping unwanted tags and content. You can do it using encoding/xml standard library.
For an example of a similar task you can see the code of https://github.com/blevesearch/fosdem-search/
In there they index file in custom format (https://github.com/blevesearch/fosdem-search/blob/master/fosdem.ical) by parsing it into a format they can submit to Bleve for indexing (https://github.com/blevesearch/fosdem-search/blob/master/ical.go).

What's the difference in calling ThinkingSphinx.search and ModelName.search...?

I have started using ThinkingSphinx for text search but can anybody explain me what's the difference between these two way of calling thinkingsphinx search , though I see both returns the same result and working fine in my local system. But does it effect in some other environment like production.??
From the documentation of Thinking Sphinx:
You can use all the same syntax to search across all indexed models in
your application:
ThinkingSphinx.search 'pancakes'
So if you call search on ThinkingSphinx, it searches in all of your indexed models.

Manually specifying how to build an index?

I'm looking into Thinking Sphinx for it's potential to solve an indexing problem. It looks like it has a very specific API for telling it what fields to index on a model. I don't like having this layer of abstraction in my way without being able to sidestep it. The thing is I don't trust Sphinx to be able to interpret my model properly as this model could have any conceivable property. Basically, I want to encode JSON in a RDBMS. In a way, I'm looking to make an RDBMS behave like MongoDB (RDBMSes have features I don't want to do without). If TS or some other index could be made to understand my models this could work. Is it possible to manually provide key/value pairs to TS?
"person.name.first" => "John", "person.name.last" => "Doe", "person.age" => 32,
"person.address" => "123 Main St.", "person.kids" => ["Ed", "Harry"]
Is there another indexing tool that could be used from Ruby to index JSON?
(By the way, I have explored a wide variety of NoSQL databases. I am trying to address a very specific set of requirements.)
As Matchu has pointed out in the comments, Sphinx usually interacts directly with the database. This is why Thinking Sphinx is built like it is.
However, Sphinx (but not Thinking Sphinx) can also accept XML data formats - so if you want to go down that path, feel free. You're going to have to understand the underlying Sphinx structure much more deeply than you would if using a normal relational database/ActiveRecord and Thinking Sphinx approach. Riddle may be useful for building a solution, but you'll still need to understand Sphinx itself first.
Basically, when you're specifying what you want to index--that is, when you want to build your own index--you're using the Map part of Map/Reduce. CouchDB supports exactly this. The only problem I ran into with Couch is that I want to query other document objects as the basis of my Map/Reduce since those documents would contain metadata about how I want to build my indexes. This goes against the grain of Map/Reduce however as you have to map a document in isolation with no external data. If you need external data it would instead be denormalized into your documents.

Resources