I have to insert a document in elasticsearch. Some of the fields in document can be null. So should I store them as null or as empty string?
I just want to know the pros and cons of storing it as null vs empty string.
NOTE : I don't have to run any queries on those fields which are null.
If you are not planning to run any queries on those fields, one good way is to use static mapping and just don't index them but if you have some use-case where you don't need to search on these fields but still want them to be part of your index, you can use the store param and choose to not store them.
This way these fields will still be part of _source which stores fields as it is(for ex: if you pass null, it will be null) and you can later retrieve them.
Regarding the pros and cons, we need to understand your use-case before commenting on them but my answer talks more on how to do it differently and optimize.
Note:, refer null_values to understand concept in details.
Related
I am still learning elasticsearch. I wanted to know, if there is a way where the type of the value of a particular key is not fixed, can we still index it?
for example firstName can be "xyz" and it can also be an object in another document of the same type, and there is a huge combination of such fields which can all have string or object as the value, so it is not like I can isolate the string one and the object one in different indexes.
Thanks
Elasticsearch doesn't support this.
Elasticsearch does have features to "auto detect" what the type for a field should be. However, the first time it sees a field, it will make its guess, and then every subsequent record that has that field will have to match.
In your case, if a record where firstName was a string was indexed first, then all records where firstName is an object will throw an error when you try to index them in Elasticsearch. If an object was indexed first, all the records where firstName is a string would fail.
Elasticsearch is designed to help you get started quickly, but ultimately there's no shortcut and you will have to:
Design a schema that tells Elasticsearch good settings to use for each field
Do the work in your code that imports records into Elasticsearch to make decisions about how you want to import them
I have a data JPA entity where it contains a "price" type Double. Now, the users need to able to filter the records based on that field (Between min and max). Now the problem is, the value in the DB can be null for some records. My data JPA repository uses a native query like "price BETWEEN :priceFrom AND :priceTo". Now, if the user does not specify anything in the filter conditions, all record including the ones where prices is null should be returned. However, this query does not return those record. I know, I can create a new method with query "price IS NULL" and check the filter values in my service layer and call the null version if nothing is specified. But, I have multiple fields with the same requirement then it results in a lot of duplicate methods to maintain. Is there a better approach to handle that situation?
It seems to me, that you can specify
(:priceFrom is null and :priceTo is null and price is null)
OR price between :priceFrom and :priceTo
if priceFrom and priceTo are entered, second part of OR will be used, otherwise it selects records where price is null
Since you are using Spring Data JPA - Specifications should solve this for you. This is a JPA Criteria based solution.
For any complex-enough API – searching/filtering your resources by very simple fields is simply not enough. A query language is more flexible and allows you to filter down to exactly the resources you need. Hence you should easily be able to program for NULL (in the scenario that you currently need) and anything else that you might need.
This is scalable for multiple fields and easy to code/configure. There are a few links which will give you more insight into it
Spring Blog
Tutorial 1
Tutorial 2
Hopefully, this is helpful.
If I create a first document of it's type, or put a mapping, is an index created for each field?
Obviously if i set "index" to "analyzed" or "not analyzed" the field is indexed.
Is there a way to store a field so it can be retrieved but never searched by? I imagine this will save a lot of space? If I set this to "no" will this save space?
Will I still be able to search by this, just take more time, or will this be totally unsearchable?
Is there a way to make a field indexed after some documents are inserted and I change my mind?
For example, I might have a mapping:
{
"book":{"properties":{
"title":{"type":"string", "index":"not_analyzed"},
"shelf":{"type":"long","index":"no"}
}}}
so I want to be able to search by title, but also retrieve the shelf the book is on
index:no will indeed not create an index for that field, so that saves some space. Once you've done that you can't search for that particular field anymore.
Perhaps also useful in this context is to know aboutthe _source field, which is returned by default and includes all fields you've stored. http://www.elasticsearch.org/guide/reference/mapping/source-field/
As to your second question:
you can't change your mind halfway. When you want to index a particular field later on you have to reindex the documents.
That's why you may want to reconsider setting index:no, etc. In fact a good strategy to begin is to don't define a schema for fields at all, unless you're 100% sure you need a non-default analyzer for a particular field for instance. Otherwise ES will use generally usable defaults.
I'm looking to search for a particular JSON document in a bucket and I don't know its document ID, all I know is the value of one of the sub-keys. I've looked through the API documentation but still confused when it comes to my particular use case:
In mongo I can do a dynamic query like:
bucket.get({ "name" : "some-arbritrary-name-here" })
With couchbase I'm under the impression that you need to create an index (for example on the name property) and use startKey / endKey but this feels wrong - could you still end up with multiple documents being returned? Would be nice to be able to pass a parameter to the view that an exact match could be performed on. Also how would we handle multi-dimensional searches? i.e. name and category.
I'd like to do as much of the filtering as possible on the couchbase instance and ideally narrow it down to one record rather than having to filter when it comes back to the App Tier. Something like passing a dynamic value to the mapping function and only emitting documents that match.
I know you can use LINQ with couchbase to filter but if I've read the docs correctly this filtering is still done client-side but at least if we could narrow down the returned dataset to a sensible subset, client-side filtering wouldn't be such a big deal.
Cheers
So you are correct on one point, you need to create a view (an index indeed) to be able to query on on the content of the JSON document.
So in you case you have to create a view with this kind of code:
function (doc, meta) {
if (doc.type == "youtype") { // just a good practice to type the doc
emit(doc.name);
}
}
So this will create a index - distributed on all the nodes of your cluster - that you can now use in your application. You can point to a specific value using the "key" parameter
I am developing an API using Codeigniter and MongoDB.
In this system I am saving the full name and _ID of users that the selected user
is following.
What is best to do regarding the _Id? Store it as an object or as a string?
If I store it as an object I need to convert it to string when echoing out followers otherwise
the output looks strange.
My question is really. Is it ok to store the _Id as a string rather than an object?
What is the downside of storing as string?
Thankful for all input!
Performance for requests (and updates) are really better with objectid. More over, objectid are quite small in space.
From the official doc :
BSON includes a binary data datatype for storing byte arrays. Using
this will make the id values, and their respective keys in the _id
index, twice as small.
here are 2 links that can help you :
- http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs
- http://www.mongodb.org/display/DOCS/Object+IDs
When you use ObjectId, it generates _id as a unique value in all your computers. So if you use Sharding, you will not worry about you _id conflicts. See how ObjectId generates in specification
But if you use string, you should generate it carefully.