I am developing an API using Codeigniter and MongoDB.
In this system I am saving the full name and _ID of users that the selected user
is following.
What is best to do regarding the _Id? Store it as an object or as a string?
If I store it as an object I need to convert it to string when echoing out followers otherwise
the output looks strange.
My question is really. Is it ok to store the _Id as a string rather than an object?
What is the downside of storing as string?
Thankful for all input!
Performance for requests (and updates) are really better with objectid. More over, objectid are quite small in space.
From the official doc :
BSON includes a binary data datatype for storing byte arrays. Using
this will make the id values, and their respective keys in the _id
index, twice as small.
here are 2 links that can help you :
- http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs
- http://www.mongodb.org/display/DOCS/Object+IDs
When you use ObjectId, it generates _id as a unique value in all your computers. So if you use Sharding, you will not worry about you _id conflicts. See how ObjectId generates in specification
But if you use string, you should generate it carefully.
Related
My log POCO has several fixed properties, like user id, timestamp, with a flexible data bag property, which is a JSON representation of any kind of extra information I'd like to add to the log. This means the property names could be anything within this data bag, bringing me 2 questions:
How can I configure the mapping so that the data bag property, which is of type string, would be mapped to a JSON object during the indexing, instead of being treated as a normal string?
With the data bag object having arbitrary property names, meaning the overall document type could have a huge number of properties inside, would this hurt the search performance?
For the data translation from string to JSON you can use ingest pipeline with JSON processor:
https://www.elastic.co/guide/en/elasticsearch/reference/master/json-processor.html
It depends of you queries. If you'll use the "free text search" - yes, the huge number of fields will slow the query. If you you'll use query like "field":"value" - no, there is no problem with the fields number in the searches. Additional information about query optimization you cold find here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/tune-for-search-speed.html#search-as-few-fields-as-possible
And the question is: what you meen, when say "huge number"? 1000? 10000? 100000? As part of optimization i recommend to use dynamic templates with the definition: each string field automatically ingest into the index as "keyword" and not text + keyword. This setting decrease the number of fields to half.
I have to insert a document in elasticsearch. Some of the fields in document can be null. So should I store them as null or as empty string?
I just want to know the pros and cons of storing it as null vs empty string.
NOTE : I don't have to run any queries on those fields which are null.
If you are not planning to run any queries on those fields, one good way is to use static mapping and just don't index them but if you have some use-case where you don't need to search on these fields but still want them to be part of your index, you can use the store param and choose to not store them.
This way these fields will still be part of _source which stores fields as it is(for ex: if you pass null, it will be null) and you can later retrieve them.
Regarding the pros and cons, we need to understand your use-case before commenting on them but my answer talks more on how to do it differently and optimize.
Note:, refer null_values to understand concept in details.
I am still learning elasticsearch. I wanted to know, if there is a way where the type of the value of a particular key is not fixed, can we still index it?
for example firstName can be "xyz" and it can also be an object in another document of the same type, and there is a huge combination of such fields which can all have string or object as the value, so it is not like I can isolate the string one and the object one in different indexes.
Thanks
Elasticsearch doesn't support this.
Elasticsearch does have features to "auto detect" what the type for a field should be. However, the first time it sees a field, it will make its guess, and then every subsequent record that has that field will have to match.
In your case, if a record where firstName was a string was indexed first, then all records where firstName is an object will throw an error when you try to index them in Elasticsearch. If an object was indexed first, all the records where firstName is a string would fail.
Elasticsearch is designed to help you get started quickly, but ultimately there's no shortcut and you will have to:
Design a schema that tells Elasticsearch good settings to use for each field
Do the work in your code that imports records into Elasticsearch to make decisions about how you want to import them
I want to have in the search response only documents with specified doc id. In stackoverflow I found this question (Lucene filter with docIds) but as far as I understand there is created the additional field in the document and then doing search by this field. Is there another way to deal with it?
Lucene's docids are intended only to be internal keys. You should not be using them as search keys, or storing them for later use. Those ids are subject to change without warning. They will be changed when updating or reindexing documents, and can change at other times, such as segment merges, as well.
If you want your documents to have a unique identifier, you should generate that key separate from the docId, and index it as a field in your document.
I am using elasticsearch as a document database and each record I create has a guid id that the system uses for the record id. Business people want to offer a feature to let the user have their own auto file name convention based on date and how many records were created so far this day/month.
What I need is to prevent duplicate user file names. Is there a way to setup an indexed field to be unique? Like a sql unique constraint?
You'd need to use the field that is supposed to be unique as id for your documents. By default a new document with existing id would override the existing document with same id, but you can switch to op_type=create in order to get back an error if a document with same id already exists.
There's no way to have the same behaviour with arbitrary fields though, only the _id field works that way. I would probably consider handling this logic in the application layer instead of within elasticsearch.
One solution will be to use uniqueId field value for specifying document ID and use op_type=create while storing the documents in ES. With this you can make sure your uniqueId field will have unique value and will not be overridden by another same valued document.
For this, the elasticsearch document says:
The index operation also accepts an op_type that can be used to force a create operation, allowing for "put-if-absent" behavior. When create is used, the index operation will fail if a document by that id already exists in the index.
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/es_index/es_type/unique_a?op_type=create' -d '{
"user" : "kimchy",
"uniqueId" : "unique_a"
}'
If you run the above request it is ok, but running it the next time will give you an error.
You can use the _id in the column you want to have unique contraint on.
Here is the sample river that uses postgresql. Yo can change the Database Driver/DB-URL according to your usage.
curl -XPUT localhost:9200/_river/simple_jdbc_river/_meta -d "{\"type\":\"jdbc\",\"jdbc\":{\"strategy\":\"simple\",\"poll\":\"1s\",\"driver\":\"org.postgresql.Driver\",\"url\":\"jdbc:postgresql://DB-URL/DB-INSTANCE\",\"user\":\"USERNAME\",\"password\":\"PASSWORD\",\"sql\":\"select t.id as _id,t.name from topic as t \",\"digesting\" : true},\"index\":{\"index\":\"jdbc\",\"type\":\"topic_jdbc_river1\"}}"
So far as to ES 7.5, there is no such extra "constraint" to ensure uniqueness using a custom field in the mapping.
But you still can walk around it via your own application UUID, which could be used directly explicitly as the _id (which is unique implictly) to achieve your goals.
PUT <your_index_name>/_doc/<your_app_uuid>
{
"a_field": "a_value"
}
Another approach might be to generate the string you store in a field that should be unique by integrating an auto-incrementing integer. This way you ensure from the start that your field values are unique.
You would put your file name together like this:
<current day/month>_<auto-incremented integer>
Auto-incrementing integers are not supported by Elasticsearch per se but you could mimic them using this approach. If you happen to use node.js you can use the es-sequence module.