Primary Keys and CouchDB - ruby

CouchDB's versioning is an absolute boon to the application I'm writing, but each of the objects I want to represent in the database has it's own unique identifier (let's call it my_id), so I don't really need the _id field.
Is there a way for me to tell CouchDB that I want to make my field the primary hey (not _id)?
I'm using ruby's couchrest_model, so I know I can do Model.find_by_my_id(params[:my_id]) if I've put view_by :my_id in my class, but this feels like I'm storing an _id for no purpose. Should I care?

would it not be possible to, when you create the document, provide your own id instead of the default one couchb assigns? I don't know if ruby's couchrest can do it, but it's available in the CouchDB API
See here: http://wiki.apache.org/couchdb/HTTP_Document_API#PUT
The document ID is passed into the url.

Related

Updating fields of a Couchbase document if it exists by Go

I am using gocb library. I want to update specific field of a document.
However if the document does not exist, I don't want to do anything I will just produce an error message.
You can say that first retrieve the full document itself and make update and then insert it.It is possible right. But I want to use a ready for use method for this purpose if there is any. Since I don't want to retrieve the document. I just want to update some fields of it.
Is there a way for this in gocb library?
If you want to update parts of the document, you can look at sub-document operations. It only transmits the accessed sections of the document over the network making it more efficient for small changes.
Example: https://couchbase.live/examples/basic-go-subdoc-mutate
If you want to rewrite the entire document, you are looking for Replace() which replaces an existing document with a new one. It is similar to Upsert() except that it can only replace existing documents & not create new ones.
General Reference:https://docs.couchbase.com/server/current/guides/updating-data.html

Map multiple values to a unique column in Elasticsearch

I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields

Getting started with Bleve using BoltDB

I am trying to wrap my head around Bleve and I understand everything that is going on in the tutorials, videos and documentation. I however get very confused when I am using it on BoltDB and don't know how to start.
Say I have an existing BoltDB database called data.db populated with values of struct type Person
type Person struct {
ID int `json:"id"`
Name string `json:"name"`
Age int `json:"age"`
Sex string `json:"sex"`
}
How do I index this data so that I can do a search? How do I handle the indexing of data that will be stored in the database in the future?
Any help will be highly appreciated.
Bleve uses BoltDB as one of several backend stores and is separate from where you store your application data. To index your data in Bleve, simply add your Index:
index.Index(person.ID, person)
That index exists separately from your application data (whether it's in Bolt, Postgres, etc).
To retrieve your data, you'll need to construct a search request using bleve.NewSearchRequest(), then call Index.Search(). This will return a SearchResult which includes a Hits field where you can retrieve the ID for your object. You can use this to look up the object in your application data store.
Disclaimer: I am the author of BoltDB.
How you index your data depends on how you want to query for it.
If you want to query by any arbitrary fields, like {Age:15, Name:"Bob"} then BoltDB isn't an awesome fit for your problem.
BoltDB is simply a key value store with fast access to sequential keys and efficient prefix seeking. It's not really a replacement for general use databases.
You likely want something more like a document store (ie: MongoDB) or RDBMS (ie: PostgreSQL).
If you just wanted something that uses simple files and is embedded, you could also use SQlite with the Go module
If you want to search by only a single field, like ID or Name, then use that as the key.
If lookup speed doesn't matter at all, I guess you can use Bolt to just iterate over the entire db, parse the json and check the fields. But that's probably the worst approach you could take.

Is it possible to get a RethinkDB document only knowing the UUID (no table)?

If UUIDs are unique across RethinkDB, I was wondering whether you could get a document having only its UUID, without knowing the table it resides in.
I am thinking of something like:
r.db('test').get('[UUID]').run()
You can write r.db('test').tableList().map(function(table){return r.table(table).get(UUID);}).

MongoDB GridFS one-to-one query effeciency in Ruby

I'm using MongoDB w/ Sinatra for an iPhone app.
I have a users MongoDB collection and a picture GridFS collection. Each user has one picture, so, initially, I just set the ObjectId for the picture to be the same as the corresponding user. That made it easy to, given the user's ObjectId, get the picture of that user with just one query. Then, I was planning to store the MD5 hash of the picture in the user object so that the iPhone would know to download the picture only if the MD5 hash had changed. This would work, but I had to modify the Grid Ruby class to get the MD5
But then, Kyle Banker suggested that I just store the picture_id, instead of the MD5, in the user object. But, if I do that, given a user ObjectId, I'd have to first query the picture_id from the user, and then query the picture (2 queries). Is there a way, in one query, to get the picture given a user's ObjectId? Reading up on GridFS indexes, I think there's a way to store the user's ObjectId in the meta data of the picture and then set an index on that field. That way, I could do it in one query. If that's correct, what's the code look like to do that in Ruby?
Alas, should I even bother? I could just as easily use the picture_id to query the picture, which is what I'll do for now, but it'd also be nice, from a syntactical perspective, to be able to query the picture (in one indexed/fast query) by the user_id. Kinda like Facebook's graph api lets you do, e.g., http://graph.facebook.com/mattdipasquale/picture.
Sure. Like you suggest, just store the user_id somewhere in the picture's file object, and build an an index on that field.

Resources