Is it possible to get a RethinkDB document only knowing the UUID (no table)? - rethinkdb

If UUIDs are unique across RethinkDB, I was wondering whether you could get a document having only its UUID, without knowing the table it resides in.
I am thinking of something like:
r.db('test').get('[UUID]').run()

You can write r.db('test').tableList().map(function(table){return r.table(table).get(UUID);}).

Related

Map multiple values to a unique column in Elasticsearch

I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields

Getting started with Bleve using BoltDB

I am trying to wrap my head around Bleve and I understand everything that is going on in the tutorials, videos and documentation. I however get very confused when I am using it on BoltDB and don't know how to start.
Say I have an existing BoltDB database called data.db populated with values of struct type Person
type Person struct {
ID int `json:"id"`
Name string `json:"name"`
Age int `json:"age"`
Sex string `json:"sex"`
}
How do I index this data so that I can do a search? How do I handle the indexing of data that will be stored in the database in the future?
Any help will be highly appreciated.
Bleve uses BoltDB as one of several backend stores and is separate from where you store your application data. To index your data in Bleve, simply add your Index:
index.Index(person.ID, person)
That index exists separately from your application data (whether it's in Bolt, Postgres, etc).
To retrieve your data, you'll need to construct a search request using bleve.NewSearchRequest(), then call Index.Search(). This will return a SearchResult which includes a Hits field where you can retrieve the ID for your object. You can use this to look up the object in your application data store.
Disclaimer: I am the author of BoltDB.
How you index your data depends on how you want to query for it.
If you want to query by any arbitrary fields, like {Age:15, Name:"Bob"} then BoltDB isn't an awesome fit for your problem.
BoltDB is simply a key value store with fast access to sequential keys and efficient prefix seeking. It's not really a replacement for general use databases.
You likely want something more like a document store (ie: MongoDB) or RDBMS (ie: PostgreSQL).
If you just wanted something that uses simple files and is embedded, you could also use SQlite with the Go module
If you want to search by only a single field, like ID or Name, then use that as the key.
If lookup speed doesn't matter at all, I guess you can use Bolt to just iterate over the entire db, parse the json and check the fields. But that's probably the worst approach you could take.

Unique column in Parse

There seems to have no way to enforce uniqueness in Parse. I have a parse object in which one of the fields is a url. I require that url to be unique as it is the most important field in the object: everything else is meta data about it. So is there a way to override the objectId of a Parse Object? That way I can always check for existence in a simple way before trying to create the object.
Actually this might still fail. So any ideas that do not include cloud code?
I know this can be done because Parse is able to ensure username and email are unique for each User object. I am hoping it's an elegant solution and someone here knows how I might accomplish something similar. After all, database tables with unique fields are commonplace.

Primary Keys and CouchDB

CouchDB's versioning is an absolute boon to the application I'm writing, but each of the objects I want to represent in the database has it's own unique identifier (let's call it my_id), so I don't really need the _id field.
Is there a way for me to tell CouchDB that I want to make my field the primary hey (not _id)?
I'm using ruby's couchrest_model, so I know I can do Model.find_by_my_id(params[:my_id]) if I've put view_by :my_id in my class, but this feels like I'm storing an _id for no purpose. Should I care?
would it not be possible to, when you create the document, provide your own id instead of the default one couchb assigns? I don't know if ruby's couchrest can do it, but it's available in the CouchDB API
See here: http://wiki.apache.org/couchdb/HTTP_Document_API#PUT
The document ID is passed into the url.

Random ID generation on Sign Up - Database Performance

I am making a site that each account will have an ID.
But, I didn't want to make it incrementable, meaning:
id=1
id=2
...
id=1000
What I want is to have random IDs:
id=2355
id=5647734
id=23532
...
(The reason is to avoid robots to check all accounts profiles by just incrementing a ID in URL - and maybe other reason, but that is not the question)
But, I am worried about performance on registration.
It will be something like this:
while (RANDOM_ID is not taken): generate new RANDOM_ID
On generating a new ID for the new account, I will query database (MySQL) to check if the ID exists, for each generation.
Is there any better solution for this?
Is there any disadvantage of using random IDs?
Thanks in advance.
There are many, many reasons not to do this:
Your solution, as written, is not transactionally-safe; two transactions at the same time could both generate the same "random" ID.
If you serialize the transaction in order to make it safe, you will slaughter performance because the query will keep every single collision row locked until it finds a spare ID.
Using a random ID as the primary key will fragment the hell out of your clustered index. This is bad enough with uuids - the whole point of an auto-generated identity column is so you can generate a safe sequence out of it.
Why not use a regular primary key, but just don't use that in any of your URLs? Generate a secondary non-sequential ID along with it - such as a uuid - index it, and use this column in any public-facing segments of your application instead of the primary key if you are really worried about security.
You can use UUIDs. It's a unique identifier generated based partly on timestamp. It's almost certainly guaranteed to be unique so you don't have to do a query to check.
i do not know what language you're using, but there should be library or sample code for this for most languages.
Yes you can use UUID but keep your auto_increment field. Just add a new field and set it so something like: md5(microtime(true).rand()) or whatever other method you like and use that unike key along the site to make the links instead to expose the primary key in urls.

Resources