I am trying to wrap my head around Bleve and I understand everything that is going on in the tutorials, videos and documentation. I however get very confused when I am using it on BoltDB and don't know how to start.
Say I have an existing BoltDB database called data.db populated with values of struct type Person
type Person struct {
ID int `json:"id"`
Name string `json:"name"`
Age int `json:"age"`
Sex string `json:"sex"`
}
How do I index this data so that I can do a search? How do I handle the indexing of data that will be stored in the database in the future?
Any help will be highly appreciated.
Bleve uses BoltDB as one of several backend stores and is separate from where you store your application data. To index your data in Bleve, simply add your Index:
index.Index(person.ID, person)
That index exists separately from your application data (whether it's in Bolt, Postgres, etc).
To retrieve your data, you'll need to construct a search request using bleve.NewSearchRequest(), then call Index.Search(). This will return a SearchResult which includes a Hits field where you can retrieve the ID for your object. You can use this to look up the object in your application data store.
Disclaimer: I am the author of BoltDB.
How you index your data depends on how you want to query for it.
If you want to query by any arbitrary fields, like {Age:15, Name:"Bob"} then BoltDB isn't an awesome fit for your problem.
BoltDB is simply a key value store with fast access to sequential keys and efficient prefix seeking. It's not really a replacement for general use databases.
You likely want something more like a document store (ie: MongoDB) or RDBMS (ie: PostgreSQL).
If you just wanted something that uses simple files and is embedded, you could also use SQlite with the Go module
If you want to search by only a single field, like ID or Name, then use that as the key.
If lookup speed doesn't matter at all, I guess you can use Bolt to just iterate over the entire db, parse the json and check the fields. But that's probably the worst approach you could take.
Related
I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields
I'm integrating GraphQL into my application and trying to figure out if this scenario is possible.
I have a schema for a Record type and a query that returns a list of Records from my service. Schema looks something like:
type Query {
records(someQueryParam: String!): [Record]!
}
type Record {
id: String!
otherId: String!
<other fields here>
}
There are some places in my application where I need to access a Record using the otherId value (because that's all I have access to). Currently, I do that with a mapping of otherId to id values that's populated after all the Records are downloaded. I use the map to go from otherId to id, and then use the id value to index into the collection of Record objects, to avoid iterating through the whole thing. (This collection used to be populated using a separate REST call, before I started using Apollo GQL.)
I'd like to remove my dependency on this mapping if possible. Since the Records are all in the Apollo cache once they've been loaded, I'd like to just query the cache for the Record in question using the otherId value. My service doesn't currently have that kind of lookup, so I don't have an existing query that I can cache in parallel. (i.e. there's no getIdFromOtherId).
tl;dr: Can I query my Apollo cache using something other than the id of an object?
You can't query the cache by otherId for the same reason you don't want to have to search through the record set to find the matching item -- the id is part of the item's key, and without the key Apollo can't directly access the item. Apollo's default cache is a key-value store, not a database that you can query however you like.
It's probably necessary to build a query into your data source that allows mapping between otherId and id, obviously it would be horribly inefficient at scale to search through the entire record set for your item.
I have a User struct with ID and LoginName fields and I want this struct to be accessible by either of these fields with single call to the DB. I know BoltDB is not supposed to handle arbitrary field indexing etc. (unlike SQL) but this case is a little different as I happen to know in advance the additional field to b used as index.
So is there some kind of secondary key or multiple key indexing? or maybe some strategy that I fail to see? If not then I'll just implement it with two calls, I just prefer a "cleaner" solution...
Thanks!
No, it's not there. BoltDB is a lot like Go. Clean and simple. And building a layer on top is easy. BoltDB even allows update transactions to be trivially implemented so two more more buckets can be updated, or not, atomically. So creating an update transaction that keeps two or more buckets in sync is easy. But it sounds like you know that and just wanted to check that you aren't missing something.
There is no secondary key indexing in BoltDB, but you can implement it.
You can store ID to LoginName mapping in another bucket, and it will be technically the "secondary key" for your struct. That is, first obtain the primary key value from the secondary key, and then the User struct.
If most of your calls are on LoginName key, use LoginName to ID mapping and store User struct under LoginName key and vice versa.
Be careful: you have to maintain consistency by your own, remember it.
If UUIDs are unique across RethinkDB, I was wondering whether you could get a document having only its UUID, without knowing the table it resides in.
I am thinking of something like:
r.db('test').get('[UUID]').run()
You can write r.db('test').tableList().map(function(table){return r.table(table).get(UUID);}).
CouchDB's versioning is an absolute boon to the application I'm writing, but each of the objects I want to represent in the database has it's own unique identifier (let's call it my_id), so I don't really need the _id field.
Is there a way for me to tell CouchDB that I want to make my field the primary hey (not _id)?
I'm using ruby's couchrest_model, so I know I can do Model.find_by_my_id(params[:my_id]) if I've put view_by :my_id in my class, but this feels like I'm storing an _id for no purpose. Should I care?
would it not be possible to, when you create the document, provide your own id instead of the default one couchb assigns? I don't know if ruby's couchrest can do it, but it's available in the CouchDB API
See here: http://wiki.apache.org/couchdb/HTTP_Document_API#PUT
The document ID is passed into the url.