I have a table where each row has a JSON structure as follows that I'm trying to index in a postgresql database and was wondering what the best way to do it is:
{
"name" : "Mr. Jones",
"wish_list": [
{"present_name": "Counting Crows",
"present_link": "www.amazon.com"},
{ "present_name": "Justin Bieber",
"present_link": "www.amazon.com"},
]
}
I'd like to put an index on each present_name within the wish_list array. The goal here is that I'd like to be able to find each row where the person wants a particular gift through an index.
I've been reading on how to create an index on a JSON which makes sense. The problem I'm having is creating an index on each element of an array within a JSON object.
The best guess I have is using something like the json_array_elements function and creating an index on each item returned through that.
Thanks for a push in the right direction!
Please check JSONB Indexing section in Postgres documentation.
For your case index config may be the following:
CREATE INDEX idx_gin_wishlist ON your_table USING gin ((jsonb_column -> 'wish_list'));
It will store copies of every key and value inside wish_list, but you should be careful with a query which hits the index. You should use #> operator:
SELECT jsonb_column->'wish_list'
FROM your_table WHERE jsonb_column->'wish_list' #> '[{"present_link": "www.amazon.com", "present_name": "Counting Crows"}]';
Strongly suggested to check existing nswers:
How to query for array elements inside JSON type
Index for finding an element in a JSON array
Related
Can someone please help me, how to execute bulk insert with header "Content-Type: application/x-ndjson" in elastic4s ? I have tried this
client.execute {
bulk(
indexInto("cars" / "car").source(getCarsFromJson)
).refresh(RefreshPolicy.WaitFor)
}.await
It works for one element in json, but when i add another element to json, no element are added to elastic.
Are you sure you are using the right syntax? Shouldn't it say
"cars/car"
Instead of
"cars" / "car"
The source method on indexInto will not support multiple json objects, because you're trying to put multiple documents inside a single document insert.
Instead, you will need to take your json, parse it into objects, and iterate over them adding an insert document for each one.
Something like the following:
def getCarsFromJson: Seq[String] = /// must return a sequence of json strings
val inserts = getCarsFromJson.map { car => indexInto("cars" /"car").source(car) }
client.execute {
bulk(inserts:_*).refresh(RefreshPolicy.WaitFor)
}
Can anyone help with a MongoTemplate question?
I have got a record structure which has nested arrays and I want to update a specific entry in a 2nd level array. I can find the appropriate entry easier enough by the Set path needs the indexes of both array entries & the '$' only refers to the leaf item. For example if I had an array of teams which contained an array of player I need to generate an update path like :
val query = Query(Criteria.where( "teams.players.playerId").`is`(playerId))
val update = Update()
with(update) {
set("teams.$.players.$.name", player.name)
This fails as the '$' can only be used once to refer to the index in the players array, I need a way to generate the equivalent '$' for the index in the teams array.
I am thinking that I need to use a separate Aggregate query using the something like this but I can't get it to work.
project().and(ArrayOperators.arrayOf( "markets").indexOf("")).`as`("index")
Any ideas for this Mongo newbie?
For others who is facing similar issue, One option is to use arrayFilters in UpdateOptions. But looks like mongotemplate in spring does not yet support the use of UpdateOptions directly. Hence, what can be done is:
Sample for document which contain object with arrays of arrayObj (which contain another arrays of arrayObj).
Bson filter = eq("arrayObj.arrayObj.id", "12345");
UpdateResult result = mongoTemplate.getDb().getCollection(collectionName)
.updateOne(filter,
new Document("$set", new Document("arrayObj.$[].arrayObj.$[x].someField"), "someValueToUpdate"),
new UpdateOptions().arrayFilters(
Arrays.asList(Filters.eq("x.id, "12345))
));
Using the following query:
r.db('somedb').table('sometable')('users')
I get the following data from the result:
[
[
{
"fn": "dpw",
"u": "usertwo"
},
{
"fn": "dwd",
"u": "userone"
}
]
]
I would like to take the field "u" and specify lets say "usertwo" and get the value of "fn" for that "u". I want to have the result filtered using ReQL so that I am not just parsing the json result in nodejs as the result will be enormous eventually. What would be the best and most efficient approach. I am new to RethinkDB and would appreciate if you could explain the answer as best you can.
I'm not sure of what you exactly want, but from my understanding, this is what you are looking for:
r.db('somedb').table('sometable')('users').filter(function(user) {
return user("u").eq("usertwo")
})("fn")
You seem to have an array of array of users. if that was not a typo, the query should probably be
r.db('somedb').table('sometable')('users').nth(0).filter(function(user) {
return user("u").eq("usertwo")
})("fn")
In our RethinkDB database, we have a table for orders, and a separate table that stores all the order items. Each entry in the OrderItems table has the orderId of the corresponding order.
I want to write a query that gets all SHIPPED order items (just the items from the OrderItems table ... I don't want the whole order). But whether the order is "shipped" is stored in the Order table.
So, is it possible to write a query that filters the OrderItems table based on the "shipped" value for the corresponding order in the Orders table?
If you're wondering, we're using the JS version of Rethinkdb.
UPDATE:
OK, I figured it out on my own! Here is my solution. I'm not positive that it is the best way (and certainly isn't super efficient), so if anyone else has ideas I'd still love to hear them.
I did it by running a .merge() to create a new field based on the Order table, then did a filter based on that value.
A semi-generalized query with filter from another table for my problem looks like this:
r.table('orderItems')
.merge(function(orderItem){
return {
orderShipped: r.table('orders').get(orderItem('orderId')).pluck('shipped') // I am plucking just the "shipped" value, since I don't want the entire order
}
})
.filter(function(orderItem){
return orderItem('orderShipped')('shipped').gt(0) // Filtering based on that new "shipped" value
})
it will be much easier.
r.table('orderItems').filter(function(orderItem){
return r.table('orders').get(orderItem('orderId'))('shipped').default(0).gt(0)
})
And it should be better to avoid result NULL, add '.default(0)'
It's probably better to create proper index before any finding. Without index, you cannot find document in a table with more than 100,000 element.
Also, filter is limit for only primary index.
A propery way is to using getAll and map
First, create index:
r.table("orderItems").indexCreate("orderId")
r.table("orders").indexCreate("shipStatus", r.row("shipped").default(0).gt(0))
With that index, we can find all of shipper order
r.table("orders").getAll(true, {index: "shipStatus"})
Now, we will use concatMap to transform the order into its equivalent orderItem
r.table("orders")
.getAll(true, {index: "shipStatus"})
.concatMap(function(order) {
return r.table("orderItems").getAll(order("id"), {index: "orderId"}).coerceTo("array")
})
Lets say a comments table has the following structure:
id | author | timestamp | body
I want to use index for efficiently execute the following query:
r.table('comments').getAll("me", {index: "author"}).orderBy('timestamp').run(conn, callback)
Is there other efficient method I can use?
It looks that currently index is not supported for a filtered result of a table. When creating an index for timestamp and adding it as a hint in orderBy('timestamp', {index: timestamp}) I'm getting the following error:
RqlRuntimeError: Indexed order_by can only be performed on a TABLE. in:
This can be accomplished with a compound index on the "author" and "timestamp" fields. You can create such an index like so:
r.table("comments").index_create("author_timestamp", lambda x: [x["author"], x["timestamp"]])
Then you can use it to perform the query like so:
r.table("comments")
.between(["me", r.minval], ["me", r.maxval]
.order_by(index="author_timestamp)
The between works like the get_all did in your original query because it gets only documents that have the author "me" and any timestamp. Then we do an order_by on the same index which orders by the timestamp(since all of the keys have the same author.) the key here is that you can only use one index per table access so we need to cram all this information in to the same index.
It's currently not possible chain a getAll with a orderBy using indexes twice.
Ordering with an index can be done only on a table right now.
NB: The command to orderBy with an index is orderBy({index: 'timestamp'}) (no need to repeat the key)
The answer by Joe Doliner was selected but it seems wrong to me.
First, in the between command, no indexer was specified. Therefore between will use primary index.
Second, the between return a selection
table.between(lowerKey, upperKey[, {index: 'id', leftBound: 'closed', rightBound: 'open'}]) → selection
and orderBy cannot run on selection with an index, only table can use index.
table.orderBy([key1...], {index: index_name}) → selection<stream>
selection.orderBy(key1, [key2...]) → selection<array>
sequence.orderBy(key1, [key2...]) → array
You want to create what's called a "compound index." After that, you can query it efficiently.
//create compound index
r.table('comments')
.indexCreate(
'author__timestamp', [r.row("author"), r.row("timestamp")]
)
//the query
r.table('comments')
.between(
['me', r.minval],
['me', r.maxval],
{index: 'author__timestamp'}
)
.orderBy({index: r.desc('author__timestamp')}) //or "r.asc"
.skip(0) //pagi
.limit(10) //nation!
I like using two underscores for compound indexes. It's just stylistic. Doesn't matter how you choose to name your compound index.
Reference: How to use getall with orderby in RethinkDB