dexie js bulkDelete - argument keys – array or object - dexie

The dexie js documentation says about the bulk delete syntax:
https://dexie.org/docs/Table/Table.bulkDelete()
db.table.bulkDelete(keys)
I don't understand what keys means in this context? Only an array of primary keys or can also objects get passed like the following example where parent_id is an indexed property?
e.g. schema
db.version(1).stores({nodes: "++id,parent_id, name"});
and then do bulk delete like this
db.table.bulkDelete({parent_id: 3, parent_id: 4})

Keys are the primary key. In your case an array of numbers.
For example:
await db.table.bulkDelete([1,2,3]);
Will delete entries with id 1,2 and 3.

Related

after filtering a plucked laravel collection, the indexed array change to Associative array

I have a collection of model eloquent such as user model, i use the pluck method to get the only post_idfrom this collection, this method give me the indexed array of post_id, but when i use filter or unique method for this indexed array the result change to Associative array. i don't want a assoc array in result. I want just the unique of post_id's in the indexed array. laravel auto changing my result.
$this->posts->pluck('post_id')->unique('post_id')
result is :{ "1": 1 , "2": 2 }.
Is this can a bug or I have a mistake in fetching data by methods?
You can use groupBy like this :
$this->posts->groupBy('post_id')->pluck('post_id');

Index JSON Array in Postgres DB

I have a table where each row has a JSON structure as follows that I'm trying to index in a postgresql database and was wondering what the best way to do it is:
{
"name" : "Mr. Jones",
"wish_list": [
{"present_name": "Counting Crows",
"present_link": "www.amazon.com"},
{ "present_name": "Justin Bieber",
"present_link": "www.amazon.com"},
]
}
I'd like to put an index on each present_name within the wish_list array. The goal here is that I'd like to be able to find each row where the person wants a particular gift through an index.
I've been reading on how to create an index on a JSON which makes sense. The problem I'm having is creating an index on each element of an array within a JSON object.
The best guess I have is using something like the json_array_elements function and creating an index on each item returned through that.
Thanks for a push in the right direction!
Please check JSONB Indexing section in Postgres documentation.
For your case index config may be the following:
CREATE INDEX idx_gin_wishlist ON your_table USING gin ((jsonb_column -> 'wish_list'));
It will store copies of every key and value inside wish_list, but you should be careful with a query which hits the index. You should use #> operator:
SELECT jsonb_column->'wish_list'
FROM your_table WHERE jsonb_column->'wish_list' #> '[{"present_link": "www.amazon.com", "present_name": "Counting Crows"}]';
Strongly suggested to check existing nswers:
How to query for array elements inside JSON type
Index for finding an element in a JSON array

Elasticsearch: Remove duplicates from index

I have an index with multiple duplicate entries. They have different ids but the other fields have identical content.
For example:
{id: 1, content: 'content1'}
{id: 2, content: 'content1'}
{id: 3, content: 'content2'}
{id: 4, content: 'content2'}
After removing the duplicates:
{id: 1, content: 'content1'}
{id: 3, content: 'content2'}
Is there a way to delete all duplicates and keep only one distinct entry without manually comparing all entries?
This can be accomplished in several ways. Below I outline two possible approaches:
1) If you don't mind generating new _id values and reindexing all of the documents into a new collection, then you can use Logstash and the fingerprint filter to generate a unique fingerprint (hash) from the fields that you are trying to de-duplicate, and use this fingerprint as the _id for documents as they are written into the new collection. Since the _id field must be unique, any documents that have the same fingerprint will be written to the same _id and therefore deduplicated.
2) You can write a custom script that scrolls over your index. As each document is read, you can create a hash from the fields that you consider to define a unique document (in your case, the content field). Then use this hash as they key in a dictionary (aka hash table). The value associated with this key would be a list of all of the document's _ids that generate this same hash. Once you have all of the hashes and associated lists of _ids, you can execute a delete operation on all but one of the _ids that are associated with each identical hash. Note that this second approach does not require writing documents to a new index in order to de-duplicate, as you would delete documents directly from the original index.
I have written a blog post and code that demonstrates both of these approaches at the following URL: https://alexmarquardt.com/2018/07/23/deduplicating-documents-in-elasticsearch/
Disclaimer: I am a Consulting Engineer at Elastic.
I use rails and if necessary I will import things with the FORCE=y command, which removes and re-indexes everything for that index and type... however not sure what environment you are running ES in. Only issue I can see is if the data source you are importing from (i.e. Database) has duplicate records. I guess I would see first if the data source could be fixed, if that is feasible, and you re-index everything; otherwise you could try to create a custom import method that only indexes one of the duplicate items for each record.
Furthermore, and I know this doesn't comply with you wanting to remove duplicate entries, but you could simply customize your search so that you are only returning one of the duplicate ids back, either by most recent "timestamp" or indexing deduplicated data and grouping by your content field -- see if this post helps. Even though this would still retain the duplicate records in your index, at least they won't come up in the search results.
I also found this as well: Elasticsearch delete duplicates
I tried thinking of many possible scenarios for you to see if any of those options work or at least could be a temp fix.
Here is a script I created based on Alexander Marquardt answer.
import hashlib
from elasticsearch import Elasticsearch, helpers
ES_HOST = 'localhost:9200'
es = Elasticsearch([ES_HOST])
def scroll_over_all_docs(index_name='squad_docs'):
dict_of_duplicate_docs = {}
index_docs_count = es.cat.count(index_name, params={"format": "json"})
total_docs = int(index_docs_count[0]['count'])
count = 0
for hit in helpers.scan(es, index=index_name):
count += 1
text = hit['_source']['text']
id = hit['_id']
hashed_text = hashlib.md5(text.encode('utf-8')).digest()
dict_of_duplicate_docs.setdefault(hashed_text,[]).append(id)
if (count % 100 == 0):
print(f'Progress: {count} / {total_docs}')
return dict_of_duplicate_docs
def delete_duplicates(duplicates, index_name='squad_docs'):
for hash, ids in duplicates.items():
if len(ids) > 1:
print(f'Number of docs: {len(ids)}. Number of docs to delete: {len(ids) -1}')
for id in ids:
if id == ids[0]:
continue
res = es.delete(index=index_name, doc_type= '_doc', id=id)
id_deleted = res['_id']
results = res['result']
print(f'Document id {id_deleted} status: {results}')
reminder_doc = es.get(index=index_name, doc_type= '_all', id=ids[0])
print('Reminder Document:')
print(reminder_doc)
def main():
dict_of_duplicate_docs = scroll_over_all_docs()
delete_duplicates(dict_of_duplicate_docs)
if __name__ == "__main__":
main()

RethinkDB: removing item from array in one table by value from another table

I use RethinkDB in my project and have the following table structure:
data_item {
id: "generated_thing",
slug: "slug"
}
aggregation_of_data_items {
items: ["some", "ids", "from", "data_item", "table"]
}
When I delete item from content table I want to keep data consistent - delete ID from aggregation_of_data_items.items array - is there any opportunity to do this in one request (something like $pull or $pullAll in MongoDB)?
To delete an item from an array you can do the following (this is in Python but it works in any supported language):
def remove(doc, value):
doc.replace(
lambda doc: doc.merge({"items" : doc["items"].set_difference([value])}))
Now we just need to run a query that does both, the easiest way to do this is to put them in an array:
[r.table("data_item").get(id).delete(),
remove(r.table("aggregation_of_..").get(...), id)]
.run()

RethinkDB index for filter + orderby

Lets say a comments table has the following structure:
id | author | timestamp | body
I want to use index for efficiently execute the following query:
r.table('comments').getAll("me", {index: "author"}).orderBy('timestamp').run(conn, callback)
Is there other efficient method I can use?
It looks that currently index is not supported for a filtered result of a table. When creating an index for timestamp and adding it as a hint in orderBy('timestamp', {index: timestamp}) I'm getting the following error:
RqlRuntimeError: Indexed order_by can only be performed on a TABLE. in:
This can be accomplished with a compound index on the "author" and "timestamp" fields. You can create such an index like so:
r.table("comments").index_create("author_timestamp", lambda x: [x["author"], x["timestamp"]])
Then you can use it to perform the query like so:
r.table("comments")
.between(["me", r.minval], ["me", r.maxval]
.order_by(index="author_timestamp)
The between works like the get_all did in your original query because it gets only documents that have the author "me" and any timestamp. Then we do an order_by on the same index which orders by the timestamp(since all of the keys have the same author.) the key here is that you can only use one index per table access so we need to cram all this information in to the same index.
It's currently not possible chain a getAll with a orderBy using indexes twice.
Ordering with an index can be done only on a table right now.
NB: The command to orderBy with an index is orderBy({index: 'timestamp'}) (no need to repeat the key)
The answer by Joe Doliner was selected but it seems wrong to me.
First, in the between command, no indexer was specified. Therefore between will use primary index.
Second, the between return a selection
table.between(lowerKey, upperKey[, {index: 'id', leftBound: 'closed', rightBound: 'open'}]) → selection
and orderBy cannot run on selection with an index, only table can use index.
table.orderBy([key1...], {index: index_name}) → selection<stream>
selection.orderBy(key1, [key2...]) → selection<array>
sequence.orderBy(key1, [key2...]) → array
You want to create what's called a "compound index." After that, you can query it efficiently.
//create compound index
r.table('comments')
.indexCreate(
'author__timestamp', [r.row("author"), r.row("timestamp")]
)
//the query
r.table('comments')
.between(
['me', r.minval],
['me', r.maxval],
{index: 'author__timestamp'}
)
.orderBy({index: r.desc('author__timestamp')}) //or "r.asc"
.skip(0) //pagi
.limit(10) //nation!
I like using two underscores for compound indexes. It's just stylistic. Doesn't matter how you choose to name your compound index.
Reference: How to use getall with orderby in RethinkDB

Resources