When should I use CREATE and when MERGE in Cypher queries? - memgraphdb

I've seen that sometimes CREATE is used to create nodes, and in other situations, MERGE is used. What's the difference, and when should one be used in place of another?

CREATE does just what it says. It creates, and if that means creating duplicates, well then it creates.
MERGE does the same thing as CREATE, but also checks to see if a node already exists with the properties you specify. If it does, then it doesn't create. This helps avoid duplicates.
Here's an example: I use CREATE twice to create a person with the same name.

CREATE should be used when you are absolutely certain that the information doesn't exist in the database (for example, when you are loading data). MERGE is used whenever there is a possibility that the node or relationship already exists and you don’t need to duplicate it. MERGE shouldn't always be used as it’s considerably slower than the create clause.

Related

Running code when a model is deleted via cascade/bulk deletion

I have a table that basically represents uploads, therefore, when an instance of the model representing this table is deleted, I want the file being represented to be deleted from my uploads folder.
The way I've gone about this thus far is basically overriding the delete method, so that, before the model instance is deleted, the file will be as well.
Problem: not only does this not work for cascade deletions, it also doesnt work if I delete a Collection....
I've looked at Events, like Model::deleting, but they suffer from exactly the same problem (namely they're not triggered by cascade deletions or bulk deletions).
I have also considered a SQL trigger, but it doesnt seem like I can delete files from SQL (inform me if I can, I'd love it! I'm using MySQL, btw).
Do I have an option that is classier than just making a separate query and iterating over it deleting the files every time I need to do a bulk deletion/cascade, or is this really it?
Take a look at https://laravel-news.com/laravel-model-events-getting-started
You need to define an event inside your model.

Check and put in Hbase

I have a case in which we need to insert records into Hbase table, in which 90% of the records coming from the source are repeated. In this case,
is it advisable to first query for the record from Hbase, if not present then call put
or
just simply call put.
Which of the above will be good in terms of performance.
Both HTable methods checkAndPut() and exists() requires accessing to table data which could hurt you badly if you receive lots of write requests and the data is not in the memstore.
Plain writes in HBase are usually not so expensive, so, if you have a good rowKey design and you're already avoiding hot regions, I'll just stick to overwriting data.
If you don't want to re-insert existing records you can use the checkAndPut method of HTable. Which this the put will be applied only if the condition you specify is met. So you could check for an existence of a column to put only if not existing.
I kind of agree with both answers. It is true that before using the CAS (Check And Set) mechanism, one has to revise his design first, and see if it is possible to refactor it and use plain writes instead. However, in some cases, this is not trivial.
Another thing I would make sure of before using the checkAndPut(), is that this operation requires Isolation, when updating values. HBase only guarantees it, when rewriting, but not updating.
And at last, check if it is possible to use the Append instead of checkAndPut.

Make row in a table read only on oracle?

I have a table with many rows.
For testing purpose my colleagues are also using same table. The problem is that some time he is deleting the row which I was testing and some time I.
So is there any way in oracle so I can make some specific rows to be read only so other should not delete and edit that?
Thanks.
There are a number of differnt ways to tackle this problem.
As Sun Tzu said, the best thing would be if you and your colleagues use data sets which do not collide.
For instance perhaps you could each have your own database instance, on local PCs; whether this will suit depends on a number of factors, not the least of which is your licensing arrangements with Oracle. Alternatively, you could have separate schemas in a shared database; depending on your application you may need to you synonyms or special connectioms.
Another approach: everybody builds their own data sets, known as test fixtures. This is a good policy, because testing is only truly valid when it runs against a known state; if we make assumptions regarding the presence or absence of data how valid are our test results? The point is, the tests should clean up after themselves, removing any data created in fixtures and by the running of tests. With this tactic you need to agree ranges of IDs for each team member: they must only use records within their ranges for testing or development work.
I prefer these sorts of approach because they don't really change the way the application works (arguably except using different schemas and synonyms). More draconian methods are available.
If you have Enterprise Edition you can use Row Level Security to protect your records. This is a extension of the last point: you will need a mechanism for identifying your records, and some infrastructure to identify ownership within the session. But in addition to preventing other users rom deleting your data you can also prevent them inserting, updating or even viewing records which are with your range of IDs. Find out more.
A lighter solution is use a trigger as A B Cade suggests. You will still need to identifying your records and who is connected (because presumably from time-to-time you will still want to delete your records.
One last strategy: take your ball home. Get the table in the state you want it and make a data pump export. For extra vindictiveness you can truncate the table at this point. Then any time you want to use the table you run a data pump import. This will reset the table's state, wiping out any existing data. This is just an extreme version of test scripts creating their own data.
You can create a trigger that prevents deleting some specific rows.
CREATE OR REPLACE TRIGGER trg_dont_delete
BEFORE DELETE
ON <your_table_name>
FOR EACH ROW
BEGIN
IF :OLD.ID in (<IDs of rows you dont want to be deleted>) THEN
raise_application_error (-20001, 'Do not delete my records!!!');
END IF;
END;
Of course you can make it smarter - make the if statement rely on user, or get the records IDs from another table and so on
Oracle supports row level locking. you can prevent the others to delete the row, which one you are using. for knowing better check this link.

Building Oracle DB; Good Directory Layout

I'm looking for advice on how to best organize a new Oracle schema and dependent files in my project directory - with the sequences, triggers, DDL, etc. I've been using one monolothic file called schema.sql for some time, but I'm wondering if there's a best practice? Something like...
database/
tables/
person.sql
group.sql
sequences/
person.sequence
group.sequence
triggers/
new_person.trigger
Penny for your thoughts or a URL that I may have missed!
Thank you!
Storing DDL by object type is a reasonable approach-- anything is likely to be easier to navigate than a monolithic SQL script. Personally, though, I'd much rather have DDL organized by function. If you're building an accounting system, for example, you probably have a series of objects to manage accounts payable and a separate set of objects to manage accounts receivable along with some core objects for managing the general ledger accounts. That would lead to something along the lines of
database/
general_ledger/
tables/
packages/
sequences/
accounts_receivable/
tables/
packages/
sequences/
accounts_payable/
tables/
packages/
sequences
As the system gets more complex, that hierarchy would naturally get deeper over time. This sort of approach would more naturally mirror the way non-database code is stored in source control. You wouldn't have a single directory of Java classes in a directory structure like
middle_tier/
java/
Foo.java
Bar.java
You would organize the classes that implement the same sorts of business logic together and separate from the classes that implement different bits of business logic.
One item to consider is those SQLs which can act as 'latest only' scripts. These include CREATE OR REPLACE PROCEDURE/FUNCTION/TRIGGER etc. You run the latest version and you are not worried about what may have previously existed in the database.
On the other hand you have tables where you may start off with a CREATE TABLE followed by several ALTER TABLEs as changes to the schema evolve. And if you are doing an upgrade you may want to apply several of the ALTER TABLE scripts (preferably in order).
I'd argue against a 'functional grouping' unless it is really obvious where the lines are drawn. You probably don't want to be in a position where you have a USERS table in one group and a USER_AUTHORITIES in another and an AUTHORITY group in a third.
If you do have decent separation, then they are probably in separate schemas and you do want to keep schemas distinct (since you can have the same object names in different schemas).
The division-by-object-type arrangement, with the addition of a "schema" directory below the database directory works well for me.
I've worked with source control systems that have the additional division-by-function layer - if there are many objects it adds additional searching if you're trying to cross-reference the source control file with the object that you see in a database GUI navigator that generally groups objects by type. It's also not always clear how an object should be classified this way.
Consider adding a "grants" directory for the grants made by that schema to other schemas or roles, with one file per grantee. If you have "rule-based" grants such as "the APPLICATION_USER role always gets SELECT on all of schema X's tables", then write a PL/SQL anonymous block to perform this action. (You might be tempted to reverse-engineer the grants after they get put in place by some ad-hoc method, but it's easy to miss something when new tables or views are added to the application).
Standardize on a delimiter for all scripts and you'll make your life easier if you start deploying through a build utility such as Ant. Using "/" (vs. ";") works for both SQL statements as well as PL/SQL anonymous blocks.
In our projects we use somewhat combined approach: we have a core of our program as a root and other functionalities in subfolders:
root/
plugins/
auth/
mail/
report/
etc.
In all these folders we have both DDL and DML scripts almost all of them can be run more that once, e.g. all packages are defined as create or replace..., all data insertion scripts check whether data already exists and so on. This gives us the opportunity to rus almost all scripts without thinking that we can crash something.
Obviously this scenario can't be applied for create table and similar statements. For these scripts we have manually written small bash script that extracts specified files and runs them not failing on particular ORA errors, like: ORA-00955: name is already used by an existing object.
Also all files are mixed in the directories but differ with extensions: .seq goes for sequence, .tbl goes for table, .pkg goes for package interface, .bdy goes for package body, .trg goes for trigger an so on...
Also we have a naming convention denoting prefixes for all of our files: we can have cl_oper.tbl table with cl_oper.seq and cl_oper.trg sequence and triggers and cl_oper_processing.pkg together with cl_oper_processing.bdy with logic for mentioned objects. With this naming convention in file managers it's very easy to see all the files connected with some unit of logic for our project (whilst the grouping in directories by object types does not provide this).
Hope this information helps you somehow. Please leave comments if you have any questions.

Thread-safe unique entity instance in Core Data

I have a Message entity that has a messageID property. I'd like to ensure that there's only ever one instance of a Message entity with a given messageID. In SQL, I'd just add a unique constraint to the messageID column, but I don't know how to do this with Core Data. I don't believe it can be done in the data model itself, so how do you go about it?
My initial thought is to use a validation method to do a fetch on the NSManagedObject's context for the ID, see if it finds anything but itself, and if so, fails the validation. I suspect this will work - but I'm worried about the performance of something like that. I went through a lot of effort to minimize the fetch requests needed for the entire import routine, and having it validate by performing a fetch for every single new message entity seems a bit excessive. I can get all pre-existing objects I need and identify all the new objects I need to insert into the store using just two fetch queries before I do the actual work of importing and connecting everything together. This would add a fetch to every single update or insert in addition to those two - which would seem to eliminate any performance advantage I had by pre-processing the import data in the first place!
The main reason this is an issue is that the importer can (potentially) run several batches concurrently on several threads and may include some overlapping/duplicate data that needs to ultimately result in just one object in the store and not duplicate entries. Is there a reasonable way to do this and does what I'm asking for make sense for Core Data?
The only way to guarantee uniqueness is to do a fetch. Fortunately you can just do a -countForFetchRequest:error: and check to see if it is zero or not. That is the least expensive way to guarantee uniqueness at this time.
You can probably accomplish this in the validation or run it in the loop that is processing the data. Personally I would do it above the creation of the NSManagedObject so that you do not have the unnecessary allocs when a record already exists.
I don't think there is a way to easily guarantee an attribute is unique without doing a lot of work on your own. You can, of course use CFUUIDCreate to create a globally unique UUID, which should be unique, even in a multithreaded environment. But...
The objectID (type NSManagedObjectID) of all managed objects is guaranteed to be unique within the persistent store coordinator. Since you can add arbitrarily many persistent stores to the coordinator, this guarantee basically guarantees that the objectIDs are globally unique. Why don't you use the objectID as your messageID? You can't, of course, change the objectID once it's assigned (and it won't get assigned until the context containing the inserted object is saved; until then it will be a temporary but still unique ID).
So you have a NSManagedContext for each thread, backed by the same persistent store, is that correct? And before you save the NSManagedContext, you'd like to make sure the messageID is unique, that is, that you are not updating an existing row, and that it is not in one of the other contexts, correct?
Given that model (correct me if I misunderstand), I think you'd be better served having one object that manages access to the persistent store. That way, all threads would update one context and you can do your validation in there, using Marcus's -countForFetchRequest:error: suggestion. Granted, that places a bottleneck on this operation.
Just to add my 2 cents: I think inconsistencies will occur sooner or later anyway, and the only way to mitigate them seems to be to do it on an application-level with rather complex code.
So in my case I decided to allow duplicate values for what are supposed to be "unique" fields.
I added code, however, that detects these problems later (e.g. when a fetch that should return 1 object returns more than 1) and fixes them when they occur (usually by deleting).
It's a "go ahead, make a mistake, ill fix it later for you"-strategy.
This is not ideal, of course, but a valid way to attack this problen, imho.

Resources