Improve Neo4j Cypher query performance - performance

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///sample.csv"
AS row WITH row
MATCH (server:Server { name:row.`SOURCE_HOST`, source:'sample' })
MATCH (server__1:Server__1 { name:row.`TARGET_HOST`, source:'sample' })
MERGE (server)-[:DEPENDS_ON]->(server__1)
this query is taking very long time to create the relationships.
there are 1875 server and server__1 nodes each
Thanks in advance

The best approach to defining graph schemas is to have a unique id for each entity in your graph. This means that you could look up your nodes by their unique id. That way you can define a unique constraint for that property, which will speed up the query execution. For example, if the name property of the server were enough to look up the Server node, you could first define a unique constraint:
CREATE CONSTRAINT constraint_name
ON (s:Server) ASSERT s.name IS UNIQUE;
Then when you will be searching for the server, the queries will perform much better.
MATCH (server:Server {name:row.`SOURCE_HOST`})
This is the best approach as far as I have seen. If you don't have unique property, you could create one by combining two properties, in your case name + source. If that is not an option for you, you can create an index on both name and source properties, and if you use the Enterprise version you can also create a composite index on both properties to optimize performance.

Related

Inject a list into SpringData to be used as a kind of virtual database table

I have a database table on which a sorted query needs to be done.
To do the sorting a join on another table is requiered. The problem is that this other table does not exist in the database because we read the required data on the services startup from a CSV file and keep it as an in-memory list.
Is it possible to somehow inject this list as a kind of virtual database into Spring Data? So that it could use this list to make the required join and sorting.
As far as I know, the only other options I have would be to create a real database table from this in-memory list or load the whole table and do the sorting in the service itself.
You can add a special order by expression through e.g. Spring Data Specification, but that is going to be very ugly. In HQL it looks like this:
case rootAlias.attribute when 'value1' then 1 when 'value2' then 2 ... else null end
which will return some integer value by which you can sort ascending or descending, based on the mapping you have.
Even if you have lots of values, I would rather recommend you don't do a join at all, and instead try to make this attribute of your main table sortable, so that you don't need this mapping. You can maybe create a trigger that maintains a column based on the mapping, which can be used for sorting directly. If you do all your changes through JPA/Hibernate, you could also use a #PreUpdate/#PrePersist listener to handle the maintenance of this column.

Recursive database viewing

I have this situation. Starting from a table, I have to check all the records that match a key. If records are found, I have to check another table using a key from the first table and so on, more on less on five levels. There is a way to do this in a recursive way, or I have to write all the code "by hand"? The language I am using is Visual Fox Pro. If this is is not possible, is it al least possible to use recursion to popolate a treeview?
You can set a relation between tables. For example:
USE table_1.dbf IN 0 SHARED
USE table_2.dbf IN 0 SHARED
SET ORDER TO TAG key_field OF table_2.cdx IN table_2
SET RELATION TO key_field INTO table_2 ADDITIVE IN table_1
First two commands open table_1 and table_2. Then you have to set the order/index of table_2. If you don't have an index for the key field then this will not work. The final command sets the relation between the two tables on the key field.
From here you can browse both tables and table_2's records will be filtered based on table_1's key field. Hope this helps.
If the tables have similar structure or you only need to look at a few fields, you could write a recursive routine that receives the name of the table, the key to check, and perhaps the fields you need to check as parameters. The tricky part, I guess, is knowing what to pass down to the next call.
I don't think I can offer any more advice without at least seeing some table structures.
Sorry for answering so late, but the problem was of course that the recursion wasn't a viable solution since I had to search inside multiple tables. So I resolved by doing a simple 2-Level search in the tables that I needed.
Thank you very much for the help, and sorry again for answering so late.

Queries in Dynamodb

I have an application written in Nodejs that needs to find ONE row based on a city name (this could just be the table's name, different cities will be categorized as different tables), and a field named "currentJobLoads" which is a number. For example, a user might want to find ONE row with the city name "Chicago" and the lowest currentJobLoads. How can I achieve this in Dynamodb without scan operations(since scan would be slower and can only read so much data before it gets terminated)? Any suggestions would be highly appreciated.
You didn't specify what your current partition key and sort key for the table are, but I'm guessing the currentJobLoads field isn't one of them. So you would need to create a Global Secondary Index on the currentJobLoads field, at which point you will be able to run query operations against that field.

Enumerate indexes on a Extensible Storage Engine (ESENT) table

Background
I'm writing an adapter for ESE to .NET and LINQ in a Google Code project called eselinq. One important function I can't seem to figure out is how to get a list of indexes defined for a table. I need to be able to list available indexes so the LINQ part can automatically determine when indexes can be used. This will allow much more efficient plans for user queries if appropriate indexes can be found.
There are two related functions for querying index information:
JetGetTableIndexInfo - get index information by tableID
JetGetIndexInfo - get index information by tableName
These only differ in how the related table is specified (name or tableid). It sounds like these would support the function I want but all the info levels seem to require that I already have a certain index to query information for. The only exception is JET_IdxInfoCount, but that only counts how many indexes are present.
JET_IdxInfo with its JET_INDEXLIST sounds plausible but it only lists the columns on a specific index.
Alternatives
I am aware that I could get the index information another way, like annotations on .NET types corresponding to database tables, or by requiring a index mapping be provided ahead of time. I think there's enough introspection implemented to make everything else work out of the box without the user supplying extra information, except for this one function.
Another option may be to examine the system tables to find related index objects, but this is would mean depending on an undocumented interface.
To satisfy this question, I want a supported method of enumerating the indexes (just the name would be sufficient) on a table.
You are correct about JetGetTableIndexInfo and JetGetIndexInfo and JET_IdxInfo. The twist is that the data is returned in a somewhat complex: a temporary table is returned containing a row for the index and then a row for each column in the table. To just get the index names you will need to skip the column rows (the column count is given by the value of the columnidcColumn column in the first row).
For a .NET example of how to decipher this, look at the ManagedEsent project. In the MetaDataHelpers.cs file there is a method called GetIndexInfoFromIndexlist that extracts all the data from the temporary table.

How do I implement threaded comments?

I am developing a web application that can support threaded comments. I need the ability to rearrange the comments based on the number of votes received. (Identical to how threaded comments work in reddit)
I would love to hear the inputs from the SO community on how to do it.
How should I design the comments table?
Here is the structure I am using now:
Comment
id
parent_post
parent_comment
author
points
What changes should be done to this structure?
How should I get the details from this table to display them in the correct manner?
(Implementation in any language is welcome. I just want to know how to do it in the best possible manner)
What are the stuff I need to take care while implementing this feature so that there is less load on the CPU/Database?
Thanks in advance.
Storing trees in a database is a subject which has many different solutions. It depends on if you want to retrieve a subhierarchy as well (so all children of item X) or if you just want to grab the entire set of hierarchies and build the tree in an O(n) way in memory using a dictionary.
Your table has the advantage that you can fetch all comments on a post in 1 go, by filtering on the parentpost. As you've defined the comment's parent in the textbook/naive way, you have to build the tree in memory (see below). If you want to obtain the tree from the DB, you need a different way to store a tree:
See my description of a pre-calc based approach here:
http://www.llblgen.com/tinyforum/GotoMessage.aspx?MessageID=17746&ThreadID=3208
or by using balanced trees described by CELKO here:
or yet another approach:
http://www.sqlteam.com/article/more-trees-hierarchies-in-sql
If you fetch everything in a hierarchy in memory and build the tree there, it can be more efficient due to the fact that the query is pretty simple: select .. from Comment where ParentPost = #id ORDER BY ParentComment ASC
After that query, you build the tree in memory with just 1 dictionary which keeps track of the tuple CommentID - Comment. You now walk through the resultset and build the tree on the fly: every comment you run into, you can lookup its parentcomment in the dictionary and then store the comment currently processed also in that dictionary.
Couple things to also consider...
1) When you say "sort like reddit" based on rank or date, do you mean the top-level or the whole thing?
2) When you delete a node, what happens to the branches? Do you re-parent them? In my implementation, I'm thinking that the editors will decide--either hide the node and display it as "comment hidden" along with the visible children, hide the comment and it's children, or nuke the whole tree. Re-parenting should be easy (just set the chidren's parent to the deleted's parent), but it anything involving the whole tree seems to be tricky to implement in the database.
I've been looking at the ltree module for PostgreSQL. It should make database operations involving parts of the tree a bit faster. It basically lets you set up a field in the table that looks like:
ltreetest=# select path from test where path <# 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
However, it doesn't ensure any kind of referential integrity on its own. In other words, you can have a records for "Top.Science.Astronomy" without having a record for "Top.Science" or "Top". But what it does let you do is stuff like:
-- hide the children of Top.Science
UPDATE test SET hide_me=true WHERE path #> 'Top.Science';
or
-- nuke the cosmology branch
DELETE FROM test WHERE path #> 'Top.Science.Cosmology';
If combined with the traditional "comment_id"/"parent_id" approach using stored procedures, I'm thinking you can get the best of both worlds. You can quickly traverse the comment tree in the database using your "path" and still ensure referential integrity via "comment_id"/"parent_id". I'm envisioning something like:
CREATE TABLE comments (
comment_id SERIAL PRIMARY KEY,
parent_comment_id int REFERENCES comments(comment_id) ON UPDATE CASCADE ON DELETE CASCADE,
thread_id int NOT NULL REFERENCES threads(thread_id) ON UPDATE CASCADE ON DELETE CASCADE,
path ltree NOT NULL,
comment_body text NOT NULL,
hide boolean not null default false
);
The path string for a comment look like be
<thread_id>.<parent_id_#1>.<parent_id_#2>.<parent_id_#3>.<my_comment_id>
Thus a root comment of thread "102" with a comment_id of "1" would have a path of:
102.1
And a child whose comment_id is "3" would be:
102.1.3
A some children of "3" having id's of "31" and "54" would be:
102.1.3.31
102.1.3.54
To hide the node "3" and its kids, you'd issue this:
UPDATE comments SET hide=true WHERE path #> '102.1.3';
I dunno though--it might add needless overhead. Plus I don't know how well maintained ltree is.
Your current design is basically fine for small hierarchies (less than thousand items)
If you want to fetch on a certian level or depth, add a 'level' item to your structure and compute it as part of the save
If performance is an issue use a decent cache
I'd add the following new fields to the above tabel:
thread_id: identifier for all comments attached to a specific object
date: the comment date (allows fetching the comments in order)
rank: the comment rank (allows fetching the comment order by ranking)
Using these fields you'll be able to:
fetch all comments in a thread in a single op
order comments in a thread either by date or rank
Unfortunately if you want to preserve your queries DB close to SQL standard you'll have to recreate the tree in memory. Some DBs are offering special queries for hierarchical data (f.e. Oracle)
./alex

Resources