Are composite indexes supported in Memgraph? - memgraphdb

Can I create index on more than one property for any given label? I am running query like:
MATCH (n:Node) WHERE n.id = xyz AND n.source = abc;
and I want to know whether I can create multiple property indexes and if not, is there a good way to query for nodes matching multiple properties?

Memgraph does not support composite indexes, but you can create multiple label-property indexes. For example, run
CREATE INDEX ON Node(id);
CREATE INDEX ON Node(source);
To check if they are properly created, run SHOW INDEX INFO;.
Use EXPLAIN/PROFILE query (inspecting and profiling queries) to see different options and test the performance, maybe one label-property index is good enough.

Related

How to create index for realtionships when using apoc.algo.dijkstra in neo4j

I want to create appropriate indexes on relationships which increase performance when using the apoc.algo.dijkstra algorithm.
My query looks like this
MATCH (a:Waypoint {name: 'nameTwo'}), (b:Waypoint{name: 'nameOne'}) CALL apoc.algo.dijkstra(a, b, 'STREET_A>|STREET_B>', 'distance') yield path as path, weight as distance RETURN path, distance;
I want indexes for the relationship types 'STREET_A' and 'STREET_B'
I tried to create indexes like these, but it does not seem to make performance difference:
CREATE INDEX STREET_A_INDEX IF NOT EXISTS FOR ()-[r:STREET_A]-() ON (r.distance)
This is the result with PROFILE:
Plan image
Is it at all possible to make apoc.algo.dijkstra more performant via an index?
I don't think a relationship index will help in this case. Indexes are used in Neo4j to find the starting point of a traversal - in your case that would be finding the two Waypoint nodes. From the query plan, it looks like you already have an index that is being used to find those nodes.
Neo4j has what's called "index-free adjacency". This means that a traversal from one node to another following a relationship doesn't use an index, rather the operation is more like following pointers - so a relationship index would not be used in your example.

faster search for a substring through large document

I have a csv file of more than 1M records written in English + another language. I have to make a UI that gets a keyword, search through the document, and returns record where that key appears. I look for the key in two columns only.
Here is how I implemented it:
First, I made a postgres database for the data stored in the CSV file. Then made a classic website where the user can enter a keyword. This is the SQL query that I use(In spring boot)
SELECT * FROM table WHERE col1 LIKE %:keyword% OR col2 LIKE %:keyword%;
Right now, it is working perfectly fine, but I was wondering how to make search faster? was using SQL instead of classic document search better?
If the document is only searched once and thrown away, then it's overhead to load into a database. Instead can search the file directly using the nio parallel search feature which uses multiple threads to concurrently search the file:
List<Record> result = Files.lines("some/path")
.parallel()
.unordered()
.map(l -> lineToRecord(l))
.filter(r -> r.getCol1().contains(keyword) || r.getCol2().contains(keyword))
.collect(Collectors.toList());
NOTE: need to provide the lineToRecord() method and the Record class.
If the document is going to be searched over and over again, then can think about indexing the document. This means pre-processing the document to suit the search requirements. In this case it's keywords of col1 and col2. An index is like a map in java, eg:
Map<String, Record> col1Index
But since you have the "LIKE" semantics, this is not so easy to do as it's not as simple as splitting the string by white space since the keyword could match a substring. So in this case it might be best to look for some tool to help. Typically this would be something like solr/lucene.
Databases can also provide similar functionality eg: https://www.postgresql.org/docs/current/pgtrgm.html
For LIKE queries, you should look at the pg_trgm index type with the gin_trgm_ops operator class. You shouldn't need to change query at all, just build the index on each column. Or maybe one multi-column index.

Which is the best way to index the data from relational database table of One to many relationship

Can you please let me know which is the best way to index the records in elastic search for my scenario.
My Scenario is :
1) Need to index around 40 million records from oracle table which has entries having one to many relationship records. And the uniqueness of the records is based on the composite key with 4 columns
2) After indexing , Search should support "full text search" on all the fields
3) Filters and sorting on selected fields needs to be supported.
After going through the official documentation i found couple of options , but want to know which approach would be most useful among below
1) For each record in table create a entry in the elastic index
2) Create a nested json object based on the composite key and then add this elastic index
3)Parent child Relationship mechanism and application side joins are not suitable for my scenario
Thanks
Girish T S
Your question is not particularly clear, here's how I understand it: you have 40M child records in one table, each with a reference to a parent record.
You want to index your records so as to be able to search for a parent record whose children match certain criteria.
There are two solutions here:
Indexing one document per parent, with all children indexed as nested documents within the parent
Indexing each child record as a separate document, with a parent-child relationship in ElasticSearch
The first solution will have better performance, but it means that every time a child is updated, the full parent document must be reindexed with all its children.
In any case you're saying that a parent-child scheme is not suitable for your case, so you're left with only the first solution.

Sort by a different index's values

Given two indexes, I'm trying to sort the first based on values of the second.
For example, Index 1 ('Products') has fields id, name. Index 2 ('Prices') has fields id, price.
Struggling to figure out how to sort 'Products' by the 'Prices'.price, assuming the ids match. Reason for this quest is that hypothetically the 'Products' index becomes very large (with duplicate ids), and updating all documents becomes expensive.
Elasticsearch is a document based store, rather than a column based store. What you're looking for is a way to JOIN the two indices, however this is not supported in Elasticsearch. The 'Elasticsearch way' of storing these documents is to have 1 index that contains all relevant data. If you're worried about update procedures taking very long, look into creating an index with an Alias. When you need to do a major update, do it to a new index and only when you're done switch the alias target to the new index, this will allow you to update you data seamlessly

Oracle string search performance issue

I have a simple search store procedure in Oracle 11GR2 in a table with over 1.6 million records. I am puzzled by the fact that if I want to search for a work inside a column, such as "%boston%", it would take 12 seconds. I have an index on the name collumn.
select description from travel_websites where name like "%boston%";
If I only search for a word start with Boston like "boston%", it only takes 0.15 seconds.
select description from travel_websites where name like "boston%";
I added an index hint and try to force optimizer to use my index on the name column, it did not help either.
select description /*+ index name_idx */ from travel_websites where name like "%boston%";
Any advises would be greatly appreciated.
You cannot use an index range scan for a predicate that has a leading wildcard (i.e. like '%boston%'). This makes sense if you think about how an index is stored on disk-- if you don't know what the first character of the string you are searching is, you can't traverse the index to look for index entries that match that string. You may be able to do a full scan of the index where you read every leaf block and search the name there to see if it contains the string you want. But that requires a full scan of the index plus you then have to visit the table for every ROWID you get from the index in order to fetch any columns that are not part of the index that you just full-scanned. Depending on the relative size of the table and the index and how selective the predicate is, the optimizer may easily decide that it is quicker to just do a table scan if you're searching for a leading wildcard.
Oracle does support full text search but you have to use Oracle Text which would require that you build an Oracle Text index on the name column and use the CONTAINS operator to do the search rather than using a LIKE query. Oracle Text is very robust product so there are quite a few options to consider both in building the index, refreshing the index, and building the query depending on how sophisticated you want to get.
Your index hint is not correctly specified. Assuming there is an index on name, that the name of that index is name_idx, and that you want to force a full scan of the index (just to reiterate, a range scan on the index is not a valid option if there is a leading wildcard), you would need something like
select /*+ index(travel_websites name_idx) */ description
from travel_websites
where name like '%boston%'
There is no guarantee, however, that a full index scan is going to be any more efficient than a full table scan. And it is entirely possible that the optimizer is choosing the index full scan already without the hint (you don't specify what the query plans are for the three queries).
Oracle (and as far as I know most other databases) by default indexes strings so that the index can only be used to look up string matches from the start of the string. That means, a LIKE 'boston%' (startswith) will be able to use the index, while a LIKE '%boston' (endswith) or LIKE '%boston%' (contains) will not.
If you really need indexes that can find substrings fast, you can't use the regular index types for strings, but you can use TEXT indexes which sadly may require slightly different query syntax.

Resources