How do I create a Vertex with mandatory values in 3.1.1 MultiModel API - orientdb3.0

We are migrating from Tinkerpop 2.6 to MMAPI but could find the way to create a vertex with mandatory values.
In Tinkerpop we do this:
OrientGraphFactory factory = new OrientGraphFactory(“remote:localhost/ogm-test”, “root”, “toor”).setupPool(1, 10);
OrientGraph g = factory.getTx();
OrientVertex v1 = g.addVertex(“class:SimpleVertexEx”,“svex”,“directTest”);
and in MMAPI:
OrientDB dbServer = new OrientDB(“remote:localhost”,OrientDBConfig.defaultConfig());
ODatabaseSession db = dbServer.open(“ogm-test”, “root”, “toor”);
db.begin();
OVertex v1 = db.newVertex(“class:SimpleVertexEx”);
v1.setProperty(“svex”,“directTest”);
v1.save();
but this fail at newVertex line. How must we do that?

I make a mistake translating the code.
In Tinker you must put
g.addVertex(“class:SimpleVertexEx”,“svex”,“directTest”);
but the word "class:" is not necessary in MMAPI.
The correct statement is:
OVertex v1 = db.newVertex(“SimpleVertexEx”);
after that you get a vertex to fill.

Related

Losing the crs when writing to .gpkg with geopandas

When I write my .gpkg I am losing the CRS. I have tried setting the CRS with .set_crs, or adding the CRS when writing the .gpkg (which creates a fault - "fiona._env - WARNING - dataset filename.gpkg does not support layer creation option EPSG"
My code
for layername in fiona.listlayers(file):
vector = geopandas.read_file(file, layer=layername)
vector.set_crs(4326)
vector.to_file(filename + ".gpkg", layer = layername, driver='GPKG')
or
for layername in fiona.listlayers(file):
vector = geopandas.read_file(file, layer=layername)
vector.to_file(filename + ".gpkg", layer = layername, driver='GPKG', epsg=4326)
neither works.
vector.set_crs(4326) does not work in place by default. You either need to assign it or specify inplace=True.
for layername in fiona.listlayers(file):
vector = geopandas.read_file(file, layer=layername)
# vector.set_crs(4326, inplace=True) # one option
vector = vector.set_crs(4326) # other option
vector.to_file(filename + ".gpkg", layer = layername, driver='GPKG')
Your second attempt does not work because to_file does not have espg keyword you are trying to lose and that gets lost among arguments passed to Fiona and GDAL (which silently ignores it).

Very slow connection to Snowflake from Databricks

I am trying to connect to Snowflake using R in databricks, my connection works and I can make queries and retrieve data successfully, however my problem is that it can take more than 25 minutes to simply connect, but once connected all my queries are quick thereafter.
I am using the sparklyr function 'spark_read_source', which looks like this:
query<- spark_read_source(
sc = sc,
name = "query_tbl",
memory = FALSE,
overwrite = TRUE,
source = "snowflake",
options = append(sf_options, client_Q)
)
where 'sf_options' are a list of connection parameters which look similar to this;
sf_options <- list(
sfUrl = "https://<my_account>.snowflakecomputing.com",
sfUser = "<my_user>",
sfPassword = "<my_pass>",
sfDatabase = "<my_database>",
sfSchema = "<my_schema>",
sfWarehouse = "<my_warehouse>",
sfRole = "<my_role>"
)
and my query is a string appended to the 'options' arguement e.g.
client_Q <- 'SELECT * FROM <my_database>.<my_schema>.<my_table>'
I can't understand why it is taking so long, if I run the same query from RStudio using a local spark instance and 'dbGetQuery', it is instant.
Is spark_read_source the problem? Is it an issue between Snowflake and Databricks? Or something else? Any help would be great. Thanks.

How to set the starting point when using the Redis scan command in spring boot

i want to migrate 70million data redis(sentinel-mode) to redis(cluster-mode)
ScanOptions options = ScanOptions.scanOptions().build();
Cursor<byte[]> c = sentinelTemplate.getConnectionFactory().getConnection().scan(options);
while(c.hasNext()){
count++;
String key = new String(c.next());
key = key.trim();
String value = (String)sentinelTemplate.opsForHash().get(key,"tc");
//Thread.sleep(1);
clusterTemplate.opsForHash().put(key, "tc", value);
}
I want to scan again from a certain point because redis connection disconnected at some point.
How to set the starting point when using the Redis scan command in spring boot?
Moreover, whenever the program is executed using the above code, the connection is broken when almost 20 million data are moved.

How to do _cat/indices/<index_name_with_reg_ex> with JAVA API?

I have some indexes with name test-1-in, test-2-in, test-3-in. I want to do _cat/indices/test-*-in from JAVA API. How to do this?
I tried using the IndexAdminClient but no luck.
Given an ElasticSearch Client object:
client.admin().indices()
.getIndex(new GetIndexRequest().indices("regex-*"))
.actionGet().getIndices();
In addition to Mario's answer, use the following to retrieve the indices with the Elasticsearch 6.4.0 high level REST client:
GetIndexRequest request = new GetIndexRequest().indices("*");
GetIndexResponse response = client.indices().get(request, RequestOptions.DEFAULT);
String[] indices = response.getIndices();
I have a solution:
final ClusterStateRequest clusterStateRequest = new ClusterStateRequest();
clusterStateRequest.clear().metaData(true);
final IndicesOptions strictExpandIndicesOptions = IndicesOptions.strictExpand();
clusterStateRequest.indicesOptions(strictExpandIndicesOptions);
ClusterStateResponse clusterStateResponse = client.admin().cluster().state(clusterStateRequest).get();
clusterStateResponse.getState().getMetadata().getIndices()
This will give all indexes. After that the reg ex matching has to be done manually. This is what is done for _cat implementation in elasticsearch source code.
In case you want to cat indices with ?v option:
IndicesStatsRequestBuilder indicesStatsRequestBuilder = new
IndicesStatsRequestBuilder(client, IndicesStatsAction.INSTANCE);
IndicesStatsResponse response = indicesStatsRequestBuilder.execute().actionGet();
for (Map.Entry<String, IndexStats> m : response.getIndices().entrySet()) {
System.out.println(m);
}
Each of the entries contains document count, storage usage, etc. You can run this for all or filtering some index.
PD: Tested with 5.6.0 version

Using apply functions in SparkR

I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is no longer directly available. Example:
x = c(1,2)
xDF_R = data.frame(x)
colnames(xDF_R) = c("number")
xDF_S = createDataFrame(sqlContext,xDF_R)
Now, I can use the function sapply on the data.frame object
xDF_R$result = sapply(xDF_R$number, ppois, q=10)
When I use a similar logic on the DataFrame
xDF_S$result = sapply(xDF_S$number, ppois, q=10)
I get the error message "Error in as.list.default(X) :
no method for coercing this S4 class to a vector"
Can I somehow do this?
This is possible with user defined functions in Spark 2.0.
wrapper = function(df){
+ out = df
+ out$result = sapply(df$number, ppois, q=10)
+ return(out)
+ }
> xDF_S2 = dapplyCollect(xDF_S, wrapper)
> identical(xDF_S2, xDF_R)
[1] TRUE
Note you need a wrapper function like this because you can't pass the extra arguments in directly, but that may change in the future.
The native R functions do not support Spark DataFrames. We can use user defined functions in SparkR to execute native R modules. These are executed on the executors and thus the libraries must be available on all the executors.
For example, suppose we have a custom function holt_forecast which takes in a data.table as an argument.
Sample R code
sales_R_df %>%
group_by(product_id) %>%
do(holt_forecast(data.table(.))) %>%
data.table(.) -> dt_holt
For using UDFs, we need to specify the schema of the output data.frame returned by the execution of the native R method. This schema is used by Spark to generate back the Spark DataFrame.
Equivalent SparkR code
Define the schema
structField("product_id", "integer"),
structField("audit_date", "date"),
structField("holt_unit_forecast", "double"),
structField("holt_unit_forecast_std", "double")
)
Execute the method
library(data.table)
library(lubridate)
library(dplyr)
library(forecast)
sales <- data.table(x)
y <- data.frame(key,holt_forecast(sales))
}, dt_holt_schema)
Reference: https://shbhmrzd.medium.com/stl-and-holt-from-r-to-sparkr-1815bacfe1cc

Resources