How to use startsWith and endsWith function in SparkR 2.3.0? - sparkr

I can't understand how to use startsWith and endsWith function in SparkR 2.3.0.
I thought that I could use it like starts_with command of dplyr as below, but an error occurred.
If you'd kindly teach me.
> df <- read.df("/hadoop/tmp/iris.csv", "csv", header = "true")
> showDF(select(df, startsWith(columns(df), "Sepal")))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'select' for signature '"SparkDataFrame", "logical"'

The startsWith and endsWith functions operate on columns, not on a dataframe.
To do the select you are attempting you can use
df <- as.DataFrame(iris)
df_sepal <- select(df, names(df)[grepl("Sepal", names(df))])
To use startsWith() you need to pass a column as an argument, as well as the string you are checking. For example,
df_v <- filter(df, startsWith(df$Species, "v") == TRUE)
will filter for only the rows where Species begins with 'v' (versicolor, virginica)
df_a <- filter(df, endsWith(df$Species, "a") == TRUE)
will filter for only the rows where Species ends with 'a' (setosa, viginica)

Related

pass dataframe column as parameter in xpath

I am using xpath in pyspark to extract from xml which is stored as a column in a table.
Below works fine
entity_id="D8"
dfquestionstep=df_source_xml.selectExpr("disclosure_entity_id",
f'xpath(**xml**,"*/entities/entity[#type=\'TI\']/entity[#type=\'UNDERWRITING\']/entity[#type=\'DISCLOSURES\']/entity[**#id=\'{entity_id}\'**]/entity[#type=\'DECISION_PATH\']/entity[#type=\'QUESTION_STEP\']/#id") QUESTION_STEP_ID'
)
PROBLEM
Now I want to pass disclosure_entity_id which is a column in dataframe having values like D8, D9 etc. in place of entity_id, i.e. entity[#id=disclosure_entity_id]
But all I get is [] as result when I execute like this, i.e. xpath fails to find anything.
Is there a way to pass the DF column directly as argument to XPATH like above?
Some testdata:
data = [
['a','<x><a>a1</a><b>b1</b><c>c1</c></x>'],
['b','<x><a>a2</a><b>b2</b><c>c2</c></x>'],
['c','<x><a>a3</a><b>b3</b><c>c3</c></x>'],
]
df = spark.createDataFrame(data, ['col','data'])
Attempt 1:
Creating a column with an XPath expression can be done:
from pyspark.sql import functions as f
df.withColumn('my_path', f.concat(f.lit('//'), f.col('col'))) \
.selectExpr('xpath(data, my_path)').show()
But unfortunately the code above only yields the error message
AnalysisException: cannot resolve 'xpath(`data`, `my_path`)' due to data type mismatch:
path should be a string literal; line 1 pos 0;
The path parameter of the xpath function has to be a constant string. This string is parsed before Spark even looks at the data.
Attempt 2:
Another option is to use an udf and use standard Python functions to process the XPath expression inside of the udf:
import xml.etree.ElementTree as ET
from pyspark.sql import types as T
def find_val(col, data):
result= ET.fromstring(data).find(f'.//{col}')
if not result is None:
return result.text
find_val_udf=f.udf(find_val, returnType=T.StringType())
df.select('col', 'data', find_val_udf('col', 'data')).show(truncate=False)
Output:
+---+----------------------------------+-------------------+
|col|data |find_val(col, data)|
+---+----------------------------------+-------------------+
|a |<x><a>a1</a><b>b1</b><c>c1</c></x>|a1 |
|b |<x><a>a2</a><b>b2</b><c>c2</c></x>|b2 |
|c |<x><a>a3</a><b>b3</b><c>c3</c></x>|c3 |
+---+----------------------------------+-------------------+

Compare two arrays using QueryDSL

Need assistance with QueryDSL predicate composition - how to write QueryDSL predicate to compare two arrays(find any UUID matches between two arrays) using && operator like this:
select '{e48f54d5-9845-4987-a53d-e0ecfe3dbb43}'::uuid[] && '{e48f54d5-9845-4987-a53d-e0ecfe3dbb43,4e9a43f2-cb23-4f1b-9f7f-c09687d97570}'::uuid[];
Using:
Cockroach - v20.1.7,
QueryDSL - v4.3.1
Tried the following way:
private BooleanBuilder createPredicates(QPlayer player, List<UUID> otherUuids) {
predicates.and(player.listOfUuids.any().in(otherUuids)); // player.listOfUuids is type of ListPath<java.util.UUID, ComparablePath<java.util.UUID>>
return predicates;
}
But it raise exception:
java.lang.IllegalStateException: name property not available for path of type COLLECTION_ANY. Use getElement() to access the generic path element.
Also tried to create booleanTemplate like this:
predicates.and(Expressions.booleanTemplate("{0} && '{{1}}'::uuid[]", player.listOfUuids, StringUtils.join(",", otherUuids)));
It returns such a SQL:
select ... where player.business_unit_ids && '{$1}'::uuid[]
But execution of it raise exception:
io.r2dbc.postgresql.ExceptionFactory$PostgresqlNonTransientResourceException: [08P01] received too many type hints: 1 vs 0 placeholders in query
Because it interpretates extra '{' and '}' which required to use to wrap it in uuid array as another placeholder. And it doesn't respect special symbol escaping or unicode also.
Any thoughts how two array comparison might be achieved using QueryDSL?
Figured out how to add desired predicate with && overlap operator:
predicates.and(Expressions.booleanTemplate("{0} && {1}::uuid[]", arg0, String.format("{%s}", arg1.stream().map(UUID::toString).collect(joining(",")))))
And it's working based on example query:
select '{e48f54d5-9845-4987-a53d-e0ecfe3dbb43,e48f54d5-9845-4987-a53d-e0ecfe3dbb45}'::uuid[] && '{e48f54d5-9845-4987-a53d-e0ecfe3dbb40,e48f54d5-9845-4987-a53d-e0ecfe3dbb45}'::uuid[];
I didn't found that QueryDSL will support && overlap operator in Ops.class that I will be able to write this predicate different way.

eXist-DB / XQuery StringValue cannot be cast to AnyURIValue (using compression:zip)

In eXist 4.4/XQuery 3.1, I am building a function to compress a number of xml files into a zip using compression:zip.
I have one function which collects all the URIs for the documents to be compressed, schedule:get-document-uris-for-zip(xmlid as xs:string). This function returns lists of documents like the following:
/db/apps/deheresi/data/MS609-0001.xml
/db/apps/deheresi/data/MS609-0002.xml
/db/apps/deheresi/data/MS609-0003.xml
/db/apps/deheresi/data/MS609-0004.xml
/db/apps/deheresi/data/MS609-0005.xml
/db/apps/deheresi/data/MS609-0006.xml
/db/apps/deheresi/data/MS609-0007.xml
/db/apps/deheresi/data/MS609-0008.xml
/db/apps/deheresi/data/MS609-0009.xml
/db/apps/deheresi/data/MS609-0010.xml
This function is called by the compression function as follows
declare function schedule:create-zip-by-batch()
{
let $batch := doc(concat($globalvar:URIdocuments,"document_collections.xml"))
for $entry in $batch//collection[#compile="y"]
let $zipobject := compression:zip(schedule:get-document-uris-for-zip($entry/string(#xml:id)),false())
let $zipstore := xmldb:store("/db/apps/deheresi/documents",
"MS609_tei.zip",
$zipobject)
return $zipstore
};
This is throwing a cast error as follows, but I can't identify how to resolve this...
org.exist.xquery.value.StringValue cannot be cast to org.exist.xquery.value.AnyURIValue
Many thanks in advance.
Edit - I'm adding here the part of the function schedule:get-document-uris-for-zip(xmlid as xs:string) which outputs the list of URIs. The URIs are built through string concatenation:
(: get names of documents which meet criteria :)
let $list := xmldb:get-child-resources("/db/apps/deheresi/data")[starts-with(., $y/string(#filename)) and ends-with(., $y/string(#ext))]
(: create URI for each document :)
return
for $n in $list
return concat("/db/apps/deheresi/data/",$n)
You're right to find this function a bit confusing. The (eXist-specific) compression:zip() function $sources parameter is typed as if it is quite flexible way, as xs:anyType()+. But really it is quite strict about the two types of item it accepts: a sequence of URIs (i.e., of type xs:anyURI), or a sequence of <entry> elements:
<entry name="filename.ext"
type="collection|uri|binary|xml|text"
method="deflate|store"
>data</entry>
See https://exist-db.org/exist/apps/fundocs/view.html?uri=http://exist-db.org/xquery/compression#zip.2.
The problem with your code is that you are passing strings in your $sources parameter, and have not cast these strings as xs:anyURI.
Here is sample working code:
xquery version "3.1";
let $prepare :=
(
xmldb:create-collection("/db", "test"),
xmldb:store("/db/test", "test.xml", <test/>)
)
let $zip := compression:zip("/db/test/test.xml" cast as xs:anyURI, false())
return
xmldb:store("/db/test", "test.zip", $zip)

Linq to XML + chaining expressions

I have the following code that's repeated:
var ccaNumber = (from r in xDoc.Elements("ResultSet").Elements("DataRow")
where Convert.ToInt32(r.Element("PaymentPlanNumber").Value) == payPlan.OrderNumber
Ideally, I want to create the above as an expression then add my clause to the end of it.
So, I created the expression as follows:
Expression currExp = from r in xDoc.Elements("ResultSet").Elements("DataRow")
where Convert.ToInt32(r.Element("PaymentPlanNumber").Value) == payPlan.OrderNumber;
I now want to combine them:
var ccaNumber = (currExp select r.Element("CreditCardAuthorityNumber").Value).FirstOrDefault();
However I now get the following error:
Invalid expression term ')'
Any suggestions?
ta,
yogi
I think you are mixing things up here.
What you can do is:
var items = from r in xDoc.Elements("ResultSet").Elements("DataRow")
where Convert.ToInt32(r.Element("PaymentPlanNumber").Value) == payPlan.OrderNumber
select r;
This declares items as a Enumerable of elements that match your Where-Condition.
And then you can use those defined items like this:
var ccaNumber = items.Select(item=>item.Element("CreditCardAuthorityNumber").Value).FirstOrDefault();
However, this is all utilising lazy evaluation and you need to take care of multiple enumerations here. Here is a pretty indepth explanaition that is way better than my sh*tty english.
When adding to an existing expression, you need to use the lambda syntax, not the Linq syntax:.
Try:
var ccaNumber = (currExp
.Select(r=>r.Element("CreditCardAuthorityNumber").Value))
.FirstOrDefault();

Database query using array in ruby

I'm trying to find all rows with value in array, This is my code
require 'sqlite3'
db = SQLite3::Database.new('test.sqlite')
res = db.query("SELECT w1.synsetid
FROM words w1
WHERE w1.wordid IN (?)", arr)
arr: array of strings
And I get this error
SQLite3::RangeException: bind or column index out of range
Any Help?
The second argument to query is meant to be an array of placeholder values:
- (Object) query(sql, bind_vars = [], *args)
This is a convenience method for creating a statement, binding
paramters to it, and calling execute:
The query method doesn't know that it should treat your arr array specially, it just sees one placeholder and multiple values.
I think you have to do this this hard way: build the appropriate number of placeholders and paste them into the SQL. Something like this:
placeholders = (['?'] * arr.length).join(',')
res = db.query("select ... where w1.wordid in (#{placeholders})", arr)
You know exactly what is in placeholders so you don't have to worry about using string interpolation and injection issues when building your SQL like this.
If you're using Rails already then you could also wrap your SQLite tables with ActiveRecord and then use the usual ActiveRecord interface:
words = Word.where(:wordid => arr)

Resources