How to use cq in quasi quote to return the matched pattern - scala-macros

I am trying to write this case authorDao: AuthorDao => authorDao so that it returns the subclass of Dao itself.
When I use this quasi quote:
val daoType = TypeName(daoName)
val caseTerm = TermName(daoName.toLowerCase)
cases.append(cq"$caseTerm: $daoType=> $caseTerm")
It generates this
case (authordao # ((_): AuthorDao)) => authordao
And if I do this
cases.append(cq"${q"$caseTerm: $daoType"} => $caseTerm")
It does this
case ((authordao): AuthorDao) => authordao
Both are produce compile errors

After some googling, I found the answer here:
Scala multiple type pattern matching
Basically
case authordao: AuthorDao => authordao is equivalent to this
case authordao # AuthorDao(_) => authordao
So the final code is this
val daoTerm = TermName(daoName)
val caseType = TypeName(daoName.toLowerCase())
val caseExpr = TermName(daoName.toLowerCase)
cases.append(cq"$caseType # $daoTerm(_) => $caseExpr")

Related

require solution for string replacement

I have a code where I have to replace a string with another string.
My file contains
secondaryPort = 7504
The code below
filtered_data =
filtered_data.gsub(
/secondaryPort=\d+/,
'secondaryPort=' + node['server']['secondaryPort']
)
should replace my file with
secondaryPort = 7555
but it fails to do so.
Make sure you account for the spaces around the equals sign in your string:
filtered_data = 'secondaryPort = 7504'
=> 'secondaryPort = 7504'
# with literal spaces
filtered_data.gsub(/secondaryPort = \d+/, 'secondaryPort = 7555')
=> 'secondaryPort = 7555'
# with regex character class for literal space
filtered_data.gsub(/secondaryPort\s{1}=\s{1}\d+/, 'secondaryPort = 7555')
=> 'secondaryPort = 7555'

Oracle decode logic implementation using Slick

I have following problem - there is sql with DECODE oracle function:
SELECT u.URLTYPE, u.URL
FROM KAA.ENTITYURLS u
JOIN KAA.ENTITY e
ON decode(e.isurlconfigured, 0, e.urlparentcode, 1, e.CODE,
NULL)=u.ENTITYCODE
JOIN CASINO.Casinos c ON e.casinocode = c.code
WHERE e.NAME = $entityName
AND C.NAME = $casinoName
I'm trying to realize this sql in my slick code , like:
val queryUrlsEntityName = for {
entityUrl <- entityUrls
entity <- entities.filter(e => e.name.trim===entityName &&
entityUrl.entityCode.asColumnOf[Option[Int]]==(e.isURLConfigured match
{
case Some(0) => e.urlParentCode
case Some(1) => e.code.asColumnOf[Option[Int]]
case _ => None
}
)
)
casino <- casinos.filter(_.name.trim===casinoName) if
entity.casinoCode==casino.code
} yield (entityUrl)
But I don't understand how can I implement of matching of values in line
case Some(0) => e.urlParentCode
because I'm getting error
constructor cannot be instantiated to expected type;
[error] found : Some[A]
[error] required: slick.lifted.Rep[Option[Int]]
[error] case Some(0) => e.urlParentCode
Thanks for any advice
You should rewrite your code in pattern-matching section so you could compare required Rep[Option[Int]] - to left type, in your case it's Option[Int], or transform Rep[Option[Int]] to Option[Int] type. Rep is only the replacement to the column datatype in slick. I would prefer the first variant - this answer shows how to make the transformation from Rep, or you can use map directly:
map(u => (u.someField)).result.map(_.headOption.match {
case Some(0) => .....
})

join in Spark outputs wrong result whereas map-side join is correct

My spark version is 1.2.0, and here's the scenario:
There are two RDDs, namely RDD_A and RDD_B, whose data structure are all RDD[(spid, the_same_spid)]. RDD_A has 20,000 lines whereas RDD_B 3,000,000,000 lines. I intend to calculate line count of RDD_B whose 'spid' exists in RDD_A.
My first implementation is quite mainstream, applying join method from RDD_B on RDD_A:
val currentDay = args(0)
val conf = new SparkConf().setAppName("Spark-MonitorPlus-LogStatistic")
val sc = new SparkContext(conf)
//---RDD A transforming to RDD[(spid, spid)]---
val spidRdds = sc.textFile("/diablo/task/spid-date/" + currentDay + "-spid-media").map(line =>
line.split(",")(0).trim).map(spid => (spid, spid)).partitionBy(new HashPartitioner(32));
val logRdds: RDD[(LongWritable, Text)] = MzFileUtils.getFileRdds(sc, currentDay, "")
val logMapRdds = MzFileUtils.mapToMzlog(logRdds)
//---RDD B transforming to RDD[(spid, spid)]---
val tongYuanRdd = logMapRdds.filter(kvs => kvs("plt") == "0" && kvs("tp") == "imp").map(kvs => kvs("p").trim).map(spid => (spid, spid)).partitionBy(new HashPartitioner(32));
//---join---
val filteredTongYuanRdd = tongYuanRdd.join(spidRdds);
println("Total TongYuan Imp: " + filteredTongYuanRdd.count())
However, the result is incorrect (bigger than) when comparing to the hive's one. When changing the join method from reduce-side join to map-side join as below, the result is just the same as the hive's result:
val conf = new SparkConf().setAppName("Spark-MonitorPlus-LogStatistic")
val sc = new SparkContext(conf)
//---RDD A transforming to RDD[(spid, spid)]---
val spidRdds = sc.textFile("/diablo/task/spid-date/" + currentDay + "-spid-media").map(line =>
line.split(",")(0).trim).map(spid => (spid, spid)).partitionBy(new HashPartitioner(32));
val logRdds: RDD[(LongWritable, Text)] = MzFileUtils.getFileRdds(sc, currentDay, "")
val logMapRdds = MzFileUtils.mapToMzlog(logRdds)
//---RDD B transforming to RDD[(spid, spid)]---
val tongYuanRdd = logMapRdds.filter(kvs => kvs("plt") == "0" && kvs("tp") == "imp").map(kvs => kvs("p").trim).map(spid => (spid, spid)).partitionBy(new HashPartitioner(32));
//---join---
val globalSpids = sc.broadcast(spidRdds.collectAsMap());
val filteredTongYuanRdd = tongYuanRdd.mapPartitions({
iter =>
val m = globalSpids.value
for {
(spid, spid_cp) <- iter
if m.contains(spid)
} yield spid
}, preservesPartitioning = true);
println("Total TongYuan Imp: " + filteredTongYuanRdd.count())
As you can see, the only difference between the above two code snippets is the 'join' part.
So, is there any suggestions on addressing this problem? Thanks in advance!
Spark's join doesn't enforce uniquiness of key, and when the key is duplicated actually outputs the cross product for that key. Using cogroup and only outputting on k/v pair for each key, or maping to just the ids and then using intersection will do the trick.

Slow performance in spark streaming

I am using spark streaming 1.1.0 locally (not in a cluster).
I created a simple app that parses the data (about 10.000 entries), stores it in a stream and then makes some transformations on it. Here is the code:
def main(args : Array[String]){
val master = "local[8]"
val conf = new SparkConf().setAppName("Tester").setMaster(master)
val sc = new StreamingContext(conf, Milliseconds(110000))
val stream = sc.receiverStream(new MyReceiver("localhost", 9999))
val parsedStream = parse(stream)
parsedStream.foreachRDD(rdd =>
println(rdd.first()+"\nRULE STARTS "+System.currentTimeMillis()))
val result1 = parsedStream
.filter(entry => entry.symbol.contains("walking")
&& entry.symbol.contains("true") && entry.symbol.contains("id0"))
.map(_.time)
val result2 = parsedStream
.filter(entry =>
entry.symbol == "disappear" && entry.symbol.contains("id0"))
.map(_.time)
val result3 = result1
.transformWith(result2, (rdd1, rdd2: RDD[Int]) => rdd1.subtract(rdd2))
result3.foreachRDD(rdd =>
println(rdd.first()+"\nRULE ENDS "+System.currentTimeMillis()))
sc.start()
sc.awaitTermination()
}
def parse(stream: DStream[String]) = {
stream.flatMap { line =>
val entries = line.split("assert").filter(entry => !entry.isEmpty)
entries.map { tuple =>
val pattern = """\s*[(](.+)[,]\s*([0-9]+)+\s*[)]\s*[)]\s*[,|\.]\s*""".r
tuple match {
case pattern(symbol, time) =>
new Data(symbol, time.toInt)
}
}
}
}
case class Data (symbol: String, time: Int)
I have a batch duration of 110.000 milliseconds in order to receive all the data in one batch. I believed that, even locally, the spark is very fast. In this case, it takes about 3.5sec to execute the rule (between "RULE STARTS" and "RULE ENDS"). Am I doing something wrong or this is the expected time? Any advise
So i was using case matching in allot of my jobs and it killed performance, more than when i introduced a json parser. Also try tweaking the batch time on the StreamingContext. It made quite a bit of difference for me. Also how many local workers do you have?

LINQ DataLoadOptions.loadwith

This is the code I have for loading my data entities.
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<msPlaylistItem>(m => m.tbMedia);
dlo.LoadWith<tbMedia>(a => a.tbArtists);
dlo.LoadWith<msNote>(n => n.tbMedia.msNotes);
db.LoadOptions = dlo;
dlo.LoadWith(n => n.tbMedia.msNotes); This is the line I am having a problem with. This is the error "The expression specified must be of the form p.A, where p is the parameter and A is a property or field member."
What I am trying to do is load the notes that are related to the each tbMedia object.
this is the correct line
dlo.AssociateWith <tbMedia>(t => t.msNotes.Where(n => n.MediaId == n.tbMedia.id));

Resources