What does the merge parameter of a token do? - ace-editor

In the ACE overview on creating syntax highlighters, under the section 'Defining States', the token definitions include the parameter merge: true. But I cannot find an explanation of what this does. What is its purpose?
Excerpt from example:
this.$rules = {
"start" : [ {
token : "text",
merge : true,
regex : "<\\!\\[CDATA\\[",
next : "cdata"
},
"cdata" : [ {
token : "text",
regex : "\\]\\]>",
next : "start"
}, {
token : "text",
merge : true,
regex : "\\s+"
}, {
token : "text",
merge : true,
regex : ".+"
} ]
};

Figured out by trial and error: Setting the merge property to true in a token will cause the token to be merged with the following token, both in the token list in memory and as a rendered span in the UI DOM, but only if the following token also evaluates to the same token type. I'm using this to merge the compound SQL tokens IS NULL and IS NOT NULL:
In the following rules, the 3 tokens [is + (spaces) + null], or the 5 tokens [is + (spaces) + not + (spaces) + null] will be merged into a single token. If is is followed by something other that not or null, that something will be flagged as invalid. It will not be merged because, even though the previous token as still in merge mode, the resulting token class (invalid) for the next token is different.
this.$rules = {
"start": [{
token : "keyword.operator",
regex: "[iI][sS]\\b",
merge: true,
next: "is_keyword"
}],
"is_keyword": [
{
token: "keyword.operator",
regex: "\\s+",
merge: true
}, {
token: "keyword.operator",
regex: "[nN][oO][tT]\\b",
merge: true
}, {
token: "keyword.operator",
regex: "[nN][uU][lL][lL]\\b",
next: "start"
}, {
token: "invalid",
regex: ".+",
next: "start"
}
]
};

Related

How to use array column name inside of object from json return in datatable ajax?

i try to figure how to write column name of data if the data is array inside object
if the json return looks like
"data" : {
"name" : "aaa"
}
i will use this code
ajax : {
url : url,
type : 'GET'
},
"scrollX" : true,
destroy : true,
columns : [ {
data : 'name'
}, ]
but how about if the json return looks like
"data" : {
"detail" : [
{
"name" : "abc"
}
]
}
i try to write this code below but it not works, can someone help me with this issue
ajax : {
url : url,
type : 'GET'
},
"scrollX" : true,
destroy : true,
columns : [ {
data.detail : 'name'
},
]
This is where the dataSrc option for your DataTables ajax can be used:
ajax : {
url : url,
type : 'GET',
dataSrc: 'data.detail'
},
This option tells DataTables that the iteration through your JSON should start at the data.detail array of objects.
Normally, (as in your first JSON example), you do not need this directive - as the starting point is assumed to be data. This is a DataTables extension to the standard set of jQuery ajax options. You can read more about it here.
The relevant part is:
you can use Javascript dotted object notation to get a data source for multiple levels of object / array nesting.
EDIT:
Just to clarify: After making the above change, you can use your original approach again to identify column data:
columns : [
{ data : 'name' },
{ data : 'another_name' },
{ ... }
]

How do I implement a PatternAnalyzer in elastic4s and elasticsearch to exclude result with a certain field

I'm trying to perform a query on my index and get all reviews that do NOT have a reviewer with a gravatar image. To do this I have implemented a PatternAnalyzerDefinition with a host pattern:
"^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"
that should match and extract host of urls like:
https://www.gravatar.com/avatar/blablalbla?s=200&r=pg&d=mm
becomes:
www.gravatar.com
The mapping:
clientProvider.getClient.execute {
create.index(_index).analysis(
phraseAnalyzer,
PatternAnalyzerDefinition("host_pattern", regex = "^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)")
).mappings(
"reviews" as (
.... Cool mmappings
"review" inner (
"grade" typed LongType,
"text" typed StringType index "not_analyzed",
"reviewer" inner (
"screenName" typed StringType index "not_analyzed",
"profilePicture" typed StringType analyzer "host_pattern",
"thumbPicture" typed StringType index "not_analyzed",
"points" typed LongType index "not_analyzed"
),
.... Other cool mmappings
)
) all(false)
} map { response =>
Logger.info("Create index response: {}", response)
} recover {
case t: Throwable => play.Logger.error("Error creating index: ", t)
}
The query:
val reviewQuery = (search in path)
.query(
bool(
must(
not(
termQuery("review.reviewer.profilePicture", "www.gravatar.com")
)
)
)
)
.postFilter(
bool(
must(
rangeFilter("review.grade") from 3
)
)
)
.size(size)
.sort(by field "review.created" order SortOrder.DESC)
clientProvider.getClient.execute {
reviewQuery
}.map(_.getHits.jsonToList[ReviewData])
Check the index for the mapping:
reviewer: {
properties: {
id: {
type: "long"
},
points: {
type: "long"
},
profilePicture: {
type: "string",
analyzer: "host_pattern"
},
screenName: {
type: "string",
index: "not_analyzed"
},
state: {
type: "string"
},
thumbPicture: {
type: "string",
index: "not_analyzed"
}
}
}
When i perform the query the pattern matching does not seem to work. I still get reviews with a reviewer that has a gravatar image.
What am I doing wrong? Maybe I have misunderstood the PatternAnalyzer?
I'm using
"com.sksamuel.elastic4s" %% "elastic4s" % "1.5.9",
I guess once again RTFM is in order here:
The docs states:
IMPORTANT: The regular expression should match the token separators, not the tokens themselves.
meaning that in my case the matched token www.gravatar.com will not be
a part of the tokens after analyzing the field.
Instead use the Pattern Capture Token Filter
First declare a new CustomAnalyzerDefinition:
val hostAnalyzer = CustomAnalyzerDefinition(
"host_analyzer",
StandardTokenizer,
PatternCaptureTokenFilter(
name = "hostFilter",
patterns = List[String]("^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"),
preserveOriginal = false
)
)
Then add the analyzer to the field:
"review" inner (
"reviewer" inner (
"screenName" typed StringType index "not_analyzed",
"profilePicture" typed StringType analyzer "hostAnalyzer",
"thumbPicture" typed StringType index "not_analyzed",
"points" typed LongType index "not_analyzed"
)
)
create.index(_index).analysis(
someAnalyzer,
phraseAnalyzer,
hostAnalyzer
).mappings(
And voila. It works. A very nice tool for checking the tokens and the index is calling:
/[index]/[collection]/[id]/_termvector?fields=review.reviewer.profilePicture&pretty=true

RethinkDB - delete a nested object

I am trying to delete a nested object, but instead of disappearing it is being replaced by an empty object.
Here's the structure of my documents:
[
{
"id": "0" ,
"name": "Employee 1" ,
"schedules": [
{"Weekdays": "yes"},
{"Weekends": "no"}
]
} ,
{
"id": "1" ,
"name": "Employee 2" ,
"schedules": [
{"Weekdays": "no"},
{"Weekends": "yes"}
]
}
]
Let's say I want to delete "Weekends". Here's my code:
r.db('shank').table('teachers').replace(r.row.without({'schedules': 'Weekends'})).run(connection, function(err, result) {
if (err) throw err;
;
//confirmation stuff
});
});
Now if I look at my table, the documents have this:
"schedules": [
{"Weekdays": "yes"},
{}
]
I also tried changing it to follow the syntax described here, by making it:
r.row.without({'schedules': {'Weekends'}})
but I got an error of an unexpected token '}'. Any idea what's up?
This should work:
r.db('test').table('test').get('0').update(function(doc) {
return doc.merge({
schedules: doc("schedules").filter(function(schedule) {
return schedule.hasFields('Weekends').not()
})
})
})
The field schedules is an array, is that expected? Is there a reason why it's not just an object with two fields Weekends and Weekdays? That would make things way easier.
Your last one is close to what worked for me, just need to add a true to the nested JSON object:
r.row.without({'schedules': {'Weekends': true}})

Mongo DB MapReduce: Emit key from array based on condition

I am new to mongo db so excuse me if this is rather trivial. I would really appreciate the help.
The idea is to generate a histogram over some specific values. In that case the mime types of some files. For that I am using a map reduce job.
I have a mongo with documents in the following form:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : [
{
"key" : "key1",
"value" : "Plain text",
"status" : "SINGLE_RESULT",
},
{
"key" : "key2",
"value" : "text/plain",
"status" : "SINGLE_RESULT",
},
{
"key" : "key3",
"value" : 3469,
"status" : "OK",
}
]
}
Please note, that in almost every document there are more metadata key values.
Map Reduce job
I tried doing the following:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;}
});
emit(mime, {count:1});
}
function reduce() {
var res = {count:0};
values.forEach(function (v) {res.count += v.count;});
return res;
}
db.collection.mapReduce(map, reduce, {out: { inline : 1}})
This seems to work for a small number of documents (~15K) but the problem is that iterating through all metadata key values takes a lot of time during the mapping phase. When running this on more documents (~1Mio) the operation takes for ever.
So my question is:
Is there some way in which I can emit the mime type (the value) directly instead of iterating through all keys and selecting it? Or is there a better way to write a map reduce functions.
Something like emit (this.metadata.value {$where this.metadata.key:"key2"}) or similar...
Thanks for your help!
Two thoughts ...
First thought: How attached are you to this document schema? Could you instead have the metadata field value as an embedded document rather than an embedded array, like so:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : {
"key1" : {
"value" : "Plain text",
"status" : "SINGLE_RESULT"
},
"key2": {
"value" : "text/plain",
"status" : "SINGLE_RESULT"
},
"key3" : {
"value" : 3469,
"status" : "OK"
}
}
}
Then your map step does away with the loop entirely:
function map() {
emit( this.metadata["key2"].value, { count : 1 } );
}
At that point, you might even be able to cast this as a "group" command rather than a "mapReduce".
Second thought: Absent a schema change like that, particularly if "key2" appears early in the metadata array, you could at least exit the loop eagerly once the key is found to save yourself some iterations, like so:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;
break;
}
});
emit(mime, {count:1});
}
Not sure if either path is the key to victory, but hopefully helpful thoughts. Best of luck!

Check for id existence in param Array with Elasticsearch custom script field

Is it possible to add a custom script field that is a Boolean and returns true if the document's id exists in an array that is sent as a param?
Something like this https://gist.github.com/2437370
What would be the correct way to do this with mvel?
Update:
Having trouble getting it to work as specified in Imotov's answer.
Mapping:
Sort:
:sort=>{:_script=>{:script=>"return friends_visits_ids.contains(_fields._id.value)", :type=>"string", :params=>{:friends_visits_ids=>["4f8d425366eaa71471000011"]}, :order=>"asc"}}}
place: {
properties: {
_id: { index: "not_analyzed", store: "yes" },
}
}
I don't get any errors, the documents just doesn't get sorted right.
Update 2
Oh, and I do get this back on the documents:
"sort"=>["false"]
You were on the right track. It just might be more efficient to store list of ids in a map instead of an array if this list is large.
"sort" : {
"_script" : {
"script" : "return friends_visits_ids.containsKey(_fields._id.value)",
"type" : "string",
"params": {
"friends_visits_ids": { "1" : {}, "2" : {}, "4" : {}}
}
}
}
Make sure that id field is stored. Otherwise _fields._id.value will return null for all records.

Resources