UTF-8 formatting in SPARQL - utf-8

How can I "say" to SPARQL that ?churchname is in UTF-8 formatting? because response is like:Pražský hrad
PREFIX lgv: <http://linkedgeodata.org/vocabulary#>
PREFIX abc: <http://dbpedia.org/class/yago/>
SELECT ?churchname
WHERE
{
<http://dbpedia.org/resource/Prague> geo:geometry ?gm .
?church a lgv:castle .
?church geo:geometry ?churchgeo .
?church lgv:name ?churchname .
FILTER ( bif:st_intersects (?churchgeo,?gm, 10))
}
GROUP BY ?churchname
ORDER BY ?churchname

Bit of a non-answer, I'm afraid: there is no way to do this in SPARQL. SPARQL works on character data (not bytes) so encoding isn't something it's concerned with.
There are a couple of reason you might have this problem. Firstly you might be handling the results incorrectly. Check whether the raw results do indeed have the encoding problem.
If you're doing the right thing then problem you are seeing is that the data is broken, and it certainly looks like a encoding issue has crept in upstream.
Your options are:
See if the SPARQL endpoint supports an extension function to change the encoding. I'm fairly sure this doesn't exist, but perhaps someone from virtuoso (which it looks like you're using) knows better.
Fix the results when you get them. Suboptimal, I know,

Related

DMQL2 Query Syntax for PHRets v2 Seach() to include filter arguments?

(It's been a while since I've been here.)
I've been using the first version of PHRets v1 for years, and understood it well enough to get by, but now I'm trying to understand the advantages of v2.6.2. I've got it all installed and the basics are working fine. My issues are pretty much with comprehending fine points of query syntax that goes into the rets=>Search() statement. (I'm much more familiar with SQL statements). Specifically, I'd like to have a query return a list of properties, EXCLUDING those which already have the status of "Sold".
Here's where I am stuck: If I start with this
`$results = $rets->Search('Property', 'A','*',['Select' => 'LIST_8,LIST_105,LIST_15,LIST_19,listing_office_shortid']);`
That works well enough. BUT I'd like to fit in a filter like:
"LIST_15 != Sold", or "NOT LIST_15=Sold"...something like that. I don't get how to fit/type that into a PHRets Search().
I like PHRets but it is so hard to find well-organized/complete documentation about specific things like this. Thanks in advance.
As in my comment above I've figured out that the filter goes in the third argument position ('*', as in the original question). The tricky thing was having to find a specific "sold" code for each class of properties and placing it in that position like so: '(LIST_15=~B4ZIT1Y75TZ)', (notice the =~ combination of characters that means "does not equal" in this context). I've found the code strings for each of the property types (not clear WHY they would need to be unique for each type of property: "Sold" is Sold for any type, after all) but the correct code for a single-family residential property (type 'A' ...at least for the MLS in which I have to search is:
$results = $rets->Search('Property', 'A','(LIST_15=~B4ZIT1Y75TZ)',['Select' => 'LIST_8,LIST_105,LIST_15,LIST_19,listing_office_shortid']);
(again, the code to go with LIST_15 will be different for the different types of properties.) I think there is a better answer that involves more naturalistic language, but this works and I guess I will have to be satisfied with it for now. I hope this is of some use to anyone else struggling with this stuff.

socket.io with strange timestamp format (?)

I see requests to socket.io containing parameter t to be like LZywzeV, LZz5lk7 and similar.
All examples that i found so far used second- or millisecond-based UNIX timestamps.
Has anyone ever seen a timestamp format like this? (It is not base64-encoded).
I started looking a site that uses Socket.io today, and got the same problem, trying to look for the protocol definition was useless.
I figured this format is something called yeast
TBH, really don't know why people invent this sort of things instead of use
base64(timestamp.getBytes())
pseudocode instead.
A yeast decode algorithm in Python is as follow:
from datetime import datetime
a='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_'
b={a[i]: i for i in range(len(a))}
c=0
for d in "LZywzeV":
c=c*64+b[d]
print(c)
print(datetime.fromtimestamp(c/1000))
The output of that code is:
1481712065055
2016-12-14 07:41:05
to #jeremoquai:
It is easy, is matter of invert the algorithm:
def yeast(d):
r=""
while d!=0:
r='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_'[d&63]+r
d>>=6
return r
so, if you run
yeast(1481712065055)
it returns LZywzeV

Primitive type as data structure for API Blueprint

I want to use primitive type for describe data structure. Like so:
# Data Structures
## Video Delete (enum[number])
+ `0` - Successful deletion.
+ `1` - Error occured.
And the output is.
{
"enum": [
1,
0
],
"$schema": "http://json-schema.org/draft-04/schema#"
}
So description is missing. I've tried to put description in different places. I did a lot of things (do not wanna talk about them). Also I've tried to add info to enum values like so:
+ `0` (number) - Successful deletion.
I do not know whether this problem deals with MSON syntax or Aglio generator.
The syntax above is supported by MSON as far as I can tell. The problem is that Aglio doesn't do anything with the description, and when I went to look into adding it I realized that it isn't really supported in JSON Schema. There seem to be two methods people use to get around that fact:
Add the enumerated value descriptions to the main description, the Olio theme 1.6.2 has support for this but the C++ parser seems to still have some bugs around this feature:
## Video Delete (enum[number]) - 0 for success, 1 for error
Use a weird oneOf syntax where you create sets of single enums with a description. I don't recommend this.
Unfortunately the first option requires work on your part and can't easily be done in Aglio. Does anyone else have a better description and some samples of MSON input -> JSON Schema output?

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

Bestw.d format syntax in SAS

I am converting a character variable to a numeric variable. I am using a bestw.d format. I also tried just best. as the format in the input statement and this worked fine. I cant find any mention of just using best. instead of bestw. in SAS help, though I know from SAS help that the d can be omitted. I have been playing around with using just the best.and I am wondering if there is a default w assigned when just using best..
All formats have a default w. It is not generally good practice to use best. (or <format>.) in most cases, as you should know and plan for the specific width needed, but it always exists.
Also, informat and format have different defaults in many cases where there are identically named informat and format.
In the case of bestw., the default length is 12. See this documentation page for details.
I always find it's worth using a worked example, this shows the different outcomes when using lengths on the BEST. format:
data _NULL_;
a=1031564321300.302;
put '==================================';
put 'Different "BEST" formats';
put '==================================';
put 'BEST8. - ' a best8.;
put 'BEST12. - ' a best12.;
put 'BEST13. - ' a best13.;
put '==================================';
put 'BEST. - ' a best.;
put '==================================';
run;
You can run this in your environment and check the outcome. On my machine it looks like this:
==================================
Different "BEST" formats
==================================
BEST8. - 1.032E12
BEST12. - 1.0315643E12
BEST13. - 1031564321300
==================================
BEST. - 1.0315643E12
==================================
i.e. It looks like BEST12. is the matching format when no width is specified.

Resources