Optimizing SPARQL queries on Fuseki using ttl - ajax

I have been working on generating a ttl file via Apache Jena. This file is being queried using Apache Fuseki but I am noticing that the more data I add, the slower the query is becoming. I am using an ajax call from javascript to display the results on a webpage. Is there a better configuration or a way that I can optimize the queries? I know for a fact that the ttl file is going to grow as more data is predicted to be added.
For example this is one of the queries I am using:
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX locah: <http://data.archiveshub.ac.uk/def/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT ?Register ?Id ?Date ?Type ?Lang ?TypeOfCover ?Spine ?Note (count(?Deed) as ?TotalDeeds)
WHERE{
?Register dcterms:creator ?Author.
?Register dcterms:identifier ?Id.
?Register dcterms:type ?Type.
optional{?Register locah:dateCreatedAccumulatedString ?Date.}
optional{?Register <http://natarchives.com.mt/notarypedia/ontology#coverType> ?TypeOfCover.}
optional{?Register <http://natarchives.com.mt/notarypedia/ontology#spineWidth> ?Spine.}
optional{?Register <http://www.w3.org/2004/02/skos/core#note> ?Note.}
optional{?Register ore:aggregates ?Deed.}
optional{?Register dcterms:language ?Lang.}
?Author foaf:firstName "Giacomo".
?Author foaf:familyName "Zabbara"}
group by ?Register ?Id ?Date ?Type ?Lang ?TypeOfCover ?Spine ?Note
where it takes about 10s to display the results, which is quite annoying for someone who wants to display the results there and then on a UI

Related

Mindmeld Elasticsearch index and QuestionAnswerer

I am using Mindmeld blueprint application (kwik_e_mart) to understand how the Question Answerer retrieves data from relevant knowledge base data file (newbie to Mindmeld, OOP and Elasticsearch).
See code snippet below:
from mindmeld.components import QuestionAnswerer
config = {"model_type": "keyword"}
qa = QuestionAnswerer(app_path='kwik_e_mart', config=config)
qa.load_kb(app_namespace='kwik_e_mart', index_name='stores',
data_file='kwik_e_mart/data/stores.json', app_path='kwik_e_mart', config=config, clean = True)
Output - Loading Elasticsearch index stores: 100%|██████████| 25/25 [00:00<00:00, 495.28it/s]
Output -Loaded 25 documents
Although Elasticsearch is able to load all 25 documents (see output above), unable to retrieve any data with index greater than 9.
stores = qa.get(index='stores')
stores[0]
Output: - {'address': '23 Elm Street, Suite 800, Springfield, OR, 97077',
'store_name': '23 Elm Street',
'open_time': '7:00',
'location': {'lon': -123.022029, 'lat': 44.046236},
'phone_number': '541-555-1100',
'id': '1',
'close_time': '19:00',
'_score': 1.0}
However, stores [10] gives an error
`stores[10]`
Output: - IndexError Traceback (most recent call last)
<ipython-input-12-08132a2cd460> in <module>
----> 1 stores[10]
IndexError: list index out of range
Not sure why documents at index higher than 9 are unreachable. My understanding is that the elasticsearch index is still pointing to remote blueprint data (http/middmeld/blueprint...) and not pointing to the folder locally.
Not sure how to resolve this. Any help is much appreciated.
By default, the get() method only returns 10 records per search - so only stores[0] through stores[9] will be valid.
You can add the size= option to your get() to increase the number of records it returns:
stores = qa.get(index='stores', size=25)
See the bottom of this section for more info.

Dumping Beam-scores to an HDF File

I have a translation model (TM), which synthesizes its hypotheses using beam-search. For analysis purposes, I would like to study all hypotheses in each beam emitted by the TM’s ChoiceLayer. I’m able to fetch the hypotheses for each input sequence from the TM’s ChoiceLayer and write it to my file system, using the HDFDumpLayer:
'__SEARCH_dump_beam__': {
'class': 'hdf_dump',
'from': ['output'],
'filename': "<my-path>/beams.hdf",
'is_output_layer': True
}
But beside the hypotheses, I would also like to store the score of each hypothesis. I’m able to fetch the beam scores from the ChoiceLayer using a ChoiceGetBeamScoresLayer, but I was not able to dump the scores using an HDFDumpLayer:
'get_scores': {'class': 'choice_get_beam_scores', 'from': ['output']},
'__SEARCH_dump_scores__': {
'class': 'hdf_dump',
'from': ['get_scores'],
'filename': "<my-path>/beam_scores.hdf",
'is_output_layer': True
}
Running the config like likes this, makes RETURNN complain about the ChoiceGetBeamScoresLayer output not having a time axis:
Exception creating layer root/'__SEARCH_dump_scores__' of class HDFDumpLayer with opts:
{'filename': '<my-path>/beam_scores.hdf',
'is_output_layer': True,
'name': '__SEARCH_dump_scores__',
'network': <TFNetwork 'root' train=False search>,
'output': Data(name='__SEARCH_dump_scores___output', shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B]),
'sources': [<ChoiceGetBeamScoresLayer 'get_scores' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>]}
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 139964674299648)>, proc 31228.
...
File "<...>/returnn/repository/returnn/tf/layers/basic.py", line 6226, in __init__
line: assert self.sources[0].output.have_time_axis()
locals:
self = <local> <HDFDumpLayer '__SEARCH_dump_scores__' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>
self.sources = <local> [<ChoiceGetBeamScoresLayer 'get_scores' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>]
output = <not found>
output.have_time_axis = <not found>
AssertionError
I tried to alter the shape of the score data using ExpandDimsLayer and EvalLayer, with several different configurations, but those all lead to different errors.
I’m sure I am not the first person trying to dump beam scores. Can anybody tell me, how to do that properly?
To answer the question:
The HDFDumpLayer assert you are hitting is simply due to the fact that this is not implemented yet (support of dumping data without time axis).
You can create a pull request and add this support for HDFDumpLayer.
If you just want to dump this information in any way, not necessarily via HDFDumpLayer, there are a couple of other options:
With task="search", as the search_output_layer, just select the layer which still includes the beam information, i.e. not after a DecideLayer.
That will simply dump all hypotheses including their beam scores.
Use a custom EvalLayer and dump it in whatever way you want.

SDO_UTIL.TO_GMLGEOMETRY throws srsname which starts with SDO: in stead of expected EPSG:

We have code which creates a wfs request in the program which is called to get some data from open data. In this code we use oracle procedure SDO_UTIL.TO_GMLGEOMTERY to complete request with necessary gml data.
used query is something like:
select
sdo_util.to_gmlgeometry(geom)
from geom_table;
Result should be something like:
<gml:Polygon srsName="EPSG:28992" xmlns:gml="http://www.opengis.net/gml">
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates decimal="." cs="," ts=" ">185610.0,320243.0 185620.0,320243.0 185620.0,320233.0 185610.0,320233.0 185610.0,320243.0 </gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>
However, in some cases srsname in result is thrown as "SDO:28992" in stead of "EPSG:28992".
Does anyone know how to make sure srsname is returned as EPSG?

How to measure moped insert runtime in a ruby mongoid app?

I'm currently writing a moped log parser in order to monitor moped queries runtime.
It's work great for QUERY command using the runtime parameter, but INSERT and UPDATE have no runtime parameter. All INSERT and UPDATE are followed by a getLastError COMMAND which contains a runtime.
Here are some samples of moped logs:
QUERY with runtime
MOPED: 127.0.0.1:27017 QUERY database=X collection=X selector=X
flags=[] limit=-1 skip=0 batch_size=nil fields=nil runtime: 0.6950ms
INSERT without runtime but with COMMAND
MOPED: 127.0.0.1:27017 INSERT database=X collection=X documents=X flags=[]
COMMAND database=X command={:getlasterror=>1, :w=>1}
runtime: 0.4750ms
I'm pretty sure that COMMAND runtime is for the getlasterror call and not for my INSERT one.
So is there a way to get this runtime info for an INSERT query?
Instead of using a log parser, I use something like this and it works great:
ActiveSupport::Notifications.subscribe('query.moped') do |name, start, finish, id, payload|
runtime = (finish - start)*1000
moped_ops = payload[:ops]
moped_ops.each do |op|
unless op.collection == '$cmd'
query = op.class.name.split('::').last.downcase
query = op.selector.first[0].to_s.gsub(/\$/, '') if query == 'command'
DO SOMETHING WITH #{op.database}.#{op.collection}.#{query}
end
end
end

With Zabbix API, how do I get the values of items/resources rather than just the ID's?

I have some data in a Custom Screen in Zabbix, and would like to pull the data from the screen via the API. I'm using this Ruby gem: https://github.com/express42/zabbixapi
I'm able to successfully connect and query, but the results I'm getting are not very useful:
p zbx.query(
:method => "item.get",
:params => {
:itemids => "66666",
:output => "extend"
}
)
# [{"itemid"=>"66666", "type"=>"0", "snmp_community"=>"", "snmp_oid"=>"", "hostid"=>"77777", "name"=>"Fro Packages", "key_"=>"system.sw.packages[davekey1|davekey2|davekey3|davekey4]", "delay"=>"300", "history"=>"90", "trends"=>"365", "status"=>"0", "value_type"=>"1", "trapper_hosts"=>"", "units"=>"", "multiplier"=>"0", "delta"=>"0", "snmpv3_securityname"=>"", "snmpv3_securitylevel"=>"0", "snmpv3_authpassphrase"=>"", "snmpv3_privpassphrase"=>"", "formula"=>"1", "error"=>"", "lastlogsize"=>"0", "logtimefmt"=>"", "templateid"=>"88888", "valuemapid"=>"0", "delay_flex"=>"", "params"=>"", "ipmi_sensor"=>"", "data_type"=>"0", "authtype"=>"0", "username"=>"", "password"=>"", "publickey"=>"", "privatekey"=>"", "mtime"=>"0", "flags"=>"0", "filter"=>"", "interfaceid"=>"25", "port"=>"", "description"=>"", "inventory_link"=>"0", "lifetime"=>"30", "snmpv3_authprotocol"=>"0", "snmpv3_privprotocol"=>"0", "state"=>"0", "snmpv3_contextname"=>""}]
You can see that it's returning a bunch of ID's for the items, including the correct keys, but I can't seem to get the actual plain text values, which is the data I'm interested in.
I started with the screen_id, then got the screenitem_id, now the item_id, but I don't seem to be getting any closer to what I want!
Thanks for any help
Getting items or getting hosts means getting their description, not the data. is You are after history. Reading the actual Zabbix user manual and API docs is highly recommended.

Resources