I have tried to install a Virtuoso machine with DBpedia data to run a DBpedia endpoint on my own server. I have followed everything in this guide.
Though, every time I try to access the endpoint using URL http://ec2-ami-public-dns-cname/resource/Bob_Marley or simply http://ec2-ami-public-dns-cname/, I get CONNECTION_REFUSED response.
I tried to do that using wget inside the machine and I still got the same response.
Why is that?
That instance-backed AMI was originally built in 2008 and last substantially updated in 2012, based on Virtuoso 6, and populated with DBpedia 3.8. The guide you linked needed a number of related updates, which are in progress.
I think you will likely be much happier today with the current, EBS-backed, Pay-as-you-go AMI, based on Virtuoso 7, and populated with DBpedia 2015 a/k/a DBpedia 3.10.
Also... for future reference, assistance with Virtuoso and other products from OpenLink Software (my employer) is often delivered more quickly and accurately through the Virtuoso Users mailing list, our public Support Forums, and confidential support cases
This guide misses an important step
Based on this guide, you still need to configure the clusters by running
[root#your_machine]# . /opt/virtuoso/bin/virtuoso-mkclusters.sh
Related
Our current software solution uses a local ES installation (1 cluster and 1 node) to store documents so then later the user is able to search them. The ingest of nodes is not continuously done but let's say once a month by using bulks. The document set isn't huge and the size of documents is small. This solution has been working correctly without problems in normal laptop PCs (i5 with 8Gb RAM) since the use case does not require big performance.
Now we're facing 2 new requirements for our software solution:
Should be branded for other customers
The same final user (using the same machine) should be able to work with several instances of our solution (from different customers)
With these 2 new requirement the current solution cannot be used because all documents would be indexed in the same node using the same index. Further searches would show document from different customers.
A first approach to solve this issue was to index documents based on customer, that is, to create indices per customer and index/search documents on the corresponding index. However, we're thinking on another solution that allows us the following:
ES indexed information must be easily removed from the system (i.e. by removing the data folder)
Each customer may want to use a newer version of our solution (i.e. which uses ES 7) whereas other will remain with older versions (i.e. ES 6)
Based on this, I think that the solution would be to have several ES installations on the same PC, each one with its customer dependent configuration:
Different cluster
Different node name and port
Different ES version
My questions then would be, has anyone faced a similar use case? Would it be performance issues by installing several ES an let their services running continuously at the same time? Which possible problems could arise of having this configuration?
Any help would be appreciated.
UPDATE
Based on the answer received and for possible future answers, I would like to clarify a bit more about the architecture of our solution + ES:
Our solution is a desktop application executed on normal laptop PCs
Single user
Even if more than one customer specific solution is installed in the PC, only 1 will be active at a time
Searches will be executed sporadically when the user wants to search for a specific document (as if someone opens Wikipedia to search for an article)
So topics as ...
Infrastructure failure
Data replication
Performance at high search demand
... are not critical
You can run the multiple installations of ES in the same machine in production but it has a lot of disadvantages.
Ideally, you should have at least 1 replica of your shard and it should present in another physical machine(node) so that in case of infrastructure failure, it can recover, this is done to improve the resiliency of your system.
In production, it's common to come across a use case, where having single shard is not enough and you need to break your index into multiple primary shards to make it horizontal scalable but if you just use 1 physical server then having multiple shards will not help you.
Having multiple installations also doesn't help in the case where there is a lot of traffic in one installation and it consumes all the physical resources like RAM, CPU, disk and brings down all the installations also down in production.it also becomes difficult to isolate the root cause and quickly fix the issue as ES installation is not stateless and you can not just start the same installation on another machine, without moving all its data and configuration.
Basically, yours is a truly tenant-based SAAS application and by looking into your requirement, you should design your system considering below:
Upgrading the ES version sometimes is not very straightforward and it involves a lot of breaking changes in your application code as well, having just a cluster running with the latest version will not solve the problem. Hence your application should expose the tenant(your customer) registration API which Also takes which version of ES customer wants to use and accordingly your code handles that.
ES indexed information must be easily removed from the system :- I didn't get what the issue here, you can simply delete it using the ES API which is the recommended way of doing that, instead of doing it manually.
Hope my answer is clear to you and let me know if I missed any of your requirement and you need further clarification.
Based on the update on the question I am adding below points:
As OP mentioned its a very small desktop application and not a server-side application, then it's very important to not mix and store the content of each customer. Anybody can install the ES web admin plugin like https://github.com/lmenezes/cerebro and read the data of other customers.
The best solution in your case to have a single installation of ES based on the version specified by the customer and have just 1 index pertaining to the customer running the desktop application. And you can easily use the delete API as I mentioned earlier.
There is no need to have multiple installations at all, even though they won't be active but still, they consume the local disk space(which is even more important in case of desktop app) and can cause this and this issue and its not at all cleaner design to store the unnecessary information on desktop app and also cause a security issue which is much bigger concerns in general.
I have seen the sample projects on your website for Dexie.Syncable such as sync-server and sync-client and they all seem to write to a datbase directly vs interacting with a web api. I am looking for a little help in where to get started beyond the examples on the website. The api I am trying to write a gateway for is dreamfactory
Also it looks like version 2 beta has had many improvements to Dexie.Syncable
I would recommend to build a new server-project based on either WebSocketSyncServer.js or the github repo of sync-server. However, I cannot give the details on how to call REST APIs instead of working directly towards database or memory. I would suggest using ES2016 async/await since your API calls are asynchronic.
Maybe you could try getting more help on https://github.com/nponiros/sync_server by filing an issue there.
I have a REST api.
It offers the services get person, get price, get route
how can I determine how long does each call on each of this services take?
For example get person is very fast=ms 5; get route takes 2sec as it needs to make a remote call to Google API.
I could get the time at the beginning of the request and just before the response is submitted, compute the difference and log that to a database.
But that would be pretty much overhead, so how would you do it? would you do it at all, or just rely on on-machine profiling? what tools would you use that minimize overhead?
What I want is to determine if there is any component that in production could have low availability.
Thank you
So it looks like you want 2 things:
Minimal impact on your production environment
Figuring out how much each request takes
In that case I would go for the IIS logs. Windows Azure Diagnostics you can get this out-of-the-box by adding the module and configuring it. As a result your IIS logs will be stored in your storage account.
After that you can download these logs and use Log Parser to execute some interesting queries which allow you to find the slowest pages, pages with most hits, pages with most exceptions... Log Parser can be a little hard to work with if you never used it before. Take a look at the blog post by Scott Hanselman covering the Log Parser Lizard GUI tool: Analyze your Web Server Data and be empowered with LogParser and Log Parser Lizard GUI:
This powerful tool can give you all the information you need with minimal impact on your production instances.
Supposedly ReportsAnywhere will talk with MongoDB to generate reports. I am not sure if it is using the JDBC driver or a different method. Hoping someone can help me on how to set up reports anywhere database driver/connection to mongodb?
So the ReportsAnywhere author makes mention of building such a product here. Based on his description, this is no using a JDBC driver but pulling data via the native driver.
It looks like he did a presentation at MongoBerlin in October 2010, unfortunately, I cannot any videos / slides from the presentation. The website is also completely devoid of examples.
Given that ReportsAnywhere is a paid-for product and this is an advertised feature, your best bet may be to go directly to ReportsAnywhere. Maybe you'll get lucky on SO, but I would definitely contact their team directly. Looks like Hans, the creator, is also available on twitter.
Recently I stumbled across mongoDB, couchDB etc.
I am hoping to have a play with this type of database and was wondering how much access to the hosting server one needs to get it running.
If anyone has any knowledge of this, I would love to know whether it can be set up to work when your app is hosted via a 'normal' hosting company.
I use Mongo, and so I'm really only speaking for Mongo, but your typical web hosting environment wouldn't allow you to set up your own database. You'd want root-level (admin) access to the server to set up Mongo. To get that, you'd want something like a VPS or a dedicated server.
However, to just play around with Mongo, I'd recommend downloading the binary for your OS and giving it a run. Their JavaScript shell interface is very easy to use.
Hope that helps!
Tim
Various ways:-
1) There are many free mongodb hosting available. Try DotCloud.com. Many others here http://www.cloudhostingguru.com/mongoDB-server-hosting.php
2) If you are asking specifically about shared hosting, the answer is mostly no. But, if you could run mongoDB somewhere else (like from the above link) and want to connect from your website, it is probably possible if your host allows your own extensions (for php)
3) VPS
How about virtual private server hosting? The host gives you what looks like an entire machine... hard drive, CPU, memory. You get to install whatever you want, since it's your (virtual) machine.
In terms of MongoDB like others have said, you need the ability to install the MongoDB software and run it (normally as a daemon). However, hosted services are just beginning to appear, such as MongoHQ. Perhaps something like this might be appropriate once its out of beta (or if you request an invite).
It appears hosted CouchDB services are also popping up, such as couch.io or Cloudant. I personally have no experience with Couch so I can be less certain than with Mongo, but I'd imagine that again to run it yourself, you'd need to install the software (and thus require root access).
If you don't currently have a VPS or dedicated server (or the cloud-based versions of the aforementioned), perhaps moving your data out to a dedicated hosted service would be an ideal way to go to avoid the pain and expense of changing your hosting setup.
You can host your application and your database in the different hosting servers.
For MongoDB you can use mongohq or mongolab with space 0.5 Gb for free