Our J2EE-based software is using Libreoffice headless services (usually located on another servers) to convert documents, supplying them via TCP/IP soffice port. Typical system may include up to 10 Libreoffice instances on 2-3 VMs to manage heavy load.
Long-living libreoffice headless instance have some issues in production:
1. It may freeze after converting severe number of documents.
2. It may freeze because of two simultaneous connection made to one instance.
It require special treatment and is quite hard to track all connections to libreoffice instances in threaded Java application. And its even more hard to test them for being hung and restart them, because they are intended to run as services on another server.
We spent much time to create special spooling code, but it is still far from solving the real issues in production.
Once I found a special piece of software acting as a proxy-server for Libreoffice instances. It open TCP/IP port and act like Libreoffice to any connecting party and connect to libreoffice instance by itself. It also tracks LO instances states and restart them if needed or after certain number of documents processed. However today I have spent all day searching for it and found nothing like this.
Is there any software acting as described proxy-server or maybe there is another approach which I should consider?
Related
I want to create another instance of dspace to work on different projects. however I do not know how or if it will conflict with this running.
While it's technically possible, I would advise against this for the following 3 key reasons:
Quite a few configuration aspects of DSpace still count on a Tomcat restart to take effect. If you have two instances running in the same Tomcat, it means you have to bring both of them down when you want to update one of them.
Performance related issues are already far from trivial to debug in DSpace, even if you have only one instance running in one Tomcat. If you run two instances, it is very likely that you will only make this more difficult.
This kind of setup is non-standard. As with all non-standard setups, you will find it much harder to get community support, as very few other people will be in the same boat.
So ... either run two VMs, or just two Tomcat processes on one VM.
If after these warnings, you still want to do it, the basics would be to run all of the webapps you want twice in the tomcat, on different ports. The minimum you would need are 2x XMLUI OR JSPUI and 2x SOLR. It could be possible to run one solr webapp, and keep 2 search, statistics, authority and oai indexes in this one SOLR webapp, but I don't know what the side effects could be.
1) Obviously each instance should be installed to a different set of directories.
2a) Create a separate Context for each instance. That will give them different paths: http://legion.example.com/one/, http://legion.example.com/two/.... I do this on my development workstation all the time.
2b) You can also create separate domains and IP addresses, bind them to multiple Host objects in a single Tomcat configuration: http://one.example.com/, http://two.example.com/.... I have four low-volume production DSpace instances running in one Tomcat instance on a midsized host.
Each DSpace instance needs its own database, but PostgreSQL can host dozens. You should consider creating separate database user accounts for each.
You'll also need separate Handle resolvers for each DSpace, just the same as if each instance was on its own host. When configured for DSpace, the Handle resolver uses the DSpace database instead of its own, so it's specific to a single instance.
Solr ought to be able to serve several sets of cores for several DSpaces, but you'll have to do a fair amount of configuration to keep them distinct and ensure that each DSpace is using its own set. You'll learn a lot more about Solr than you need to know for the captive Solr instance that gets installed with a single DSpace.
But then, you'll also learn a lot more about Tomcat than you need to know for a single DSpace....
If you declare your Contexts in external files ([Tomcat]/config/Catalina/localhost/one.xml etc.) and you have automatic deployment set up right, you can just 'touch' one of the Contexts to restart it without restarting a whole Tomcat. Otherwise you can use the Tomcat Manager webapp to do this. Consider well whether you want to have the Manager running, though, because it is quite powerful and it is exposed on the network. I run such applications on yet another, non-routable address so they can't be reached from Outside.
DSpace is not small, so you will need to ensure that you have enough memory to run several instances and that Tomcat's memory limits are adjusted accordingly. I would suggest also installing a resource monitor such as Psi Probe and glancing at it regularly. The above comments on performance are spot-on.
Learning to make all this work was loads of fun, and took quite some time. On the other hand, for development you may prefer something like https://github.com/DSpace/vagrant-dspace, a packaged virtual machine with DSpace and friends inside.
What is the best way for multiple client programs to
communicate with a single server program, all running
on a single Windows computer? All written in VB6.
I'd appreciate recommendations of how you might solve
this problem.
NOTE: we are working on transition to .NET, but have to
add a capability to the V6B version before the .NET will
be ready.
The possibilities include TPC connections, named pipes,
shared memory, messages, files, and more.
A client passes the server a string as input, and the server
combines it with data known only to the server, to generate
another string which is returned to the client. Both strings
are only about 100 characters long. The server is contacted
only when a new file needs to be opened, and so it is a very
low volume of communication... probably a flurry of 10 calls
within 15 seconds, followed by an hour of idle time.
But it is possible that two clients would choose about the
same time to request information. Blocking/Locking are certainly
acceptable, as the server will be done with each request in
well under a second, and several seconds of delay is unimportant
to any of the programs.
The server's algorithm is complex, and for several reasons important
to the application should not be replicated in each helper program.
That is the reason for needing a server.
Background:
I am adding capability to a large existing legacy program.
This single program has several other legacy programs which
act as helpers and are run when the user makes certain
choices. These programs are started with a shell command,
and are not just separate threads. For instance, one helper
loads new data from a DVD drive onto the hard drive. Another
helper just displays a chart of the current positions of
the planets.
This is a LARGE commercial legacy program that happens to be
written in VB6. We are working to convert it and all the
helper programs to .NET, but must first release a new version
under vb6 with this added capability. (Please don't tell me
to not use VB6, as we are already moving elsewhere.)
We need a temporary VB6 solution.
VB6 does TCP and UDP extremely well via the standard Winsock Control component included in Pro and Enterprise Editions. A lot of shadetree coders do seem to struggle with it though. This is probably the most obvious route since the only other native IPC in VB6 would be COM/DCOM and DDE, however MSMQ provided excellent support for VB6 as well.
The downside of IP-based protocols is their limited namespace and resulting high probability of collisions (64K port numbers, many set aside for standard applications, ephemeral port ranges, etc.). They're also somewhat "heavyweight" but considering the vast resources of even the oldest PCs still in service and your light traffic requirements you can ignore that in deciding.
Another option you've considered is Named Pipes.
This offers a number of advantages in your situation. For one thing the namespace is much larger requiring only a unique name, which in the post-Win9x era can be up to 256 characters long making uniqueness fairly easy to achieve. For another, as long as your firewalls permit "File and Print Sharing" you're all set on that front.
Also, for your application you only seem to require an RPC-style mechanism rather than arbitrary bidirectional streams or messages. TransactNamedPipe() calls in your clients might be ideal. Named Pipes work over a LAN, but within one PC they are quite fast and light weight.
While VB6 doesn't come with a Named Pipe component such a thing is fairly easy to create as long as extremely high performance isn't required. You can use Timer-based polling in the server instead of trying to implement overlapped I/O to get asynchronicity. I put one together a couple of years ago and have had good luck with this approach.
I published a fairly stable rendition of this a while back at PipeRPC - RPC Over Named Pipes. There is an older and a somewhat newer version there with examples of use and documentation. As designed, clients make "calls" passing a Byte array of request parameters and receiving back a Byte array of response results. You can also shove Unicode Strings though with no changes, letting the compiler coerce the types.
Just one "drop in" UserControl for both clients and servers.
Looking back at this question:
The server's algorithm is complex, and for several reasons important
to the application should not be replicated in each helper program.
That is the reason for needing a server.
If that's really the concern why not just create a shared DLL that all programs use?
For a one-off upgrade release to an existing VB6 application being moved to a newer platform, I would stress keeping the modification as simple and straightforward as possible. As a result, I wouldn't go down any routes involving shared memory or anything relatively unusual.
A few options, none perfectly simple, but at least some ideas:
Expose a COM object in the server code that performs the translation, and can be consumed by the client apps. The clients instantiate the object from the server as an out-of-process object, and let COM handle all the marshalling, etc.
Does the server have any network awareness? VB6 doesn't do sockets/tcp natively very well, but if you've had a reason to add that in, you might be able to leverage it to perform a socket-based connection and data exchange.
The server and client could each poll a common resource folder for the presence of a specific file that constituted inbound/outbound requests for the translation service you describe. Not very elegant, but it might be the simplest.
Just a few ideas to give you some things to think about. Hope that's helpful in some way. Good luck!
I need to deploy an application onto some Windows machines for purposes of data collection from a group of people (i.e. the application will be used to gather responses to a series of survey questions). The process is interactive, alternating between displays of text and images with specific timing requirements. I have put together a prototype application using HTML and JavaScript that implements the survey. However, there are some unique constraints on the deployment environment that have me stuck:
While the machine is Internet-connected, the client requires that the survey application must run fully local to the PC that it runs on. Therefore, sending the survey results to a remote server is not permissible. Obviously, saving to a local file from a Web browser is typically not permitted for security reasons.
Installation of applications onto the machines that will run the survey is not permitted.
The configuration of the machines is not known specifically a priori, but I can assume some recent version of Windows with IE8+.
The "no remote access" requirement was a late comer, and has thrown a wrench into the plan of just writing a simple Web application that could post results to an HTTP server. I'm now looking for the easiest way forward. Two main approaches come to mind:
Use a GUI framework that provides a control that can display HTML/JavaScript; running a full-blown application on the PC would allow me to save the results to the filesystem. I've never done this, but it seems like in this day and age it shouldn't be too difficult. This would allow me to reuse much of my existing prototype implementation, but I would need some way of transferring the results (which would be stored in a JavaScript data structure) outside of the Web control to where the rest of the application could access it.
Reimplement the entire application using some GUI framework (I've used PyQt successfully before, although not on Windows). This approach is obviously less desirable than #1 due to the lack of reuse. However, it may be necessary if #1 isn't feasible.
Any recommendations for the best way to go? Ideally, I'm looking for a solution that can be run in a "portable" manner from a USB thumbdrive or similar.
Have you looked at HTML Applications (HTA)? They work in IE5+ and can use Windows Scripting Host to write to local drives and UNC shares...
Maybe you can use a portable web server with a scripting language on the server side. http://code.google.com/p/mongoose/ Mongoose, for example, you can run PHP, CGI, etc. .. scripts. Then, simply create a script to save a file to your hard drive. And let the rest of the application in the same manner.
Use a script to start the web server, and perhaps a portable web browser like K-Meleon to start the application http://kmeleon.sourceforge.net/ This is highly configurable. Or start the system explorer to your localhost URL.
The only problem may be that the user has to modify the firewall for the first time you run the server?
Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.
Can you suggest how to create a test environment to simulate various types of bandwidths and traffic in a web app?
Or maybe an open source program which does this against localhost?
I think this is a very important subject when programming web apps but it is not a usual topic, the only way i can imagine to create such kind of environment is to use some kind of proxy in a local network but before start looking into the squid documentation i would like to hear your suggestions.
if you're using apache you may want to take a look at apache ab
There are two approaches to shape network traffic to simulate a network link:
Run some software on the client or server that sits somewhere in the networking stack and shapes the traffic between the app and the network interface
Run the traffic shaping software on a dedicated machine with 2 network interfaces through which your traffic is routed
(2) is a better solution if you don't want to install software on the client or server (and possibly impact performance), but requires more hardware fiddling.
Some other features you might want to think about are what shaping parameters can be simulated. Most do delay and packet loss, some do jitter and bandwidth limiting as well. Some solutions can selectively filter traffic (for instance by port number, TCP or UDP etc).
Here is a list of some of the systems I've found:
Open Source or Freeware
DummyNet is an open source BSD Unix-based for dedicated devices. It is not clear if the software is being actively maintained
NistNet is an open source Linux-based system for dedicated devices. The software has not been actively maintained for several years.
Commercial
Apposite Technoligies sell dedicated hardware solutions for simulating WAN links, with a Web based GUI for configuring the settings and collecting traffic measurements
East Coast DataCom sell hardware dedicated simulators for simulating routers and modems
Itrinegy offer both dedicated device solutions, and solutions for running on clients or servers.
Network FX offer several dedicated device products for simulating network impairments between the client & server
NetLimiter is a client side system that allows throttling of individual applications, and includes a firewall.
Shunra Software offer a range of products, from high end enterprise WAN simulation and testing, to a simple client-resident emulator.
The closest I can think of is doing something similar with VEDekstop from Shunra..
Simulating High Latency and Low Bandwidth in Testing of Database Applications
Shunra VE Desktop Standard is a Windows-based client software solution that simulates a wide area network link so that you can test applications under a variety of current and potential network conditions – directly from your desktop.
I wrote a php script awhile back which used CURL to run a sequence of page requests against my server which represented a typical use scenario. I had it output the times that it took for the server to respond to each of the requests. I then had another script which spawned a bunch of these test case scripts simultaneously for a sustained period and correlated the results into a file which I could then look at in a spreadsheet to see average times. This way I could simulate the number of users hitting the site that I wanted. The limitations are that you need to run the test script on a different server to the web server and that the client machine can become too loaded to give meaningful results past a certain point. I've since left the job otherwise I would paste the scripts here.
If you are running a Linux box as your server, Linux box as your client, or have the capability to put (perhaps a VM) a Linux router between your client and server, you can use NetEm.
NetEm is a Linux TC (Traffic Control) discipline which can delay (i.e. add latency) packets leaving a host. Although it's tricky to set up clever rules (e.g. add latency to some traffic, not to others), it's easy to add a simple "delay everything leaving the interface by 50ms" type rules and some recipes are provided.
By sticking a Linux VM between your client and server, you can simulate as much latency as you like. And you can turn it on and off dynamically. Linux has other TC disciplines which can be combined with NetEm to restrict bandwidth (but the script to set this up can be somewhat complicated). NetEm can also randomly drop packets.
I use it and it works a treat :)
Web Application Stress Tool (WAST) from Microsoft is what you need.
http://www.microsoft.com/downloads/details.aspx?familyid=e2c0585a-062a-439e-a67d-75a89aa36495&displaylang=en
I haven't used it for years (lack of need, not because I'd found anything else), but xat webspeed would be the first thing I would point toward
As other people have mentioned, Apache's ab (comes with Apache, so you probably have it already) is good.
Other good options are:
HP's LoadRunner Apache
Jakarta's JMeter
Tsung (if you want to get your erlang on)
I personally like ab and JMeter the best.
We use Loadrunner to do bandwidth and traffic simulation in our App. Loadrunner is can start agents on various machines and you can simulate one machine as running on dialup modem v/s another on DSL v/s another on Cable internet.
We also use Loadrunner to simulate various kinds of traffic conditions from 10 user run to 500 user run. We can also insert think times in the script and simulate a real user executing the http request. The best part is that it comes with a recording studio where it will plug in with Internet explorer and you can record the whole scenario/Usecase that can be as simple as hitting one page to a full blown 50-60 page script or more.
i found this little java program that works great : sloppy
yet not a proffesional solution but it works for simple tests, i guess it uses java streams and buffers to slow down the connection .
Have you looked at Tsung? It's a great utility for seeing if your website will scale in event of attack, I mean massive popularity. We use it for our web frontend, and our internal systems too.
If you're interested in performing your tests out of your browser, there is also a really great Firefox plug-in.
Do not forget about Wanulator (http://www.wanulator.de/).
The name Wanulator comes from "WAN" and "simulator. This pretty much describes what the software does: It simulates different Internet conditions such as delay or packet loss. Furthermore it simulates user access line speeds e.g. modem, ISDN or ADSL.
Wanulator is currently packaged as a Linux boot CD based on SLAX. This will give you a full out of the box experience. You can turn any PC into a test-system within a blink - just by booting the Wanulator CD. The package already includes useful client SW such as web-browser and network sniffer (Wireshark). Nevertheless if the PC has 2 network interfaces the system can run as an intermediate system between your server and your client - as a switch - without any configuration hassles.