SAN based storage with chronicle queue - chronicle

I started using chronicle-queue a few days back, and i was going through its documentation.
Chronicle Queue does not support operating off any network file
system, be it NFS, AFS, SAN-based storage or anything else. The reason
for this is those file systems do not provide all the required
primitives for memory-mapped files Chronicle Queue uses.
When i tried writing and reading from a SAN mounted location, i was able to do it.
Can someone explain what does exactly mean by 'chronicle does not support operating off any network file system.'

If you only ever access a SAN or NFS drive from one machine, this should work. However, if the you access it from two machines, you are likely to see an inconsistent state as the order in which pages are flushed to the underlying storage is not reliable.
For fail over and distribution of queues, we recommend using Chronicle Queue Enterprise.

Related

How to detect Windows file closures locally and on network drives

I'm working on a Win32 based document management system that employs an automatic check in/check out model. The model it currently uses for tracking documents in use (monitoring the processes of the applications that open the documents) is not particularly robust so I'm researching alternatives.
Check outs are easy as the DocMgt application is responsible for launching the other application (Word, Adobe, Notepad etc) and passing it the document.
It's the automatic check-in requirement that is more difficult. When the user closes the document in Word/Adobe/Notepad ideally the DocMgt system would be automatically notified so it can perform an automatic check in of the updated document.
To complicate things further the document is likely to be stored on a network drive not a local drive.
Anyone got any tips on API calls, techniques or architectures to support this sort of functionality?
I'm not expecting a magic 3 line solution, the research I've done so far leads me to believe that this is far from a trivial problem and will require some significant work to implement. I'm interested in all suggestions whether they're for a full or part solution.
What you describe is a common task. It is perfectly doable, though not without its share of hassle. Here I assume that the files are closed on the computer where your code can run (even if the files are stored on the mounted network share).
There exist two approaches to controlling the files when they are used: the filter and the virtual filesystem.
The filter sits in the middle, between the process and the filesystem (any filesystem, either local, network or fully virtual) and intercepts file requests that go to this filesystem. Here it is required that the filter code is run on the computer, via which the requests are passed (this requirement seems to be met in your scenario).
The virtual filesystem is an endpoint for the requests that come from the applications. When you implement the virtual filesystem, you handle all requests, so you always fully control the lifetime of the files. As the filesystem is virtual, you are free to keep the files anywhere including the real disk (local or network) or even in the cloud.
The benefit of the filter approach is that you can control individual files that reside on the real disks, while the virtual filesystem can be mounted only to the new drive letter or into the empty directory on the NTFS drive, which is not always fisible. At the same time, sitting in the middle, the filter is to some extent more restricted at what it can do, and the files can be altered while the filter is not running. Finally, filters are more complicated and potentially error prone, as they sit in the middle and must play nice with other filters and with endpoints.
I don't have specific recommendations, but if the separate drive letter is an option, I would recommend the virtual filesystem.
Our company developed (and continues to maintain for the new owner) two products, CBFS Filter and CBFS Connect, which let you create a filter and a virtual filesystem respectively, all in the user mode. Those products are used in many software titles, including some Document Management Systems (which is close to what you do). You will find both products on their website.

server web with redundant mariaDB databases

I would like like some advices on how to set up a server web in HA with a php application using virtual machines with a RED HAT linux OS.
The idea is to have two virtual server web sharing a common document root with NFS or iSCSI plus other two MariaDB databases replicating the data.
I have some documentation to follow anyway I'd like to know your opinion in particular about how to cope with the replication of the databases which must be redundant.
Many Thanks
Riccardo
Do not try to share files. The code does not know how to coordinate things between two instances of mysqld touching the same data files.
Then you talk about replication. This involves separate data files. But, if you put them on the same physical drive, what is the advantage? If the drive crashes, you lose both.
Read more about HA solutions, and make a priority list of what things "keep you up at night". Drive failure, motherboard failure, network corruption, floods, earthquakes, bats in the belfry, etc.

Needs of frequent disk access?

In the famous benchmarking of phoenix: http://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections
I've noticed that I/O optimized machines were used, Rackspace says:
Work-optimized server types
I/O-optimized servers are assigned networking resources and use local
high-speed SSD drives for storage. I/O-optimized servers work best for
applications that require frequent or sustained disk access, like
databases.
Is there an explanation why would Phoenix needs a frequent disk access? are channels stored into memory or on disk? Is I/O optimization the first priority when we need to decide about server specs for production?
I wrote that blog post! I can confirm that I/O optimized instances were not a priority, those just happen to be the types of machines that Rackspace kindly donated to us.
In our case, the important things for us were:
Number of cores (this was important for the sharding optimization mentioned in the post. The new Elixir 1.4 registry will also parallelize the broadcasts (per the docs on using it as a PubSub)
Memory - as you can see in the post, running out of RAM was an issue when we were testing against the 15GB instances.

ActiveMQ Shared File System Master Slave on Amazon EC2

We want to use an ActiveMQ master/slave configuration based on a shared file system on Amazon EC2 - that's the final goal. Operating system should be Ubuntu 12.04, but that shouldn't make too much difference.
Why not a master/slave configuration based on RDS? We've tried that and it's easy to set up (including multi-AZ). However, it is relatively slow and the failover takes approximately three minutes - so we want to find something else.
Which shared file system should we use? We did some research and came to the following conclusion (which might be wrong, so please correct me):
GlusterFS is often suggested and should be supporting multi-AZs fine.
NFSv4 should be working (while NFSv3 is said to corrupt the file system), but I didn't see too many references to it on EC2 (rather: asked for NFS, got the suggestion to use GlusterFS). Is there any particular reason for that?
Ubuntu's Ceph isn't stable yet.
Hadoop Distributed File System (HDFS) sounds like overkill to me and the NameNode would again be a single point of failure.
So GlusterFS it is? We found hardly any success stories. Instead rather dissuasive entries in the bug tracker without any real explanation: "I would not recommend using GlusterFS with master/slave shared file system with more than probably 15 enqueued/dequeued messages per second." Does anyone know why or is anyone successfully using ActiveMQ on GlusterFS with a lot of messages?
EBS or ephemeral storage? Since GlusterFS should replicate all data, we could use the ephemeral storage or are there any advantages of using EBS (IMHO snapshots are not relevant for our scenario).
We'll probably try out GlusterFS, but according to Murphy's Law we'll run into problems at the worst possible moment. So we'd rather try to avoid that by (hopefully) getting a few more opinions on this. Thanks in advance!
PS: Why didn't I post this on ServerFault? It would be a better fit, but on SO there are 10 times more posts about this topic, so I stuck with the flock.
Just an idea.... but with activemq 5.7 (or maybe already 5.6) you can have pluggable lockers (http://activemq.apache.org/pluggable-storage-lockers.html). So it might be an option to use the filesystem as storage and RDS as just a locking mechanism. Note I have never tried this before.

HadoopFS (HDFS) as distributive file storage

I'm consider to use HDFS as horizontal scaling file storage system for our client video hosting service. My main concern that HDFS wasn't developed for this needs this is more "an open source system currently being used in situations where massive amounts of data need to be processed".
We don't want to process data just store them, create on a base of HDFS something like small internal Amazon S3 analog.
Probably important moment is that stored file size will be quite git from 100Mb to 10Gb.
Did anyone use HDFS in such purposes?
If you are using an S3 equivalient then it should already provide a distributed, mountable file-system no? Perhaps you can check out OpenStack at http://openstack.org/projects/storage/.
The main disadvantage would be the lack of POSIX semantics. You can't mount the drive, and you need special APIs to read and write from it. The Java API is the main one. There is a project called libhdfs that makes a C API over JNI, but I've never used it. Thriftfs is another option.
I'm also not sure about the read performance compared to other alternatives. Maybe someone else knows. Have you checked out other distributed filesystems like Lustre?
You may want to consider MongoDB for this. They have GridFS which will allow you to use it as a storage. You can then horizontally scale your storage through shards and provide fault tolerance with replication.
http://docs.mongodb.org/manual/core/gridfs/
http://docs.mongodb.org/manual/replication/
http://docs.mongodb.org/manual/sharding/

Resources