Websphere MQ - Disk Space taken by all messages in the queue? - ibm-mq

In Websphere MQ, we can easily find out that how many messages are there in a local queue using CURDEPTH attribute of the queue.
But how can I find the actual disk space taken by these messages? As messages in the queue might be of different size i.e. they may take different disk space.
Thanks in advance.

The name of the "queue" file on the disk does not map exactly to the queue name.
For normal queues created with older version of IBM MQ the file would be called q in the directory /var/mqm/qmgrs/QMGR/queues/QUEUE_NAME where the . character in the queue name is replaced with !.
For normal queues created with newer versions of MQ (I believe 7.5 and later) the actual file is called /var/mqm/qmgrs/QMGR/queues/QUEUE_NAME where the . character in the queue name is replaced with !, it is no longer a directory with a file called q.
For dynamic queues the directory or file name will not contain the actual dynamic queue name at all and will be similar to !!GHOST!DEADBEEF!0!DEADBEEF!99.
To find the exact location of the queue file use the dspmqfls command as in the example below:
dspmqfls -m QMGR -t ql SYSTEM.DEFAULT.LOCAL.QUEUE
The output will look like this:
WebSphere MQ Display MQ Files
QLOCAL SYSTEM.DEFAULT.LOCAL.QUEUE
/var/mqm/qmgrs/QMGR/queues/SYSTEM!DEFAULT!LOCAL!QUEUE
Note that the output is the same no matter if the location is a directory or the actual file. If you check and it is a directory you can look in the directory for the file named q, if it is a file that is the actual "queue" file.
Example of a queue Directory:
$ls -ld /var/mqm/qmgrs/QMGR/queues/SYSTEM!DEFAULT!LOCAL!QUEUE
drwxrwx--- 2 mqm mqm 96 Apr 7 2010 /var/mqm/qmgrs/QMGR/queues/SYSTEM!DEFAULT!LOCAL!QUEUE
Example of a queue File:
$ls -ld /var/mqm/qmgrs/QMGR/queues/SYSTEM!DEFAULT!LOCAL!QUEUE
-rw-rw---- 1 mqm mqm 2048 Jul 19 2016 /var/mqm/qmgrs/QMGR/queues/SYSTEM!DEFAULT!LOCAL!QUEUE
NOTE APAR IT09611 which applies to IBM MQ v7.5.0.0 through 7.5.0.5 can cause some queue file names to be truncated, this is fixed in 7.5.0.6.

You look at the size of the queue file to determine the disk space taken by all messages in a queue. Queue file will be located under /qmgrs//queues folder. Queue file name will be same as the queue name.

Thanks Shashi.
For others, the full path of the queue file is
/var/mqm/qmgrs/QMANAGER_FOLDER/queues/QUEUE_You_Want/q
QMANAGER_FOLDER - The Queue Manager directory
QUEUE_You_Want - The Queue , you are looking for.
The size of file 'q' will be enough to determine the total disk space taken by the queue. So, in case of a file system gets full due to some queue messages, we can determine which queue is taking how much disk space from here.

Related

Sequential file processing in webmethods

Can webmethods file port read files in specific sequence? like if i have file 001.xml, 003.xml, 002.xml in the monitoring folder can webmethods be configured/customized to read file in filename sequence like 001.xml, 002.xml, 003.xml?
AFAIK, the files are read in no particular order. In order to process them in some order, you'd probably have to read them, publish an event (containing the file name among other data) or store the file somewhere, and then process the published events / stored files as you like.

Storing mapreduce intermediate output on a remote server

I use a hadoop (version 1.2.0) cluster of 16 nodes, one with a public IP (the master) and 15 connected through a private network (the slaves).
Is it possible to use a remote server (in addition to these 16 nodes) for storing the output of the mappers? The problem is that the nodes are running out of disk space during the map phase and I cannot compress map output any more.
I know that mapred.local.dirin mapred-site.xml is used to set a comma-separated list of dirs where the tmp files are stored. Ideally, I would like to have one local dir (the default one) and one directory on the remote server. When the local disk fills, then I would like to use the remote disk.
I am not very sure about about this but as per the link (http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml) it says that:
The local directory is a directory where MapReduce stores intermediate data files.
May be a comma-separated list of directories on different devices in
order to spread disk i/o. Directories that do not exist are ignored.
Also there are some other properties which you should check out. These might be of help:
mapreduce.tasktracker.local.dir.minspacestart: If the space in mapreduce.cluster.local.dir drops under this, do not ask for more tasks. Value in bytes
mapreduce.tasktracker.local.dir.minspacekill: If the space in mapreduce.cluster.local.dir drops under this, do not ask more tasks until all the current ones have finished and cleaned up. Also, to save the rest of the tasks we have running, kill one of them, to clean up some space. Start with the reduce tasks, then go with the ones that have finished the least. Value in bytes.
The solution was to use the iSCSI technology. A technician helped us out to achieve that, so unfortunately I am not able to provide more details on that.
We mounted the remote disk to a local path (/mnt/disk) of each slave node, and created a tmp file there, with rwx priviledges for all users.
Then, we changed the $HADOOP_HOME/conf/mapred-site.xml file and added the property:
<property>
<name>mapred.local.dir</name>
<value>/mnt/disk/tmp</value>
</property>
Initially, we had two, comma-separated values for that property, with the first being the default value, but it still didn't work as expected (we still got some "No space left on device" errors). So we left only one value there.

Write Path HDFS

Introduction
Follow-up question to this question.
A File has been provided to HDFS and has been subsequently replicated to three DataNodes.
If the same file is going to be provided again, HDFS indicates that the file already exists.
Based on this answer a file will be split into blocks of 64MB (depending on the configuration settings). A mapping of the filename and the blocks will be created in the NameNode. The NameNode knows in which DataNodes the blocks of a certain file reside. If the same file is provided again the NameNode knows that blocks of this file exists on HDFS and will indicate that the file already exits.
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename? Which process is responsible for this?
Which process is responsible for splitting a file into blocks?
Example Write path:
According to this documentation the Write Path of HBase is as follows:
Possible Write Path HDFS:
file provided to HDFS e.g. hadoop fs -copyFromLocal ubuntu-14.04-desktop-amd64.iso /
FileName checked in FSImage whether it already exists. If this is the case the message file already exists is displayed
file split into blocks of 64MB (depending on configuration
setting). Question: Name of the process which is responsible for block splitting?
blocks replicated on DataNodes (replication factor can be
configured)
Mapping of FileName to blocks (MetaData) stored in EditLog located in NameNode
Question
How does the HDFS' Write Path look like?
If the content of a file is changed and provided again does the NameNode update the existing file or is the check restricted to mapping of filename to blocks and in particular the filename?
No, it does not update the file. The name node only checks if the path (file name) already exists.
How does the HDFS' Write Path look like?
This is explained in detail in this paper: "The Hadoop Distributed File System" by Shvachko et al. In particular, read Section 2.C (and check Figure 1):
"When a client writes, it first asks the NameNode to choose DataNodes to host replicas of the first block of the file. The client organizes a pipeline from node-to-node and sends the data. When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block. A new pipeline is organized, and the client sends the further bytes of the file. Choice of DataNodes for each block is likely to be different. The interactions among the client, the NameNode and the DataNodes are illustrated in Fig. 1."
NOTE: A book chapter based on this paper is available online too. And a direct link to the corresponding figure (Fig. 1 on the paper and 8.1 on the book) is here.

Does hadoop use folders and subfolders

I have started learning Hadoop and just completed setting up a single node as demonstrated in hadoop 1.2.1 documentation
Now I was wondering if
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Yes, use the directories to your advantage. Generally, when you run jobs in Hadoop, if you pass along a path to a directory, it will process all files in that directory. So.. you really have to use them anyway.
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
You can add/remove nodes as you please (unless by single-node, you mean pseudo-distributed... that's different)
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
Lots
To expand on climbage's answer:
Maximum number of files is a function of the amount of memory available to your Name Node server. There is some loose guidance that each metadata entry in the Name Node requires somewhere between 150-200 bytes of memory (it alters by version).
From this you'll need to extrapolate out to the number of files, and the number of blocks you have for each file (which can vary depending on file and block size) and you can estimate for a given memory allocation (2G / 4G / 20G etc), how many metadata entries (and therefore files) you can store.

WebSphere MQ q program read/write to file

I use the following to write queue contents to a file:
q -xb -ITESTQ -mTEST > messages.out
I had 3 binary messages in the queue that got written to the file successfully. Now I have a need to load the same file back to the queue (same queue at a later time). When I do:
q -xb -oTESTQ -mTEST < messages.out
It puts 9 messages instead of 3. I am guess the formatting is misread while the file is loaded. I've noticed there is -X option in the q program. What is the usage of it? What other options I have?
You really need to look at the QLoad program (SupportPac MO03) for this. Same author as the Q program and every bit as good a tool. Also free. As the author explains in the manual:
Ever since I released my MA01 (Q Utility) SupportPac I have had
periodic requests to explain how it can be used to unload, and
subsequently reload, messages from a queue. The answer has always been
that this is not what MA01 is for and that surely there must be a
utility available. Well, after sufficient numbers of these requests I
looked for a utility myself and didn’t really find anything which
fitted the bill. What was needed was a very simple, some would say
unsophisticated, program which unloaded a queue into a text file. The
notion of a text file was important because a number of users wanted
the ability to change the file once it had been created. I also find
that text based files are more portable and so this seemed useful if
we want to unload a queue, say on Windows, and then load the messages
again on a Solaris machine. The disadvantage of this approach is that
the file is larger than it would be in binary mode. Storing data using
the hex representation of the character rather than the character
itself essentially uses twice as much space. However, in general I do
not envisage people using this program to unload vast amounts of
message data but a few test messages or a few rogue messages on the
dead letter queue which are then changed and reloaded elsewhere.

Resources