Azure Storage Queue very slow from a worker role in the cloud, but not from my machine - performance

I'm doing a very simple test with queues pointing to the real Azure Storage and, I don't know why, executing the test from my computer is quite faster than deploy the worker role into azure and execute it there. I'm not using Dev Storage when I test locally, my .cscfg is has the connection string to the real storage.
The storage account and the roles are in the same affinity group.
The test is a web role and a worker role. The page tells to the worker what test to do, the the worker do it and returns the time consumed. This specific test meassures how long takes get 1000 messages from an Azure Queue using batches of 32 messages. First, I test running debug with VS, after I deploy the app to Azure and run it from there.
The results are:
From my computer: 34805.6495 ms.
From Azure role: 7956828.2851 ms.
That could mean that is faster to access queues from outside Azure than inside, and that doesn't make sense.
I'm testing like this:
private TestResult InQueueScopeDo(String test, Guid id, Int64 itemCount)
{
CloudStorageAccount account = CloudStorageAccount.Parse(_connectionString);
CloudQueueClient client = account.CreateCloudQueueClient();
CloudQueue queue = client.GetQueueReference(Guid.NewGuid().ToString());
try
{
queue.Create();
PreTestExecute(itemCount, queue);
List<Int64> times = new List<Int64>();
Stopwatch sw = new Stopwatch();
for (Int64 i = 0; i < itemCount; i++)
{
sw.Start();
Boolean valid = ItemTest(i, itemCount, queue);
sw.Stop();
if (valid)
times.Add(sw.ElapsedTicks);
sw.Reset();
}
return new TestResult(id, test + " with " + itemCount.ToString() + " elements", TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
}
finally
{
queue.Delete();
}
return null;
}
The PreTestExecute puts the 1000 items on the queue with 2048 bytes each.
And this is what happens in the ItemTest method for this test:
Boolean done = false;
public override bool ItemTest(long itemCurrent, long itemCount, CloudQueue queue)
{
if (done)
return false;
CloudQueueMessage[] messages = null;
while ((messages = queue.GetMessages((Int32)itemCount).ToArray()).Any())
{
foreach (var m in messages)
queue.DeleteMessage(m);
}
done = true;
return true;
}
I don't what I'm doing wrong, same code, same connection string and I got these resuts.
Any idea?
UPDATE:
The problem seems to be in the way I calculate it.
I have replaced the times.Add(sw.ElapsedTicks); for times.Add(sw.ElapsedMilliseconds); and this block:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
for this one:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
times.Min(),times.Max(),times.Average());
And now the results are similar, so apparently there is a difference in how the precision is handled or something. I will research this later on.

The problem apparently was a issue with different nature of the StopWatch and TimeSpan ticks, as discussed here.
Stopwatch.ElapsedTicks Property
Stopwatch ticks are different from DateTime.Ticks. Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.

How is your CPU utilization? Is this possible that your code is spiking the CPU and your workstation is much faster than your Azure node?

Related

IBM MQ tuning for tranfer a large number of file

I have a project to transfer file using IBM MQ. There are 10000 clients and one data center. The largest file size is almost 8MB. The MQ cluster contains three MQ managers which are at different Windows server. Each MQ manager have 5 channels for client and 5 channel for data center. There are two cases for testing. Clients are evenly distributed to MQ manager in each case. Do not lose any file is the most important thing in these cases.
Case 1:
Every client send 50 files to data center at the same time. The files size are between 150KB to 5MB.
In this case, the sum of file size one client send is almost 80MB.
Case 2 :
Data center send the 10 identical files to every client at the same time. In this case, I create a topic named `myTopic` and 10000 clients subscribe this topic. Data center send 10 identical files to the topic.
MQ Mangers have a heavy load. I already set some attribute in IBM MQ:
Queue Manager:
Max handles: 100000
Maximum message length: 100MB
Max channels: 10000
Max channels: 10000
Is there any attribute that could increase the performance?
5/11 update:
First, I have modified the situation of case 2 above. I have a data center server that has a 4 core CPU and 32G RAM. I use 4 clients server to simulate 10000 clients, and each client server has 4 core CPU and 16G RAM.
In case 1, it take about 37 minutes when 1000 clients send files to the data center. There are not enough memory on data center server when data center receive files from 2000 clients. I find there are 20G memory used for buffer/cache. Here is my java code used to receive files:
try {
String filePath = ConfigReader.getInstance().getConfig("filePath");
MQMessage mqMsg = new MQMessage();
mqMsg.messageId = CMQC.MQMI_NONE;
mqMsg.correlationId = CMQC.MQCI_NONE;
mqMsg.groupId = CMQC.MQGI_NONE;
int flag = 1;
while (true) {
try {
MQQueueManager queueManager = new MQQueueManager("QMGR1");
int option = CMQC.MQTOPIC_OPEN_AS_SUBSCRIPTION | CMQC.MQSO_DURABLE;
MQTopic subscriber = queueManager.accessTopic("", "myTopic", option, null, "datacenter");
subscriber.get(mqMsg);
if (mqMsg.getDataLength() != 0) {
String fileName = filePath + "_file" + flag + ".txt";
byte[] b = new byte[mqMsg.getDataLength()];
mqMsg.readFully(b);
System.out.println("Receive " + fileName + ", complete time: " + System.currentTimeMillis());
Path path = Paths.get(fileName);
System.out.println("Write " + fileName + ", start time: " + System.currentTimeMillis());
Files.write(path, b);
System.out.println("Write " + fileName + ", complete time: " + System.currentTimeMillis());
flag++;
}
} catch (MQException e) {
// e.printStackTrace();
if (e.reasonCode != 2033) {
e.printStackTrace();
}
} finally {
mqMsg.clearMessage();
mqMsg.messageId = CMQC.MQMI_NONE;
mqMsg.correlationId = CMQC.MQCI_NONE;
mqMsg.groupId = CMQC.MQGI_NONE;
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I use byte array to read message and write it to disk. Is it possible that the byte array does not release memory and takes 20G memory?
In case 2, I find if I send a 5MB file to myTopic that has 1000 subscribers on MQ manager01, MQ manager01 take a lot of time to sync with cluster member. The disks on the MQ servers are very busy. There are another problem: Sometimes I get only 7 seconds to send a 5MB file, sometimes it takes 90 seconds. Here is my java code to send files:
try {
MQQueueManager queueManager = new MQQueueManager("QMGR1");
MQTopic publisher = queueManager.accessTopic("myTopic", "", CMQC.MQTOPIC_OPEN_AS_PUBLICATION,
CMQC.MQOO_OUTPUT);
System.out.println("---- start publish , time: " + System.currentTimeMillis() + " ----");
publisher.put(InMemoryDataProvider.getInstance().getMessage("my5MBFile"));
System.out.println("---- end publish , time: " + System.currentTimeMillis() + " ----");
publish.getPublisher().close();
} catch (MQException e) {
System.out.println("threadNum: " + publish.getThreadNo() + " publish error");
if (e.reasonCode != 2033) {
e.printStackTrace();
}
}
A couple of things.
MQ has FTE which transfers files for you. I think it does it using non persistent messages, so you avoid the disk overhead.
You might try checking your .ini files for parameters like ClntRcvBuffSize=0
see here.
0 says use the operating system values.
TCP used to send some data in short packets (64KB chunk), then wait till the packets have been acknowledged, and send more. If the connection is reliable, then you get higher throughput by sending bigger logical packets, a technique known as Dynamic Right Sizing. See here
it works best when the connection is long lived and sending a lot if data. For example the first few chunks may be 64KB, then increase it a bit to 128KB chunks, eventually up to 100MB ( or more) if needed.
You need to set both ends.
Depending on platform, you can use Netstat replacement ss command to display the various window sizes.
For your QM to QM channels specify a large batchsz and batchlim - though this may make your disk IO worse as the data gets to the remote end faster.

Hyperledger composer performance(adding asset) is very low

Following code is simple code to check how many entities can be added per second or minute.
createAsset is calling backend(http:localhost:3000) and add data using post.
When I did test using this code, it took 23 seconds to add 10 entities.
I am using composer 0.19.12 and fabric 1.1. When I checked some thread from GitHub, performance has improved using indexing couchdb. How can I use that feature? (I need to check again, but it seems that it is default feature of recent composer version)
addEntities: async function() {
var start = 0;
var end = start + 100;
var sd = new Date();
console.log(sd.getHours()+':'+sd.getMinutes()+':'+sd.getSeconds()+'.'+sd.getMilliseconds());
for(var i = start; i<end; i++) {
entityData.id = i.toString();
await this.createAsset('/Entity', 'model.Entity', entityData);
}
var ed = new Date();
var totalTime = new Date(ed.getTime()-sd.getTime());
console.log(totalTime.getMinutes()+':'+totalTime.getSeconds()+'.'+totalTime.getMilliseconds());
},
My model is really simple as follows.
asset Entity identified by id {
o String id
}
I have changed the test code to send multiple transactions as follows following david_k's advice.
addEntities: async function() {
var start = 15000;
var dataNumber = 1200;
var loopNumber = 400;
var end = start + dataNumber;
var sd = new Date();
console.log(sd.getHours()+':'+sd.getMinutes()+':'+sd.getSeconds()+'.'+sd.getMilliseconds());
var tasks = [];
for(var i = start; i<end; i++) {
entityData.id = i.toString();
if((i-start)%loopNumber === loopNumber - 1) {
await this.createAsset('/Entity', 'model.Entity', entityData);
console.log('--- i: ' + i + ' loops completed');
}
else {
this.createAsset('/Entity', 'model.Entity', entityData);
}
}
var ed = new Date();
var totalTime = new Date(ed.getTime()-sd.getTime());
console.log(totalTime.getMinutes()+':'+totalTime.getSeconds()+'.'+totalTime.getMilliseconds());
},
The purpose of change is send multiple requests at the same time, and it seems work well because it shows much better performance compared to previous code. However, the performance is still around 8 TPS. As original test code was 1 transaction per 2sec~3sec, it improved a lot. But, 8TPS looks that it cannot be used for commercial application at all. Even it is not good for test purpose as well. Could someone give some advice for this?
That sounds about right looking at your example code and I am assuming you are using either the fabric-dev-servers package which is a very simple fabric network to help get users started with developing a business network and want to try out on a hyperledger fabric network, or you are using the byfn network from the multi-org tutorial which is a hyperledger fabric example of a 2 organisation network in a consortium to demonstrate the required operational steps of composer in a multi-org fabric setup.
Hyperledger Fabric is a distributed ledger technology based around eventual consistency. Composer implements a submit/notify model such that once a transaction has been submitted it will notify the client when that transaction has been committed to the ledger. You can configure which Peers in a network you are interested in informing you when that occurs, but the default is all of them and so the rest server responds once all peers have committed it to the ledger.
Hyperledger fabric doesn't commit individual transactions, it batches them up into blocks and these blocks get committed to the ledger, and it will wait a period of time before building that block with the current set of transactions that have been submitted for ordering, so blocks can contain one or more transactions. You need to configure fabric for your use case to determine how transactions are batched into blocks.

Communication between website and windows

This is general question, how to approach this problem.
For my technical degree i would like to do sort of website application that will connect windows machine, send a request to powershell e.g. get-processes, and in the end display it on the website.
I'm not sure if PowerShell Web Access can be modified like that, Is there any other solution?
Like service that i could communicate on?
-mateusz
You can use Powershell runspaces, this is an example, but in your case you might have to change it for the authentication methods in you have to use...
PSCredential credential = new PSCredential(user, secure_pw);
WSManConnectionInfo connectionInfo = new WSManConnectionInfo();
connectionInfo.AuthenticationMechanism = AuthenticationMechanism.Credssp;
connectionInfo.ProxyAuthentication = AuthenticationMechanism.Negotiate;
connectionInfo.OperationTimeout = 4 * 60 * 1000; // 4 minutes.
connectionInfo.OpenTimeout = 1 * 60 * 1000; // 1 minute.
connectionInfo.Credential = credential;
Runspace rs = RunspaceFactory.CreateRunspace(connectionInfo);
rs.Open();
using (PowerShell PowerShellInstance = PowerShell.Create())
{
string hostname = "my-host";
PowerShellInstance.Runspace = rs;
PowerShellInstance.AddScript(string.Format("param([string]$hostname) Get-Process -ComputerName $hostname"))
PowerShellInstance.AddParameter("hostname", hostname);
// invoke execution on the pipeline (collecting output)
Collection<PSObject> PSOutput = PowerShellInstance.Invoke();
// do something with the errors found.
if (PowerShellInstance.Streams.Error.Count > 0)
{
foreach (var error in PowerShellInstance.Streams.Error)
{
Console.WriteLine(error.Exception.Message);
}
}
}
rs.Dispose();
If do it this way, I recommend you do a bit of research about PowerShellInstance.AddScript vs PowerShellInstance.AddCommand and how the parameters have to be handled, etc...

Temboo call hangs Arduino

I am using an Arduino Uno with the Desloo W5100 Ethernet shield. Whenever I try to make calls to Parse using Temboo, the device just hangs. Sometimes for minutes...sometimes indefinitely. Here is what I run:
void updateParseDoorState() {
if (!ENABLE_DOOR_STATE_PUSHES) {
Serial.println("Door state pushing disabled. Skipping.");
return;
}
Serial.println("Pushing door state to database...");
TembooChoreo UpdateObjectChoreo(client);
// Invoke the Temboo client
UpdateObjectChoreo.begin();
// Set Temboo account credentials
UpdateObjectChoreo.setAccountName(TEMBOO_ACCOUNT);
UpdateObjectChoreo.setAppKeyName(TEMBOO_APP_KEY_NAME);
UpdateObjectChoreo.setAppKey(TEMBOO_APP_KEY);
// Set profile to use for execution
UpdateObjectChoreo.setProfile("ParseAccount");
// Set Choreo inputs
String ObjectIDValue = "xxxxxxxxxx";
UpdateObjectChoreo.addInput("ObjectID", ObjectIDValue);
String ClassNameValue = "DoorState";
UpdateObjectChoreo.addInput("ClassName", ClassNameValue);
String ObjectContentsValue = (currentState == OPEN) ? "{\"isOpen\":true}" : "{\"isOpen\":false}";
UpdateObjectChoreo.addInput("ObjectContents", ObjectContentsValue);
// Identify the Choreo to run
UpdateObjectChoreo.setChoreo("/Library/Parse/Objects/UpdateObject");
// Run the Choreo; when results are available, print them to serial
int returnStatus = UpdateObjectChoreo.run();
if (returnStatus != 0){
setEthernetIndicator(EthernetStatus::SERVICES_DISCONNECTED);
Serial.print("Temboo error: "); Serial.println(returnStatus);
// read the name of the next output item
String returnResultName = UpdateObjectChoreo.readStringUntil('\x1F');
returnResultName.trim(); // use “trim” to get rid of newlines
Serial.print("Return result name: "); Serial.println(returnResultName);
// read the value of the next output item
String returnResultData = UpdateObjectChoreo.readStringUntil('\x1E');
returnResultData.trim(); // use “trim” to get rid of newlines
Serial.print("Return result data: "); Serial.println(returnResultData);
}
/*while(UpdateObjectChoreo.available()) {
char c = UpdateObjectChoreo.read();
Serial.print(c);
}*/
UpdateObjectChoreo.close();
Serial.println("Pushed door state to database!");
Serial.println("Waiting 30s to avoid overloading Temboo...");
delay(30000);
}
I get this in the serial monitor:
Current state:6666ÿ &‰ SP S P U WR SR R PR P 66Temboo error: 223
This indicates that there is some type of HTTP error, but I never get to print what the error is...because the serial monitor is stuck there forever. And eventually disconnects.
I work at Temboo.
It sounds like you might be running out of memory on your board (a common occurrence on resource-constrained hardware like Arduino). You can find our tutorial on how to conserve memory usage while using Temboo here:
https://temboo.com/hardware/profiles
Feel free to get in touch with Temboo Support at any time if you have further questions - we're always available and happy to help.

Redis / RabbitMQ - Pub / Sub - Performances

I wrote a little test for a simple scenario:
One publisher and one subscriber
Publisher send 1000000 messages
Subscriber receive the 1000000 messages
First test with RabbitMQ, fanout Exchange, RabbitMq node type Ram : 320 seconds
Second test with Redis, basic pub/Sub : 24 seconds
Am i missing something? Why a such difference ? Is this a configuration problem or something?
First scenario: one node.js process for the subscriber, one for the publisher, each one, one connection to rabbitmq with amqp node module.
Second scénario: one node.js process for the subscriber, one for the publisher, each one got one connection to redis.
Any help is welcom to understand... I can share the code if needed.
i'm pretty new to all of this.
What i need, is a high performances pub / sub messaging system. I'd like to have clustering capabilities.
To run my test, i just launch the rabbitMq server (default configuration) and i use the following
Publisher.js
var sys = require('sys');
var amqp = require('amqp');
var nb_messages = process.argv[2];
var connection = amqp.createConnection({url: 'amqp://guest:guest#localhost:5672'});
connection.addListener('ready', function () {
exchangeName = 'myexchange';
var start = end = null;
var exchange = connection.exchange(exchangeName, {type: 'fanout'}, function(exchange){
start = (new Date()).getTime();
for(i=1; i <= nb_messages; i++){
if (i%1000 == 0){
console.log("x");
}
exchange.publish("", "hello");
}
end = (new Date()).getTime();
console.log("Publishing duration: "+((end-start)/1000)+" sec");
process.exit(0);
});
});
Subscriber.js
var sys = require('sys');
var amqp = require('amqp');
var nb_messages = process.argv[2];
var connection = amqp.createConnection({url: 'amqp://guest:guest#localhost:5672'});
connection.addListener('ready', function () {
exchangeName = 'myexchange';
queueName = 'myqueue'+Math.random();
var queue = connection.queue(queueName, function (queue) {
queue.bind(exchangeName, "");
queue.start = false;
queue.nb_messages = 0;
queue.subscribe(function (message) {
if (!queue.start){
queue.start = (new Date()).getTime();
}
queue.nb_messages++;
if (queue.nb_messages % 1000 == 0){
console.log('+');
}
if (queue.nb_messages >= nb_messages){
queue.end = (new Date()).getTime();
console.log("Ending at "+queue.end);
console.log("Receive duration: "+((queue.end - queue.start)/1000));
process.exit(0);
}
});
});
});
Check to ensure that:
Your RabbitMQ queue is not configured as persistent (since that would require disk writes for each message)
Your prefetch count on the subscriber side is 0
You are not using transactions or publisher confirms
There are other things which could be tuned, but without knowing the details of your test it's hard to guess. I would just make sure that you are comparing "apples to apples".
Most messaging products can be made to go as fast as humanly possible at the expense of various guarantees (like delivery assurance, etc) so make sure you understand your application's requirements first. If your only requirement is for data to get shoveled from point A to point B and you can tolerate the loss of some messages, pretty much every messaging system out there can do that, and do it well. The harder part is figuring out what you need beyond raw speed, and tuning to meet those requirements as well.

Resources