How can you configure a Beckhoff program to work with a variable amount of physical IO blocks? - twincat

At work we are building an airline machine. It is a machine which holds bicycle frames and it has several stations.
Depending on how much stations there are, the amount of physical IO blocks on the ethercat bus may differ. This may differ per customer.
The amount of stations can be entered via a user interface. So the Beckhoff can calculate how much IO there should be present.... in theory that is.
We would like one single program for this machine which can work if not all IO is present on the ethercat bus. But we do not know how to.
We have found out about Conditional pragmas but that is our last resort.

This is possible to achieve. I've worked in a project where parts of the EtherCAT topology was changing every minute.
You achieve this by a combination of EtherCAT couplers/junctions with identity switches (such as the EK1101-0010) and the Hot Connect functionality of EtherCAT. Depending on your real-time requirements and how fast you want to be able to do the switching of the EtherCAT fieldbus slaves, you might also want to consider fast hot connect.
Using the above you can change your hardware configuration during runtime.

I don't think it is possible to change the number of IO links while the program is executing. Whenever a change is made to some IO links, you have to reactivate the configuration.
Like you mentioned you can use conditional pragma's in combination with TcLinkTo attributes to change IO links.

Related

Vulkan multiple logical device from one physical device consequences

When creating a logical device, we can specify multiple queue for one family queue index as we can see in the VkDeviceQueueCreateInfo, and we can pass an array of this struct in the VkDeviceCreateInfo.
So my first clue to, for instance, create a transfer queue and a graphics queue, is to use the same logical device with two different VkDeviceQueueCreateInfo at device creation.
But, can I create two logical devices from the same physical device with a different VkDeviceQueueCreateInfo (one for graphics, and one for transfer) ?
And if yes, what could be the benefit or the bad idea to make one or the other solution ?
Generally speaking, when assessing possible ways to do things in Vulkan, you should pick the way that seems to require the least stuff.
In this case, you're trying to select between multiple queues and multiple devices. Well, the multi-queue method obviously requires less stuff; you have one device with many queues (in theory), rather than many devices with many queues (one from each device). Same number of queues, but more devices. So pick the one with less.
The Vulkan API is not trying to trick you into taking the slower path. If using multiple devices with one queue per-device were the best option, then Vulkan wouldn't have multiple queues as an option at all.
To get into more detail, you say that you want to do memory transfers outside of graphics operations. OK, fine.
Physical devices do not have to provide multiple queues. Some devices provide exactly one queue family, which can have exactly one VkQueue created from it. Obviously if a device allows for multiple queues, you should use them. But if it only allows for one, then you might have reason to think that you should just create multiple devices and work that way.
Even in this case, do not do this.
Here's the thing: if a GPU could actually do multiple operations independently such that they overlap... they'd expose multiple queues. The fact that a physical device does not do so therefore suggests that independent execution of different operations simply is not possible at the GPU level.
This means that, even if you use multiple devices, the transfer and graphics operation will almost certainly execute in some order. That is, whichever vkQueueSubmit is issued first will be the one that does its work first.
So using multiple devices gives you no actual GPU execution overlap (in theory). You've gained nothing, and you've lost explicit control over the order in which these operations are issued.
Now, it may be that the execution of a transfer operation on the graphics queue will not inhibit the execution of rendering commands on that same queue. That is, transfer operations can start, then rendering commands can start while the transfer completes via DMA or something. So they start executing in order, but finish executing in any order.
Even if that's the case, working across devices doesn't give you any advantages here. As previously noted, you lose control over the order in which these commands are submitted. Graphics commands tend to hog the command queue, while a single transfer command could (on such a system) be processed and then execute in the background while processing unrelated commands. In such a case, it's important to send any transfer commands before graphics commands for a particular frame.
And if you have two devices, you have to have 2 vkQueueSubmit calls rather than one. And vkQueueSubmit calls are not known for being fast.
There are a host of other reasons not to try multi-device stuff for this. For example, if later rendering operations need access to the transferred data, this means you need external memory and external synchronization primitives to synchronize access between the devices. And so on.

Synchronizing a counter across a network

I have two computers that can talk to each other over a serial connection. The connection is made over a wireless network. There is a variable, changing delay in communications between the two systems. On both systems I have a counter runtime that increments by 1 every ms. They both start as soon as the applications start. Say each computer is started at different times. How can I with with the serial connection synchronize the counters so that systemA.counter will equal systemB.counter and so that both counters increment at the same time (or as close as possible).
Ideally once synchronized the counters would drift only slowly apart so that once every 3 or 4 thousand incs I could re-synchronize.
I'm looking for good resources on the topic, example algorythms, example code (c/c++), anything to point me in the right direction.
Update
This is a closed system, no internet. For all intents and purposes no real protocol at all besides and open serial line over the wireless link. That link at the moment is bluetooth, but I'm thinking over moving it to a ZigBee Mesh. There are currently 2 nodes, but if I have 30 nodes all running this same application I would want them all to synchronize. There is not client/server designation, just a couple of devices running the same program with a counter. I don't have access to anything like time, just this counter that increments once a millisecond and whatever algorithm I can put in place.
Once I can get this working, I would like to put in place a propositioning and mapping system, but to figure out distances between nodes, I need actuate timing synchronized on the devices.
If you use this counters to order events in a system, you should look at vector clocks or Lamport timestamps.
The obvious resource is NTP, which is documented for example at http://www.eecis.udel.edu/~mills/ntp.html and with links off there. Basically, this uses timestamps to adjust the frequency at which local clocks run. The protocol has been around for years and been the subject of continuous research - I can't see any pack of slides there which immediately makes it clear how it works. You might be better to see if there is already an NTP implementation available than to try and re-implement it yourself.
It appears (e.g. from searching) that there is a small industry of people working on time synchronisation algorithms, especially in the context of wireless sensor networks. One jumping-off point, apart from searches, is the survey paper at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.2012 - Time synchronization in sensor networks: A survey (2004)

what's the fastest way to copy a folder over the network to multiple servers(Python)

As the title says, what I would like to accomplish is given a package(usually the size may vary between 500Mb and 1Gb), I would like to copy over something around 40 servers at the same time(concurrently), I've been using a script that run a copy at the time, therefore I'm considering these possibilities:
1- Multiprocess library and create a single process for each copy function so that, they can run concurrently;
-although I think I might end up having an I/O bottleneck, and process cannot share the same data.
2-I m not using a single internet connection, but a huge corporate WAN.
Can anyone tell me whether is there any other more effective way(faster) to achieve the same thing? Or some other way to solve it?(I can run this task from a 2 core workstation).
1) I have no experience with this, but it looks like a fit for your use case:
http://code.google.com/p/pysendfile/
sendfile(2) is a system call which provides a "zero-copy" way of copying data from one file descriptor to another (a socket). The phrase "zero-copy" refers to the fact that all of the copying of data between the two descriptors is done entirely by the kernel, with no copying of data into userspace buffers. This is particularly useful when sending a file over a socket (e.g. FTP).
and
When do you want to use it?
Basically any application sending files over the network can take advantage of sendfile(2).
2) Another option would be to use some torrent library. I recently learned (skip to 31:00 for the torrent stuff) that facebook distribute their daily software updates via torrent (and update 1000s of servers with 1.5GB binaries within 15min or so).
Assume your machines have 1Gbit connections. You'll get 800Mbit/s if you're lucky/work at it, and it'll take ~10s to copy each 1GByte and 6-7 minutes to update those machines. If that's good enough, the only thing you need to do is work on using the 1Gbit efficiently to hit that target (what are you seeing from your current scripts ? OK 1Gbit may be ambitous on WAN, but you can do a similar analysis). Multiprocessing might or might not help here... but it's not going to magically get you more bandwidth.
If it's not good enough, I'd either consider:
go P2P (see miku;s answer), so as soon as one machine has a bit of
the data it can share it with other machines using it's own
bandwidth. How much this helps depends to some extent on your
network topology (existence of other bottleneck points).
Look into multicast, if the network is enough under your control that you can get the stuff routed appropriately (this seems pretty
unlikely for WAN, but maybe one day in an IPv6 wonderland...).
Instead of copying the same data 40 times (assuming it is the same
each time), you just broadcast it once and all the receivers pick it
up simultaneously. Multicast UDP isn't reliable (intended more for
IPTV I think) but there have been attempts to build reliable file
transfer tools using multicast tech e.g OpenPGM and MS's
own implementation.

How to do performance and scalability testing without clear requirements?

Any idea how to do performance and scalability testing if no clear performance requirements have been defined?
More information about my application.
The application has 3 components. One component can only run on Linux, the other two components are Java programs so they can run on Linux/Windows/Mac... The 3 components can be deployed to one box or each component can be deployed to one box. Deployment is very flexible. The Linux-only component will capture raw TCP/IP packages over the network, then one Java component will get those raw data from it and assemble them into the data end users will need and output them to hard disk as data files. The last Java component will upload data from data files to my database in batch.
In the absence of 'must be able to perform X iterations within Y seconds...' type requirements, how about these kinds of things:
Does it take twice as long for twice the size of dataset? (yes = good)
Does it take 10x as long for twice the size of dataset? (yes = bad)
Is it CPU bound?
Is it RAM bound (eg lots of swapping to virtual memory)?
Is it IO / Disk bound?
Is there a certain data-set size at which performance suddenly falls off a cliff?
Surprisingly this is how most perf and scalability tests start.
You can clearly do the testing without criteria, you just define the tests and measure the results. I think your question is more in the lines 'how can I establish test passing criteria without performance requirements'. Actually this is not at all uncommon. Many new projects have no clear criteria established. Informally it would be something like 'if it cannot do X per second we failed'. But once you passed X per second (and you better do!) is X the 'pass' criteria? Usually not, what happens is that you establish a new baseline and your performance tests guard against regression: you compare your current numbers with the best you got, and decide if the new build is 'acceptable' as build validation pass (usually orgs will settle here at something like 70-80% as acceptable, open perf bugs, and make sure that by ship time you get back to 90-95% or 100%+. So basically the performance test themselves become their own requirement.
Scalability is a bit more complicated, because there there is no limit. The scope of your test should be to find out where does the product break. Throw enough load at anything and eventually it will break. You need to know where that limit is and, very importantly, find out how does your product break. Does it give a nice error message and revert or does it spills its guts on the floor?
Define your own. Take the initiative and describe the performance goals yourself.
To answer any better, we'd have to know more about your project.
If there has been 'no performance requirement defined', then why are you even testing this?
If there is a performance requirement defined, but it is 'vague', can you indicate in what way it is vague, so that we can better help you?
Short of that, start from the 'vague' requirement, and pick a reasonable target that at least in your opinion meets or exceeds the vague requirement, then go back to the customer and get them to confirm that your clarification meets their requirements and ideally get formal sign-off on that.
Some definitions / assumptions:
Performance = how quickly the application responds to user input, e.g. web page load times
Scalability = how many peak concurrent users the applicaiton can handle.
Firstly perfomance. Performance testing can be quite simple, such as measuring and recording page load times in a development environment and using techniques like applicaiton profiling to identify and fix bottlenecks.
Load. To execute a load test there are four key factors, you will need to get all of these in place to be successfull.
1. Good usage models of how users will use your site and/or application. This can be easy of the application is already in use, but it can be extermely difficult if you are launching a something new, e.g. a Facebook application.
If you can't get targets as requirements, do some research and make some educated assumptions, document and circulate them for feedback.
2. Tools. You need to have performance testing scripts and tools that can excute the scenarios defined in step 1, with the number of expected users in step 1. (This can be quite expensive)
3. Environment. You will need a production like environment that is isolated so your tests can produce repoducible results. (This can also be very expensive.)
4. Technical experts. Once the applicaiton and environment starts breaking you will need to be able to identify the faults and re-configure the environment and or re-code the application once faults are found.
Generally most projects have a "performance testing" box that they need to tick because of some past failure, however they never plan or budget to do it properley. I normally recommend to do budget for and do scalability testing properley or save your money and don't do it at all. Trying to half do it on the cheap is a waste of time.
However any good developer should be able to do performance testing on their local machine and get some good benefits.
rely on tools (fxcop comes to mind)
rely on common sense
If you want to test performance and scalability with no requirements then you should create your own requirements / specs that can be done in the timeline / deadline given to you. After defining the said requirements, you should then tell your supervisor about it if he/she agrees.
To test scalability (assuming you're testing a program/website):
Create lots of users and data and check if your system and database can handle it. MyISAM table type in MySQL can get the job done.
To test performance:
Optimize codes, check it in a slow internet connection, etc.
Short answer: Don't do it!
In order to get a (better) definition write a performance test concept you can discuss with the experts that should define the requirements.
Make assumptions for everything you don't know and document these assumptions explicitly. Assumptions comprise everything that may be relevant to your system's behaviour under load. Correct assumptions will be approved by the experts, incorrect ones will provoke reactions.
For all of those who have read Tom DeMarcos latest book (Adrenaline Junkies ...): This is the strawman pattern. Most people who are not willing to write some specification from scratch will not hesitate to give feedback to your document. Because you need to guess several times when writing your version you need to prepare for being laughed at when being reviewed. But at least you will have better information.
The way I usually approach problems like this is just to get a real or simulated realistic workload and make the program go as fast as possible, within reason. Then if it can't handle the load I need to think about faster hardware, doing parts of the job in parallel, etc.
The performance tuning is in two parts.
Part 1 is the synchronous part, where I tune each "thread", under realistic workload, until it really has little room for improvement.
Part 2 is the asynchronous part, and it is hard work, but needs to be done. For each "thread" I extract a time-stamped log file of when each message sent, each message received, and when each received message is acted upon. I merge these logs into a common timeline of events. Then I go through all of it, or randomly selected parts, and trace the flow of messages between processes. I want to identify, for each message-sequence, what its purpose is (i.e. is it truly necessary), and are there delays between the time of receipt and time of processing, and if so, why.
I've found in this way I can "cut out the fat", and asynchronous processes can run very quickly.
Then if they don't meet requirements, whatever they are, it's not like the software can do any better. It will either take hardware or a fundamental redesign.
Although no clear performance and scalability goals are defined, we can use the high level description of the three components you mention to drive general performance/scalability goals.
Component 1: It seems like a network I/O bound component, so you can use any available network load simulators to generate various work load to saturate the link. Scalability can be measure by varying the workload (10MB, 100MB, 1000MB link ), and measuring the response time , or in a more precise way, the delay associated with receiving the raw data. You can also measure the working set of the links box to drive a realistic idea about your sever requirement ( how much extra memory needed to receive X more workload of packets, ..etc )
Component 2: This component has 2 parts, an I/O bound part ( receiving data from Component 1 ), and a CPU bound part ( assembling the packets ), you can look at the problem as a whole, make sure to saturate your link when you want to measure the CPU bound part, if is is a multi threaded component, you can look for ways to improve look if you don't get 100% CPU utilization, and you can measure time required to assembly X messages, from this you can calculate average wait time to process a message, this can be used later to drive the general performance characteristic of your system and provide and SLA for your users ( you are going to guarantee a response time within X millisecond for example ).
Component 3: Completely I/O bound, and depends on both your hard disk bandwidth, and the back-end database server you use, however you can measure how much do you saturate disk I/O to optimize throughput, how much I/O counts do you require to read X MB of data, and improve around these parameters.
Hope that helps.
Thanks

What simple method can I use to debug an embedded processor without serial port or video?

We have a small embedded system without any video or serial ports (i.e. we can't output text via printf).
We would like to track the progress of our code through the initialization sequence.
Is there some simple things we can do to help with this.
It is not running any OS, and the hardware platform is somewhat customizable.
The simplest most scalable solution are state LEDs. Toggle LEDs based on actions, either in binary form or when certain actions occur if you can narrow your focus.
The most powerful will be a hardware JTAG device. You don't even need to set breakpoints - simply being able to stop the application and inspect the state of memory may be enough. Note that some hardware platforms do not support "fancy" options such as memory watches or hardware breakpoints. The former is usually worked around with constantly stopping the processor and reading memory (turns your 10MHz system into a 1kHz system), while the latter is sometimes performed using code replacement (replace the targeted instruction with a different jump), which sometimes masks other problems. Be aware of these issues and which embedded processors they apply to.
There are a few strategies you can employ to help with debugging:
If you have Output Pins available, you can hook them up to LEDs (or an oscilloscope) and toggle the output pins high/low to indicate that certain points have been reached in the code.
For example, 1 blink might be program loaded, 2 blink is foozbar initialized, 3 blink is accepting inputs...
If you have multiple output lines available, you can use a 7 segment LED to convey more information (numbers/letters instead of blinks).
If you have the capabilities to read memory and have some RAM available, you can use the sprint function to do printf-like debugging, but instead of going to a screen/serial port, it is written in memory.
It depends on the type of debugging that you're trying to do - in particular if you're after a temporary method of tracing or if you are trying to provide a tool that can be used as an indication of status during the life of the project (or product).
For one off, in depth source tracing and debugging an in-circuit debugger (eg. jtag) can be very helpful. However, they are most helpful where your debugging requires setting breakpoints and investigating memory and registers - which makes it of little benefit where you are dealing time critical problems.
Where you need to determine program state without having a significant impact on the execution of your system the use of LEDs connected to spare I/O pins will be helpful. These can also be used as the input to a digital storage oscilloscope (DSO) or logic analyzer.
This technique can be made more powerful by selecting unique patterns of pulses that will be identifiable on the DSO.
For a more versatile debugging tool, though, a serial port is a good solution. To save cost and PCB real-estate you may find it useful to use an plug-in module that contains the RS232 converters.
If you are trying to provide a longer term indication of status as part of the normal operation of your product, LEDs are again a cheap an simple method. However in this situation it is best to choose patterns of pulses that are slow enough to be easily identified by visual inspection. This will all you over time you will learn a particular pattern that represents "normal" behavior.
You can easily emulate serial communications (UARTs) using bit-banging from the IO pins of the system. Hook it to one of the card's pins and attach to a RS232 converter there (TTL to RS232 converters are easy to either buy or build), which goes to your PC's serial port.
A JTAG debugger is also an option, though cumbersome to set up.
If you don't have JTAG, the LEDs suggested by the others are a great idea - although you do tend to end up in a test/rebuild cycle to try to track down the issue.
If you've got more time, and spare hardware pins, and memory to spare, you could always bit-bash a low speed serial interface. I've found that pretty useful in the past.
Others have suggested some pretty good ideas using output pins, so I won't suggest that, although it can be a very good solution, and is very cost effective. If your budget and target processor support it, a hardware trace system, (either an old fashioned emulator, or a fancy BDM with bus snooping trace support) can be great for this type of thing. It's very expensive though.
The idea of using a bit-banged software UART is nice, but there's some effort required in writing one and also you need some free timers and interrupts. If your hardware has any other unused serial interface (SPI, I2C, ..), using them would be easier. With a small microcontroller you could convert the interface to RS-232.
If you have to go for the bit-banging, making a synchronous serial might be a simpler alternative as it wouldn't be critical to timing.

Resources