AT32UC3B0512 project compiled as AT32UC3B0256 -> Consequences

AT32UC3B0512 project compiled as AT32UC3B0256 -> Consequences - avr

I just figured out that I have compiled and programmed my AT32UC3B0512 project using the AT32UC3B0256 as target device.
My application seams to work without problems. Is that possible? What are the differences between AT32UC3B0512 and AT32UC3B0256 (beside flash and ram size)?

in most cases the program EEPROM is the only difference.
You use lower target then you have
so it just limits the size of your program more then you can use in real.
The functionality is not affected at all (tested for quite some time on L0,A0,and A3 series).
The only thing you have to be careful with UC3 chips are the pin incompatibility between package and series
for example TQFP is much different then BGA ...
also the same package with different pin count is incompatible
also you can not change UC3A0 for UC3A3 ...
the last 2/3 digits are only the EEPROM size
and mostly does not affect SW/HW compatibility
[Note]
#SergioFormiggini is right the AT32UC3B0256 does not have DAC
it is the first time I see a difference in HW config of a chip with different memory size only on Atmel chips
Unless they also changed the memory map and or GPIO mappings you should be fine

Related

How to change endianess settings in cortex m3?

I found two statements in cortex m3 guide(red book)
1. Cortex m3 supports both Little as well as big endianess.
2. After reset endianess cannot be changed dynamically.
So indirectly it is telling change endianess settings in reset handler , is it so?
If yes then how to change endianess. Means which register I need to configure and where to configure ( in reset or in exception handler)
It is not actually good idea to change endianess
But still as a curiosity I wanted to see whether cortex m3 really supports to both endianess or not?

The Cortex-M architecture can be configured to support either big-endian or little-endian operation.
However, a specific Cortex-M implementation can only support one endianness -- it's hard-wired into the silicon, and cannot be changed. Every implementation I'm aware of has chosen little-endian.

You need to be reading the ARM documentation directly. The Technical Reference Manual touches on things like this. If you actually had the source to the cortex-m3 when building it into a chip then you would see the outer layers and or config options that you can touch.
From the cortex-m3 TRM
SETEND always faults. A configuration pin selects Cortex-M3
endianness.
And then we have it in another hit:
The processor contains a configuration pin, BIGEND, that enables you
to select either the little-endian or BE-8 big-endian format. This
configuration pin is sampled on reset. You cannot change endianness
when out of reset.
Technically it would be possible to build a chip where you could choose, it could be designed with an external strap connected to BIGEND, it could be some fuse or other non-volatile thing that you can touch and then pop reset on the ARM core, could have some other processor or logic that manages the booting of the ARM core and you talk to or program that before releasing reset on the ARM core.
In general it is a bad idea to go against the grain on default endianness for an architecture. ARM in particular now that there are two flavors and the latter (BE-8) being more painful (than BE-32). Granted there are other toolchains than gcc, but even with those the vast majority of the users, the vast majority of the indirect testing is in the native (little-endian) mode. Would even wonder how truly tested the logic is, does anyone outside ARMs design verification actually push that mode? Did they test it hard enough?
Have you tried actually building big endian cortex-m3 code? Since the cortex-m is a 16 bit instruction set (with thumb2 extensions) how does that affect BE-8. With BE-8 on a full sized ARM with ARM instructions the 32 bit data swaps but the 32 bit instructions do not. Perhaps this is in the TRM and I should read more, but does that work the same way on the cortex-m? The 16 bit instructions do not swap but the data does? What about on a full sized arm with thumb instructions? And does the toolchain match what the hardware expects?
BTW what that implies is there is a signal named BIGEND in the logic that you interface when you are building a chip around the cortex-m3 and you can go into that logic and change the default setting for BIGEND (I assume they have provided one) or as I have mentioned above you can add logic in your chip to make it a runtime option rather than compile time.

Xilinx ISE 9.2 and programming FPGA

Can anyone tell me how should I configure Xilinx ISE to get fastest FPGA programming speed ?
I have Spartan 3 Starter Board (FPGA chis is xc3s200). I'm not sure what's the name of programming cable, but I plug it in my computer to LPT1 (parallel port) and other side to JTAG header in the Spartan 3 board. Currently it takes over 1 minute to program it, and my project isn't that big (it's only beginner stuff).
Also, I've notices that now I need to turn on "Create ReadBack Data Files" and "Create Mask File" options. If they aren't checked, I can't program the FPGA.
Can anyone tell what are best Xilinx ISE options for FPGA programming ?
Thanks :)
P.S. I should tell you that I am running Xilinx ISE inside Windows XP virtual machine. Is maybe that what is causing the speed problem ? If it is, can you at least tell me how to get rid of that Mask file, because before I didn't create it and everything worked ...

The full-device bitstream (the "file" you load onto the FPGA) for every given device is always the same size; it does not depend on the functionality. An "empty" design's bitstream will be the same size as one that uses most of the FPGA's resources.
Configuration time depends on the configuration mode (serial, parallel 8/16/32 bits) and the configuration hardware (FPGA, PROM, Flash, programming cable capabilities, settings, and operating frequencies).
It sounds like you are using the very old parallel cable (PC-III?); try getting hold of the more recent USB ones and you should get faster programming times for JTAG (which is serial, btw).

1 minute is a looong time for a device that small.
Is that including a certain amount of compile time, or does impact take a long time to load on your system?
Anyway, speed improvements can come from:
Bitstream compression - you can turn this option on to produce a smaller bitstream, which loads faster
Faster configuration clock (change this in the configuration software, eg Impact) - no idea how fast the parallel cable will go, but I can do 12MHz with my USB cable on most boards I've tried and 6MHz on some of them which weren't quite so well-designed.

Try to see if bit-banging is possible. Haven't tried this myself but you might want to read this thread.
http://forums.xilinx.com/t5/Silicon-Devices-Others/FX2-Micrcontroller-JTAG-FPGA-configuration/td-p/41602
RRS

Daisy chain programming with PIC Microcontrollers

Is it possible to program multiple PIC microcontrollers using only 1 PICKit2 programmer? The microcontrollers is connected via daisy chain. With PGC, PGD and MCLR of the PIC to be programmed is connected to the GPIO of the programming PIC.

I may be wrong, but I do not think this will work well as MPLBX will want to read back the written data to verify the programming operation succeed.
Alternatively, have you considered using PICkit3's in their "independent of a computer" mode? The PICKit3's can be configured to burn a specific program into a target PIC independent of a computer. I am wondering if having an "army" of these might address your issues.

I don't believe so. Just for fun after finding this question I took two 12f508's that were known to be good.
To prove that they were good I used IPE to load a previously tested program onto two devices. The devices worked as expected. I then used IPEs "fill memory" tool to program both devices to all empty (every address has 0x00), less the oscillator calibration memory location (I've had trouble with this area in the pass, so I always disable reading and writing to that location).
I then connected both chips up to the programmer in parallel and tried to program them with the same program. This is where everything went horribly awry.
For some reason, the programmer got confused and wrote a value of 0xFF to all addresses, including the out of range addresses. I verified that this was what actually happened by disconnecting the chips from the circuit and reading them independently.
Luckily for me I ran into this problem repeatedly before, and so have built a programmer out of an arduino and some extra circuits, so that I can ignore the stupid "oscillator calibration data invalid" error and reprogram that location to the correct instruction. It takes a long time to read and write memory, but it saves otherwise bricked chips.
In shorter words: No, this does not work, and it may actually "brick" your chips.

How are FPGAs "Updated"

I seem to be under the impression that FPGAs can be updated while the chip is running; and I need to know if that is correct or not.
It seems to be from what I've read that you can change the FPGA netlist on demand the same way you can change the program that's running on a processor. Yes I know that an FPGA is not a processor.
Is my assumption correct, and if not then how come?

Most of the time, you load the configuration for the entire FPGA in one go, and all logic stops running during the reconfiguration process.
It sounds like you want to reload a subset of the FPGA, while the remainder continues running. You would need a device with special support for partial reconfiguration. There's more information on Wikipedia.

==> EDIT: I stand corrected: EETimes article on partial reconfiguration
You will generally need to reset the FPGA so that it can be reprogrammed.
At a system level reconfiguration is possible. You can have a software application running on a PC or embedded system that reprograms the FPGA as needed. Depending on the application or software license, you can program different FPGA designs easily. You cannot, however, significantly alter the design structure, such I/Os, logic cells, DSP configs, memory blocks, etc.

FPGAs have a bunch of logic cells that need to be initialized by a stream of configuration bits. This stream of bits usually comes from a flash chip located off the device, although some devices have the flash memory on-board.
Partial Reconfiguration means the ability to configure just some of the logic cells while the rest are in use. This is specific to particular models.
Total reconfiguration is possible even if your device doesn't support it - you would need to reprogram the flash chip and then issue a Reset or reload command when done.
Some devices have more than one configuration image in the configuration flash. The device will load the first image, and if it doesn't like it, it will load the second (or subsequent) images. This can be for redundancy, or difference feature sets.
Some of the SOC FPGAs (like Xilinx Zynq) use the microprocessor core to load the FPGA. In this case, the microprocessor core can change the FPGA as much as it wants while running.

Yes I know that an FPGA is not a processor.
An FPGA is is a type of processor, but it is not a type of CPU.
Most FPGAs only have volatile storage so you have to update them whilst they're on. This doesn't mean that you can change their operation any time you want. That's dynamic reconfiguration and only supported by a subset of FPGAs.

CUDA fallback to CPU?

I have a CUDA application that on one computer (with a GTX 275) works fine and on another, with a GeForce 8400 works about 100 times slower. My suspicion is that there is some kind of fallback that makes the code actually run on the CPU rather than on the GPU.
Is there a way to actually make sure that the code is running on the GPU?
Is this fallback documented somewhere?
What conditions may trigger it?
EDIT: The code is compiled with compute capabilities 1.1 which what the 8400 has.

Couldn't it just be that the gap in performance is that large. This link indicates that the 8400 operates at 22-62 GFlops and this link indicates that the GTX 275 operates at 1010.88 GFlops.

There are a number of possible reasons for this.
Presumably you're not using the emulation device. Can you run the device query sample from the SDK? That will show if you have the toolkit and driver installed correctly.
You can also query the device properties from within your app to check what device you are attached to.
The 8400 is much lower performance than the GTX275, so it could be real, but see the next point.
One of the major changes in going from compute capability 1.1 to 1.2 and beyond is the way the memory accesses are handled. In 1.1 you have to be very careful not only to coalesce your memory accesses but also to make sure that each half-warp is aligned, otherwise each thread will issue it's own 32 byte transaction. In 1.2 and beyond the alignment is not such an issue as it degrades gracefully to minimise transactions.
This, combined with the lower performance of the 8400, could also account for what you are seeing.

If I remember correctly, you can list all available devices (and choose which device to use for your kernel) from the host code. You could try determine if the available device is software emulation and issue a warning.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio