How does the output register data path work in the 6502?

How does the output register data path work in the 6502? - processor

I am currently developing a subset of the 6502 in LogiSim and at the current stage I am determining which parts to implement and what can be cut out. One of my main resources is Hanson's Block Diagram.
I am currently trying to determine how exactly the output register and its data path works. In this diagram, it looks to me like the data output register goes back onto the bus through the Input Data Latch, but also back into the instruction register.
This confuses me because usually the Address lines to the right of the diagram are sent back into the program memory (not pictured) and not back onto the bus as pictured.
How exactly does this data path work? As a follow up, Is it possible to simplify this area to only take the output and send it to a display instead of back into the processor as pictured?

This confuses me because usually the Address lines to the right of the diagram are sent back into the program memory (not pictured) and not back onto the bus as pictured.
The address bus works differently from the data bus. The address bus is always Output, but the data bus can be Input or Output. We say that the databus is tristate; it either reads, or writes, or does neither. Each pin d0 thru d7 has a simple circuit involving a couple of transistors that controls this. In the case of the 6502, each and every cycle the CPU is either reading something or writing something. In other words, from the 6502's point of view, every cycle is either a read or write cycle.
I am currently trying to determine how exactly the output register and its data path works.
Have a look: the Input Data Latch and Predecode Register are loaded with each φ2. But the Output Data Register is loaded with each φ1. φ1 and φ2 are the two phases of the CPU clock. This arrangement leaves enough time for, say a value to pass from the Input Data Latch, through the ALU, and into the Output Data Register for example.
The Data Output Register's output goes to the Data Bus Tristate Buffers. As you can see, that is controlled by R/W and also by φ2. If it's a read cycle, nothing happens there. So if it's a write cycle, that means the value in the Data Output Register (which was loaded with the previous φ1) is going to be put onto the databus. It also will get loaded into the Predecode Register and into the Input Data Latch.
In this diagram, it looks to me like the data output register goes back onto the bus through the Input Data Latch, but also back into the instruction register.
Absolutely. Anything that the CPU outputs could also get loaded into the Input Data Latch and the Predecode Register. But that doesn't matter, since an instruction will always start with a read cycle, which is the opcode fetch, so the Input Data Latch and the Predecode Register will get overwritten then with the proper value.

Related

Proper way to change state on a state machine in VHDL

I'm working on a FPGA project where I need to read data from an image sensor. This sensor has different image modes (like test pattern, frame, binning, etc.) and in order to change image mode I need to look for specific signals before writing into the registers.
I have inherited some code that I need to fix since the image sensor sometimes gets stuck when we change image mode.
Concerning the change of image mode, a state machine is used.
The following piece of code shows how the registers for changing mode are currently written.
Essentially, when we want to change mode, we need to wait that the signal MODE_SIG_HIGH becomes high before writing into the registers. Then, when this condition happens, we check what mode we want to set. For example, to set set test pattern, we check if bit S2 is set. Then we performs all the operations to actually change mode (line 10).
01. ...
02. WHEN MODE_SIG_HIGH =>
03. NEXT_ST <= MODE_SIG_HIGH;
04. ...
05. IF S2 = '1' THEN
06. -- configure the sensor to
07. NEXT_ST <= CONFIGURE_TEST_PATTERN;
08. END IF;
09. ...
10. WHEN CONFIGURE_TEST_PATTERN =>
11. ...
I'm having a debate with a friend of mine concerning what is the best way to change state when a new event happens. The above solution doesn't seem right to me.
As far as I understood, when we enter a sate, all the instructions contained in that state are performed in parallel. Therefore, concerning the above piece of code, when we enter the state MODE_SIG_HIGH the instruction at line 03 is executed in parallel to the IF condition. My point is that if the bit S2 is set to 1, the IF condition is true and we end up assigning the value CONFIGURE_TEST_PATTERN to the NEXT_ST. An this ends up in assigning two different values to the same variable (in parallel), in line 03 and in line 07. Am I right or am I missing some basic behavior? The reason for having the instruction at line 3 is because after we enter MODE_SIG_HIGH, it could take some clock cycles before we see on of the mode bits set.

As far as I understood, when we enter a sate, all the instructions
contained in that state are performed in parallel.
Not quite. The only things in VHDL which are concurrent ('performed in parallel') are:
processes
concurrent signal assignments
component instantiations
concurrent procedure calls
concurrent assertions (inc.PSL)
generates
blocks
The code inside a process or subprogram (function/prodedure) executes sequentially. This is where you do your conventional programming, using sequential statements (ie. nothing in the list above). These are your standard control constructs (if, case, loop, etc), sequential signal assignments, and so on. If you carry out signal (or variable) assignments in a sequential region, the last one wins, just like a conventional programming language. There are scheduling rules that make this happen, but you don't need to know about those (yet!)

what happens during context switch between two processes in linux?

Let's say process p1 is executing with its own address space(stack,heap,text). When context switch happens, i understand that all the current cpu registers are pushed into PCB before loading process p2. Then TLB is flushed and loaded with p2 address mapping and starts executing with its own address spaces.
What i would like know is the state of p1 address space. Will it be copied to disk and updates its page table before loading process p2?

The specifics of a context switch depend upon the underlying hardware. However, context switches are basically the same, even among different system.
The mistake you have is " i understand that all the current cpu registers are pushed into stack before loading process p2". The registers are stored in an area of memory that is usually called the PROCESS CONTEXT BLOCK (or PCB) whose structure is defined by the processor. Most processors have instructions for loading and saving the process context (i.e., its registers) into this structure. In the case of Intel, this can require multiple instructions saving to multiple blocks because of all the different register sets (e.g. FPU, MMX).
The outgoing process does not have to be written to disk. It may paged out if the system needs more memory but it is possible that it could stay entirely in memory and be ready to execute.
A context switch is simply the exchange of one processor's saved register values for another's.

Linux Driver and API architecture for a data acquisition device

We're trying to write a driver/API for a custom data acquisition device, which captures several "channels" of data. For the sake of discussion, let's assume this is a several-channel video capture device. The device is connected to the system via an 8xPCIe Gen-1 link, which has a theoretical throughput of 16Gbps. Our actual data rate will be around 2.8Gbps (~350MB/sec).
Because of the data rate requirement, we think we have to be careful about the driver/API architecture. We've already implemented a descriptor based DMA mechanism and the associated driver. For example, we can start a DMA transaction for 256KB from the device and it completes successfully. However, in this implementation we're only capturing the data in the kernel driver, and then dropping it and we aren't streaming the data to the user-space at all. Essentially, this is just a small DMA test implementation.
We think we have to separate the problem into three sections: 1. Kernel driver 2. Userspace API 3. User Code
The acquisition device has a register in the PCIe address space which indicates whether there is data to read for any channel from the device. So, our kernel driver must poll for this bit-vector. When the kernel driver sees this bit set, it starts a DMA transaction. The user application however does not need to know about all these DMA transactions and data, until an entire chunk of data is ready (For example, assume that the device provides us with 16 lines of video data per transaction, but we need to notify the user only when the entire video frame is ready). We need to only transfer entire frames to the user application.
Here was our first attempt:
Our user-side API allows a user application to register a function callback for a "channel".
The user-side API has a "start" function, which can be called by the user application, which uses ioctl to send a start message to the kernel driver.
In the kernel driver, upon receiving the start message, we started a kernel thread, which continuously monitors the "data ready" bit-vector, and when it sees new data, copies it over to a driver-allocated (kmalloc) buffer. It keeps doing this until the size of the collected data reaches the "frame size".
At this point a custom linux SIGNAL (similar to SIGINT, SIGHUP, etc) is sent to the process which is running the driver. Our API catches this signal and then calls back the appropriate user callback function.
The user callback function calls a function in the API (transfer_data), which uses an ioctl call to send a userspace buffer address to the kernel, and the kernel completes the data transfer by doing a copy_to_user of the channel frame data to userspace.
All of the above is working OK, except that the performance is abysmal. We can only achieve about 2MB/sec of transfer rate. We need to completely re-write this and we're open to any suggestions or pointers to examples.
Other notes:
Unfortunately, we can not change anything in the hardware device. So we must poll for the "data-ready" bit and start DMA based on that bit.
Some people suggested to look at Infiniband drivers as a reference, but we're completely lost in that code.

You're probably way past this now, but if not here's my 2p.
It's hard to believe that your card can't generate interrupts when
it has transferred data. It's got a DMA engine, and it can handle
'descriptors', which are presumably elements of a scatter-gather
list. I'll assume that it can generate a PCIe 'interrupt'; YMMV.
Don't bother trawling the kernel for existing similar drivers. You
might get lucky, but I suspect not.
You need to write a blocking read, which you supply a large memory buffer to. The driver read op (a) gets gets a list of user pages for your user buffer and locks them in memory (get_user_pages); (b) creates a scatter list with pci_map_sg; (c) iterates through the list (for_each_sg); (d) for each entry writes the corresponding physical bus address and data length to the DMA controller as what I presume you're calling a 'descriptor'.
The card now has a list of descriptors which correspond to the physical bus addresses of your large user buffer. When data arrives at the card, it writes it directly into user space, into your user buffer, while your user-level read is still blocked. When it has finished the descriptor list, the card has to be able to interrupt, or it's useless. The driver responds to the interrupt and unblocks your user-level read.
And that's it. The details are nasty, of course, and poorly documented, but that should be the basic architecture. If you really haven't got interrupts you can set up a timer in the kernel to poll for completion of transfer, but if it is really a custom card you should get your money back.

Intel Pin Tool: Get instruction from address

I'm using Intel's Pin Tool to do some binary instrumentation, and was wondering if there an API to get the instruction byte code at a given address.
Something like:
instruction = getInstructionatAddr(addr);
where addr is the desired address.
I know the function Instruction (used in many of the simple/manual examples) given by Pin gets the instruction, but I need to know the instructions at other addresses. I perused the web with no avail. Any help would be appreciated!
CHEERS

wondering if there an API to get the instruction byte code at a given
address
Yes, it's possible but in a somewhat contrived way: with PIN you are usually interested in what is executed (or manipulated through the executed instructions), so everything outside the code / data flow is not of any interest for PIN.
PIN is using (and thus ships with) Intel XED which is an instruction encoder / decoder.
In your PIN installation you should have and \extra folder with two sub-directories: xed-ia32 and xed-intel64 (choose the one that suits your architecture). The main include file for XED is xed-interface.h located in the \include folder of the aforementioned directories.
In your Pintool, given any address in the virtual space of your pintooled program, use the PIN_SafeCopy function to read the program memory (and thus bytes at the given address). The advantage of PIN_SafeCopy is that it fails graciously even if it can't read the memory, and can read "shadowed" parts of the memory.
Use XED to decode the instruction bytes for you.
For an example of how to decode an instruction with XED, see the first example program.
As the small example uses an hardcoded buffer (namely itext in the example program), replace this hardcoded buffer with the destination buffer you used in PIN_SafeCopy.
Obviously, you should make sure that the memory you are reading really contains code.
AFAIK, it is not possible to get an INS type (the usual type describing an instruction in PIN) from an arbitrary address as only addresses in the code flow will "generate" an INS type.
As a side note:
I know the function Instruction (used in many of the simple/manual
examples) given by Pin gets the instruction
The Instruction routine used in many PIN example is called an "Instrumentation routine": its name is not relevant in itself.

Pin_SafeCopy may help you. This API could copy memory content from the address space of target process to one specified buffer.

Event using FTD2XX_NET.DLL

I am using a FT232RL chip with FTD2XX_NET.dll I've made a program which writes and reads data to/from AVR atmega32 mcu. First writes data, then reads data as answer.
Now, i want to make an event which indicated me if there's available unreaded data, only when AVR sends data to FTDI buffer and ONLY then. Whithout forcing my program to making loops for checking available data. For my purpose, i want to do the mcu to sends data only when he wants, and the PC must to knows when there's new data in FTDI buffer's chip.
I know that It's impossible for the pc to know when AVR sending data to the FTDI. But this which I mean it's that I need some way for my program to know if FTDI have New unreaded data to it's own buffer.
I don't won't to running read operator over and over in an infinity loop as I do now.

You should create a read thread which does your reading in the background. Then from that thread you can signal an even to notify another part of your application when you have data. I'm not sure what language you are using but you should easily be able to find an example of threading and event notification with a Google search.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio