What is an assembly routine and how to use it in Go? - go

I am reading a Go tutorial (The Way To Go), and it mentions something about assembly routine, can somebody please explain what is that? It also states about implemented 'outside' Go, and also no body... may I know what is the purpose?
Quote as follow:
To declarer[sic] a function implemented outside Go, such as an assembly routine, you simply give the name and signature, and no body:
func flushICache(begin, end uintptr) // implemented externally
I tried to search online, but it seems hard to find any tutorial regarding assembly routine, what is that alien? And, what is the meaning of implemented outside Go?

Please read this and follow the links there.
To cite it
Machine code or machine language is a set of instructions executed
directly by a computer's central processing unit (CPU). Each
instruction performs a very specific task, such as a load, a jump, or
an ALU operation on a unit of data in a CPU register or memory.
<…>
All practical programs today are written in higher-level languages or
assembly language.
The Go's own reference doc on its support for assembler is this.

Related

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

How to properly use Golang packages in the standard library or third-party with Goroutines?

Hi Golang programmers,
First of all I apologize if my question is not very clear initially but I'm trying to understand the proper usage pattern when writing Golang code that uses Goroutines when using the standard lib or other libraries.
Let me elaborate: Suppose I import some package that I didn't have a hand in writing that I want to utilize. Let's say this package does a simple http get request somehow to a website such as Flickr for example. If I want a concurrent request, I can just prefix the function call with the go keyword. But how do I know, that this package when doing the request doesn't already do some internal go calls itself therefore making my go calls redundant?
Do Golang packages typically say in the documentation that their method is "greened"? Or perhaps they provide two versions of a method, one that is green and one that is straight synchronous?
In my quest to understand Go idioms and usage patterns I feel like when using even packages in the standard lib that I can't be sure if my go commands are necessary. I suppose I can profile the calls, or write test code but that feels odd to have to figure out if a func is already "green".
I suppose another possibility is that it's up to me to study the source code of whatever I'm using and understand how it should be used and if the go keyword is necessary.
If anybody can shed some light on this or point me to the right documentation or even a Golang screen-cast I'd much appreciate it. I think Rob Pike briefly mentions in one talk that a good client api written go is just written in a typical synchronous manner and it's up to the caller of that api to have the choice of making it green or not.
Thanks for your time,
-Ralph
If a function / method returns some value(s), or have a side effect like that (io.Reader.Read) - then it's necessarily a synchronous thing. Unless documented otherwise, no safety for concurrent use by multiple goroutines should be assumed.
If it accepts a closure (callback) or a channel or if it returns a channel - then it is often an asynchronous thing. If that's the case, it's normally either obvious or explicitly documented. Asynchronous stuff like this is usually safe for concurrent use by multiple goroutines.

Writing a Ruby extension in Go (golang)

Are there some tutorials or practical lessons on how to write an extension for Ruby in Go?
Go 1.5 added support for building shared libraries that are callable from C (and thus from Ruby via FFI). This makes the process easier than in pre-1.5 releases (when it was necessary to write the C glue layer), and the Go runtime is now usable, making this actually useful in real life (goroutines and memory allocations were not possible before, as they require the Go runtime, which was not useable if Go was not the main entry point).
goFuncs.go:
package main
import "C"
//export GoAdd
func GoAdd(a, b C.int) C.int {
return a + b
}
func main() {} // Required but ignored
Note that the //export GoAdd comment is required for each exported function; the symbol after export is how the function will be exported.
goFromRuby.rb:
require 'ffi'
module GoFuncs
extend FFI::Library
ffi_lib './goFuncs.so'
attach_function :GoAdd, [:int, :int], :int
end
puts GoFuncs.GoAdd(41, 1)
The library is built with:
go build -buildmode=c-shared -o goFuncs.so goFuncs.go
Running the Ruby script produces:
42
Normally I'd try to give you a straight answer but the comments so far show there might not be one. So, hopefully this answer with a generic solution and some other possibilities will be acceptable.
One generic solution: compile high level language program into library callable from C. Wrap that for Ruby. One has to be extremely careful about integration at this point. This trick was a nice kludge to integrate many languages in the past, usually for legacy reasons. Thing is, I'm not a Go developer and I don't know that you can compile Go into something callable from C. Moving on.
Create two standalone programs: Ruby and Go program. In the programs, use a very efficient way of passing data back and forth. The extension will simply establish a connection to the Go program, send the data, wait for the result, and pass the result back into Ruby. The communication channel might be OS IPC, sockets, etc. Whatever each supports. The data format can be extremely simple if there's no security issues and you're using predefined message formats. That further boosts speed. Some of my older programs used XDR for binary format. These days, people seem to use things like JSON, Protocol Buffers and ZeroMQ style wire protocols.
Variation of second suggestion: use ZeroMQ! Or something similar. ZeroMQ is fast, robust and has bindings for both languages. It manages the whole above paragraph for you. Drawbacks are that it's less flexible wrt performance tuning and has extra stuff you don't need.
The tricky part of using two processes and passing data between them is a speed penalty. The overhead might not justify leaving Ruby. However, Go has great native performance and concurrency features that might justify coding part of an application in it versus a scripting language like Ruby. (Probably one of your justifications for your question.) So, try each of these strategies. If you get a working program that's also faster, use it. Otherwise, stick with Ruby.
Maybe less appealing option: use something other than Go that has similar advantages, allows call from C, and can be integrated. Althought it's not very popular, Ada is a possibility. It's long been strong in native code, (restricted) concurrency, reliability, low-level support, cross-language development and IDE (GNAT). Also, Julia is a new language for high performance technical and parallel programming that can be compiled into a library callable from C. It has a JIT too. Maybe changing problem statement from Ruby+Go to Ruby+(more suitable language) will solve the problem?
As of Go 1.5, there's a new build mode that tells the Go compiler to output a shared library and a C header file:
-buildmode c-shared
(This is explained in more detail in this helpful tutorial: http://blog.ralch.com/tutorial/golang-sharing-libraries/)
With the new build mode, you no longer have to write a C glue layer yourself (as previously suggested in earlier responses). Once you have the shared-library and the header file, you can proceed to use FFI to call the Go-created shared library (example here: https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/)

How to detect who's issuing a wrong kfree

I am suspecting a double kfree in my kernel code. Basically, I have a data structure that is kzalloced and kfreed in a module. I notice that the same address is allocated and then allocated again without being freed in the module.
I would like to know what technique should I employ in finding where the wrong kfree is issued.
1.
Yes, kmemleak is an excellent tool, especially suitable for system-wide analysis.
Note that if you are going to use it to analyze a kernel module, you may need to save the addresses of the ELF sections containing the code of the module (.text, .init.text, ...) when the module is loaded. This may help you decipher the call stacks in the kmemleak's report. It usually makes sense to ask kmemleak to produce a report after the module has been unloaded but kmemleak cannot resolve the addresses at that time.
While a module is loaded, the addresses fo its sections can be found in the files in /sys/module/<module_name>/sections/.
After you have found the section each code address in the report belongs to and the corresponding offset into that section, you can use objdump, gdb, addr2line or a similar tool to obtain more detailed information about where the event of interest occurred.
2.
Besides that, if you are working on an x86 system and you would like to analyze a single kernel module, you can also use KEDR LeakCheck tool.
Unlike kmemleak, most of the time, it is not required to rebuild the kernel to be able to use KEDR.
The instructions on how to build and use KEDR are here. A simple example of how LeakCheck can be used is described in "Detecting Memory Leaks" section.
Have you tried enabling the kmemleak detection code?
See Documentation/kmemleak.txt for details.

Are there any good reference implementations available for command line implementations for embedded systems?

I am aware that this is nothing new and has been done several times. But I am looking for some reference implementation (or even just reference design) as a "best practices guide". We have a real-time embedded environment and the idea is to be able to use a "debug shell" in order to invoke some commands. Example: "SomeDevice print reg xyz" will request the SomeDevice sub-system to print the value of the register named xyz.
I have a small set of routines that is essentially made up of 3 functions and a lookup table:
a function that gathers a command line - it's simple; there's no command line history or anything, just the ability to backspace or press escape to discard the whole thing. But if I thought fancier editing capabilities were needed, it wouldn't be too hard to add them here.
a function that parses a line of text argc/argv style (see Parse string into argv/argc for some ideas on this)
a function that takes the first arg on the parsed command line and looks it up in a table of commands & function pointers to determine which function to call for the command, so the command handlers just need to match the prototype:
int command_handler( int argc, char* argv[]);
Then that function is called with the appropriate argc/argv parameters.
Actually, the lookup table also has pointers to basic help text for each command, and if the command is followed by '-?' or '/?' that bit of help text is displayed. Also, if 'help' is used for a command, the command table is dumped (possible only a subset if a parameter is passed to the 'help' command).
Sorry, I can't post the actual source - but it's pretty simple and straight forward to implement, and functional enough for pretty much all the command line handling needs I've had for embedded systems development.
You might bristle at this response, but many years ago we did something like this for a large-scale embedded telecom system using lex/yacc (nowadays I guess it would be flex/bison, this was literally 20 years ago).
Define your grammar, define ranges for parameters, etc... and then let lex/yacc generate the code.
There is a bit of a learning curve, as opposed to rolling a 1-off custom implementation, but then you can extend the grammar, add new commands & parameters, change ranges, etc... extremely quickly.
You could check out libcli. It emulates Cisco's CLI and apparently also includes a telnet server. That might be more than you are looking for, but it might still be useful as a reference.
If your needs are quite basic, a debug menu which accepts simple keystrokes, rather than a command shell, is one way of doing this.
For registers and RAM, you could have a sub-menu which just does a memory dump on demand.
Likewise, to enable or disable individual features, you can control them via keystrokes from the main menu or sub-menus.
One way of implementing this is via a simple state machine. Each screen has a corresponding state which waits for a keystroke, and then changes state and/or updates the screen as required.
vxWorks includes a command shell, that embeds the symbol table and implements a C expression evaluator so that you can call functions, evaluate expressions, and access global symbols at runtime. The expression evaluator supports integer and string constants.
When I worked on a project that migrated from vxWorks to embOS, I implemented the same functionality. Embedding the symbol table required a bit of gymnastics since it does not exist until after linking. I used a post-build step to parse the output of the GNU nm tool for create a symbol table as a separate load module. In an earlier version I did not embed the symbol table at all, but rather created a host-shell program that ran on the development host where the symbol table resided, and communicated with a debug stub on the target that could perform function calls to arbitrary addresses and read/write arbitrary memory. This approach is better suited to memory constrained devices, but you have to be careful that the symbol table you are using and the code on the target are for the same build. Again that was an idea I borrowed from vxWorks, which supports both teh target and host based shell with the same functionality. For the host shell vxWorks checksums the code to ensure the symbol table matches; in my case it was a manual (and error prone) process, which is why I implemented the embedded symbol table.
Although initially I only implemented memory read/write and function call capability I later added an expression evaluator based on the algorithm (but not the code) described here. Then after that I added simple scripting capabilities in the form of if-else, while, and procedure call constructs (using a very simple non-C syntax). So if you wanted new functionality or test, you could either write a new function, or create a script (if performance was not an issue), so the functions were rather like 'built-ins' to the scripting language.
To perform the arbitrary function calls, I used a function pointer typedef that took an arbitrarily large (24) number of arguments, then using the symbol table, you find the function address, cast it to the function pointer type, and pass it the real arguments, plus enough dummy arguments to make up the expected number and thus create a suitable (if wasteful) maintain stack frame.
On other systems I have implemented a Forth threaded interpreter, which is a very simple language to implement, but has a less than user friendly syntax perhaps. You could equally embed an existing solution such as Lua or Ch.
For a small lightweight thing you could use forth. Its easy to get going ( forth kernels are SMALL)
look at figForth, LINa and GnuForth.
Disclaimer: I don't Forth, but openboot and the PCI bus do, and I;ve used them and they work really well.
Alternative UI's
Deploy a web sever on your embedded device instead. Even serial will work with SLIP and the UI can be reasonably complex ( or even serve up a JAR and get really really complex.
If you really need a CLI, then you can point at a link and get a telnet.
One alternative is to use a very simple binary protocol to transfer the data you need, and then make a user interface on the PC, using e.g. Python or whatever is your favourite development tool.
The advantage is that it minimises the code in the embedded device, and shifts as much of it as possible to the PC side. That's good because:
It uses up less embedded code space—much of the code is on the PC instead.
In many cases it's easier to develop a given functionality on the PC, with the PC's greater tools and resources.
It gives you more interface options. You can use just a command line interface if you want. Or, you could go for a GUI, with graphs, data logging, whatever fancy stuff you might want.
It gives you flexibility. Embedded code is harder to upgrade than PC code. You can change and improve your PC-based tool whenever you want, without having to make any changes to the embedded device.
If you want to look at variables—If your PC tool is able to read the ELF file generated by the linker, then it can find out a variable's location from the symbol table. Even better, read the DWARF debug data and know the variable's type as well. Then all you need is a "read-memory" protocol message on the embedded device to get the data, and the PC does the decoding and displaying.

Resources