Attempting to use (SSE4) blendvpd with inline assembly in gcc

Attempting to use (SSE4) blendvpd with inline assembly in gcc - gcc

I would like to let the compiler choose registers automatically by parameter-izing my inline assembly in my C code, but I'm having some trouble. Can anyone tell me what is going wrong? If I use the code that I have commented out (forcing the affiliation with %xmm0), it will compile and get the expected result. But if I leave it commented out as written here, I get the compiler error:
/tmp/ccJxmSbm.s: Assembler messages:
/tmp/ccJxmSbm.s:81: Error: the first operand of `blendvpd' must be `%xmm0'
Also, if I do nothing other than remove the printf statement, the code block compiles successfully too. So it has something to do with moving parameters around to prepare for the printf call. I have explicitly put in the "Yz" constraint which is supposed to force the use of %xmm0, but it looks like it is not being honored.
Here is the code in question:
#include <stdio.h>
const unsigned long long myConst[2] = {0x0000000000000000,0xffffffffffffffff};
const unsigned long long myConst2[2] = {0x0000000000000000,0x1111111111111111};
const unsigned long long myConst3[2] = {0x0123456789abcdef,0x0000000000000000};
#define ASSIGN_CONST128( val, const ) \
val = *((__uint128_t *)const);
int main( void )
{
register __uint128_t regVal1 /* asm("%xmm0") */ ;
register __uint128_t regVal2;
register __uint128_t regVal3;
ASSIGN_CONST128( regVal1, myConst );
ASSIGN_CONST128( regVal2, myConst2 );
ASSIGN_CONST128( regVal3, myConst3 );
asm( "blendvpd %[mask], %[val1], %[val2]" :
[val2] "+x" (regVal3) :
[mask] "Yz" (regVal1),
[val1] "x" (regVal2) );
printf( "REGVAL1: %016llx%016llx (original=%016llx%016llx)\n"
"REGVAL2: %016llx%016llx (original=%016llx%016llx)\n"
"REGVAL3: %016llx%016llx (original=%016llx%016llx)\n",
(unsigned long long)(regVal1>>64), (unsigned long long)regVal1,
myConst[1], myConst[0],
(unsigned long long)(regVal2>>64), (unsigned long long)regVal2,
myConst2[1], myConst2[0],
(unsigned long long)(regVal3>>64), (unsigned long long)regVal3,
myConst3[1], myConst3[0] );
// Expected result:
// REGVAL1: ffffffffffffffff0000000000000000 (original=ffffffffffffffff0000000000000000)
// REGVAL2: 11111111111111110000000000000000 (original=11111111111111110000000000000000)
// REGVAL3: 11111111111111110123456789abcdef (original=00000000000000000123456789abcdef)
}
I appreciate any thoughts.

Why not just use the relevant intrinsic?
regVal3 = _mm_blendv_pd (regVal1, regVal2, regVal3);
As others have noted, regVal1, regVal2 and regVal3 should all be declared as __m128d.

Related

compiler segfault when printf is added (gcc 10.2 aarch64_none-elf- from arm)

I know this is not adequate for stack overflow question, but ..
This is a function in scripts/dtc/libfdt/fdt_ro.c of u-boot v2021.10.
const void *fdt_getprop_namelen(const void *fdt, int nodeoffset,
const char *name, int namelen, int *lenp)
{
int poffset;
const struct fdt_property *prop;
printf("uuu0 nodeoffset = 0x%x, name = %s, namelen = %d\n", nodeoffset, name, namelen);
prop = fdt_get_property_namelen_(fdt, nodeoffset, name, namelen, lenp,
&poffset);
//printf("uuu1 prop = 0x%lx, *lenp = 0x%x, poffset = 0x%x\n", prop, *lenp, poffset);
if (!prop)
return NULL;
/* Handle realignment */
if (fdt_chk_version() && fdt_version(fdt) < 0x10 &&
(poffset + sizeof(*prop)) % 8 && fdt32_to_cpu(prop->len) >= 8)
return prop->data + 4;
return prop->data;
}
When I build the program, if I uncomment the second printf, the compiler seg-faults.
I have no idea. Is it purely compiler problem(I think so it should never die at least)? or can it be linked to my fault somewhere in another code? Is there any method to know the cause of the segfault? (probably not.).

If you're getting a segmentation fault when running the compiler itself, the compiler has a bug. There are some errors in your code, but those should result in compile-time diagnostics (warnings or error messages), never a compile-time crash.
The code in your question is incomplete (missing declarations for fdt_get_property_namelen_, printf, NULL, etc.). Reproduce the problem with a complete self-contained source file and submit a bug report: https://gcc.gnu.org/bugzilla/

printf("uuu1 prop = 0x%lx, *lenp = 0x%x, poffset = 0x%x\n", prop, *lenp, poffset);
prop is a pointer, so I'd use %p instead of %lx
lenp is a pointer, so I'd make sure that it points to valid memory

How to dump/list all kernel symbols with addresses from Linux kernel module?

In a kernel module, how to list all the kernel symbols with their addresses?
The kernel should not be re-compiled.
I know "cat /proc/kallsyms" in an interface, but how to get them directly from kernel data structures, using functions like kallsyms_lookup_name.

Example
Working module code:
#include <linux/module.h>
#include <linux/kallsyms.h>
static int prsyms_print_symbol(void *data, const char *namebuf,
struct module *module, unsigned long address)
{
pr_info("### %lx\t%s\n", address, namebuf);
return 0;
}
static int __init prsyms_init(void)
{
kallsyms_on_each_symbol(prsyms_print_symbol, NULL);
return 0;
}
static void __exit prsyms_exit(void)
{
}
module_init(prsyms_init);
module_exit(prsyms_exit);
MODULE_AUTHOR("Sam Protsenko");
MODULE_DESCRIPTION("Module for printing all kernel symbols");
MODULE_LICENSE("GPL");
Explanation
kernel/kallsyms.c implements /proc/kallsyms. Some of its functions are available for external usage. They are exported via EXPORT_SYMBOL_GPL() macro. Yes, your module should have GPL license to use it. Those functions are:
kallsyms_lookup_name()
kallsyms_on_each_symbol()
sprint_symbol()
sprint_symbol_no_offset()
To use those functions, include <linux/kallsyms.h> in your module. It should be mentioned that CONFIG_KALLSYMS must be enabled (=y) in your kernel configuration.
To print all the symbols you obviously have to use kallsyms_on_each_symbol() function. The documentation says next about it:
/* Call a function on each kallsyms symbol in the core kernel */
int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
unsigned long), void *data);
where fn is your callback function that should be called for each symbol found, and data is a pointer to some private data of yours (will be passed as first parameter to your callback function).
Callback function must have next signature:
int fn(void *data, const char *namebuf, struct module *module,
unsigned long address);
This function will be called for each kernel symbol with next parameters:
data: will contain pointer to your private data you passed as last argument to kallsyms_on_each_symbol()
namebuf: will contain name of current kernel symbol
module: will always be NULL, just ignore that
address: will contain address of current kernel symbol
Return value should always be 0 (on non-zero return value the iteration through symbols will be interrupted).
Supplemental
Answering the questions in your comment.
Also, is there a way to output the size of each function?
Yes, you can use sprint_symbol() function I mentioned above to do that. It will print symbol information in next format:
symbol_name+offset/size [module_name]
Example:
psmouse_poll+0x0/0x30 [psmouse]
Module name part can be omitted if symbol is built-in.
I tried the module and see the result with "dmesg". But a lot of symbols are missing such as "futex_requeue". The output symbol number is about 10K, while it is 100K when I use "nm vmlinux".
This is most likely because your printk buffer size is insufficient to store all the output of module above.
Let's improve above module a bit, so it provides symbols information via miscdevice. Also let's add function size to the output, as requested. The code as follows:
#include <linux/device.h>
#include <linux/fs.h>
#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/miscdevice.h>
#include <linux/sizes.h>
#include <linux/uaccess.h>
#include <linux/vmalloc.h>
#define DEVICE_NAME "prsyms2"
/* 16 MiB is sufficient to store information about approx. 200K symbols */
#define SYMBOLS_BUF_SIZE SZ_16M
struct symbols {
char *buf;
size_t pos;
};
static struct symbols symbols;
/* ---- misc char device definitions ---- */
static ssize_t prsyms2_read(struct file *file, char __user *buf, size_t count,
loff_t *pos)
{
return simple_read_from_buffer(buf, count, pos, symbols.buf,
symbols.pos);
}
static const struct file_operations prsyms2_fops = {
.owner = THIS_MODULE,
.read = prsyms2_read,
};
static struct miscdevice prsyms2_misc = {
.minor = MISC_DYNAMIC_MINOR,
.name = DEVICE_NAME,
.fops = &prsyms2_fops,
};
/* ---- module init/exit definitions ---- */
static int prsyms2_store_symbol(void *data, const char *namebuf,
struct module *module, unsigned long address)
{
struct symbols *s = data;
int count;
/* Append address of current symbol */
count = sprintf(s->buf + s->pos, "%lx\t", address);
s->pos += count;
/* Append name, offset, size and module name of current symbol */
count = sprint_symbol(s->buf + s->pos, address);
s->pos += count;
s->buf[s->pos++] = '\n';
if (s->pos >= SYMBOLS_BUF_SIZE)
return -ENOMEM;
return 0;
}
static int __init prsyms2_init(void)
{
int ret;
ret = misc_register(&prsyms2_misc);
if (ret)
return ret;
symbols.pos = 0;
symbols.buf = vmalloc(SYMBOLS_BUF_SIZE);
if (symbols.buf == NULL) {
ret = -ENOMEM;
goto err1;
}
dev_info(prsyms2_misc.this_device, "Populating symbols buffer...\n");
ret = kallsyms_on_each_symbol(prsyms2_store_symbol, &symbols);
if (ret != 0) {
ret = -EINVAL;
goto err2;
}
symbols.buf[symbols.pos] = '\0';
dev_info(prsyms2_misc.this_device, "Symbols buffer is ready!\n");
return 0;
err2:
vfree(symbols.buf);
err1:
misc_deregister(&prsyms2_misc);
return ret;
}
static void __exit prsyms2_exit(void)
{
vfree(symbols.buf);
misc_deregister(&prsyms2_misc);
}
module_init(prsyms2_init);
module_exit(prsyms2_exit);
MODULE_AUTHOR("Sam Protsenko");
MODULE_DESCRIPTION("Module for printing all kernel symbols");
MODULE_LICENSE("GPL");
And here is how to use it:
$ sudo insmod prsyms2.ko
$ sudo cat /dev/prsyms2 >symbols.txt
$ wc -l symbols.txt
$ sudo rmmod prsyms2
File symbols.txt will contain all kernel symbols (both built-in and from loaded modules) in next format:
ffffffffc01dc0d0 psmouse_poll+0x0/0x30 [psmouse]
It seems that I can use kallsyms_lookup_name() to find the address of the function, can then use a function pointer to call the function?
Yes, you can. If I recall correctly, it's called reflection. Below is an example how to do so:
typedef int (*custom_print)(const char *fmt, ...);
custom_print my_print;
my_print = (custom_print)kallsyms_lookup_name("printk");
if (my_print == 0) {
pr_err("Unable to find printk\n");
return -EINVAL;
}
my_print(KERN_INFO "### printk found!\n");

text mode cursor doesn't appear in qemu vga emulator

I have a problem with the function that updates cursor position in text mode
the function definition and declaration are
#include <sys/io.h>
signed int VGAx = 0,VGAy=0;
void setcursor()
{
uint16_t position = VGAx+VGAy*COLS;
outb(0x0f, 0x03d4);
outb((position<<8)>>8,0x03d5);
outb(0x0e,0x03d4);
outb(position>>8,0x03d5);
}
and the file sys/io.h
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm ("inb %0, %%al":"=rm"(value):"a"(port));
return value;
}
static inline void outb(unsigned char value, unsigned short int port)
{
asm volatile ("outb %%al, $0"::"rm"(value), "a"(port));
}
before using the function the cursor sometimes was blinking underscore and sometimes didn't appear while after using the function no cursor appeared
here is the main function that runs
#include <vga/vga.h>
int kmain(){
setcursor()
setbgcolor(BLACK);
clc();
setforecolor(BLUE);
terminal_write('h');
setcursor();
return 0;
}
I tried using this function
void enable_cursor() {
outb(0x3D4, 0x0A);
char curstart = inb(0x3D5) & 0x1F; // get cursor scanline start
outb(0x3D4, 0x0A);
outb(0x3D5, curstart | 0x20); // set enable bit
}
which is provided here but I got this error
inline asm: operand type mismatch for 'in'
any help is appreciated
EDIT
I tried to fix the wrong inb and outb:
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm volatile("inb %1, %0" : "=a"(value) : "Nd"(port));
return value;
}
static inline void outb(unsigned char value, unsigned short int port)
{
asm volatile ("outb %%al, $0"::"Nd"(value), "a"(port));
}
I guess this is the right definition but still no cursor appeard
EDIT 2
I followed the given answer and defined the io.h file as the following
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm volatile ("inb %1, %0" : "=a"(value) : "Nd"(port));
return value;
}
static inline void outb(unsigned char value, unsigned short int port)
{
asm volatile ("outb %0, %1"::"a"(value), "Nd"(port));
}
I would like to mention that I also addedenable_cursor(); to the beginning of kmain now the compile time error is fixed but no cursor appeared (which is the main problem)
EDIT 3
I would like to point out that a version of the whole code is availabe on gihub if any one want access to pieces of code that are no available in the question

inb and outb Function Bugs
This code for inb is incorrect:
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm ("inb %0, %%al":"=rm"(value):"a"(port));
return value;
}
A few problems with it:
It seems you have the parameters to inb reversed. See the instruction set reference for inb. Remember that in AT&T syntax (that you are using in your GNU Assembler code) the operands are reversed. The instruction set reference shows them in Intel format.
The port number is either specified as an immediate 8 bit value or passed in the DX register. The proper constraint for specifying the DX register or an immediate 8 bit value for inb/outb is Nd. See my Stackoverflow answer here for an explanation of the constraint Nd.
The destination that the value read is returned in is either AL/AX/EAX so a constraint =rm on the output that says an available register or memory address is incorrect. It should be =a in your case.
Your code should be something like:
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm volatile ("inb %1, %0" : "=a"(value) : "Nd"(port));
return value;
}
Your assembler template for outb is incorrect:
static inline void outb(unsigned char value, unsigned short int port)
{
asm volatile ("outb %%al, $0"::"rm"(value), "a"(port));
}
A couple problems with it:
The port number is either specified as an immediate 8 bit value or passed in the DX register. The proper constraint for specifying the DX register or an immediate 8 bit value for inb/outb is Nd. See my Stackoverflow answer here for an explanation of the constraint Nd.
The value to output on the port has to be specified in AL/AX/EAX so a constraint rm on the value that says an available register or memory address is incorrect. It should be a in your case. See the instruction set reference for outb
The code should probably look something like:
static inline void outb(unsigned char value, unsigned short int port)
{
asm volatile ("outb %0, %1"::"a"(value), "Nd"(port));
}
Enabling and Disabling the Cursor
I had to look up the VGA registers about the cursor and found this document on the cursor start register which says:
Cursor Start Register (Index 0Ah)
-------------------------------------------------
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
-------------------------------------------------
| | | CD | Cursor Scan Line Start |
-------------------------------------------------
CD -- Cursor Disable
This field controls whether or not the text-mode cursor is displayed. Values are:
0 -- Cursor Enabled
1 -- Cursor Disabled
Cursor Scan Line Start
An important thing is that the cursor is disabled when the bit 5 is set. In your github setcursor function you do this:
outb(curstart | 0x20, 0x3D5);
curstart | 0x20 sets bit 5 (0x20 = 0b00100000). If you want to clear bit 5 and enable the cursor, then you bitwise NEGATE(~) the bitmask and bitwise AND (&) that with curstart. It should look like this:
outb(curstart & ~0x20, 0x3D5);
VGA Function Bugs
Once you have the cursor properly enabled it will render the cursor in the foreground color (attribute) for the particular video location it is currently over. One thing I noticed is that your clc routine does this:
vga_deref_80x24(VGAx,VGAy) = \
vga_encode_80x24(' ',BgColor,BgColor);
The thing to observe is that you set the attribute for the foreground and background colors to BgColor . If you set the bgcolor to black before calling clc it will flash a black underline cursor on a black background rendering it invisible on any screen location. For the cursor to be visible it must be on a screen location where the foreground and background are different colors. One way to see if this works is to change the code to:
vga_deref_80x24(VGAx,VGAy) = \
vga_encode_80x24(' ',BgColor,ForeColor);
I think it is a bug that you are clearing it with encoding vga_encode_80x24(' ',BgColor,BgColor); I think you mean to use vga_encode_80x24(' ',BgColor,ForeColor);
Now in your kmain function you need to set a ForeColor and BgColor before calling clc and they both must be different color to make the cursor visible. You have this code:
setbgcolor(BLACK);
clc();
setforecolor(BLUE);
It should now be:
setbgcolor(BLACK);
setforecolor(BLUE);
clc();
Now if the cursor is rendered anywhere on an unwritten location on the screen it will flash BLUE underline on BLACK background.
This should solve your cursor problem. However, I noticed that you also use encode vga_encode_80x24(' ',BgColor,BgColor); in your VGA scrolldown and terminal_control functions. I think this is a bug as well, and I think you should use encode vga_encode_80x24(' ',BgColor,ForeColor); instead. You do seem to set it properly in terminal_write.
If you want to change the color of the cursor at any location you could write a function that changes the foreground attribute under the cursor location without changing the background color. Make sure the two attributes (Foreground and background color) are different for the cursor to be visible. If you wish to hide the cursor you can set foreground and background color the same color for the screen location the cursor is currently at.

The problem is in your outb code. Also be aware of order port and value parameters.
Following works for me:
static inline unsigned char inb (unsigned short int port)
{
unsigned char value;
asm volatile ("inb %1, %0" : "=a"(value) : "Nd"(port));
return value;
}
static inline void outb (unsigned short int port, unsigned char value)
{
asm volatile ("outb %b0,%w1": :"a" (value), "Nd" (port));
}
void update_cursor(int x, int y)
{
uint16_t pos = y * 80 + x;
outb(0x3D4, 0x0F);
outb(0x3D5, (uint8_t) (pos & 0xFF));
outb(0x3D4, 0x0E);
outb(0x3D5, (uint8_t) ((pos >> 8) & 0xFF));
}

CUDA constant memory issue: invalid device symbol with cudaGetSymbolAddress

I am trying to set constant values on my GPU's constant memory before launching a kernel which needs these values.
My code (simplified):
__constant__ size_t con_N;
int main()
{
size_t N;
size_t* dev_N = NULL;
cudaError_t cudaStatus;
//[...]
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, &con_N);
if (cudaStatus != cudaSuccess) {
cout<<"cudaGetSymbolAddress (dev_N) failed: "<<cudaGetErrorString(cudaStatus)<<endl;
}
I planned to cudaMemcpy my N to dev_N afterwards.
However, all I get at this point in the code is:
cudaGetSymbolAddress (dev_N) failed: invalid device symbol
I'm working with CUDA 6.5 so it's not a quoted symbol issue, as it is in most of the Q&A I've been checking so far.
I tried to replace con_N with con_N[1] (and remove the & before con_N in cudaGetSymbolAddress parameters): same result.
As the prototype of this function is cudaGetSymbolAddress(void **devPtr , const void* symbol ), I guessed it wanted to be given my symbol's address. However, I tried with cudaStatus = cudaGetSymbolAddress((void **)&dev_N, (const void*) con_N); and I got the same message.
I'm also getting the very same error message when I remove cudaGetSymbolAddress((void **)&dev_N, &con_N) and go directly with cudaMemcpyToSymbol(&con_N, &N, sizeof(size_t)) instead.
I'm afraid I missed something essential. Any help will be greatly appreciated.

The correct usage of cudaGetSymbolAddress is
cudaGetSymbolAddress((void **)&dev_N, con_N)
I'm showing this with the simple example below.
As the documentation explains, the symbol should physically reside on the device. Accordingly, using &con_N in the API call appears to be meaningless, since, being cudaGetSymbolAddress a host API, accessing the address of something residing on the device directly from host should not be possible. I'm not sure if the prototype appearing in the CUDA Runtime API document should better read as `
template<class T>
cudaError_t cudaGetSymbolAddress (void **devPtr, const T symbol)
with device symbol reference instead of device symbol address.
#include <stdio.h>
__constant__ int const_symbol;
/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
/***************/
/* TEST KERNEL */
/***************/
__global__ void kernel() {
printf("Address of symbol from device = %p\n", &const_symbol);
}
/********/
/* MAIN */
/********/
int main()
{
const int N = 16;
int *pointer = NULL;
gpuErrchk(cudaGetSymbolAddress((void**)&pointer, const_symbol));
kernel<<<1,1>>>();
printf("Address of symbol from host = %p\n", pointer);
return 0;
}

In my opinion, A line of your code should be fixed like below.
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, con_N);
Hope this helps you.

Dereferencing void* warnings on Xcode

I'm aware of this SO question and this SO question. The element
of novelty in this one is in its focus on Xcode, and in its use of
square brackets to dereference a pointer to void.
The following program compiles with no warning in Xcode 4.5.2, compiles
with a warning on GCC 4.2 and, even though I don't have Visual Studio
right now, I remember that it would consider this a compiler
error, and MSDN and Internet agree.
#include <stdio.h>
int main(int argc, const char * argv[])
{
int x = 24;
void *xPtr = &x;
int *xPtr2 = (int *)&xPtr[1];
printf("%p %p\n", xPtr, xPtr2);
}
If I change the third line of the body of main to:
int *xPtr2 = (int *)(xPtr + 1);
It compiles with no warnings on both GCC and Xcode.
I would like to know how can I turn this silence into warnings or errors, on
GDB and especially Xcode/LLVM, including the fact that function main is int but
does not explicitly return any value (By the way I think -Wall does
the trick on GDB).

that isnt wrong at all...
the compiler doesnt know how big the pointer is ... a void[] ~~ void*
thats why char* used as strings need to be \0-terminated
you cannot turn on a warning for that as it isnt possible to determine a 'size of memory pointer to by a pointer' at compile time
void *v = nil;
*v[1] = 0 //invalid
void *v = malloc(sizeof(int)*2);
*v[1] = 0 //valid
*note typed inline on SO -- sorry for any non-working code

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Attempting to use (SSE4) blendvpd with inline assembly in gcc - gcc

Why not just use the relevant intrinsic? regVal3 = _mm_blendv_pd (regVal1, regVal2, regVal3); As others have noted, regVal1, regVal2 and regVal3 should all be declared as __m128d.

Related

compiler segfault when printf is added (gcc 10.2 aarch64_none-elf- from arm)

How to dump/list all kernel symbols with addresses from Linux kernel module?

text mode cursor doesn't appear in qemu vga emulator

CUDA constant memory issue: invalid device symbol with cudaGetSymbolAddress

Dereferencing void* warnings on Xcode

Categories

Resources