Linux kernel on_each_cpu run on only one cpu (cpu0)

Linux kernel on_each_cpu run on only one cpu (cpu0) - linux-kernel

I have this:
static void per_cpu_print(void *info)
{
printk("cccpuID = %d\n", smp_processor_id());
}
static void test(void)
{
on_each_cpu(per_cpu_print, NULL, 1);
}
test runs before rest_init in the kernel boot sequence.
I only see one print: cccpuID = 0
Im running ubuntu 16 with the 4.15 kernel inside VMware Workstation and I gave it 4 cores, haven changed anything in the kernel regarding smp.

Related

Device number linux module issue

I'm developing character devices for a Linux kernel distro Poky 1.7 built with Yocto and installed on a Freescale imx6sx target board. They're simple modules, that are just used to read/write IO registers of the SoC.
Everything has worked well till I've load the modules after login, by simple calling in the target console
modprobe mydevice
Problems have begun since I've automatically load this modules at boot time. First time system startup after flashing my board, my custom modules are loaded and I can open them, do read/write operations...therefore they're working.
Restarting the system (switching it off and on or rebooting), modules are still loaded, but I'm unable to use them.
After some trials, what I've seen is that the problem is dynamic allocation of the device number they use, in particular function alloc_chrdev_region( ) called in the __init( ) function of the module.
In fact, really first time the system is switched on after has been flashed, the device file's device number in /dev is up-to-date with the /sys/class/myclass/mydevice/dev device number, while from the first reboot they doesn't match anymore. Briefly:
First boot:
/dev/mydevice device number [Major, Minor] = 247,0
/sys/class/myclass/mydevice/dev device number [Major, Minor] = 247,0
Second and later boots:
/dev/mydevice device number [Major, Minor] = 247,0
/sys/class/myclass/mydevice/dev device number [Major, Minor] = 248,0
So, it seems like the device file is kept in memory instead of been updated at every boot.
Here is main code for __init( ) and __exit( ) functions:
static struct class *g_deviceClass;
static struct cdev g_characterDevice;
static dev_t g_deviceNumber;
static int __init npe_power_drv_init(void)
{
int32_t result;
if ( alloc_chrdev_region(&g_deviceNumber, 0, 1, DEVICE_NAME) < 0 )
{
return -EAGAIN;
}
cdev_init(&g_characterDevice, &npe_power_drv_ops);
g_deviceClass = class_create(THIS_MODULE, CLASS_NAME);
if( device_create(g_deviceClass, NULL, g_deviceNumber, NULL, DEVICE_NAME) == NULL )
{
class_destroy(g_deviceClass);
unregister_chrdev_region(g_deviceNumber, 1);
return -EAGAIN;
}
result = cdev_add(&g_characterDevice, g_deviceNumber, 1);
if( result < 0 )
{
device_destroy(g_deviceClass, g_deviceNumber);
class_destroy(g_deviceClass);
unregister_chrdev_region(g_deviceNumber, 1);
return -EAGAIN;
}
return 0;
}
static void __exit npe_power_drv_exit(void)
{
device_destroy(g_deviceClass, g_deviceNumber);
class_destroy(g_deviceClass);
cdev_del(&g_characterDevice);
unregister_chrdev_region(g_deviceNumber, 1);
}
module_init(npe_power_drv_init);
module_exit(npe_power_drv_exit);
Using static device number allocation, register_chrdev_region( ) everything works fine. Sure I can use this latter way to implement my code, but I would like to know why I've got this behaviour.
Many thanks for any help
Andrea

java.lang.UnsatisfiedLinkError: Can't load IA 32-bit .dll

Hi I am getting this error:
Error loading win32com: java.lang.UnsatisfiedLinkError: C:\Program
Files\Java\jre1.8.0_60\bin\win32com.dll: Can't load IA 32-bit .dll on
a AMD 64-bit platform
while running this program:
public static void main(String args[]) {
Enumeration ports = CommPortIdentifier.getPortIdentifiers();
System.out.println(ports.hasMoreElements());
while (ports.hasMoreElements()) {
CommPortIdentifier port = (CommPortIdentifier) ports.nextElement();
String type;
switch (port.getPortType()) {
case CommPortIdentifier.PORT_PARALLEL:
type = "Parallel";
break;
case CommPortIdentifier.PORT_SERIAL:
type = "Serial";
break;
default: /// Shouldn't happen
type = "Unknown";
break;
}
System.out.println(port.getName() + ": " + type);
}
// System.out.println(port.getName());
}
I am using java 1.8
Thanks in advance

You are probably running a 64 bit JVM.
First, check your JVM by entering this on cmd line:
java -version
If it says "64 bit", you're running 64 bit JVM incapable of loading 32 bit .dll's.
You can use a 32 bit JVM instead.

FreeBSD newbus driver loading succesfully but cant create /dev/** file and debugging

I am installing a new newbuf driver on FreeBSD 10.0 . After compiling with make the driver.ko file has been created and than kldload can load successfully. kldload returns 0 and I can see the device at the kldstat output. When attempt to use the driver opening the /dev/** file, the file is not exist.
I think that this /dev/** file should be created by make_dev function which is located in device_attach member method. To test if the kldload reaches this attaching function; when write printf and uprintf to debug the driver, I can not see any output at console nor dmesg output.
But the problem is after writing printf at beginnings (after local variable definitions) of device_identify and device_probe functions, I can't see any output again at console nor dmesg.
My question is that even if the physical driver has problem (not located etc.), should I see the ouput of printf at the device_identify member function which is called by kldload at starting course (I think)?
Do I have a mistake when debugging newbuf driver with printf (I also tried a hello_world device driver and at this driver I can take output of printf at dmesg)?
Mainly how can I test/debug this driver's kldload processes?
Below some parts of my driver code (I think at least I should see MSG1, but I can not see):
struct mydrv_softc
{
device_t dev;
};
static devclass_t mydrv_devclass;
static struct cdevsw mydrv_cdevsw = {
.d_version = D_VERSION,
.d_name = "mydrv",
.d_flags = D_NEEDGIANT,
.d_open = mydrv_open,
.d_close = mydrv_close,
.d_ioctl = mydrv_ioctl,
.d_write = mydrv_write,
.d_read = mydrv_read
};
static void mydrv_identify (driver_t *driver, device_t parent) {
devclass_t dc;
device_t child;
printf("MSG1: The process inside the identfy function.");
dc = devclass_find("mydrv");
if (devclass_get_device(dc, 0) == NULL) {
child = BUS_ADD_CHILD(parent, 0, "mydrv", -1);
}
}
static int mydrv_probe(device_t dev) {
printf("MSG2: The process inside the probe function.");
mydrv_init();
if (device_get_unit(dev) != 0)
return (ENXIO);
device_set_desc(dev, "FreeBSD Device Driver");
return (0);
}
static int mydrv_attach(device_t dev) {
struct mydrv_softc *sc;
device_printf(dev, "MSG3: The process will make attachment.");
sc = (struct mydrv_softc *) device_get_softc(dev);
sc->dev = (device_t)make_dev(&mydrv_cdevsw, 0, UID_ROOT, GID_WHEEL, 0644, "mydrv_drv");
return 0;
}
static int mydrv_detach(device_t dev) {
struct mydrv_softc *sc;
sc = (struct mydrv_softc *) device_get_softc(dev);
destroy_dev((struct cdev*)(sc->dev));
bus_generic_detach(dev);
return 0;
}
static device_method_t mydrv_methods[] = {
DEVMETHOD(device_identify, mydrv_identify),
DEVMETHOD(device_probe, mydrv_probe),
DEVMETHOD(device_attach, mydrv_attach),
DEVMETHOD(device_detach, mydrv_detach),
{ 0, 0 }
};
static driver_t mydrv_driver = {
"mydrv",
mydrv_methods,
sizeof(struct mydrv_softc),
};
DRIVER_MODULE(mydrv, ppbus, mydrv_driver, mydrv_devclass, 0, 0);

If you don't see your printf's output on your console then your device functions will probably not be called. Can you show us your module's code?
Have you used DRIVER_MODULE() or DEV_MODULE()?
What parent bus are you using?

I guess printf works fine, but I prefer to use device_printf as it also prints the device name, and will be easier when looking through logs or dmesg output. Also leave multiple debug prints and check the log files on your system. Most logs for the device drivers are logged in /var/log/messages. But check other log files too.
Are you running your code on a virtual machine? Some device drivers don't show up their device files in /dev if the OS is running on a virtual machine. You should probably run your OS on actual hardware for the device file to show up.
As far as I know, you can't see the output in dmesg if you cannot find the corresponding device file in /dev but you may have luck with logs as I mentioned.
The easiest way to debug is of course using the printf statements. Other than this, you can debug the kernel using gdb running on another system. I am not familiar with the exact process but I know you can do this. Google it.

Interrupt performance on linux kernel with RT patches - should be better?

I have bumped into a bit inconsistent IRQ/ISR performance on Freescales imx.233 running linux kernel (3.8.13) with CONFIG_PREEMPT_RT patches.
I am little bit surprised why this processor (ARM9, 454mhz) is unable to keep up even with 74kHz IRQ requests.. ?
In my kernel config I have set following flags:
CONFIG_TINY_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y
On the system there is basically nothing running (created by buildroot), and I set PWM to generate a pulse of 74kHz, that serves as interrupt.
Then in the ISR, I just trigger another GPIO output pin, and check the output.
What I find is that sometimes I miss an interrupt -
You can see the missed interrupt here:
And also the the triggering of output pin seems to be a bit inconsistent, the output pin is triggered usually within "5% window", that might still be acceptable. But I worry, that when I start implementing data transfer logic, instead of just triggering the pin, I might run into further problems...
My simple driver code looks like this:
#needed includes
uint16_t INPUT_IRQ = 39;
uint16_t OUTPUT_GPIO = 38;
struct test_device *device;
//Prototypes
void irqtest_exit(void);
int irqtest_init(void);
void free_device(void);
//Default functions
module_init(irqtest_init);
module_exit(irqtest_exit);
//triggering flag
uint16_t pulse = 0x1;
irqreturn_t irq_handle_function(int irq, void *device_id)
{
pulse = !pulse;
gpio_set_value(OUTPUT_GPIO, pulse);
return IRQ_HANDLED;
}
struct test_device {
int huuhaa;
};
void free_device() {
if (device)
kfree(device);
}
int irqtest_init(void) {
int result = 0;
device = kmalloc(sizeof *device, GFP_KERNEL);
device->huuhaa = 10;
printk("IRB/irqtest_init: Inserting IRQ module\n");
printk("IRB/irqtest_init: Requesting GPIO (%d)\n", INPUT_IRQ);
result = gpio_request_one(INPUT_IRQ, GPIOF_IN, "PWM input");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as input.. exiting\n", INPUT_IRQ);
return -EINVAL;
}
result = gpio_request_one(OUTPUT_GPIO, GPIOF_OUT_INIT_LOW , "IR OUTPUT");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as output.. exiting\n", OUTPUT_GPIO);
return -EINVAL;
}
//Set our desired interrupt line as input
result = gpio_direction_input(INPUT_IRQ);
if (result != 0) {
printk("IRB/irqtest_init: Failed to set IRQ as input.. exiting\n");
free_device();
return -EINVAL;
}
//Set flags for our interrupt, guessing here..
irq_flags |= IRQF_NO_THREAD;
irq_flags |= IRQF_NOBALANCING;
irq_flags |= IRQF_TRIGGER_RISING;
irq_flags |= IRQF_NO_SOFTIRQ_CALL;
//register interrupt
result = request_irq(gpio_to_irq(INPUT_IRQ), irq_handle_function, irq_flags, "irq testing", device);
if (result != 0) {
printk("IRB/irqtest_init: Failed to reserve GPIO 38\n");
return -EINVAL;
}
printk("IRB/irqtest_init: insert success\n");
return 0;
}
void irqtest_exit(void) {
if (device)
kfree(device);
gpio_free(INPUT_IRQ);
gpio_free(OUTPUT_GPIO);
printk("IRB/irqtest_exit: Removing irqtest module\n");
}
int irqtest_open(struct inode *inode, struct file *filp) {return 0;}
int irqtest_release(struct inode *inode, struct file *filp) {return 0;}
In the system, I have following interrupts registered, after the driver is loaded:
# cat /proc/interrupts
CPU0
16: 36379 - MXS Timer Tick
17: 0 - mxs-spi
18: 2103 - mxs-dma
60: 0 gpio-mxs irq testing
118: 0 - mxs-spi
119: 0 - mxs-dma
120: 0 - RTC alarm
124: 0 - 8006c000.serial
127: 68050 - uart-pl011
128: 151 - ci13xxx_imx
Err: 0
I wonder if the flags I declare to my IRQ are good ? I noticed that with this configuration, I can no longer reach console, so kernel seems totally consumed with servicing this 74kHz trigger now.. this can't be right ?
I suppose it's not a big deal for me since this is only during data transfer, but still I feel I'm doing something wrong..
Also, I wonder if it would be more efficient to map the registers with ioremap, and trigger the output with direct memory writes ?
Is there some way I could increase the priority of the interrupt even higher ? Or could I somehow lock the kernel for the duration of the data transfer (~400ms), and generate somehow else my timing for the output ?
Edit: Forgot to add /proc/interrupts output to the question...

What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).
This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.
Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.
Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.
From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).
The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).
You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.

You can have a look at what my colleague Maxime Ripard did using an FIQ on a similar SoC (i.mx28) :
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/

Try this flags:
int irq_flags;
...
irq_flags = IRQF_TRIGGER_RISING | IRQF_EARLY_RESUME
I had a kernel 3.8.11 and can't find IRQF_NO_SOFTIRQ_CALL define. It's only for 3.8.13?
Also I didn't see irq_flags define. Where is it?

How to make a fast context switch from one process to another?

I need to run unsafe native code on a sandbox process and I need to reduce bottleneck of process switch. Both processes (controller and sandbox) shares two auto-reset events and a coherent view of a mapped file (shared memory) that is used for communication.
To make this article smaller, I removed initializations from sample code, but the events are created by the controller, duplicated using DuplicateHandle, and then sent to sandbox process prior to work.
Controller source:
void inSandbox(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
int before = *shared;
for (int i = 0; i < 100000; ++i) {
// Notify sandbox of a new request and wait for answer.
SignalObjectAndWait(hNewRequest, hAnswer, INFINITE, FALSE);
}
assert(*shared == before + 100000);
}
void inProcess(volatile int *shared) {
int before = *shared;
for (int i = 0; i < 100000; ++i) {
newRequest(shared);
}
assert(*shared == before + 100000);
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
Sandbox source:
void sandboxLoop(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
// Wait for the first request from controller.
assert(WaitForSingleObject(hNewRequest, INFINITE) == WAIT_OBJECT_0);
for(;;) {
// Perform request.
newRequest(shared);
// Notify controller and wait for next request.
SignalObjectAndWait(hAnswer, hNewRequest, INFINITE, FALSE);
}
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
Measurements:
inSandbox() - 550ms, ~350k context switches, 42% CPU (25% kernel, 17% user).
inProcess() - 20ms, ~2k context switches, 55% CPU (2% kernel, 53% user).
The machine is Windows 7 Pro, Core 2 Duo P9700 with 8gb of memory.
An interesting fact is that sandbox solution uses 42% of CPU vs 55% of in-process solution. Another noteworthy fact is that sandbox solution contains 350k context switches, which is much more than the 200k context switches that we can infer from source code.
I need to know if there's a way to reduce the overhead of transfer control to another process. I already tried to use pipes instead of events, and it was much worse. I also tried to use no event at all, by making the sandbox call SuspendThread(GetCurrentThread()) and making the controller call ResumeThread(hSandboxThread) on every request, but the performance was similar to using events.
If you have a solution that uses assembly (like performing a manual context switch) or Windows Driver Kit, please let me know as well. I don't mind having to install a driver to make this faster.
I heard that Google Native Client does something similar, but I only found this documentation. If you have more information, please let me know.

The first thing to try is raising the priority of the waiting thread. This should reduce the number of extraneous context switches.
Alternatively, since you're on a 2-core system, using spinlocks instead of events would make your code much much faster, at the cost of system performance and power consumption:
void inSandbox(volatile int *lock, volatile int *shared)
{
int i, before = *shared;
for (i = 0; i < 100000; ++i) {
*lock = 1;
while (*lock != 0) { }
}
assert(*shared == before + 100000);
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
void sandboxLoop(volatile int *lock, volatile int * shared)
{
for(;;) {
while (*lock != 1) { }
newRequest(shared);
*lock = 0;
}
}
In this scenario, you should probably set thread affinity masks and/or lower the priority of the spinning thread so that it doesn't compete with the busy thread for CPU time.
Ideally, you'd use a hybrid approach. When one side is going to be busy for a while, let the other side wait on an event so that other processes can get some CPU time. You could trigger the event a little ahead of time (using the spinlock to retain synchronization) so that the other thread will be ready when you are.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio