The book ldd says for the function blk_queue_segment_boundary() like this:
Some devices cannot handle requests that cross a particular size
memory bound- ary; if your device is one of those, use this function
to tell the kernel about that boundary. For example, if your device
has trouble with requests that cross a 4- MB boundary, pass in a mask
of 0x3fffff. The default mask is 0xffffffff.
I don't quite understand what the boundary means here, for example, I have a virtual block device, which are made of indeed 4MB files, so I want a request not exceed 4MB boundary,
unsigned long sector = blk_rq_pos(req);
unsigned long offset = sector << 9;
unsigned long nbytes = blk_rq_bytes(req);
int file_offset = offset % (1 << 22);
What I want is that (file_offset + nbytes) not greater than 4M, but indeed sometimes it exceeds 4M,
so, is there any misunderstanding of blk_queue_segment_boundary() ?
Some controllers (particulary IDE) can not handle DMA requests that cross memory regions at 4MB. Think of it as segment:index addressing where index can not be larger that the set boundary.
There is also a blk_queue_max_segment_size. Both are used to construct correct requests to the device - requests get reordered and merged.
There are other uses. For example, from xen-blkfront.c:
/* Each segment in a request is up to an aligned page in size. */
blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
blk_queue_max_segment_size(rq, PAGE_SIZE);
Requests are limited to PAGE_SIZE for better performance.
Related
I tried usual Windows way, I passed nullptr as output buffer pointer and size 0. AcceptSecurityContext fails with error SEC_E_INSUFFICIENT_MEMORY. I was expecting to get needed size in OutSecBuff.cbBuffer but it is 0. I call it again with huge buffer. Call succeeds but context is invalid an later calls fail.
// Query needed buffer size
secStatus = AcceptSecurityContext(&hcred,&hctxt, &InBuffDesc,attr,SECURITY_NATIVE_DREP,
&hctxt,&OutBuffDesc,&attr,nullptr);
if(SEC_E_INSUFFICIENT_MEMORY == ss)
{
// Allocate buffer of needed size, big enough
OutSecBuff.cbBuffer = *pcbOut;
OutSecBuff.pvBuffer = pOut;
// Call with buffer of required size
secStatus = AcceptSecurityContext(&hcred,&hctxt, InBuffDesc,
attr,SECURITY_NATIVE_DREP,&hctxt,&OutBuffDesc,&attr,nullptr);
}
If I preallocate huge buffer, everything works fine.
I would like to dynamically allocate buffer of needed size.
SSAPI takes different approcah. When querying security package QuerySecurityPackageInfo, max size of output buffer is returned in field cbMaxToken. You allocate buffer once and you can be assured that buffer size will be enough for all requests.
I was looking at my process in Task manager after pulling in a large amount of data that went to a CEditView then setting back to small amount. I noticed the commit size stayed large. Then I used VMMMap and see it as well, so I did "Memory Usage" in VS2017 Diagnostic Tools. I see it's coming from ultimately the ::SetWindowText() call. So that obviously allocates a large buffer on the heap, but then when I set it back to a small amount, that allocate stays large. The question is, is there a way I can have the Edit Control free up the memory it doesn't need for smaller amounts of text to reduce that committed memory? Say, I can free it up before setting the new text and it would allocate as needed?
Thanks!!
// From within CEditView
// based on an answer RbMm using EM_GETHANDLE/EM_SETHANDLE and further
// from https://stackoverflow.com/questions/5500237/how-to-use-em-sethandle-on-edit-control
// reduce size to 64K if larger than that and not needed
#define EDITctrlLimitSize 65536
// check if we should reduce size of buffer
HLOCAL horgmem = (HLOCAL) SendMessage(EM_GETHANDLE,0,0);
SIZE_T sizeused=LocalSize(horgmem);
int cbCh = sizeof(TCHAR) > 1 ? sizeof(TCHAR) : IsUsingComCtlV6() ? sizeof(WCHAR) : sizeof(char);
if (sizeused > EDITctrlLimitSize && (string.GetLength()*cbCh) < EDITctrlLimitSize) {
// EM_GETHANDLE says to not change the data, yet EM_SETHANDLE tells you to
// get the handle then LocalFree it, so why not just LocalReAlloc it and set
// again.
HLOCAL hnewmem=(HLOCAL) LocalReAlloc(horgmem, EDITctrlLimitSize, LMEM_MOVEABLE);
if (hnewmem) {
// zero full buffer because seems like good thing to do
LPVOID pnewmem=LocalLock(hnewmem);
if (pnewmem) {
memset(pnewmem, 0, EDITctrlLimitSize);
LocalUnlock(hnewmem);
}
// viola - process memory reduced.
SendMessage(EM_SETHANDLE, (WPARAM) hnewmem, 0);
}
}
I'm altering someone else's code. They used PNG's which are loaded via BufferedImage. I need to load a TGA instead, which is just simply a 18 byte header and BGR codes. I have the textures loaded and running, but I get a gray box instead of the texture. I don't even know how to DEBUG this.
Textures are loaded in a ByteBuffer:
final static int datasize = (WIDTH*HEIGHT*3) *2; // Double buffer size for OpenGL // not +18 no header
static ByteBuffer buffer = ByteBuffer.allocateDirect(datasize);
FileInputStream fin = new FileInputStream("/Volumes/RAMDisk/shot00021.tga");
FileChannel inc = fin.getChannel();
inc.position(18); // skip header
buffer.clear(); // prepare for read
int ret = inc.read(buffer);
fin.close();
I've followed this: [how-to-manage-memory-with-texture-in-opengl][1] ... because I am updating the texture once per frame, like video.
Called once:
GL11.glBindTexture(GL11.GL_TEXTURE_2D, textureID);
GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_WRAP_S, GL11.GL_CLAMP);
GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_WRAP_T, GL11.GL_CLAMP);
GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MAG_FILTER, GL11.GL_NEAREST);
GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MIN_FILTER, GL11.GL_NEAREST);
GL11.glTexImage2D(GL11.GL_TEXTURE_2D, 0, GL11.GL_RGB, width, height, 0, GL11.GL_RGB, GL11.GL_UNSIGNED_BYTE, (ByteBuffer) null);
assert(GL11.GL_NO_ERROR == GL11.glGetError());
Called repeatedly:
GL11.glBindTexture(GL11.GL_TEXTURE_2D, textureID);
GL11.glTexSubImage2D(GL11.GL_TEXTURE_2D, 0, 0, 0, width, height, GL11.GL_RGB, GL11.GL_UNSIGNED_BYTE, byteBuffer);
assert(GL11.GL_NO_ERROR == GL11.glGetError());
return textureID;
The render code hasn't changed and is based on:
GL11.glDrawArrays(GL11.GL_TRIANGLES, 0, this.vertexCount);
Make sure you set the texture sampling mode. Especially min filter: glTexParameteri ( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR). The default setting is mip mapped (GL_NEAREST_MIPMAP_LINEAR) so unless you upload mip maps you will get a white read result.
So either set the texture to no mip or generate them. One way to do that is to call glGenerateMipmap after the tex img call.
(see https://www.khronos.org/opengles/sdk/docs/man/xhtml/glTexParameter.xml).
It's a very common gl pitfall and something people just tend to know after getting bitten by it a few times.
There is no easy way to debug stuff like this. There are good gl debugging tools in for example xcode but they will not tell you about this case.
Debugging GPU code is always a hassle. I would bet my money on a big industry progress in this area as more companies discover the power of GPU. Until then; I'll share my two best GPU debugging friends:
1) Define a function to print OGL errors:
int printOglError(const char *file, int line)
{
/* Returns 1 if an OpenGL error occurred, 0 otherwise. */
GLenum glErr;
int retCode = 0;
glErr = glGetError();
while (glErr != GL_NO_ERROR) {
printf("glError in file %s # line %d: %s\n", file, line, gluErrorString(glErr));
retCode = 1;
glErr = glGetError();
}
return retCode;
}
#define printOpenGLError() printOglError(__FILE__, __LINE__)
And call it after your render draw calls (possible earlier errors will also show up):
GL11.glDrawArrays(GL11.GL_TRIANGLES, 0, this.vertexCount);
printOpenGLError();
This alerts if you make some invalid operations (which might just be your case) but you usually have to find where the error occurs by trial and error.
2) Check out gDEBugger, free software with tons of GPU memory information.
[Edit]:
I would also recommend using the opensource lib DevIL - its quite competent in loading various image formats.
Thanks to Felix, by not calling glTexSubImage2D (leaving the memory valid, but uninitialized) I noticed a remnant pattern left by the default memory. This indicated that the texture is being displayed, but the load is most likely the problem.
**UPDATE:
The, problem with the code above is essentially the buffer. The buffer is 1024*1024, but it is only partially filled in by the read, leaving the limit marker of the ByteBuffer at 2359296(1024*768*3) instead of 3145728(1024*1024*3). This gives the error:
Number of remaining buffer elements is must be ... at least ...
I thought that OpenGL needed space to return data, so I doubled the size of the buffer.
The buffer size is doubled to compensate for the error.
final static int datasize = (WIDTH*HEIGHT*3) *2; // Double buffer size for OpenGL // not +18 no header
This is wrong, what is needed is the flip() function (Big THANKS to Reto Koradi for the small hint to the buffer rewind) to put the ByteBuffer in read mode. Since the buffer is only semi-full, the OpenGL buffer check gives an error. The correct thing to do is not double the buffer size; use buffer.position(buffer.capacity()) to fill the buffer before doing a flip().
final static int datasize = (WIDTH*HEIGHT*3); // not +18 no header
buffer.clear(); // prepare for read
int ret = inc.read(buffer);
fin.close();
buffer.position(buffer.capacity()); // make sure buffer is completely FILLED!
buffer.flip(); // flip buffer to read mode
To figure this out, it is helpful to hardcode the memory of the buffer to make sure the OpenGL calls are working, isolating the load problem. Then when the OpenGL calls are correct, concentrate on the loading of the buffer. As suggested by Felix K, it is good to make sure one texture has been drawn correctly before calling glTexSubImage2D repeatedly.
Some ideas which might cause the issue:
Your texture is disposed somewhere. I don't know the whole code but I guess somewhere there is a glDeleteTextures and this could cause some issues if called at the wrong time.
Are the texture width and height powers of two? If not this might be an issue depending on your hardware. Old hardware sometimes won't support non-power of two images.
The texture parameters changed between the draw calls at some other point ( Make a debug check of the parameters with glGetTexParameter ).
There could be a loading issue when loading the next image ( edit: or even the first image ). Check if the first image is displayed without loading the next images. If so it must be one of the cases above.
I am currently porting my DCF77 library (you may find the source code at GitHub) from Arduino (AVR based) to Arduino Due (ARM Cortex M3).
The library requires precise 1ms timing. An obvious candidate is the use of the systicks. Conveneniently the Arduino Due is already setup for systicks with 1 kHz.
However my (AVR) DCF77 library is capable to tune the timing once it locks to DCF77. This is done by manipulating the timer reload values like so
void isr_handler() {
cumulated_phase_deviation += adjust_pp16m;
// 1 / 250 / 64000 = 1 / 16 000 000
if (cumulated_phase_deviation >= 64000) {
cumulated_phase_deviation -= 64000;
// cumulated drift exceeds 1 timer step (4 microseconds)
// drop one timer step to realign
OCR2A = 248;
} else if (cumulated_phase_deviation <= -64000) {
// cumulated drift exceeds 1 timer step (4 microseconds)
// insert one timer step to realign
cumulated_phase_deviation += 64000;
OCR2A = 250;
} else {
// 249 + 1 == 250 == 250 000 / 1000 = (16 000 000 / 64) / 1000
OCR2A = 249;
}
DCF77_Clock_Controller::process_1_kHz_tick_data(the_input_provider());
}
I want to port this to the ARM processor. In the ARM information center I found the following documentation.
Configuring SysTick
...
To configure the SysTick you need to load the SysTick Reload Value
register with the interval required between SysTick events. The timer
interrupt or COUNTFLAG bit (in the SysTick Control and Status
register) is activated on the transition from 1 to 0, therefore it
activates every n+1 clock ticks. If a period of 100 is required 99
should be written to the SysTick Reload Value register. The SysTick
Reload Value register supports values between 1 and 0x00FFFFFF.
If you want to use the SysTick to generate an event at a timed
interval, for example 1ms, you can use the SysTick Calibration Value
Register to scale your value for the Reload register. The SysTick
Calibration Value Register is a read-only register that contains the
number of pulses for a period of 10ms, in the TENMS field (bits 0 to
23). This register also has a SKEW bit (30) that is used to indicate
that the calibration for 10ms in the TENMS section is not exactly 10ms
due to small variations in clock frequency. Bit 31 is used to indicate
if the reference clock is provided.
...
Unfortunately I did not find anything on how SysTick->LOAD and SysTick->CALIB are connected. That is: if I want to throttle or accelerate systicks, do I need to manipulate the LOAD or the CALIB value? And which values do I need to put into these registers?
Searching the internet did not bring up any better hints. Maybe I am searching at the wrong places.
Is there anywhere a more detailed reference for these questions? Or maybe even some good examples?
Comparing the AtMega328 datasheet with the Cortex-M3 TRM, the standout point is that the timers work opposite ways round: on the AVR, you're loading a value into OCR2A and waiting for the timer in TCNT2 to count up to it, whereas on the M3 you load the delay value into SYST_RVR, then the system will count down from this value to 0 in SYST_CVR.
The big difference for calibration is going to be because the comparison value is fixed at 0 and you can only adjust the reload value, you might have more latency compared to adjusting the comparison value directly (assuming the counter reload happens at the same time the interrupt is generated).
The read-only value in SYST_CALIB (if indeed it even exists, being implementation-defined and optional), is merely for relating SYSTICK ticks to actual wallclock time - when first initialising the timer, you need to know the tick frequency in order to pick an appropriate reload value for your desired period, so having a register field that says "this many reference clock ticks happen in 10ms (possibly)" offers some possibility of calculating that at runtime in a portable fashion, rather than having to hard-code a value that might need changing for different devices.
In this case, however, not only does having an even-more-accurate external clock to synchronise against makes this less important, but crucially, the firmware has already configured the timer for you. Thus you can assume that whatever value is in SYST_RVR represents close-enough-to-1KHz, and work from there - in fact to simply fine-tune the 1KHz period you don't even need to know what the actual value is, just do SysTick->LOAD++ or SysTick->LOAD-- if the error gets too big in either direction.
Delving a bit deeper, the SAM3X datasheet shows that for the particular M3 implementation in that SoC, SYSTICK has a 10.5 MHz reference clock, therefore the SYST_CALIB register should give a value of 105000 ticks for 10ms. Except it doesn't, because apparently Atmel thought it would be really clever to make the unambiguously-named TENMS field give the tick count for 1ms, 10500, instead. Wonderful.
Just for the reason that others do not have to dig around like had to do - here is what I found out in addition.
In arduino-1.5.8/hardware/arduino/sam/system/CMSIS/CMSIS/Include/core_cm*.h there is code to manipulate SysTick. In particular in core_cm3.h there is a function
static __INLINE uint32_t SysTick_Config(uint32_t ticks)
{
if (ticks > SysTick_LOAD_RELOAD_Msk) return (1); /* Reload value impossible */
SysTick->LOAD = (ticks & SysTick_LOAD_RELOAD_Msk) - 1; /* set reload register */
NVIC_SetPriority (SysTick_IRQn, (1<<__NVIC_PRIO_BITS) - 1); /* set Priority for Cortex-M0 System Interrupts */
SysTick->VAL = 0; /* Load the SysTick Counter Value */
SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk |
SysTick_CTRL_TICKINT_Msk |
SysTick_CTRL_ENABLE_Msk; /* Enable SysTick IRQ and SysTick Timer */
return (0); /* Function successful */
}
Then in arduino-1.5.8/hardware/arduino/sam/variants/arduino_due_x/variant.cpp in function init there is
// Set Systick to 1ms interval, common to all SAM3 variants
if (SysTick_Config(SystemCoreClock / 1000))
{
// Capture error
while (true);
}
Since SystemCoreClock evaluates to 84000000 it follows that this compiles like SysTick_Config(84000). I verified against a DCF77 module that SysTick_Config(84001) will slow down SysTicks while SysTick_Config(83999) will speed it up.
I am developing an application in Atmel Studio 6 using the xMega32a4u. I'm using the TWI libraries provided by Atmel. Everything is going well for the most part.
Here is my issue: In order to update an OLED display I am using (SSD1306 controller, 128x32), the entire contents of the display RAM must be written immediately following the I2C START command, slave address, and control byte so the display knows to enter the data into the display RAM. If the control byte does not immediately precede the display RAM package, nothing works.
I am using a Saleae logic analyzer to verify that the bus is doing what it should.
Here is the function I am using to write the display:
void OLED_buffer(){ // Used to write contents of display buffer to OLED
uint8_t data_array[513];
data_array[0] = SSD1306_DATA_BYTE;
for (int i=0;i<512;++i){
data_array[i+1] = buffer[i];
}
OLED_command(SSD1306_SETLOWCOLUMN | 0x00);
OLED_command(SSD1306_SETHIGHCOLUMN | 0x00);
OLED_command(SSD1306_SETSTARTLINE | 0x00);
twi_package_t buffer_send = {
.chip = OLED_BUS_ADDRESS,
.buffer = data_array,
.length = 513
};
twi_master_write(&TWIC, &buffer_send);
}
Clearly, this is very inefficient as each call to this function recreates the entire array "buffer" into a new array "data_array," one element at a time. The point of this is to insert the control byte (SSD1306_DATA_BYTE = 0x40) into the array so that the entire "package" is sent at once, and the control byte is in the right place. I could make the original "buffer" array one element larger and add the control byte as the first element, to skip this process but that makes the size 513 rather than 512, and might mess with some of the text/graphical functions that manipulate this array and depend on it being the correct size.
Now, I thought I could write the code like this:
void OLED_buffer(){ // Used to write contents of display buffer to OLED
uint8_t data_byte = SSD1306_DATA_BYTE;
OLED_command(SSD1306_SETLOWCOLUMN | 0x00);
OLED_command(SSD1306_SETHIGHCOLUMN | 0x00);
OLED_command(SSD1306_SETSTARTLINE | 0x00);
twi_package_t data_control_byte = {
.chip = OLED_BUS_ADDRESS,
.buffer = data_byte,
.length = 1
};
twi_master_write(&TWIC, &data_control_byte);
twi_package_t buffer_send = {
.chip = OLED_BUS_ADDRESS,
.buffer = buffer,
.length = 512
};
twi_master_write(&TWIC, &buffer_send);
}
/*
That doesn't work. The first "twi_master_write" command sends a START, address, control, STOP. Then the next such command sends a START, address, data buffer, STOP. Because the control byte is missing from the latter transaction, this does not work. All I need is to insert a 0x40 byte between the address byte and the buffer array when it is sent over the I2C bus. twi_master_write is a function that is provided in the Atmel TWI libraries. I've tried to examine the libraries to figure out its inner workings, but I can't make sense of it.
Surely, instead of figuring out how to recreate a twi_write function to work the way I need, there is an easier way to add this preceding control byte? Ideally one that is not so wasteful of clock cycles as my first code example? Realistically the display still updates very fast, more than enough for my needs, but that does not change the fact this is inefficient code.
I appreciate any advice you all may have. Thanks in advance!
How about having buffer and data_array pointing to the same uint8_t[513] array, but with buffer starting at its second element. Then you can continue to use buffer as you do today, but also use data_array directly without first having to copy all the elements from buffer.
uint8_t data_array[513];
uint8_t *buffer = &data_array[1];