Binding numeric keypad keys with Emacs 24 and OS/X - macos

I'm currently trying out the Git HEAD version of Emacs 24 on OS/X as described in this article:
http://www.viget.com/extend/emacs-24-rails-development-environment-from-scratch-to-productive-in-5-minu/
I'd like to bind some of the Macintosh extended keyboard numeric keypad keys to Emacs functions, but it doesn't seem to be working. When I do a "c-h k" to check the key details the key presses are not recognized. Ditto if I refer to them in a (global-set-key (kbd "kp-minus") ...) setting.
Is this an issue with using the development version of Emacs 24 or is it something about the Macintosh keyboard hardware and how Emacs sees it? Can anyone advise on the best way to go about this?
Thanks in advance,
Stu

I had the same issue with the keypad keys building emacs 24. The problem is the same with emacs 23. I patched the emacs 24 code as follows to correct the problem. Not sure this is a good solution but it works well enough for me.
index 91f0cbb..d537ee3 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
## -87,6 +87,7 ## static unsigned convert_ns_to_X_keysym[] =
NSBeginFunctionKey, 0x58,
NSSelectFunctionKey, 0x60,
NSPrintFunctionKey, 0x61,
+ NSClearLineFunctionKey, 0x0B,
NSExecuteFunctionKey, 0x62,
NSInsertFunctionKey, 0x63,
NSUndoFunctionKey, 0x65,
## -134,6 +135,35 ## static unsigned convert_ns_to_X_keysym[] =
0x1B, 0x1B /* escape */
};
+static unsigned convert_nskeypad_to_X_keysym[] =
+{
+ /* Arrow keys are both function and keypad keys */
+ NSLeftArrowFunctionKey, 0x51,
+ NSUpArrowFunctionKey, 0x52,
+ NSRightArrowFunctionKey, 0x53,
+ NSDownArrowFunctionKey, 0x54,
+
+ 0x41, 0xAE, /* KP_Decimal */
+ 0x43, 0xAA, /* KP_Multiply */
+ 0x45, 0xAB, /* KP_Add */
+ 0x4B, 0xAF, /* KP_Divide */
+ 0x4E, 0xAD, /* KP_Subtract */
+ 0x51, 0xBD, /* KP_Equal */
+ 0x52, 0xB0, /* KP_0 */
+ 0x53, 0xB1, /* KP_1 */
+ 0x54, 0xB2, /* KP_2 */
+ 0x55, 0xB3, /* KP_3 */
+ 0x56, 0xB4, /* KP_4 */
+ 0x57, 0xB5, /* KP_5 */
+ 0x58, 0xB6, /* KP_6 */
+ 0x59, 0xB7, /* KP_7 */
+ 0x5B, 0xB8, /* KP_8 */
+ 0x5C, 0xB9, /* KP_9 */
+
+ // The enter key is on the keypad but modifier isnt set
+ NSEnterCharacter, 0x8D
+};
+
static Lisp_Object Qmodifier_value;
Lisp_Object Qalt, Qcontrol, Qhyper, Qmeta, Qsuper, Qnone;
## -1924,13 +1954,33 ## ns_convert_key (unsigned code)
unsigned keysym;
/* An array would be faster, but less easy to read. */
for (keysym = 0; keysym < last_keysym; keysym += 2)
- if (code == convert_ns_to_X_keysym[keysym])
- return 0xFF00 | convert_ns_to_X_keysym[keysym+1];
+
+ if (code == convert_ns_to_X_keysym[keysym]) {
+ return 0xFF00 | convert_ns_to_X_keysym[keysym+1];
+ }
return 0;
/* if decide to use keyCode and Carbon table, use this line:
return code > 0xff ? 0 : 0xFF00 | ns_keycode_to_xkeysym_table[code]; */
}
+static unsigned
+ns_convert_keypad (unsigned code)
+/* --------------------------------------------------------------------------
+ Internal call used by NSView-keyDown.
+ -------------------------------------------------------------------------- */
+{
+ const unsigned last_keysym = (sizeof (convert_nskeypad_to_X_keysym)
+ / sizeof (convert_nskeypad_to_X_keysym[0]));
+ unsigned keysym;
+ /* An array would be faster, but less easy to read. */
+ for (keysym = 0; keysym < last_keysym; keysym += 2) {
+ if (code == convert_nskeypad_to_X_keysym[keysym]) {
+ return 0xFF00 | convert_nskeypad_to_X_keysym[keysym+1];
+ }
+ }
+ return 0;
+}
+
char *
x_get_keysym_name (int keysym)
## -4503,10 +4553,10 ## ns_term_shutdown (int sig)
Mouse_HLInfo *hlinfo = MOUSE_HL_INFO (emacsframe);
int code;
unsigned fnKeysym = 0;
- int flags;
static NSMutableArray *nsEvArray;
static BOOL firstTime = YES;
int left_is_none;
+ unsigned int flags = [theEvent modifierFlags];
NSTRACE (keyDown);
## -4550,9 +4600,13 ## ns_term_shutdown (int sig)
code = ([[theEvent charactersIgnoringModifiers] length] == 0) ?
0 : [[theEvent charactersIgnoringModifiers] characterAtIndex: 0];
/* (Carbon way: [theEvent keyCode]) */
+
/* is it a "function key"? */
- fnKeysym = ns_convert_key (code);
+ if (code < 0x00ff && (flags & NSNumericPadKeyMask) )
+ fnKeysym = ns_convert_keypad([theEvent keyCode]);
+ else
+ fnKeysym = ns_convert_key(code);
if (fnKeysym)
{
/* COUNTERHACK: map 'Delete' on upper-right main KB to 'Backspace',
## -4565,7 +4619,6 ## ns_term_shutdown (int sig)
/* are there modifiers? */
emacs_event->modifiers = 0;
- flags = [theEvent modifierFlags];
if (flags & NSHelpKeyMask)
emacs_event->modifiers |= hyper_modifier;

I tried this with an older Emacs:
This is GNU Emacs 22.2.1 (powerpc-apple-darwin9.5.0, GTK+ Version 2.10.13)
Built from the ports collection with +gtk +x11 and used with the X11 server XQuartz 2.1.6 (xorg-server 1.4.2-apple33), when I look with C-h lI get
<kp-0> ... <kp-9>
for the numbers. and
<kp-enter> <kp-subtract> <kp-multiply> <kp-divide> <kp-equal>
For the other keys.
I'd suggest to build the latest Emacs from MacPorts with the options +gtk and +x11.
Then I'd get the latest XQuartz and run Emacs over X11 (I prefer this to the more native builds because then Emacs always behaves the same regardless if it runs remotely on another OS (usually via ssh -Y) or locally.
I'll upgrade my ports to the latest Emacs next week and add the result for these also.

The problem is exclusive to the Cocoa variant of emacs. The problem did not exist in the Carbon version, which is emacs22. I updated the patch I posted above, which now works better. It will likely work with the Emacs23 code base using XCode 3. If you are using XCode 4, like me you will need to use the Emacs24 code base which is currently only available in the GIT repository. Here is a very nice description on building Emacs24 via XCode 4
[http://mikbe.tk/2011/04/18/build-emacs-with-xcode-4/][1]

Here's an Emacs 23.3 translation of the Emacs 24 patch M. D. Marchionna's posted on this page.
--- nsterm-orig.m 2011-11-13 17:51:47.000000000 -0500
+++ nsterm.m 2011-11-13 17:39:56.000000000 -0500
## -87,6 +87,7 ##
NSBeginFunctionKey, 0x58,
NSSelectFunctionKey, 0x60,
NSPrintFunctionKey, 0x61,
+ NSClearLineFunctionKey, 0x0B,
NSExecuteFunctionKey, 0x62,
NSInsertFunctionKey, 0x63,
NSUndoFunctionKey, 0x65,
## -134,6 +135,33 ##
0x1B, 0x1B /* escape */
};
+static unsigned convert_nskeypad_to_X_keysym[] =
+{
+ /* Arrow keys are both function and keypad keys */
+ NSLeftArrowFunctionKey, 0x51,
+ NSUpArrowFunctionKey, 0x52,
+ NSRightArrowFunctionKey, 0x53,
+ NSDownArrowFunctionKey, 0x54,
+
+ 0x41, 0xAE, /* KP_Decimal */
+ 0x43, 0xAA, /* KP_Multiply */
+ 0x45, 0xAB, /* KP_Add */
+ 0x4B, 0xAF, /* KP_Divide */
+ 0x4E, 0xAD, /* KP_Subtract */
+ 0x51, 0xBD, /* KP_Equal */
+ 0x52, 0xB0, /* KP_0 */
+ 0x53, 0xB1, /* KP_1 */
+ 0x54, 0xB2, /* KP_2 */
+ 0x55, 0xB3, /* KP_3 */
+ 0x56, 0xB4, /* KP_4 */
+ 0x57, 0xB5, /* KP_5 */
+ 0x58, 0xB6, /* KP_6 */
+ 0x59, 0xB7, /* KP_7 */
+ 0x5B, 0xB8, /* KP_8 */
+ 0x5C, 0xB9, /* KP_9 */
+ // The enter key is on the keypad but modifier isnt set
+ NSEnterCharacter, 0x8D
+};
/* Lisp communications */
Lisp_Object ns_input_file, ns_input_font, ns_input_fontsize, ns_input_line;
## -1842,6 +1870,23 ##
return code > 0xff ? 0 : 0xFF00 | ns_keycode_to_xkeysym_table[code]; */
}
+static unsigned
+ns_convert_keypad (unsigned code)
+/* --------------------------------------------------------------------------
+ Internal call used by NSView-keyDown.
+ -------------------------------------------------------------------------- */
+{
+ const unsigned last_keysym = (sizeof (convert_nskeypad_to_X_keysym)
+ / sizeof (convert_nskeypad_to_X_keysym[0]));
+ unsigned keysym;
+ /* An array would be faster, but less easy to read. */
+ for (keysym = 0; keysym < last_keysym; keysym += 2) {
+ if (code == convert_nskeypad_to_X_keysym[keysym]) {
+ return 0xFF00 | convert_nskeypad_to_X_keysym[keysym+1];
+ }
+ }
+ return 0;
+}
char *
x_get_keysym_name (int keysym)
## -4349,7 +4394,7 ##
struct ns_display_info *dpyinfo = FRAME_NS_DISPLAY_INFO (emacsframe);
int code;
unsigned fnKeysym = 0;
- int flags;
+ unsigned int flags = [theEvent modifierFlags];
static NSMutableArray *nsEvArray;
static BOOL firstTime = YES;
## -4397,6 +4442,9 ##
/* (Carbon way: [theEvent keyCode]) */
/* is it a "function key"? */
+ if (code < 0x00ff && (flags & NSNumericPadKeyMask) )
+ fnKeysym = ns_convert_keypad([theEvent keyCode]);
+ else
fnKeysym = ns_convert_key (code);
if (fnKeysym)
{
## -4410,8 +4458,6 ##
/* are there modifiers? */
emacs_event->modifiers = 0;
- flags = [theEvent modifierFlags];
-
if (flags & NSHelpKeyMask)
emacs_event->modifiers |= hyper_modifier;

Related

eBPF Validation error when trying to hash a string (process name)

Hi I am trying to generate a 32bit hash for the full process name in ebpf. These process names can be long and will not fit on the stack hence the "heap" per cpu array. I am currently using libbpf bootstrap as a prototype from here: https://github.com/libbpf/libbpf-bootstrap.git I am having an issue with the verifier not validating the hash function. What is the problem here? I am stumped.
The meat of the code is:
uint32_t map_id = 0;
char *map_val = bpf_map_lookup_elem(&heap, &map_id);
if (!map_val)
return 0;
int bytes_read = bpf_probe_read_str(map_val, sizeof(e->filename), (void *)ctx + fname_off);
if (bytes_read > 0) {
map_val[ (bytes_read - 1) & (4096 -1) ] = 0;
uint32_t key = hash( (unsigned char*)map_val);
bpf_printk("process_exec count: %u, hash: %lu, full path: %s\n", bytes_read -1, key, map_val);
}
The hash function is:
uint32_t hash(unsigned char *str)
{
int c;
uint32_t hash = 5381;
while ( c = *str++ )
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
I get a validator error:
; hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
91: (27) r4 *= 33
; hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
92: (0f) r4 += r1
; while ( c = *str++ )
93: (71) r1 = *(u8 *)(r2 +0)
R0=inv(id=6,smin_value=-4096,smax_value=4095) R1_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2_w=map_value(id=0,off=4096,ks=4,vs=4096,imm=0) R4_w=inv(id=0) R6=ctx(id=0,off=0,umax_value=65535,var_off=(0x0; 0xffff)) R7=map_value(id=0,off=0,ks=4,vs=4096,imm=0) R8=invP0 R10=fp0 fp-8=mmmm???? fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm
invalid access to map value, value_size=4096 off=4096 size=1
R2 min value is outside of the allowed memory range
processed 32861 insns (limit 1000000) max_states_per_insn 4 total_states 337 peak_states 337 mark_read 4
-- END PROG LOAD LOG --
libbpf: prog 'handle_exec': failed to load: -13
libbpf: failed to load object 'bootstrap_bpf'
libbpf: failed to load BPF skeleton 'bootstrap_bpf': -13
Failed to load and verify BPF skeleton
Here is the complete diff for my use case:
diff --git a/examples/c/bootstrap.bpf.c b/examples/c/bootstrap.bpf.c
index d0860c0..c93ed58 100644
--- a/examples/c/bootstrap.bpf.c
+++ b/examples/c/bootstrap.bpf.c
## -20,6 +20,13 ## struct {
__uint(max_entries, 256 * 1024);
} rb SEC(".maps");
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(key_size, sizeof(u32));
+ __uint(max_entries, 1);
+ __uint(value_size, 4096);
+} heap SEC(".maps");
+
const volatile unsigned long long min_duration_ns = 0;
SEC("tp/sched/sched_process_exec")
## -58,6 +65,22 ## int handle_exec(struct trace_event_raw_sched_process_exec *ctx)
/* successfully submit it to user-space for post-processing */
bpf_ringbuf_submit(e, 0);
+
+
+ uint32_t map_id = 0;
+ char *map_val = bpf_map_lookup_elem(&heap, &map_id);
+ if (!map_val)
+ return 0;
+
+ int bytes_read = bpf_probe_read_str(map_val, sizeof(e->filename), (void *)ctx + fname_off);
+ if (bytes_read > 0) {
+ // tell the validator bytes ready is between 0 and 4095
+ map_val[ (bytes_read - 1) & (4096 -1) ] = 0;
+
+ uint32_t key = hash( (unsigned char*)map_val);
+ bpf_printk("process_exec count: %u, hash: %u, full path: %s\n", bytes_read -1, key, map_val);
+ }
+
return 0;
}
## -109,4 +132,3 ## int handle_exit(struct trace_event_raw_sched_process_template* ctx)
bpf_ringbuf_submit(e, 0);
return 0;
}
-
diff --git a/examples/c/bootstrap.h b/examples/c/bootstrap.h
index b49e022..d268e56 100644
--- a/examples/c/bootstrap.h
+++ b/examples/c/bootstrap.h
## -4,7 +4,7 ##
#define __BOOTSTRAP_H
#define TASK_COMM_LEN 16
-#define MAX_FILENAME_LEN 127
+#define MAX_FILENAME_LEN 4096
struct event {
int pid;
## -16,4 +16,15 ## struct event {
bool exit_event;
};
+static inline
+uint32_t hash(unsigned char *str)
+{
+ int c;
+ uint32_t hash = 5381;
+ while ( c = *str++ )
+ hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
+
+ return hash;
+}
+
#endif /* __BOOTSTRAP_H */
TL;DR. You need to ensure that you are not reading past the end of the map value. So you need to check str never goes past the initial str value + 4095.
Verifier error explanation.
; while ( c = *str++ )
93: (71) r1 = *(u8 *)(r2 +0)
R0=inv(id=6,smin_value=-4096,smax_value=4095) R1_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2_w=map_value(id=0,off=4096,ks=4,vs=4096,imm=0) R4_w=inv(id=0) R6=ctx(id=0,off=0,umax_value=65535,var_off=(0x0; 0xffff)) R7=map_value(id=0,off=0,ks=4,vs=4096,imm=0) R8=invP0 R10=fp0 fp-8=mmmm???? fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm
invalid access to map value, value_size=4096 off=4096 size=1
R2 min value is outside of the allowed memory range
The verifier here is telling you that your code may attempt to read one byte (size=1) from the map value, at offset 4096 (off=4096). Since the map value has a size of 4096 (value_size=4096), that would end up reading after the end of the map value, leading to an unbounded memory access. Hence, the verifier rejects it.

Analog measurement incorrect on Teensy 2.0++

I have a Joystick wired up to my Teensy 2.0++ and I want to read the analog values from it.
I took this implementation from PJRC:
static uint8_t aref = (1<<REFS0); // default to AREF = Vcc, this is a 5V Vcc Teensy
void analogReference(uint8_t mode)
{
aref = mode & 0xC0;
}
// Mux input
int16_t adc_read(uint8_t mux)
{
#if defined(__AVR_AT90USB162__)
return 0;
#else
uint8_t low;
ADCSRA = (1<<ADEN) | ADC_PRESCALER; // enable ADC
ADCSRB = (1<<ADHSM) | (mux & 0x20); // high speed mode
ADMUX = aref | (mux & 0x1F); // configure mux input
ADCSRA = (1<<ADEN) | ADC_PRESCALER | (1<<ADSC); // start the conversion
while (ADCSRA & (1<<ADSC)) ; // wait for result
low = ADCL; // must read LSB first
return (ADCH << 8) | low; // must read MSB only once!
#endif
}
// Arduino compatible pin input
int16_t analogRead(uint8_t pin)
{
#if defined(__AVR_ATmega32U4__)
static const uint8_t PROGMEM pin_to_mux[] = {
0x00, 0x01, 0x04, 0x05, 0x06, 0x07,
0x25, 0x24, 0x23, 0x22, 0x21, 0x20};
if (pin >= 12) return 0;
return adc_read(pgm_read_byte(pin_to_mux + pin));
#elif defined(__AVR_AT90USB646__) || defined(__AVR_AT90USB1286__)
if (pin >= 8) return 0;
return adc_read(pin);
#else
return 0;
#endif
}
I have my X and Y pins wired up to F1 and F0, and I want to retrieve values with the following code:
long map(long x, long in_min, long in_max, long out_min, long out_max) // map method shamelessy ripped from Arduino
{
return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}
joy_ly = map(analogRead(0), 0, 65535, 0, 255);
joy_lx = map(analogRead(1), 0, 65535, 0, 255);
I measured my Joystick with a multimeter and it works perfectly (around 2.43V on center, 0V on min, and 5V on max), but the center value always ends up being very close to zero.
Is there anything I'm doing wrong?
NOTE: This is an at90usb1286 chip.
The ADC max value is 1024, not 65535.

C getting a raw keypress with no stdlib

I am working an a very basic operating system for a learning experience, and I am trying to start with key presses. I am making a freestanding executable, so no standard library. How would I go about taking input from a keyboard? I have figured out how to print to the screen through video memory.
/*
* kernel.c
* */
void cls(int line) { // clear the screen
char *vidptr = (char*) 0xb8000;
/* 25 lines each of 80 columns; each element takes 2 bytes */
unsigned int x = 0;
while (x < 80 * 25 * 2) {
// blank character
vidptr[x] = ' ';
// attribute-byte - light grey on black screen
x += 1;
vidptr[x] = 0x07;
x += 1;
}
line = 0;
}
void printf(const char *str, int line, char attr) { // write a string to video memory
char *vidptr = (char*) 0xb8000;
unsigned int y =0, x = 0;
while (str[y] != '\0') {
// the character's ascii
vidptr[x] = str[y];
x++;
// attribute byte - give character black bg and light gray fg
vidptr[x+1] = attr;
x++;
y++;
}
}
void kmain(void) {
unsigned int line = 0;
cls(line);
printf("Testing the Kernel", line, 0x0a);
}
and my assembly:
;; entry point
bits 32 ; nasm directive - 32 bit
global entry
extern _kmain ; kmain is defined in the c file
section .text
entry: jmp start
;multiboot spec
align 4
dd 0x1BADB002 ; black magic
dd 0x00 ; flags
dd -(0x1BADB002 + 0x00) ; checksum. m+f+c should be zero
start:
cli ; block interrupts
mov esp, stack_space ; set stack pointer
call _kmain
hlt ; halt the CPU
section .bss
resb 8192 ; 8KB for stack
stack_space:

Efficient sse shuffle mask generation for left-packing byte elements

What would be an efficient way to optimize the following code with sse ?
uint16_t change1= ... ;
uint8_t* pSrc = ... ;
uint8_t* pDest = ... ;
if(change1 & 0x0001) *pDest++ = pSrc[0];
if(change1 & 0x0002) *pDest++ = pSrc[1];
if(change1 & 0x0004) *pDest++ = pSrc[2];
if(change1 & 0x0008) *pDest++ = pSrc[3];
if(change1 & 0x0010) *pDest++ = pSrc[4];
if(change1 & 0x0020) *pDest++ = pSrc[5];
if(change1 & 0x0040) *pDest++ = pSrc[6];
if(change1 & 0x0080) *pDest++ = pSrc[7];
if(change1 & 0x0100) *pDest++ = pSrc[8];
if(change1 & 0x0200) *pDest++ = pSrc[9];
if(change1 & 0x0400) *pDest++ = pSrc[10];
if(change1 & 0x0800) *pDest++ = pSrc[11];
if(change1 & 0x1000) *pDest++ = pSrc[12];
if(change1 & 0x2000) *pDest++ = pSrc[13];
if(change1 & 0x4000) *pDest++ = pSrc[14];
if(change1 & 0x8000) *pDest++ = pSrc[15];
So far I am using a quite big lookup table for it, but I really want to get rid of it:
SSE3Shuffle::Entry& e0 = SSE3Shuffle::g_Shuffle.m_Entries[change1];
_mm_storeu_si128((__m128i*)pDest, _mm_shuffle_epi8(*(__m128i*)pSrc, e0.mask));
pDest += e0.offset;
Assuming:
change1 = _mm_movemask_epi8(bytemask);
offset = popcnt(change1);
On large buffers, using two shuffles and a 1 KiB table is only ~10% slower than using 1 shuffle and a 1MiB table. My attempts at generating the shuffle mask via prefix sums and bit twiddling are about about half the speed of the table based methods
(solutions using pext/pdep were not explored).
Reducing table size: Use two lookups into a 2 KiB table instead of 1 lookup into a 1 MiB table. Always keep the top-most byte - if that byte is to be discarded then it doesn't matter what byte is at that position (down to 7-bit indices, or 1 KiB table). Further reduce possible combinations by manually packing the two bytes in each 16-bit lane (down to a 216 byte table).
The following example strips whitespace from text using SSE4.1. If only SSSE3 is available then blendv can be emulated. The 64-bit halves are re-combined by overlapping writes to memory, but they could be re-combined in the xmm register (as seen in the AVX2 example).
#include <stdint.h>
#include <smmintrin.h> // SSE4.1
size_t despacer (void* dst_void, void* src_void, size_t length)
{
uint8_t* src = (uint8_t*)src_void;
uint8_t* dst = (uint8_t*)dst_void;
if (length >= 16) {
// table of control characters (space, tab, newline, carriage return)
const __m128i lut_cntrl = _mm_setr_epi8(' ', 0, 0, 0, 0, 0, 0, 0, 0, '\t', '\n', 0, 0, '\r', 0, 0);
// bits[4:0] = index -> ((trit_d * 0) + (trit_c * 9) + (trit_b * 3) + (trit_a * 1))
// bits[15:7] = popcnt
const __m128i sadmask = _mm_set1_epi64x(0x8080898983838181);
// adding 8 to each shuffle index is cheaper than extracting the high qword
const __m128i offset = _mm_cvtsi64_si128(0x0808080808080808);
// shuffle control indices
static const uint64_t table[27] = {
0x0000000000000706, 0x0000000000070600, 0x0000000007060100, 0x0000000000070602,
0x0000000007060200, 0x0000000706020100, 0x0000000007060302, 0x0000000706030200,
0x0000070603020100, 0x0000000000070604, 0x0000000007060400, 0x0000000706040100,
0x0000000007060402, 0x0000000706040200, 0x0000070604020100, 0x0000000706040302,
0x0000070604030200, 0x0007060403020100, 0x0000000007060504, 0x0000000706050400,
0x0000070605040100, 0x0000000706050402, 0x0000070605040200, 0x0007060504020100,
0x0000070605040302, 0x0007060504030200, 0x0706050403020100
};
const uint8_t* end = &src[length & ~15];
do {
__m128i v = _mm_loadu_si128((__m128i*)src);
src += 16;
// detect spaces
__m128i mask = _mm_cmpeq_epi8(_mm_shuffle_epi8(lut_cntrl, v), v);
// shift w/blend: each word now only has 3 states instead of 4
// which reduces the possiblities per qword from 128 to 27
v = _mm_blendv_epi8(v, _mm_srli_epi16(v, 8), mask);
// extract bitfields describing each qword: index, popcnt
__m128i desc = _mm_sad_epu8(_mm_and_si128(mask, sadmask), sadmask);
size_t lo_desc = (size_t)_mm_cvtsi128_si32(desc);
size_t hi_desc = (size_t)_mm_extract_epi16(desc, 4);
// load shuffle control indices from pre-computed table
__m128i lo_shuf = _mm_loadl_epi64((__m128i*)&table[lo_desc & 0x1F]);
__m128i hi_shuf = _mm_or_si128(_mm_loadl_epi64((__m128i*)&table[hi_desc & 0x1F]), offset);
// store an entire qword then advance the pointer by how ever
// many of those bytes are actually wanted. Any trailing
// garbage will be overwritten by the next store.
// note: little endian byte memory order
_mm_storel_epi64((__m128i*)dst, _mm_shuffle_epi8(v, lo_shuf));
dst += (lo_desc >> 7);
_mm_storel_epi64((__m128i*)dst, _mm_shuffle_epi8(v, hi_shuf));
dst += (hi_desc >> 7);
} while (src != end);
}
// tail loop
length &= 15;
if (length != 0) {
const uint64_t bitmap = 0xFFFFFFFEFFFFC1FF;
do {
uint64_t c = *src++;
*dst = (uint8_t)c;
dst += ((bitmap >> c) & 1) | ((c + 0xC0) >> 8);
} while (--length);
}
// return pointer to the location after the last element in dst
return (size_t)(dst - ((uint8_t*)dst_void));
}
Whether the tail loop should be vectorized or use cmov is left as an exercise for the reader. Writing each byte unconditionally/branchlessly is fast when the input is unpredictable.
Using AVX2 to generate the shuffle control mask using an in-register table is only slightly slower than using large precomputed tables.
#include <stdint.h>
#include <immintrin.h>
// probably needs improvment...
size_t despace_avx2_vpermd(const char* src_void, char* dst_void, size_t length)
{
uint8_t* src = (uint8_t*)src_void;
uint8_t* dst = (uint8_t*)dst_void;
const __m256i lut_cntrl2 = _mm256_broadcastsi128_si256(_mm_setr_epi8(' ', 0, 0, 0, 0, 0, 0, 0, 0, '\t', '\n', 0, 0, '\r', 0, 0));
const __m256i permutation_mask = _mm256_set1_epi64x( 0x0020100884828180 );
const __m256i invert_mask = _mm256_set1_epi64x( 0x0020100880808080 );
const __m256i zero = _mm256_setzero_si256();
const __m256i fixup = _mm256_set_epi32(
0x08080808, 0x0F0F0F0F, 0x00000000, 0x07070707,
0x08080808, 0x0F0F0F0F, 0x00000000, 0x07070707
);
const __m256i lut = _mm256_set_epi32(
0x04050607, // 0x03020100', 0x000000'07
0x04050704, // 0x030200'00, 0x0000'0704
0x04060705, // 0x030100'00, 0x0000'0705
0x04070504, // 0x0300'0000, 0x00'070504
0x05060706, // 0x020100'00, 0x0000'0706
0x05070604, // 0x0200'0000, 0x00'070604
0x06070605, // 0x0100'0000, 0x00'070605
0x07060504 // 0x00'000000, 0x'07060504
);
// hi bits are ignored by pshufb, used to reject movement of low qword bytes
const __m256i shuffle_a = _mm256_set_epi8(
0x7F, 0x7E, 0x7D, 0x7C, 0x7B, 0x7A, 0x79, 0x78, 0x07, 0x16, 0x25, 0x34, 0x43, 0x52, 0x61, 0x70,
0x7F, 0x7E, 0x7D, 0x7C, 0x7B, 0x7A, 0x79, 0x78, 0x07, 0x16, 0x25, 0x34, 0x43, 0x52, 0x61, 0x70
);
// broadcast 0x08 then blendd...
const __m256i shuffle_b = _mm256_set_epi32(
0x08080808, 0x08080808, 0x00000000, 0x00000000,
0x08080808, 0x08080808, 0x00000000, 0x00000000
);
for( uint8_t* end = &src[(length & ~31)]; src != end; src += 32){
__m256i r0,r1,r2,r3,r4;
unsigned int s0,s1;
r0 = _mm256_loadu_si256((__m256i *)src); // asrc
// detect spaces
r1 = _mm256_cmpeq_epi8(_mm256_shuffle_epi8(lut_cntrl2, r0), r0);
r2 = _mm256_sad_epu8(zero, r1);
s0 = (unsigned)_mm256_movemask_epi8(r1);
r1 = _mm256_andnot_si256(r1, permutation_mask);
r1 = _mm256_sad_epu8(r1, invert_mask); // index_bitmap[0:5], low32_spaces_count[7:15]
r2 = _mm256_shuffle_epi8(r2, zero);
r2 = _mm256_sub_epi8(shuffle_a, r2); // add space cnt of low qword
s0 = ~s0;
r3 = _mm256_slli_epi64(r1, 29); // move top part of index_bitmap to high dword
r4 = _mm256_srli_epi64(r1, 7); // number of spaces in low dword
r4 = _mm256_shuffle_epi8(r4, shuffle_b);
r1 = _mm256_or_si256(r1, r3);
r1 = _mm256_permutevar8x32_epi32(lut, r1);
s1 = _mm_popcnt_u32(s0);
r4 = _mm256_add_epi8(r4, shuffle_a);
s0 = s0 & 0xFFFF; // isolate low oword
r2 = _mm256_shuffle_epi8(r4, r2);
s0 = _mm_popcnt_u32(s0);
r2 = _mm256_max_epu8(r2, r4); // pin low qword bytes
r1 = _mm256_xor_si256(r1, fixup);
r1 = _mm256_shuffle_epi8(r1, r2); // complete shuffle mask
r0 = _mm256_shuffle_epi8(r0, r1); // despace!
_mm_storeu_si128((__m128i*)dst, _mm256_castsi256_si128(r0));
_mm_storeu_si128((__m128i*)&dst[s0], _mm256_extracti128_si256(r0,1));
dst += s1;
}
// tail loop
length &= 31;
if (length != 0) {
const uint64_t bitmap = 0xFFFFFFFEFFFFC1FF;
do {
uint64_t c = *src++;
*dst = (uint8_t)c;
dst += ((bitmap >> c) & 1) | ((c + 0xC0) >> 8);
} while (--length);
}
return (size_t)(dst - ((uint8_t*)dst_void));
}
For posterity, the 1 KiB version (generating the table is left as an exercise for the reader).
static const uint64_t table[128] __attribute__((aligned(64))) = {
0x0706050403020100, 0x0007060504030201, ..., 0x0605040302010700, 0x0605040302010007
};
const __m128i mask_01 = _mm_set1_epi8( 0x01 );
__m128i vector0 = _mm_loadu_si128((__m128i*)src);
__m128i vector1 = _mm_shuffle_epi32( vector0, 0x0E );
__m128i bytemask0 = _mm_cmpeq_epi8( ???, vector0); // detect bytes to omit
uint32_t bitmask0 = _mm_movemask_epi8(bytemask0) & 0x7F7F;
__m128i hsum = _mm_sad_epu8(_mm_add_epi8(bytemask0, mask_01), _mm_setzero_si128());
vector0 = _mm_shuffle_epi8(vector0, _mm_loadl_epi64((__m128i*) &table[(uint8_t)bitmask0]));
_mm_storel_epi64((__m128i*)dst, vector0);
dst += (uint32_t)_mm_cvtsi128_si32(hsum);
vector1 = _mm_shuffle_epi8(vector1, _mm_loadl_epi64((__m128i*) &table[bitmask0 >> 8]));
_mm_storel_epi64((__m128i*)dst, vector1);
dst += (uint32_t)_mm_cvtsi128_si32(_mm_unpackhi_epi64(hsum, hsum));
https://github.com/InstLatx64/AVX512_VPCOMPRESSB_Emu has some benchmarks.
If one is willing to use BMI2 available on haswell and later, one can use pdep to first compress unwanted nibbles out from uint64_t, and then use pext to scatter the result to shuffle mask.
// Step 1 -- replicate mask to nibbles
uint64_t change4 = pdep(change1, 0x1111111111111111ULL) * 0x0F;
// Step 2 -- extract index from array of nibbles
uint64_t indices = pext(0xfedcba09876543210, change4);
// Step 3 -- interleave nibbles to octects
uint64_t high = pdep(indices >> 32ULL,0x0F0F0F0F0F0F0F0F);
uint64_t low = pdep(indices, 0x0F0F0F0F0F0F0F0FULL);
// Step 4 -- use these two masks to compress pSrc
__m128i compressed = _mm_shuffle_epi8(pSrc, _mm_set_epi64(high, low));
// Step 5 -- store 16 bytes unaligned
_mm_storeu_si128(pDst, compressed);
// Step 6 -- increment target pointer
pDst += __mm_popcnt(change1);
Also other variants (based on cumulative sum or sorting the 'X's (or zero bits) out from XX23456789abXXef will first require some technique to spread the bits from uint16_t evenly to __m128i (i.e. reverse of movemask_epi8).
The 64k entry LUT can however be split to top and bottom parts:
int c = change1 & 0xff;
int p = __popcount(c);
uint64_t a = LUT256[c]; // low part of index
uint64_t b = LUT256[change1 >> 8]; // top part of index
b += addlut9[p]; // 0x0101010101010101 * p
// Then must concatenate b|a at pth position of 'a'
if (p < 8)
{
a |= b << (8*(8-p));
b >>= 8*p;
}
__m128i d = _mm_shuffle_epi8(_mm_loadu_si128(pSrc),_mm_set1_epi64(b,a));
// and continue with steps 5 and 6 as before

Mifare read APDU command recived 63 00

All!
I'm trying to read data from mifare card 1k.
to get ID
I send: 0xFF 0xCA 0x00 0x00 0x00
Recive: 0x00 0x00 0x00 0x00 0x00 0x00 - ??? it's normal?
to load auth key to reader
I send: 0xFF 0x82 0x00 0x00 0x06 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
Recive: 90 00 - it's ok
to authenticate in block 01
I send: 0xFF 0x86 0x00 0x00 0x05 0x01 0x00 0x01 0x60 0x00
Recive: 90 00 - it's ok
to read data from block 01
I send: 0xFF 0xB0 0x00 0x01 0x0F
Recive: 63 00 - how a understand it's authentication error
I can't understand - why?
My code:
#include "stdafx.h"
#include "Winscard.h"
LPTSTR pmszReaders = NULL;
LPTSTR pmszCards = NULL;
LPTSTR pReader;
LPTSTR pCard;
LONG lReturn, lReturn2;
DWORD cch = SCARD_AUTOALLOCATE;
SCARDCONTEXT hSC;
SCARD_READERSTATE readerState;
LPCTSTR readerName = L"ACS ACR1222 1S Dual Reader 0";
SCARDHANDLE hCardHandle;
DWORD dwAP;
BYTE pbRecv[50];
DWORD dwRecv;
BYTE cmdGetData[] = {0xFF, 0xCA, 0x00, 0x00, 0x00};
BYTE cmdLoadKey[] = {0xFF, 0x82, 0x00, 0x00, 0x06, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
BYTE cmdAuthBlock01[] = {0xFF, 0x86, 0x00, 0x00, 0x05, 0x01, 0x00, 0x01, 0x60, 0x00};
BYTE cmdReadBlock01[] = {0xFF, 0xB0, 0x00, 0x01, 0x0F};
int _tmain(int argc, _TCHAR* argv[]) {
lReturn = SCardEstablishContext(SCARD_SCOPE_USER, NULL, NULL, &hSC);
if ( SCARD_S_SUCCESS != lReturn )
printf("Failed SCardEstablishContext\n");
else {
lReturn = SCardListReaders(hSC, NULL, (LPTSTR)&pmszReaders, &cch );
if (lReturn != SCARD_S_SUCCESS) {
printf("Failed SCardListReaders\n");
} else {
pReader = pmszReaders;
while ( '\0' != *pReader ) {
printf("Reader: %S\n", pReader );
pReader = pReader + wcslen((wchar_t *)pReader) + 1;
}
}
memset(&readerState,0,sizeof(readerState));
readerState.szReader = pmszReaders;
lReturn = SCardConnect( hSC, pmszReaders, SCARD_SHARE_EXCLUSIVE, SCARD_PROTOCOL_T0 | SCARD_PROTOCOL_T1, &hCardHandle, &dwAP );
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardConnect\n");
system("pause");
exit(1);
} else {
printf("Success SCardConnect\n");
switch ( dwAP ) {
case SCARD_PROTOCOL_T0:
printf("Active protocol T0\n");
break;
case SCARD_PROTOCOL_T1:
printf("Active protocol T1\n");
break;
case SCARD_PROTOCOL_UNDEFINED:
default:
printf("Active protocol unnegotiated or unknown\n");
break;
}
}
lReturn = SCardTransmit(hCardHandle, SCARD_PCI_T1, cmdGetData, sizeof(cmdGetData), NULL, pbRecv, &dwRecv);
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardTransmit\n");
} else {
printf("Success SCardTransmit\n");
printf("Read %u bytes\n", dwRecv);
for(byte i=0;i<dwRecv;i++) {
printf("%x ", pbRecv[i]);
}
printf("\n");
}
lReturn = SCardTransmit(hCardHandle, SCARD_PCI_T1, cmdLoadKey, sizeof(cmdLoadKey), NULL, pbRecv, &dwRecv);
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardTransmit\n");
} else {
printf("Success SCardTransmit\n");
printf("Read %u bytes\n", dwRecv);
for(byte i=0;i<dwRecv;i++) {
printf("%x ", pbRecv[i]);
}
printf("\n");
}
lReturn = SCardTransmit(hCardHandle, SCARD_PCI_T1, cmdAuthBlock01, sizeof(cmdAuthBlock01), NULL, pbRecv, &dwRecv);
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardTransmit\n");
} else {
printf("Success SCardTransmit\n");
printf("Read %u bytes\n", dwRecv);
for(byte i=0;i<dwRecv;i++) {
printf("%x ", pbRecv[i]);
}
printf("\n");
}
lReturn = SCardTransmit(hCardHandle, SCARD_PCI_T1, cmdReadBlock01, sizeof(cmdReadBlock01), NULL, pbRecv, &dwRecv);
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardTransmit\n");
} else {
printf("Success SCardTransmit\n");
printf("Read %u bytes\n", dwRecv);
for(byte i=0;i<dwRecv;i++) {
printf("%x ", pbRecv[i]);
}
printf("\n");
}
}
lReturn = SCardDisconnect(hCardHandle, SCARD_LEAVE_CARD);
if ( SCARD_S_SUCCESS != lReturn ) {
printf("Failed SCardDisconnect\n");
} else {
printf("Success SCardDisconnect\n");
}
system("pause");
return 0;
}
Can anyone explain why i got 63 00?
Thanks.
Afair your read command has to be: "0xFF, 0xB0, 0x00, BLOCK, 0x10". You send buffer length 0F - which is decimal 15 - but you have to read 16 Byte, which is 0x10.
Hope this helps
In Mifare Classic 1K tags There are 16 Sectors and each Sectors contains 4 Blocks and each block contains 16 bytes.
Sector 0 contains Block (0,1,2,3)
Sector 1 contains Block (4,5,6,7)
Sector 2 contains Block (8,9,10,11)
Sector 3 contains Block (12,13,14,15)....
Before Reading or writing from a block You must have to Authenticate its corresponding Sector using Key A or Key B of that sector. When Authentication is complete then you can read or write.
using this command you can authenticate sector 0 using KEY A(60)
byte[] authenticationByte = new byte[10];
authenticationByte = new byte[] { (byte) 0xFF, (byte) 0x86, (byte) 0x00,
(byte) 0x00, (byte) 0x05, (byte) 0x00,(byte) 0x00, (byte) 0x04,
(byte) 0x60,(byte) 0x00 };
When Authentication is succes then you will get 90 00. That is Success message. Else response is 63 00 , that means authentication failed. When Authentication complete then you can read block (0,1,2,3) cause sector 0 contains 4 block and those are block (0,1,2,3).
Here your problem is you are authenticating Sector 1 but trying to read data from Sector 0's blocks.
For more details you can read this Answer.
Sorry for bad English

Resources