linux kernel image strings extraction - linux-kernel

I'm trying to extract the strings from a binary linux kernel image
(this specific phenomena happens in all types of images I've tried: bzImage, vmlinuz, vmlinux, .... and not a specific one)
Simply running 'strings ' prints many strings with a prefix character, for example:
"4netlink: %d bytes leftover after parsing attributes in process `%s'."
However, looking at the kernel sources, the current string should not include the "4" prefix.
While opening the file using some HEX editor, I've seen that the string actually also includes:
'\x00\x01' and only then '\x34' ("4")
My guess is this is some kind of pointer to a special section, or something of the sorts,
because many other strings include "3" and other numbers (and even characters).
Would appreciate any information in the matter
Thanks!

The prefixes OP is seeing are KERN_<LEVEL> prefixes. These are special string literals to be added before the main printk format specifier, using C's concatenation of adjacent string literals. For example:
printk(KERN_ERR "Something has gone wrong!\n");
From kernel version 3.6 onwards, these KERN_<LEVEL> prefix macros are defined in "include/linux/kern_levels.h" and begin with the ASCII SOH character "\001" followed by the log level as an ASCII digit for the numeric levels, or some other ASCII character for special meanings. The string for KERN_DEFAULT changed from "\001" "d" to "" (empty string) in kernel version 5.1. The string for KERN_CONT changed from "" (empty string) to "\001" "c" in kernel version 4.9.
From kernel version 2.6.37 to 3.5.x, the KERN_<LEVEL> prefix macros were defined in "include/linux/printk.h" and used a different format with the level specified between angle brackets, for example KERN_WARNING was defined as "<4>", KERN_DEFAULT was defined as "<d>", and KERN_CONT was defined as "<c>".
Besides printk, there are other macros for generating kernel logs, some of which specify the KERN_<LEVEL> part implicitly. OP's example from "lib/nlattr.c":
pr_warn_ratelimited("netlink: %d bytes leftover after parsing attributes in process `%s'.\n",
rem, current->comm);
Here, the pr_warn_ratelimited macro is defined in "include/linux/printk.h" as:
#define pr_warn_ratelimited(fmt, ...) \
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
There is a lot going on there, but pr_fmt(fmt) is one or more string literals including fmt macro parameter, so the string passed to printk_ratelimited is constructed from some concatenated string literals beginning with those from the expansion of KERN_WARNING.

Related

What exactly expands the KERN_INFO, and where it is implemented?

In this question: Why doesn't the function printk() use a comma to separate parameters?, someone said KERN_INFO expands to ""\001" "6". I know the first \0 is null character, but then what 01 is? As I suppose to be one in octal. When preprocessor concatenate it together to "\0016", the rest after null is 016, which is 14 in decimal. So I have look up in ascii and found it as 0E SO (shift out)? That doesn't make sense to me and it should have something to do with logging (as it is purpose of printk). So what is the meaning of the KERN_INFO macro sequences after expansion?
Also, I have tried to look in source, in /usr/include/linux/kernel.h, but didn't find there the macro. So is it in kernel.h or somewhere else?
"\001" "6" is two string literals that will be concatenated (with any other adjacent string literals) into a single string literal. (The concatenation is done at translation phase 6 as defined in the C standard.)
The first of those string literals, "\001" contains a single octal escape sequence, defining a single character. An octal escape sequence in a string literal or a character constant consists of the backslash (\) followed by from 1 to 3 octal digits (001 in this case). In this case, the single character has numeric code 1, which corresponds to the ASCII SOH (start of heading) character.
The string literal "\0016" contains sequences for two characters '\001' and '6', because an octal escape sequence is always terminated after at most 3 octal digits.
Escape sequences do not cross the boundary between adjacent string literals. (Escape sequences are expanded at translation phase 3, so are already expanded before adjacent string literals are concatenated at translation phase 6). Therefore, the pair of string literals "\1" "6" is equivalent (after concatenation) to the single string literal "\0016", not "\16".
As mentioned by #Peter L., the KERN_INFO macro and other "kernel level" macros are defined in "include/linux/kern_levels.h" in the Linux kernel source. Actually, that is true since kernel version 3.6. Before kernel version 3.6, they were defined in "include/linux/printk.h" and used a different string format with the kernel level number specified between angle brackets (for example KERN_INFO used to be defined as "<6>").
The purpose of these kernel level macros is to prefix the format string parameter of the printk function with special codes to designate the log-level to use for the message written to the kernel log (apart from KERN_CONT which specifies that the message is to be appended to the previous message).

How can I create a file with null bytes in the filename?

For a security test, I need to pass a file that contains null characters in its content and its filename.
For the body content, it's easy to use printf:
$ printf "Hello\00, Null!" > containsnull.txt
$ xxd contains.null
0000000: 4865 6c6c 6f00 2c20 4e75 6c6c 21 Hello., Null!
But how can I create a file with the null bytes in the name?
Note: A solution in bash, python or nodejs is preferred if possible
It's impossible to create a file name containing a null byte through POSIX or Windows APIs. On all the Unix systems that I'm aware of, it's impossible to create a file name containing a null byte, even with an ill-behaved application that bypasses the normal API, because the kernel itself treats all of its file name inputs as null-terminated strings. I believe this is true on Windows as well but I'm not completely sure.
As an application programmer, in terms of security, this means that you don't need to worry about a file name containing null bytes, if you're sure that what you have is the name of a file. On the other hand, if you're given a string and told to use it as a file name, for example if you're programming a server and letting the client choose file names, you need to ensure that this string does not contain null bytes. That's just one requirement among others including the string length, the presence of a directory separator (/ or \), reserved names (. and .., reserved Windows file names such as nul.txt or prn), etc. On most Unix systems, on their native filesystem, the constraints for a file name are: no null byte or slash, length between 1 and some maximum, and the two names . and .. are reserved. Windows, and non-native filesystems on Unix, have additional constraints (it's possible to put a / in a file name through direct kernel calls on Windows).
To put a null byte in a file's contents, just write a string to a file using any language that allows null bytes in strings. In bash, you cannot store a null byte in a string, so you need to use another method such as printf '\0' or echo "abc" | tr b '\0'.
You don't have to worry about file names containing null bytes on Unixes and Windows because they cannot.
However, a file names that is being treated as UTF-8 can specify the NUL character (U+0000) using invalid "overlong" sequences: two, three or four-byte UTF-8 sequences that have all zeros in their code point payload bits.
This can be a security issue. For instance, A UTF-8 decoder that doesn't check for this can end up generating a wchar_t character value of 0 which then unexpectedly terminates the wide character string.
For instance, the byte sequence C0 80 is an overlong encoding for NUL. This is evidently used by something called "Modified UTF-8" specifically for the purpose of encoding NUL characters that don't terminate the C string being used to hold the UTF-8.
If you're doing security testing, this is relevant; you can test whether programs are susceptible to NUL character (and other) injection via overlong encodings.
Try $'\u000d'
Not actually a null byte, but probably close enough to confuse people since you have to look really close to see that the last character is a D and not a 0, as it will usually print (if not just a blank) as the little box with hex codes in it.
Discovered this when I found a directory in my $HOME, named that...

Significance of an ampersand in VB6 function name?

I just got a bunch of legacy VB6 (!) code dumped on me and I keep seeing functions declared with an ampersand at the end of the name, for example, Private Declare Function ShellExecute& . . ..
I've been unable to find an answer to the significance of this, nor have I been able to detect any pattern in use or signature of the functions that have been named thusly.
Anyone know if those trailing ampersands mean anything to the compiler, or at least if there's some convention that I'm missing? So far, I'm writing it off as a strange programmer, but I'd like to know for sure if there's any meaning behind it.
It means that the function returns a Long (i.e. 32-bit integer) value.
It is equivalent to
Declare Function ShellExecute(...) As Long
The full list of suffixes is as follows:
Integer %
Long &
Single !
Double #
Currency #
String $
As Philip Sheard has said it is an indentifier type for a Long. They are still present in .Net, see this MSDN link and this VB6 article
From the second article:
The rules for forming a valid VB variable name are as follows:
(1) The first character must be a letter A through Z (uppercase or
lowercase letters may be used). Succeeding characters can be letters,
digits, or the underscore (_) character (no spaces or other characters
allowed).
(2) The final character can be a "type-declaration character". Only
some of the variable types can use them, as shown below:
Data Type Type Declaration Character
String $
Integer %
Long &
Single !
Double #
Currency #
Use of type-declaration
characters in VB is not encouraged; the modern style is to use the
"As" clause in a data declaration statement.

Allowed characters in map key identifier in YAML?

Which characters are and are not allowed in a key (i.e. example in example: "Value") in YAML?
According to the YAML 1.2 specification simply advises using printable characters with explicit control characters being excluded (see here):
In constructing key names, characters the YAML spec. uses to denote syntax or special meaning need to be avoided (e.g. # denotes comment, > denotes folding, - denotes list, etc.).
Essentially, you are left to the relative coding conventions (restrictions) by whatever code (parser/tool implementation) that needs to consume your YAML document. The more you stick with alphanumerics the better; it has simply been our experience that the underscore has worked with most tooling we have encountered.
It has been a shared practice with others we work with to convert the period character . to an underscore character _ when mapping namespace syntax that uses periods to YAML. Some people have similarly used hyphens successfully, but we have seen it misconstrued in some implementations.
Any character (if properly quoted by either single quotes 'example' or double quotes "example"). Please be aware that the key does not have to be a scalar ('example'). It can be a list or a map.

User-defined Literals suffix, with *_digit..."?

A user-defined literal suffix in C++0x should be an identifier that
starts with _ (underscore) (17.6.4.3.5)
should not begin with _ followed by uppercase letter (17.6.4.3.2)
Each name that [...] begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
Is there any reason, why such a suffix may not start _ followed by a digit? I.E. _4 or _3musketeers?
Musketeer dartagnan = "d'Artagnan"_3musketeers;
int num = 123123_4; // to be interpreted in base4 system?
string s = "gdDadndJdOhsl2"_64; // base64decoder
The precedent for identifiers of the form _<number> is the function argument placeholder object mechanism in std::placeholders (§20.8.9.1.3), which defines an implementation-defined number of such symbols.
This is a good thing, because it means the user cannot #define any identifier of that form. §17.6.4.3.1/1:
A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.
The name of the user-defined literal function is operator "" _123, not simply _123, so there is no direct conflict between your name and the library name if presence of the using namespace std::placeholders;.
My 2¢, though, is that you would be better off with an operator "" _baseconv and encoding the base within the literal, "123123_4"_baseconv.
Edit: Looking at Johannes' (deleted) answer, there is There may be concern that _123 could be used as a macro by the implementation. This is certainly the realm of theory, as the implementation would have little to gain by such preprocessor use. Furthermore, if I'm not mistaken, the reason for hiding these symbols in std::placeholders, not std itself, is that such names are more likely to be used by the user, such as by inclusion of Boost Bind (which does not hide them inside a named namespace).
The tokens are not reserved for use by the implementation globally (17.6.4.3.2), and there is precedent for their use, so they are at least as safe as, say, forward.
"can" vs "may".
can denotes ability where may denotes permission.
Is there a reason why you would not have permission to the start a user-defined literal suffix with _ followed by a digit?
Permission implies coding standards or best-practices. The examples you provides seem to show that _\d would fine suffixes if used correctly (to denote numeric base). Unfortunately your question can't have a well thought out answer as no one has experience with this new language feature yet.
Just to be clear user-defined literal suffixes can start with _\d.
An underscore followed by a digit is a legal user-defined literal suffix.
The function signature would be:
operator"" _4();
so it couldn;t get eaten by a placeholder.
The literal would be a single preprocessor token:
123123_4;
so the _4 would not get clobbered by a placeholder or a preprocessor symbol.
My reading of 17.6.4.3.5 is that suffixes not containing a leading underscore risk collision with the implementation or future library additions. They also collide with existing suffixes: F, L, ULL, etc. One of the rationales for user-defined literals is that a new type (such as decimals for example) could be defined as a pure library extension including literals with suffuxes d, df, dl.
Then there's the question of style and readability. Personally, I think I would loose sight of the suffix 1234_3; Maybe, maybe not.
Finally, there was some idea that didn't make it into the standard (but I kind of like) to have _ be a literal separator for numbers like in Ada and Ruby. So you could have 123_456_789 to visually separate thousands for example. Your suffix would break if that ever went through.
I knew I had some papers on this subject:
Digital Separators describes a proposal to use _ as a digit separator in numeric literals
Ambiguity and Insecurity with User-Defined literals Describes the evolution of ideas about literal suffix naming and namespace reservation and efforts to deconflict user-defined literals against a future digit separator.
It just doesn't look that good for the _ digit separator.
I had an idea though: how about either a backslash or a backtick for digit separator? It isn't as nice as _ but I don't think there would be any collision as long as the backslash was inside the stream of digits. The backtick has no lexical use currently that I know of.
i = 123\456\789;
j = 0xface\beef;
or
i = 123`456`789;
j = 0xface`beef;
This would leave _123 as a literal suffix.

Resources