Understanding KMemLeak backtrace - memory-management

I was interested in having a better grasp of what are all the details shown in the output received by KMemLeak. Taking this trace from the wiki for example:
unreferenced object 0xffff89862ca702e8 (size 32):
comm "modprobe", pid 2088, jiffies 4294680594 (age 375.486s)
hex dump (first 32 bytes):
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
backtrace:
[<00000000e0a73ec7>] 0xffffffffc01d2036
[<000000000c5d2a46>] do_one_initcall+0x41/0x1df
[<0000000046db7e0a>] do_init_module+0x55/0x200
[<00000000542b9814>] load_module+0x203c/0x2480
[<00000000c2850256>] __do_sys_finit_module+0xba/0xe0
[<000000006564e7ef>] do_syscall_64+0x43/0x110
[<000000007c873fa6>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The addresses on the left, are locations in the code segment where
the invocations of the functions appearing in the trace take place?
What does the +0x41/0x1df refer to?
Why the function name is
sometimes missing? (i.e 0xffffffffc01d2036)
Also, noticed that
sometimes I encounter square brackets at the line's end. What do they
mean and when they are used?
Can running KMemLeak detect any other kinds of issues besides "unreferenced object"?
Is it possible to add to those traces the paths of the files in which the functions `appear? (similarly to the way Valgrind's Memcheck traces looks like)
Given two traces, trying to determine whether the two point to the same root cause, how can the age, the PID the hex dump, or the object's size and address become handy? That is, how anything besides the backtrace may help in getting a better understanding of a possible issue.
Thanks.

Related

calculate hash of binary file containing certain bytes

I'm having trouble understanding the principle/method, on how to "manually" calculate a file's hash (sha256) which consist of certain bytes.
To put into an example:
I have this binary file consisting of these bytes.
2C F2 BA A3 0E 26 5A 3B 2A 1F 01 4A 01 66 60 02
How to get following (correct) hash of the file? ea3cbd30dc6c18914d2cdafdd8bec0ff4ce5995c7b484cce3237900336abb574
1.
Convert all bytes to ASCII.
2.
Hash ASCII string to get correct hash from the file.
Doing this manually is not recommended, since e.g. copy and paste, or other factors can easily distort your ASCII string. So optimally this is written within a program to calculate everything altogether.

Algo to find redundant data in a file

I have a binary file where one record is repeated multiple times. The file only consists of this record but may be repeated for a number of times.
I dont know the size of the record. What is the best algorithm to extract the record and know how many times it is repeated.
For example suppose I have a file with following memory representation in hex. (ignore file headers and all stuff)
3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C
BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F 5C BA 3F
5C BA
so here my record is 3F 5C BA of 3 bytes and it is repeated 15 times here.
How can I get these values (sizeof the record and the number of times its repeated). Can be done using Rabin Karp but is there any other better and efficient way to do it.
One possibility is to take the size of the file and factor it. For example, if the file size was 1280, then you know that the record size is one of the following:
1,2,4,5,8,10,16,20,32,40,64,80,128,160,256,320,640,1280
You could then test each of those assumptions until you find a match or exhaust the possibilities.
Of course, this assumes that the file is not truncated or otherwise corrupted.
That's probably not the most efficient way to do it, but it's quick to code and could work quite fast enough for your purposes. It rather depends on how large your files are and how often you'll want to do this. Sometimes the brute force solution is the right solution, even if it's not the "best" solution.
You can look at suffix trees, you can insert all suffixes of your string into a suffix tree and count the number of times a certain substring occurs, then do tree traversal and find your answer.
Start with the assumption that the length l of your record is 1
Check if your assumption is correct by comparing all subsequent blocks of size l. Stop as soon as you find a mismatch.
If no mismatch is found, you are finished. RETURN.
Search for the next occurrence of the block with length l. This gives you another candidate record length. If the next matching block starts at index i (zero based), set l = i and go to step 2.
If you know that there is always a solution, you might be able to speed up step 2 a bit. If you checked 50% of the data, you can stop.
Note: This answer assumes that you are looking for the shortest possible record. If all your bytes are for instance FF, could find a lot of other solutions than l=1 (e.g. only one big record).
Example: Start with a record of size 1, in your case 3F. Then check whether this is the complete record by checking whether all subsequent bytes are 3F as well. You can stop with the next byte because it differs. Now look for the next 3F. It occurs at index 3 (zero based). Now you know your record is at least 3 bytes long. Assume your record is 3 bytes long. Check if all subsequent three byte blocks match your record. Done!

Finding the Encryption or Hashing method used

I am trying to find out what algorithm the client application is using to return the session key.
When I initiate a connection, the server first sends a unique session key. The Client has then to respond with an encrypted or hashed password and send together with the Username to the server.
Sample network trace between client and server: (username: serv1ce / password: test12)
App received from Server << 52 d7 1c 3f 9f 2c 05 c9 (one time session key)
App sent to Server >> 11 83 2d 7d ff 0c 51 8c 53 45 52 56 31 43 45 20
The "53 45 52 56 31 43 45 20" part is the username in clear text as bytes values (serv1ce).
Does anyone know how the bytes "11 83 2d 7d ff 0c 51 8c" have been created with the password 'test12' and the 64bit (8bytes) session key "52 d7 1c 3f 9f 2c 05 c9" ?
If they are using a cryptographically secure hash, then in principle from input and output you should not be able to discover this.
In practice they are returning 8 bytes, which is 64 bits, which suggests that they are using some variant of MD5. If they follow typical practice they are likely to have created a string somehow which includes some combination of the username, password, session key, and a secret hash, then hashed it. (Note that I said typical practice, not best practice. The best practice is to use something slow to calculate for this purpose, such a bcrypt.) If you figure out the magic combination, you have the answer.
You have two decent approaches. The simplest is brute force search. If you search for md5 gpu cracking you can find plenty of tools that let you offload MD5 calculations to your video card. These are ideal for brute force search, and can let you try an astonishing number of variations on the above theme quite quickly. (The feasibility of this attack is why people should use bcrypt for this sort of stuff.)
The other is that you have the application. There are various ways to trace what actually happens inside of the application as it is doing that computation. Succeed in figuring out that, and you'll have the answer.

output only blocks of size n with offset of multiple of stride k from start of binary input in shell

given a blocksize of n and another size k, I search for a way to only output blocks with an offset from the start of an input of a multiple of k.
imagine a file consisting of a number of 4-tuples of 2-byte data. now given this input I want only the first entry of each tuple.
example input:
00 00 11 11 22 22 33 33
44 44 55 55 66 66 77 77
88 88 99 99 aa aa bb bb
cc cc dd dd ee ee ff ff
example output with n=2 and k=8:
00 00 44 44 88 88 cc cc
which is only the first "column" of the input.
Now while it would be simple to do this in perl, python, I need this functionality in a shell script as the target system does not have perl or python but only basic utilities. I'm hoping there is a way to misuse an existing tool for that. If it is not possible I would write some C doing that but I would like to avoid it.
One usecase would be to extract one audio channel from a raw audio file.
A term you might search for (other than "zebra stripes") is "stride." That's what some people call this idea of skipping k bytes each time.
It's not entirely clear from your post, but it looks like you actually want to be able to insert this filter in a pipeline and have it consume raw bytes and output the same. If this is the case, I'm not sure how it can be done easily in plain shell script, so would suggest you either hunker down and write it in C, or get Python or something installed on the target system.

Searching Binary Data in Ruby

Using only pure ruby (or justifiably commonplace gems) is there an efficient way to search a large binary document for a specific string of bytes?
Deeper context: the mpeg4 container format is a 4-byte indexed serialised data structure, without having to parse the structure fully (I can assume it is valid) I want to pull out specific tags.
For those of you that haven't come across this 'dmap' serialization before it works something like this:
<4-byte length<4-byte tag><4-byte length><4-byte type definition><8 bytes of something I can't remember><data>
eg, this defines the 'tvsh' (or TV Show) tag as being 'Futurama'
00 00 00 20 ...
74 76 73 68 tvsh
00 00 00 18 ....
64 61 74 61 data
00 00 00 01 ....
00 00 00 00 ....
46 75 74 75 Futu
72 61 6D 61 rama
The exact structure isn't really important, I'd like to write a method which can pull out the show name when I give it 'tvsh' or that it's season 2 if I give it 'tvsn'.
My first plan would be to use Regular Expressions, but I get the (unjustified) feeling that this would be slow.
Let me know your thoughts! Thanks in advance
In Ruby you can use the /n flag when creating your regex to tell Ruby that your input is 8-bit data.
You could use /(.{4})tvsh(.{4})data(.{8})([\x20-\x7F]+)/n to match 4 bytes, tvsh, 4 bytes, data, 8 bytes, and any number of ASCII characters. I don't see any reason why this regex would be significantly slower to execute than hand-coding a similar search. If you don't care about the 4-byte and 8-byte blocks, /tvsh.{4}data.{8}([\x20-\x7F])/n should be nearly as fast as a literal text search for tvsh.
If I understand your description correctly, whole file consists of a number of such "blocks" of a fixed structure?
In that case, I suggest scanning one by one, and skipping ones not of interest to you. So, your each step should do the following:
Read 8 bytes (using IO#readbytes or a similar method)
From the read header, extract the size (first 4 bytes), and the tag (second 4)
If the tag is the one you need, skip following 16 bytes and read size-24 bytes.
If the tag is not of interest, skip following size-16 bytes.
Repeat.
For skipping bytes, you can use IO#seek.
Theoretically you can use regexes against any arbitrary data, including binary strings. HTH.

Resources