How to compute SHA1 of an array in Linux kernel - linux-kernel

I'm trying to compute SHA1 of an integer array in the Linux kernel. I have gone through crypto.c/crypto.h and security/integrity/ima/ima_crypto.c but I can't figure out how to init and then update the SHA1 computer. Can someone point me to a tutorial or guide on how to go about doing this?

There's a pretty good introduction to the linux cryptography api in Documentation/crypto/api-intro.txt. Also check out fs/ecryptfs/crypto.c for a real-life example of how the functions are used.
Here's a quick summary though to get you start:
Step 1: Declaration
Create some local variables:
struct scatterlist sg;
struct hash_desc desc;
char *plaintext = "plaintext goes here";
size_t len = strlen(plaintext);
u8 hashval[20];
A struct scatterlist is used to hold your plaintext in a format the crypto.h functions can understand, while a struct hash_desc is used to configure the hashing.
The variable plaintext holds our plaintext string, while hashval will hold the hash of our plaintext.
Finally, len holds the length the plaintext string.
Note that while I'm using ASCII plaintext in this example, you can pass an integer array as well -- just store the total memory size in len and replace every instance of plaintext with your integer array:
int myarr[4] = { 1, 3, 3, 7 };
size_t len = sizeof(myarr);
Be careful though: an int element generally has a size greater than a byte, so storing integer values in an int array won't have the same internal representation as a char array -- you may end up with null bytes as padding in between values.
Furthermore, if your intention is to hash the ASCII representation of your integers, you will have to first convert the values in your array to a string character sequence (perhaps using sprintf).
Step 2: Initialization
Initialize sg and desc:
sg_init_one(&sg, plaintext, len);
desc.tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
Notice that "sha1" is passed to crypto_alloc_hash; this can be set to "md5" for MD5 hashing, or any other supported string in order to use the respective hashing method.
Step 3: Hashing
Now perform the hashing with three function calls:
crypto_hash_init(&desc);
crypto_hash_update(&desc, &sg, len);
crypto_hash_final(&desc, hashval);
crypto_hash_init configures the hashing engine according to the supplied struct hash_desc.
crypto_hash_update performs the actual hashing method on the plaintext.
Finally, crypto_hash_final copies the hash to a character array.
Step 4: Cleanup
Free allocated memory held by desc.tfm:
crypto_free_hash(desc.tfm);
See also
how to use CryptoAPI in the linux kernel 2.6

Related

How to change a boost::multiprecision::cpp_int from big endian to little endian

I have a boost::multiprecision::cpp_int in big endian and have to change it to little endian. How can I do that? I tried with boost::endian::conversion but that did not work.
boost::multiprecision::cpp_int bigEndianInt("0xe35fa931a0000*);
boost::multiprecision::cpp_int littleEndianInt;
littleEndianIn = boost::endian::endian_reverse(m_cppInt);
The memory layout of boost multi-precision types is implementation detail. So you cannot assume much about it anyways (they're not supposed to be bitwise serializable).
Just read a random section of the docs:
MinBits
Determines the number of Bits to store directly within the object before resorting to dynamic memory allocation. When zero, this field is determined automatically based on how many bits can be stored in union with the dynamic storage header: setting a larger value may improve performance as larger integer values will be stored internally before memory allocation is required.
It's not immediately clear that you have any chance at some level of "normal int behaviour" in memory layout. The only exception would be when MinBits==MaxBits.
Indeed, we can static_assert that the size of cpp_int with such backend configs match the corresponding byte-sizes.
It turns out that there's even a promising tag in the backend base-class to indicate "triviality" (this is truly promising): trivial_tag, so let's use it:
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
template <int bits> using simple_be =
mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>;
template <int bits> using my_int =
mp::number<simple_be<bits>, mp::et_off>;
using my_int8_t = my_int<8>;
using my_int16_t = my_int<16>;
using my_int32_t = my_int<32>;
using my_int64_t = my_int<64>;
using my_int128_t = my_int<128>;
using my_int192_t = my_int<192>;
using my_int256_t = my_int<256>;
template <typename Num>
constexpr bool is_trivial_v = Num::backend_type::trivial_tag::value;
int main() {
static_assert(sizeof(my_int8_t) == 1);
static_assert(sizeof(my_int16_t) == 2);
static_assert(sizeof(my_int32_t) == 4);
static_assert(sizeof(my_int64_t) == 8);
static_assert(sizeof(my_int128_t) == 16);
static_assert(is_trivial_v<my_int8_t>);
static_assert(is_trivial_v<my_int16_t>);
static_assert(is_trivial_v<my_int32_t>);
static_assert(is_trivial_v<my_int64_t>);
static_assert(is_trivial_v<my_int128_t>);
// however it doesn't scale
static_assert(sizeof(my_int192_t) != 24);
static_assert(sizeof(my_int256_t) != 32);
static_assert(not is_trivial_v<my_int192_t>);
static_assert(not is_trivial_v<my_int256_t>);
}
Conluding: you can have trivial int representation up to a certain point, after which you get the allocator-based dynamic-limb implementation no matter what.
Note that using unsigned_packed instead of unsigned_magnitude representation never leads to a trivial backend implementation.
Note that triviality might depend on compiler/platform choices (it's likely that cpp_128_t uses some builtin compiler/standard library support on GCC, e.g.)
Given this, you MIGHT be able to pull of what you wanted to do with hacks IF your backend configuration support triviality. Sadly I think it requires you to manually overload endian_reverse for 128 bits case, because the GCC builtins do not have __builtin_bswap128, nor does Boost Endian define things.
I'd suggest working off the information here How to make GCC generate bswap instruction for big endian store without builtins?
Final Demo (not complete)
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/endian/buffers.hpp>
namespace mp = boost::multiprecision;
namespace be = boost::endian;
template <int bits> void check() {
using T = mp::number<mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>, mp::et_off>;
static_assert(sizeof(T) == bits/8);
static_assert(T::backend_type::trivial_tag::value);
be::endian_buffer<be::order::big, T, bits, be::align::no> buf;
buf = T("0x0102030405060708090a0b0c0d0e0f00");
std::cout << std::hex << buf.value() << "\n";
}
int main() {
check<128>();
}
(Changing be::order::big to be::order::native obviously makes it compile. The other way to complete it would be to have an ADL accessible overload for endian_reverse for your int type.)
This is both trivial and in the general case unanswerable, let me explain:
For a general N-bit integer, where N is a large number, there is unlikely to be any well defined byte order, indeed even for 64 and 128 bit integers there are more than 2 possible orders in use: https://en.wikipedia.org/wiki/Endianness#Middle-endian.
On any platform, with any native endianness you can always extract the bytes of a cpp_int, the first example here: https://www.boost.org/doc/libs/1_73_0/libs/multiprecision/doc/html/boost_multiprecision/tut/import_export.html#boost_multiprecision.tut.import_export.examples shows you how. When exporting bytes like this, they are always most significant byte first, so you can subsequently rearrange them how you wish. You should not however, rearrange them and load them back into a cpp_int as the class won't know what to do with the result!
If you know that the value is small enough to fit into a native integer type, then you can simply cast to the native integer and use a system API on the result. As in endian_reverse(static_cast<int64_t>(my_cpp_int)). Again, don't assign the result back into a cpp_int as it requires native byte order.
If you wish to check whether a value is small enough to fit in an N-bit integer for the approach above, you can use the msb function, which returns the index of the most significant bit in the cpp_int, add one to that to obtain the number of bits used, and filter out the zero case and the code looks like:
unsigned bits_used = my_cpp_int.is_zero() ? 0 : msb(my_cpp_int) + 1;
Note that all of the above use completely portable code - no hacking of the underlying implementation is required.

Why does golang implement different behavior on `[]`operator between slice and map? [duplicate]

This question already has answers here:
Why are map values not addressable?
(2 answers)
Closed 4 years ago.
type S struct {
e int
}
func main() {
a := []S{{1}}
a[0].e = 2
b := map[int]S{0: {1}}
b[0].e = 2 // error
}
a[0] is addressable but b[0] is not.
I know first 0 is an index and second 0 is a key.
Why golang implement like this? Any further consideration?
I've read source code of map in github.com/golang/go/src/runtime and map structure already supported indirectkey and indirectvalue if maxKeySize and maxValueSize are little enough.
type maptype struct {
...
keysize uint8 // size of key slot
indirectkey bool // store ptr to key instead of key itself
valuesize uint8 // size of value slot
indirectvalue bool // store ptr to value instead of value itself
...
}
I think if golang designers want this syntax, it works easy now.
Of course indirectkey indirectvalue may cost more resource and GC also need do more work.
So performance is the only reason for supporting this?
Or any other consideration?
In my opinion, supporting syntax like this is valuable.
As far as I known,
That's because a[0] can be replaced with address of array.
Similarly, a[1] can be replace with a[0]+(keySize*1).
But, In case of map one cannot do like that, hash algorithm changes from time to time based on your key, value pairs and number of them.
They are also rearranged from time to time.
specific computation is needed in-order to get the address of value.
Arrays or slices are easily addressable, but in case of maps it's like multiple function calls or structure look-ups ...
If one is thinking to replace it with what ever computation is needed, then binary size is going to be increased in orders of magnitude, and more over hash algorithm can keep changing from time to time.

Generating integer within range from unique string in ruby

I have a code that should get unique string(for example, "d86c52ec8b7e8a2ea315109627888fe6228d") from client and return integer more than 2200000000 and less than 5800000000. It's important, that this generated int is not random, it should be one for one unique string. What is the best way to generate it without using DB?
Now it looks like this:
did = "d86c52ec8b7e8a2ea315109627888fe6228d"
min_cid = 2200000000
max_cid = 5800000000
cid = did.hash.abs.to_s.split.last(10).to_s.to_i
if cid < min_cid
cid += min_cid
else
while cid > max_cid
cid -= 1000000000
end
end
Here's the problem - your range of numbers has only 3.6x10^9 possible values where as your sample unique string (which looks like a hex integer with 36 digits) has 16^32 possible values (i.e. many more). So when mapping your string into your integer range there will be collisions.
The mapping function itself can be pretty straightforward, I would do something such as below (also, consider using only a part of the input string for integer conversion, e.g. the first seven digits, if performance becomes critical):
def my_hash(str, min, max)
range = (max - min).abs
(str.to_i(16) % range) + min
end
my_hash(did, min_cid, max_cid) # => 2461595789
[Edit] If you are using Ruby 1.8 and your adjusted range can be represented as a Fixnum, just use the hash value of the input string object instead of parsing it as a big integer. Note that this strategy might not be safe in Ruby 1.9 (per the comment by #DataWraith) as object hash values may be randomized between invocations of the interpreter so you would not get the same hash number for the same input string when you restart your application:
def hash_range(obj, min, max)
(obj.hash % (max-min).abs) + [min, max].min
end
hash_range(did, min_cid, max_cid) # => 3886226395
And, of course, you'll have to decide what to do about collisions. You'll likely have to persist a bucket of input strings which map to the same value and decide how to resolve the conflicts if you are looking up by the mapped value.
You could generate a 32-bit CRC, drop one bit, and add the result to 2.2M. That gives you a max value of 4.3M.
Alternately you could use all 32 bits of the CRC, but when the result is too large, append a zero to the input string and recalculate, repeating until you get a value in range.

Little endian data and sha 256

I have to generate sha256 hashes of data that is in little endian form. I would like to know if I have to convert it to big endian first, before using the sha 256 algorithm. Or if, the algorithm is "endian-agnostic".
EDIT: Sorry, I think I wasnt clear. What I would like to know is the following: The sha256 algorithm requires to pad the end of a message with certain bits. The first step is to add a 1 at the end of the message. Then, to pad it with zero up to the end. At the very end, you must add the length of the message in bits. What I would like to know is if this padding can be performed in little endian. For example, for a 640 bit message, I could write the last word as 0x280 (in big endian), or 0x8002000 (in little endian). Can this padding be done in little endian?
SHA256 is endian-agnostic if all you want is a good hash. But if you are writing SHA256 and want to the same results with a correct implementation then you must play games on little endian hardware. SHA256 combines arithmetic addition (mod 2*32) and boolean operation thus is not endian-agnostic internally.
The SHA-256 implementation itself should take care of padding - you shouldn't have to deal with that unless you're implementing your own specialized SHA-256 code. If you are, note that the padding rules specified in the "pre-processing step" say that the length is a 64-bit big-endian integer. See SHA-2 - Wikipedia
It's hard to even figure out what "endian-agnostic" would mean, but the order of all the bits, bytes and words for a hash algorithm matter a whole lot, so I sure wouldn't use that term.
Let me reply regarding sha 256 as well as sha 512.
in short:
The algorithm itself is endian agnostic. The endian sensitive parts are when data is imported from a byte buffer to the algorithm working variables and when it is exported back to the digest result - also a byte buffer. If the import / export include casting, then endian matters.
Where could casting occur:
In sha 512 there is a working buffer of 128 bytes.
In my code its defined like this:
union
{
U64 w [80]; (see U64 example below)
byte buffer [128];
};
Input data is copied to this byte buffer and then work is done on W. This means the data was casted to some 64 bit type. This data will have to be swapped. in my case its swapped for little endian machines.
A better method would be to prepare a get macro that takes each byte and places it in its correct place in the u64 type.
When the algorithm is done the digest result is output from the working variables to some byte buffer, if this is done by memcpy it will also have to be swapped.
Another casting could occur when implementing sha 512 - which is designed for 64 bit machines - on 32 bit machines. In my case I have a 64 bit type that is defined:
typedef struct {
uint high;
uint low;
} U64;
Assume I define it for little endian as well, as follows:
typedef struct {
uint low;
uint high;
} U64;
And then the k algorithm init is done like this:
static const SHA_U64 k[80] =
{
{0xD728AE22, 0x428A2F98}, {0x23EF65CD, 0x71374491}, ...
...
...
}
But i need the logic value of k[0].high to be the same in any machine.
So in this example I will need another k array with high and low values swapped.
After the data is stored in the working parameters any bitwise manipulation would have the same result on both big/little endian machines.
Good method would be to avoid any casting:
Import bytes from input buffer to your working parameters using macro.
Work with logical values without thinking about the memory mapping.
Export output to digest result with a macro.
Macro for taking 32 bits from a byte buffer to int32 (BE = big endian):
#define GET_BE_BYTES_FROM32(a)
((((NQ_UINT32) (a)[0]) << 24) |
(((NQ_UINT32) (a)[1]) << 16) |
(((NQ_UINT32) (a)[2]) << 8) |
((NQ_UINT32) (a)[3]))
#define GET_LE_BYTES_FROM32(a)
((((NQ_UINT32) (a)[3]) << 24) |
(((NQ_UINT32) (a)[2]) << 16) |
(((NQ_UINT32) (a)[1]) << 8) |
((NQ_UINT32) (a)[0]))

What's the best way of hashing this complex structure in VB6?

I have the following structures defined (names are anonymised, but data types are correct):
Public Type ExampleDataItem
Limit As Integer ' could be any value 0-999
Status As Integer ' could be any value 0-2
ValidUntil As Date ' always a valid date
End Type
Public Type ExampleData
Name As String ' could be 5-20 chars long
ValidOn As Date ' could be valid date or 1899-12-30 representing "null"
Salt As Integer ' random value 42-32767
Items(0 To 13) As ExampleDataItem
End Type
I would like to generate a 32-bit hash code for an ExampleData instance. Minimising hash collisions is important, performance and data order is not important.
So far I have got (in pseudocode):
Serialise all members into one byte array.
Loop through the byte array, reading 4 bytes at a time into a Long value.
XOR all the Long values together.
I can't really post my code because it's heavily dependent on utility classes to do the serialisation, but if anyone wants to see it regardless then I will post it.
Will this be OK, or can anyone suggest a better way of doing it?
EDIT:
This code is being used to implement part of a software licensing system. The purpose of the hash is to confirm whether the data entered by the end user equals the data entered by the tech support person. The hash must therefore:
Be very short. That's why I thought 32 bits would be most suitable, because it can be rendered as a 10-digit decimal number on screen. This is easy, quick and unambiguous to read over the telephone and type in.
Be derived from all the fields in the data structure, with no extra artificial keys or any other trickery.
The hash is not required for lookup, uniqueness testing, or to store ExampleData instances in any kind of collection, but only for the one purpose described above.
Can you use the CRC32? Steve McMahon has an implementation. Combine that with a bit of base32 encoding and you've got something short enough to read over the phone.
Considering that performance is not an objective, if file size is not important and you want a unique value for each item. Just add an ID field. It data type is a string. Then use this function to generate a GUID. This will be a unique ID. Use it as a key for a dictonary or collection.
Public Type GUID
Data1 As Long
Data2 As Integer
Data3 As Integer
Data4(7) As Byte
End Type
Public Type GUID2 '15 BYTES TOTAL
Data1(14) As Byte
End Type
Public Declare Function CoCreateGuid Lib "OLE32.DLL" (pGuid As GUID) As Long
Public Function GetGUID() As String
Dim VBRIG_PROC_ID_STRING As String
VBRIG_PROC_ID_STRING = "GetGUID()"
Dim lResult As Long
Dim lguid As GUID
Dim MyguidString As String
Dim MyGuidString1 As String
Dim MyGuidString2 As String
Dim MyGuidString3 As String
Dim DataLen As Integer
Dim StringLen As Integer
Dim i As Integer
On Error GoTo error_olemsg
lResult = CoCreateGuid(lguid)
If lResult = 0 Then
MyGuidString1 = Hex$(lguid.Data1)
StringLen = Len(MyGuidString1)
DataLen = Len(lguid.Data1)
MyGuidString1 = LeadingZeros(2 * DataLen, StringLen) & MyGuidString1
'First 4 bytes (8 hex digits)
MyGuidString2 = Hex$(lguid.Data2)
StringLen = Len(MyGuidString2)
DataLen = Len(lguid.Data2)
MyGuidString2 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString2)
'Next 2 bytes (4 hex digits)
MyGuidString3 = Hex$(lguid.Data3)
StringLen = Len(MyGuidString3)
DataLen = Len(lguid.Data3)
MyGuidString3 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString3)
'Next 2 bytes (4 hex digits)
GetGUID = MyGuidString1 & MyGuidString2 & MyGuidString3
For i = 0 To 7
MyguidString = MyguidString & Format$(Hex$(lguid.Data4(i)), "00")
Next i
'MyGuidString contains last 8 bytes of Guid (16 hex digits)
GetGUID = GetGUID & MyguidString
Else
GetGUID = "00000000" ' return zeros if function unsuccessful
End If
Exit Function
error_olemsg:
GetGUID = "00000000"
Exit Function
End Function
Public Function LeadingZeros(ExpectedLen As Integer, ActualLen As Integer) As String
LeadingZeros = String$(ExpectedLen - ActualLen, "0")
End Function
EDIT: the question has now been edited to clarify that the goal is detecting typing errors, not minimizing collisions between totally different values. In that case Dan F's answer is the best one IMHO, not my offering below (wonderful though it is).
You could use the Microsoft CryptoAPI rather than rolling your own hash algorithm.
For instance this Microsoft article on using CryptoAPI from VB6 should get you started.
Or this from Edanmo on mvps.org for hashing a string in VB6.
EDIT: Following comment. If you insist on a 32-bit value, it will be hard to minimize hash collisions. My algorithm book suggests using Horner's method as a decent general purpose hashing algorithm. I don't have time right now to find out more information and implement in VB6. CopyMemory would probably be useful :)
You may be overthinking it, or I'm not understanding the issue. You could essentially just
hash(CStr(Salt) + Name + CStr(ValidOn) + Anyotherstrings
There is no particular need to go through the process of serializing into byte array and XORing values. Infact XORing values together in that way is more likely to create hash collisions where you aren't intending them.
Edit: I think I understand now. You're creating your own hash value by XORing the data together? It's unfortunately quite likely to give collisions. I know VB6 doesn't include any hashing algorithms, so you may be best importing and using something like Phil Fresle's SHA256 implementation.

Resources