generate master key tls1.2

generate master key tls1.2 - https

I'm trying to figure out/generate the master key manually for an https conversation using command line in openssl. I have kept track in my HTTPS conversation all of the pertinent data to the point of the client key exchange, change cipher spec, encrypted handshake message. What I have read in the RFC5246 so far is that to do this requires:
master_secret = PRF(pre_master_secret, "master secret",ClientHello.random + ServerHello.random) [0..47];
Which reading further in the RFC that would be equal to:
P_hash(secret, seed) = HMAC_hash(secret, A(1) + seed) + HMAC_hash(secret, A(2) + seed) + ...
Where secret = premaster secret from client
Where A(0)=SEED = “master secret”+clienthello.random+serverhello.random
A(1)=HMAC_hash(secret,A(0))
A(2)= HMAC_hash(secret,A(1))
and iterate through until I get the needed 48 bytes of the master secret/key
If I am correct in my assumptions I was hoping to iterate through use command line of openssl to get my 48 bytes if that is possible something like this 2 times or as many needed to get 48 bytes. I understand this would just echo to the screen the value which I would of course store to be used in the next iteration.
echo -n "value" | openssl dgst -sha1 -hmac "key"
Am I off base on my interpretation of the RFC or is something like this possible? Am I missing any steps if my interpretation is correct?
Regards
David B

Your question is not clear to me.(English is not my mother tongue.)
But I implemented prf function myself. And tested with test vector.
It works fine.
template<class H> class PRF
{//H is hash function usually sha256
public:
template<class It> void secret(const It begin, const It end) {
for(It it = begin; it != end; it++) secret_.push_back(*it);
hmac_.key(secret_.begin(), secret_.end());
}
void label(const char* p) {
while(*p) label_.push_back(*p++);
}
template<class It> void seed(const It begin, const It end) {
for(It it = begin; it != end; it++) seed_.push_back(*it);
}
std::vector<unsigned char> get_n_byte(int n) {
auto seed = label_;//seed = label + seed_
seed.insert(seed.end(), seed_.begin(), seed_.end());
std::vector<unsigned char> r, v;
std::vector<std::array<unsigned char, H::output_size>> vA;
vA.push_back(hmac_.hash(seed.begin(), seed.end()));//A(1)
while(r.size() < n) {
v.clear();
v.insert(v.end(), vA.back().begin(), vA.back().end());
v.insert(v.end(), seed.begin(), seed.end());
auto h = hmac_.hash(v.begin(), v.end());
r.insert(r.end(), h.begin(), h.end());
vA.push_back(hmac_.hash(vA.back().begin(), vA.back().end()));//A(i+1)
}
while(r.size() != n) r.pop_back();
return r;
}
protected:
HMAC<H> hmac_;
std::vector<unsigned char> secret_, label_, seed_;
};
As you can see in the code, A(i+1) is generated by hashing formerly generated one.
Code is not abstract, but a concrete example.
I hope you can get the info that you need in this code.

Related

Split encrypted messages into chunks and put them together again

I want to send GPG encrypted data via GET request of known format.
Issue #1: Data block size in the request is limited (4096 symbols), and it is not enough for a typical GPG message. So, I need to chunk it.
Issue #2: Chunks may be sent in the wrong order. Each chunk must have a unique message ID and serial number, so the messages can be put together.
GPG has the method to send encrypted data in text format (armoring). RFC 2440 standard allows chunking armored messages:
BEGIN PGP MESSAGE, PART X/Y
Used for multi-part messages, where the armor is split amongst Y
parts, and this is the Xth part out of Y.
BEGIN PGP MESSAGE, PART X
Used for multi-part messages, where this is the Xth part of an
unspecified number of parts. Requires the MESSAGE-ID Armor Header
to be used.
But, unfortunately, I've found no evidence that this feature is implemented in GPG.
And no word about chunking of public keys, which, actually, can be huge too.
So I turned down the idea of using native GPG armors for chunking.
My current home-made solution: binary encrypted data are splitted into chunks, then each chunk is put into a block, which contains UUID (MessageID analog), the serial number of the block, the total number of blocks, and CRC checksum of the block.
Like that:
[ UUID ][ Number ][ Total ][ Chunk of encrypted data ][ Checksum ]
Putting the message together out of that blocks is a bigger challenge, but doable as well.
But I want more clear solution, preferably on C++.
Could you help me?

Qt provides very simple methods for data serialization. I created a class to chunk, store, and rebuild binary data, and for now I don't think I need something more simple.
But, if someone knows a better solution, please share it with me.
#include <QByteArrayView>
#include <QDataStream>
#include <QException>
#include <QUuid>
enum CHUNKER {
MESSAGE_READY = 0,
BLOCK_ADDED
};
struct ChunkedMessage {
QUuid UUID;
QByteArray Data;
};
class Chunker {
public:
Chunker();
~Chunker();
static quint16 GetChecksum(QByteArray *Block);
static QByteArrayList ArmorData(QByteArray *Data, qsizetype *ChunkSize);
CHUNKER AddBlock(QByteArray *Block, ChunkedMessage *Message);
private:
struct MessageBlock {
QUuid UUID;
quint32 Number;
quint32 Total;
QByteArray Data;
};
QMap<QUuid, quint32> Sizes;
QMap<QUuid, QMap<quint32, Chunker::MessageBlock>*> Stack;
MessageBlock DearmorChunk(QByteArray *Block);
bool CheckIntegrity(QUuid *UUID, QByteArray *Reconstructed);
};
Chunker::Chunker() { }
Chunker::~Chunker() { }
quint16 Chunker::GetChecksum(QByteArray *Block) { return qChecksum(QByteArrayView(*Block), Qt::ChecksumIso3309); }
QByteArrayList Chunker::ArmorData(QByteArray *Data, qsizetype *ChunkSize) {
QByteArrayList Result;
QUuid UUID = QUuid::createUuid();
qsizetype RealChunkSize = (*ChunkSize) - sizeof(UUID.toRfc4122()) - sizeof(quint32) - sizeof(quint32) - sizeof(quint16);
const quint32 ChunkCount = ((*Data).length() / RealChunkSize) + 1;
for (auto Pos = 0; Pos < ChunkCount; Pos++) {
QByteArray Block;
QDataStream Stream(&Block, QIODeviceBase::WriteOnly);
Stream << UUID.toRfc4122() << (Pos + 1) << ChunkCount << (*Data).mid(Pos * RealChunkSize, RealChunkSize);
Stream << Chunker::GetChecksum(&Block);
Result.push_back(Block);
}
return Result;
}
Chunker::MessageBlock Chunker::DearmorChunk(QByteArray *Block) {
Chunker::MessageBlock Result;
QDataStream Stream(Block, QIODeviceBase::ReadOnly);
QByteArray ClearBlock = (*Block).chopped(sizeof(quint16));
QByteArray BytesUUID;
quint16 Checksum;
Stream >> BytesUUID >> Result.Number >> Result.Total >> Result.Data >> Checksum;
Result.UUID = QUuid::fromRfc4122(QByteArrayView(BytesUUID));
if (Chunker::GetChecksum(&ClearBlock) != Checksum) throw std::runtime_error("Checksums are not equal");
return Result;
}
bool Chunker::CheckIntegrity(QUuid *UUID, QByteArray *Reconstructed) {
quint32 Size = this->Sizes[*UUID];
if (this->Stack[*UUID]->size() > Size) throw std::runtime_error("Corrupted message blocks");
if (this->Stack[*UUID]->size() < Size) return false;
for (quint32 Counter = 0; Counter < Size; Counter++) {
if (!(this->Stack[*UUID]->contains(Counter + 1))) return false;
(*Reconstructed).append((*(this->Stack[*UUID]))[Counter + 1].Data);
}
return true;
}
CHUNKER Chunker::AddBlock(QByteArray *Block, ChunkedMessage *Message) {
Chunker::MessageBlock DecodedBlock = Chunker::DearmorChunk(Block);
if (!this->Sizes.contains(DecodedBlock.UUID)) {
this->Sizes[(QUuid)DecodedBlock.UUID] = (quint32)DecodedBlock.Total;
this->Stack[(QUuid)DecodedBlock.UUID] = new QMap<quint32, Chunker::MessageBlock>;
}
(*(this->Stack[DecodedBlock.UUID]))[(quint32)(DecodedBlock.Number)] = Chunker::MessageBlock(DecodedBlock);
QByteArray ReconstructedData;
if (this->CheckIntegrity(&DecodedBlock.UUID, &ReconstructedData)) {
(*Message).UUID = (QUuid)(DecodedBlock.UUID);
(*Message).Data = (QByteArray)ReconstructedData;
this->Sizes.remove(DecodedBlock.UUID);
delete this->Stack[DecodedBlock.UUID];
this->Stack.remove(DecodedBlock.UUID);
return CHUNKER::MESSAGE_READY;
}
return CHUNKER::BLOCK_ADDED;
}

Is there any atomicMul() in cuda? [duplicate]

There is atomicAdd and atomicSub but it seems that atomicMul and atomicDiv don't exist! Is it possible? I need to implement the following code:
atomicMul(&accumulation[index],value)
How Can I do?

Ok, I solved. But I cannot understand how atomicMul works and I don't know how to write it for floats.
#include <stdio.h>
#include <cuda_runtime.h>
__device__ double atomicMul(double* address, double val)
{
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val * __longlong_as_double(assumed)));
} while (assumed != old); return __longlong_as_double(old);
}
__global__ void try_atomicMul(double* d_a, double* d_out)
{
atomicMul(d_out,d_a[threadIdx.x]);
}
int main()
{
double h_a[]={5,6,7,8}, h_out=1;
double *d_a, *d_out;
cudaMalloc((void **)&d_a, 4 * sizeof(double));
cudaMalloc((void **)&d_out,sizeof(double));
cudaMemcpy(d_a, h_a, 4 * sizeof(double),cudaMemcpyHostToDevice);
cudaMemcpy(d_out, &h_out, sizeof(double),cudaMemcpyHostToDevice);
dim3 blockDim(4);
dim3 gridDim(1);
try_atomicMul<<<gridDim, blockDim>>>(d_a,d_out);
cudaMemcpy(&h_out, d_out, sizeof(double), cudaMemcpyDeviceToHost);
printf("%f \n",h_out);
cudaFree(d_a);
return 0;
}

I'll supplement horus' answer based on what I understood about atomicCAS. My answer can be wrong in detail, because I didn't look inside the atomicCAS function but just read the documents about it (atomicCAS, Atomic Functions). Feel free to tackle my answer.
How atomicMul works
According to my understanding, the behavior of atomicCAS(int* address, int compare, int val) is following.
Copy *address into old (i.e old = *address)
Store (old == compare ? val : old) to *address. (At this point, the value of old and *address can be different depending on if the condition matched or not.)
Return old
Understanding about its behavior gets better when we look at the atomicMul function's definition together.
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int oldValue = *address_as_ull, assumed; // Modified the name 'old' to 'oldValue' because it can be confused with 'old' inside the atomicCAS.
do {
assumed = oldValue;
// other threads can access and modify value of *address_as_ull between upper and lower line.
oldValue = atomicCAS(address_as_ull, assumed, __double_as_longlong(val *
__longlong_as_double(assumed)));
} while (assumed != oldValue); return __longlong_as_double(oldValue);
What we want to do is read the value from address(its value is eqaul to address_as_ull), and multiply some value to it and then write it back. The problem is other threads can access and modify value of *address between read, modify, and write.
To ensure there was no intercept of other threads, we check if the value of *address is equal to what we assumed to be there. Say that other thread modified value of *address after assumed=oldValue and oldValue = atomicCAS(...). The modified value of *address will be copied to old variable inside the atomicCAS(see behavior 1. of atomicCAS above).
Since atomicCAS updates *address according to *address = (old == compare ? val : old), *address won't be changed (old==*address).
Then atomicCAS returns old and it goes into oldValue so that the loop can keep going and we can try another shot at next iteration. When *addressis not modified between read and write, then val is written to the *address and loop will end.
How to write it for float
short answer :
__device__ float atomicMul(float* address, float val)
{
int* address_as_int = (int*)address;
int old = *address_as_int, assumed;
do {
assumed = old;
old = atomicCAS(address_as_int, assumed, __float_as_int(val *
__float_as_int(assumed)));
} while (assumed != old); return __int_as_float(old);
}
I didn't test it, so there can be some errors. Fix me if I'm wrong.
How does it work :
For some reason, atomicCAS only supports integer types. So we should manually convert float/double type variable into integer type to input to the function and then re-convert the integer result to float/double type. What I've modified above is double to float and unsigned long long to int because the size of float matches to int.

Kyungsu's answer was almost correct. On the line defining old == atomicCAS(...) though, he used __float_as_int when he should have used __int_as_float. I corrected his code below:
__device__ float atomicMul(float* address, float val){
//Implementation of atomic multiplication
//See https://stackoverflow.com/questions/43354798/atomic-multiplication-and-division
int* address_as_int = (int*)address;
int old = *address_as_int;
int assumed;
do {
assumed = old;
old = atomicCAS(address_as_int, assumed, __float_as_int(val * __int_as_float(assumed)));
} while (assumed != old);
return __int_as_float(old);}

How to solve the "R0 invalid mem access 'inv'" error when loading an eBPF file object

I'm trying to load an eBPF object in the kernel with libbpf, with no success, getting the error specified in the title. But let me show how simple my BPF *_kern.c is.
SEC("entry_point_prog")
int entry_point(struct xdp_md *ctx)
{
int act = XDP_DROP;
int rc, i = 0;
struct global_vars *globals;
struct ip_addr addr = {};
struct some_key key = {};
void *temp;
globals = bpf_map_lookup_elem(&globals_map, &i);
if (!globals)
return XDP_ABORTED;
rc = some_inlined_func(ctx, &key);
addr = key.dst_ip;
temp = bpf_map_lookup_elem(&some_map, &addr);
switch(rc)
{
case 0:
if(temp)
{
// no rocket science here ...
} else
act = XDP_PASS;
break;
default:
break;
}
return act; // this gives the error
//return XDP_<whatever>; // this works fine
}
More precisely, the libbpf error log is the following:
105: (bf) r4 = r0
106: (07) r4 += 8
107: (b7) r8 = 1
108: (2d) if r4 > r3 goto pc+4
R0=inv40 R1=inv0 R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3=pkt_end(id=0,off=0,imm=0) R4=inv48 R5=inv512 R6=inv1 R7=inv17 R8=inv1 R10=fp0,call_-1 fp-16=0 fp-32=0 fp-40=0
109: (69) r3 = *(u16 *)(r0 +2)
R0 invalid mem access 'inv'
I really don't see any problem here. I mean, this is so so simple, and yet it breaks. Why shouldn't this work? What am I missing? Either the verifier went crazy, or I'm doing something very stupid.

Ok, so, after 3 days, more precisely 3 x 8 hrs = 24 hrs, worth of code hunting, I think I've finally found the itching problem.
The problem was in the some_inlined_func() all along, it was more tricky then challenging. I'm writing down here a code template explaining the issue, so others could see and hopefully spend less then 24 hrs of headache; I went through hell for this, so stay focused.
__alwais_inline static
int some_inlined_func(struct xdp_md *ctx, /* other non important args */)
{
if (!ctx)
return AN_ERROR_CODE;
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth;
struct iphdr *ipv4_hdr = NULL;
struct ipv6hdr *ipv6_hdr = NULL;
struct udphdr *udph;
uint16_t ethertype;
eth = (struct ethhdr *)data;
if (eth + 1 > data_end)
return AN_ERROR_CODE;
ethertype = __constant_ntohs(eth->h_proto);
if (ethertype == ETH_P_IP)
{
ipv4_hdr = (void *)eth + ETH_HLEN;
if (ipv4_hdr + 1 > data_end)
return AN_ERROR_CODE;
// stuff non related to the issue ...
} else if (ethertype == ETH_P_IPV6)
{
ipv6_hdr = (void *)eth + ETH_HLEN;
if (ipv6_hdr + 1 > data_end)
return AN_ERROR_CODE;
// stuff non related to the issue ...
} else
return A_RET_CODE_1;
/* here's the problem, but ... */
udph = (ipv4_hdr) ? ((void *)ipv4_hdr + sizeof(*ipv4_hdr)) :
((void *)ipv6_hdr + sizeof(*ipv6_hdr));
if (udph + 1 > data_end)
return AN_ERROR_CODE;
/* it actually breaks HERE, when dereferencing 'udph' */
uint16_t dst_port = __constant_ntohs(udph->dest);
// blablabla other stuff here unrelated to the problem ...
return A_RET_CODE_2;
}
So, why it breaks at that point? I think it's because the verifier assumes ipv6_hdr could potentially be NULL, which is utterly WRONG because if the execution ever gets to that point, that's only because either ipv4_hdr or ipv6_hdr has been set (i.e. the execution dies before this point if it's the case of neither IPv4 nor IPv6). So, apparently, the verifier isn't able to infer that. However, there's a catch, it is happy if the validity of also ipv6_hdr is explicitly checked, like this:
if (ipv4_hdr)
udph = (void *)ipv4_hdr + sizeof(*ipv4_hdr);
else if (ipv6_hdr)
udph = (void *)ipv6_hdr + sizeof(*ipv6_hdr);
else return A_RET_CODE_1; // this is redundant
It also works if we do this:
// "(ethertype == ETH_P_IP)" instead of "(ipv4_hdr)"
udph = (ethertype == ETH_P_IP) ? ((void *)ipv4_hdr + sizeof(*ipv4_hdr)) :
((void *)ipv6_hdr + sizeof(*ipv6_hdr));
So, it seems to me there's something strange about the verifier here, because it's not smart enough (maybe neither it needs to be?) to realize that if it ever gets to this point, it's only because ctx refers either an IPv4 or IPv6 packet.
How does all of this explain the complaining over return act; within the entry_point()? Simple, just bear with me. The some_inlined_func() isn't changing ctx, and its remaining args aren't used either by entry_point(). Thus, in case of returning act, as it depends on the some_inlined_func() outcome, the some_inlined_func() gets executed, with the verifier complaining at that point. But, in case of returning XDP_<whatever>, as the switch-case body, and neither the some_inlined_func(), doesn't change the internal state of the entry_point() program/function, the compiler (with O2) is smart enough to realize that there's no point in producing assembly for some_inlined_func() and the whole switch-case (that's the O2 optimization over here). Therefore, to conclude, in case of returning XDP_<whatever>, the verifier was happy as the problem actually lies into some_inlined_func() but the actual produced BPF assembly doesn't have anything of that, so the verifier didn't checked some_inlined_func() because there wasn't any in the first place. Makes sense?
Is such BPF "limitation" known? Is out there any document at all stating such known limitations? Because I didn't found any.

Insert a map into other map with a unique type key (2)

hello guys i am new to maps in C++ i am having a question regarding copying a particular type map to another map of same kind the details are shown below
I initially declared a map like this
map<string,int> objmap,obj_porcess ;
for(int i = 0; i < 10; i++) {
objmap ["process"+to_string(i)]=i+10//some processing the to_string is just in case but i have strings with names for all 10 values
}
like
objmap["process_today"]=1;
objmap["process_yesterday"]=-1;
objmap["process_tommorow"]=2;
now i want to define some thing like this just my key word should be added with the process and remaining all can be same for all the keys from obj_process
obj_process["today"]=objmap["process_today"] ;
instead of defining all 10 can i have a simple code cause in here i took an example of 10 but i have like 200 set of different strings in the key of map
i already asked a qn for exact opposite one this was my previous qn now when i try its vice versa i got an issue hope i find some help

If you can initialize both at the same time, the solution is straightforward:
const std::vector<std::string> days = {"today", "yesterday", /*...*/};
for(const auto& d : days)
{
objmap["process_" + d] = foo();
obj_process[d] = foo();
}
If you cannot, you should be able to iterate over objmap and get rid of the "process_" prefix with some basic string manipulation:
constexpr auto prefix_length = 8; // length of "process_"
for (const auto& p : objmap)
{
const auto& key = p.first;
const auto& processed_key = key.substr(prefix_length);
obj_process[processed_key] = objmap[key];
}

vsnprintf on an ATMega2560

I am using a toolkit to do some Elliptical Curve Cryptography on an ATMega2560. When trying to use the print functions in the toolkit I am getting an empty string. I know the print functions work because the x86 version prints the variables without a problem. I am not experienced with ATMega and would love any help on this matter. The print code is included below.
Code to print a big number (it itself calls a util_print)
void bn_print(bn_t a) {
int i;
if (a->sign == BN_NEG) {
util_print("-");
}
if (a->used == 0) {
util_print("0\n");
} else {
#if WORD == 64
util_print("%lX", (unsigned long int)a->dp[a->used - 1]);
for (i = a->used - 2; i >= 0; i--) {
util_print("%.*lX", (int)(2 * (BN_DIGIT / 8)),
(unsigned long int)a->dp[i]);
}
#else
util_print("%llX", (unsigned long long int)a->dp[a->used - 1]);
for (i = a->used - 2; i >= 0; i--) {
util_print("%.*llX", (int)(2 * (BN_DIGIT / 8)),
(unsigned long long int)a->dp[i]);
}
#endif
util_print("\n");
}
}
The code to actually print a big number variable:
static char buffer[64 + 1];
void util_printf(char *format, ...) {
#ifndef QUIET
#if ARCH == AVR
char *pointer = &buffer[1];
va_list list;
va_start(list, format);
vsnprintf(pointer, 128, format, list);
buffer[0] = (unsigned char)2;
va_end(list);
#elif ARCH == MSP
va_list list;
va_start(list, format);
vprintf(format, list);
va_end(list);
#else
va_list list;
va_start(list, format);
vprintf(format, list);
fflush(stdout);
va_end(list);
#endif
#endif
}
edit: I do have UART initialized and can output printf statments to a console.

I'm one of the authors of the RELIC toolkit. The current util_printf() function is used to print inside the Avrora simulator, for debugging purposes. I'm glad that you could adapt the code to your purposes. As a side note, the buffer size problem was already fixed in more recent releases of the toolkit.
Let me know you have further problems with the library. You can either contact me personally or write directly to the discussion group.
Thank you!

vsnprintf store it's output on the given buffer (which in this case is the address point by pointer variable), in order for it to show on the console (through UART) you must send your buffer using printf (try to add printf("%s", pointer) after vsnprintf), if you're using avr-libc don't forget to initialized std stream first before making any call to printf function
oh btw your code is vulnerable to buffer overflow attack, buffer[64 + 1] means your buffer size is only 65 bytes, vsnprintf(pointer, 128, format, list); means that the maximum buffer defined by your application is 128 bytes, try to change it below 65 bytes in order to avoid overflow

Alright so I found a workaround to print the bn numbers to a stdout on an ATMega2560. The toolkit comes with a function that writes a variable to a string (bn_write_str). So I implemented my own print function as such:
void print_bn(bn_t a)
{
char print[BN_SIZE]; // max precision of a bn number
int bi = bn_bits(a); // get the number of bits of the number
bn_write_str(print, bi, a, 16) // 16 indicates the radix (hexadecimal)
printf("%s\n"), print);
}
This function will print a bn number in hexadecimal format.
Hope this helps anyone using the RELIC toolkit with an AVR.
This skips the util_print calls.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

generate master key tls1.2 - https

Related

Split encrypted messages into chunks and put them together again

Is there any atomicMul() in cuda? [duplicate]

How to solve the "R0 invalid mem access 'inv'" error when loading an eBPF file object

Insert a map into other map with a unique type key (2)

vsnprintf on an ATMega2560

Categories

Resources