PGP: Where can I find a list of supported algorithms (name+number)? - gnupg

When you generate an PGP key-pair, you can choose a Public-Key algorithm:
$ gpg --expert --full-gen-key
gpg (GnuPG) 2.2.19; Copyright (C) 2019 Free Software Foundation, Inc.
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
(7) DSA (set your own capabilities)
(8) RSA (set your own capabilities)
(9) ECC and ECC
(10) ECC (sign only)
(11) ECC (set your own capabilities)
(13) Existing key
(14) Existing key from card
Your selection?
When you list/browse PGP public keys, the used algorithms for this key is represented as a number. Example with a simple RSA 2048 key:
$ gpg --export me#localhost.com | gpg --list-packets --verbose
...
:public key packet:
version 4, algo 1, created 1531406055, expires 0s 0
...
:signature packet: algo 1, keyid 47F915B113C9BC18
version 4, created 1531406055, md5len 0, sigclass 0x13
digest algo 2, begin of digest 7a 9c
...
:public sub key packet:
version 4, algo 1, created 1531406055, expires 0
I'm here talking about the algo 1, digest algo 8, algo 2 etc.
I'm looking for a complete list where I can find the name of each algo, given this algo number.
I found a list in the RFC 4880 (OpenPGP Message Format):
ID Algorithm
-- ---------
1 - RSA (Encrypt or Sign) [HAC]
2 - RSA Encrypt-Only [HAC]
3 - RSA Sign-Only [HAC]
16 - Elgamal (Encrypt-Only) [ELGAMAL] [HAC]
17 - DSA (Digital Signature Algorithm) [FIPS186] [HAC]
18 - Reserved for Elliptic Curve
19 - Reserved for ECDSA
20 - Reserved (formerly Elgamal Encrypt or Sign)
21 - Reserved for Diffie-Hellman (X9.42,
as defined for IETF-S/MIME)
100 to 110 - Private/Experimental algorithm
But this list seems to be incomplete: if I generate a key with ECC algorithm (Elliptic Curve Cryptography) and Curve 25519, the public key algo is 22 which is not in the list above.
However gpg binary is aware of this algo name:
$ gpg --list-keys
pub ed25519 2022-04-06 [SC]
7D438CA8D0C6D57EA168521C2C800B246796CFC9
uid [ultimate] John <john.doe#ed25519.org>
sub cv25519 2022-04-06 [E]
Is there an up-to-date list of all available algos and their associated number somewhere ?

Not sure this fully covers your needs, but in addition to the RFC4880 - sections 9.1 to 9.4, that has the following lists:
9.1. Public-Key Algorithms
9.2. Symmetric-Key Algorithms
9.3. Compression Algorithms
9.4. Hash Algorithms
Here's what I could find:
Elliptic Curve Cryptography (ECC) in OpenPGP
RFC6637, section 5 - https://www.rfc-editor.org/rfc/rfc6637#section-5
"Unknown algorithm 22" thread
https://lists.gnupg.org/pipermail/gnupg-devel/2017-April/032762.html
Algorithm 22 seems to be listed in this thread:
Right we are a bit faster than the specs. The OpenPGP WG agreed on
using 22 for EdDSA in mid 2014. The draft-koch-eddsa-for-openpgp-00
specified the algorithms; meanwhile superseded by
draft-ietf-openpgp-rfc4880bis-01.
+-----------+----------------------------------------------------+
| ID | Algorithm |
+-----------+----------------------------------------------------+
| 1 | RSA (Encrypt or Sign) [HAC] |
| 2 | RSA Encrypt-Only [HAC] |
| 3 | RSA Sign-Only [HAC] |
| 16 | Elgamal (Encrypt-Only) [ELGAMAL] [HAC] |
| 17 | DSA (Digital Signature Algorithm) [FIPS186] [HAC] |
| 18 | ECDH public key algorithm |
| 19 | ECDSA public key algorithm [FIPS186] |
| 20 | Reserved (formerly Elgamal Encrypt or Sign) |
| 21 | Reserved for Diffie-Hellman |
| | (X9.42, as defined for IETF-S/MIME) |
| 22 | EdDSA [I-D.irtf-cfrg-eddsa] |
| 100--110 | Private/Experimental algorithm |
+-----------+----------------------------------------------------+
Note: just in case it helps you as it helped me, "digest" is the output of a hash algorithm.

Related

Extract raw (octet) private key from PKSC8 EC with OpenSSL CLI

Is there a way to extract a raw/octet private key from EC PKCS#8?
Here is an example file (private.pem):
-----BEGIN PRIVATE KEY-----
MIH3AgEAMBAGByqGSM49AgEGBSuBBAAjBIHfMIHcAgEBBEIA1tZ6QFxLWMJyp7vO
pDNj2Wbu2or9QaxJ3ehpi1qaVF/otjrx3Q/AMso4W9a6YQ4heDCH1rned0C2VdyK
f8n0bcugBwYFK4EEACOhgYkDgYYABAGi+uY7a67sTbwOAK/+aNUewZ3haLUV4INx
Fnk6E1iNee0YvyQ5XJrowSWjW6YfBTjYKKKYeaV5s2QTbzhvgvqL3gD1EgXNbfB9
27lO2Luy0EYxOPLxtBhCEgGnlkzHVwZaKK3+qJpR+D6oVe7l0hgBfoIYzkJgpQPC
1lblIG8qAtQEGg==
-----END PRIVATE KEY-----
If I run:
# openssl ec -in private.pem -text -noout
I am getting:
read EC key
Private-Key: (521 bit)
priv:
00:d6:d6:7a:40:5c:4b:58:c2:72:a7:bb:ce:a4:33:
63:d9:66:ee:da:8a:fd:41:ac:49:dd:e8:69:8b:5a:
9a:54:5f:e8:b6:3a:f1:dd:0f:c0:32:ca:38:5b:d6:
ba:61:0e:21:78:30:87:d6:b9:de:77:40:b6:55:dc:
8a:7f:c9:f4:6d:cb
pub:
04:01:a2:fa:e6:3b:6b:ae:ec:4d:bc:0e:00:af:fe:
68:d5:1e:c1:9d:e1:68:b5:15:e0:83:71:16:79:3a:
13:58:8d:79:ed:18:bf:24:39:5c:9a:e8:c1:25:a3:
5b:a6:1f:05:38:d8:28:a2:98:79:a5:79:b3:64:13:
6f:38:6f:82:fa:8b:de:00:f5:12:05:cd:6d:f0:7d:
db:b9:4e:d8:bb:b2:d0:46:31:38:f2:f1:b4:18:42:
12:01:a7:96:4c:c7:57:06:5a:28:ad:fe:a8:9a:51:
f8:3e:a8:55:ee:e5:d2:18:01:7e:82:18:ce:42:60:
a5:03:c2:d6:56:e5:20:6f:2a:02:d4:04:1a
ASN1 OID: secp521r1
NIST CURVE: P-521
I need the "priv" value in a binary format or at least as a string in hex: 00d6d67a405c4b58c272a7bbcea43363d966eeda8afd41ac49dde8698b5a9a545fe8b63af1dd0fc032ca385bd6ba610e21783087d6b9de7740b655dc8a7fc9f46dcb so I can convert it to binary with xxd.
How can I do that?
I can always do something like this:
openssl ec -in private.pem -text -noout | tr '\n' ' ' | grep -Po '(?<=priv:).*(?=pub:)' | tr -cd '[0-9a-f]'
but it's a terrible approach, imho.
Still not pretty but an alternative to your solution can be achieved with asn1parse. Inspecting the output for the entire keyfile:
$ openssl asn1parse -in private.pem
0:d=0 hl=3 l= 247 cons: SEQUENCE
3:d=1 hl=2 l= 1 prim: INTEGER :00
6:d=1 hl=2 l= 16 cons: SEQUENCE
8:d=2 hl=2 l= 7 prim: OBJECT :id-ecPublicKey
17:d=2 hl=2 l= 5 prim: OBJECT :secp521r1
24:d=1 hl=3 l= 223 prim: OCTET STRING [HEX DUMP]:3081DC020101044200D6D67A405C4B58C272A7BBCEA43363D966EEDA8AFD41AC49DDE8698B5A9A545FE8B63AF1DD0FC032CA385BD6BA610E21783087D6B9DE7740B655DC8A7FC9F46DCBA00706052B81040023A18189038186000401A2FAE63B6BAEEC4DBC0E00AFFE68D51EC19DE168B515E0837116793A13588D79ED18BF24395C9AE8C125A35BA61F0538D828A29879A579B364136F386F82FA8BDE00F51205CD6DF07DDBB94ED8BBB2D0463138F2F1B418421201A7964CC757065A28ADFEA89A51F83EA855EEE5D218017E8218CE4260A503C2D656E5206F2A02D4041A
The actual key(pair) information starts at offset 24, an octet string that itself is an ASN.1 object. Zooming in to that location:
$ openssl asn1parse -in private.pem -strparse 24
0:d=0 hl=3 l= 220 cons: SEQUENCE
3:d=1 hl=2 l= 1 prim: INTEGER :01
6:d=1 hl=2 l= 66 prim: OCTET STRING [HEX DUMP]:00D6D67A405C4B58C272A7BBCEA43363D966EEDA8AFD41AC49DDE8698B5A9A545FE8B63AF1DD0FC032CA385BD6BA610E21783087D6B9DE7740B655DC8A7FC9F46DCB
74:d=1 hl=2 l= 7 cons: cont [ 0 ]
76:d=2 hl=2 l= 5 prim: OBJECT :secp521r1
83:d=1 hl=3 l= 137 cons: cont [ 1 ]
86:d=2 hl=3 l= 134 prim: BIT STRING
The OCTET STRING is the private component and can be extracted and converted to binary using the awk and xxd tools as follows:
$ openssl asn1parse -in private.pem -strparse 24 | awk -F ":" '/OCTET STRING/ {print $4}' | xxd -r -p > private.der
After writing this initial approach, I realized that the ec tool in combination with a simpler ans1parse invocation works as well:
$ openssl ec -in private.pem | openssl asn1parse | awk -F ":" '/OCTET STRING/ {print $4}' | xxd -r -p > private.der

Create an none password SSH key by 1 command line

I want to create the SSH key via a script
Since I am going to set empty password anyway, I'm wondering if there is away that I can an empty password SSH key by 1 command line.
ssh-keygen -t rsa -b 4096 -C "temp#example.com"
ssh-keygen -t rsa -b 4096 -C "temp#example.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): tmp_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in tmp_rsa.
Your public key has been saved in tmp_rsa.pub.
The key fingerprint is:
SHA256:R1SEZr34y/4oBRTK3AieT4b8fw1PDG1kUekTfFuHoDA temp#example.com
The key's randomart image is:
+---[RSA 4096]----+
| . E o*+.=+o|
| o * B=..+.++|
| = B++...+ *|
| = .o .+ + |
| S .o. o .|
| o o= |
| .o..o |
| ..o. |
| oo.. |
+----[SHA256]-----+
I'm trying to create my tmp_rsa file with no password with 1 command line.
Any direction on how to do that will be a huge help !
ssh-keygen has options to specify the passphrase and output file on the command line:
-f filename
Specifies the filename of the key file.
-N new_passphrase
Provides the new passphrase.
So:
$ ssh-keygen -t rsa -b 4096 -C "temp#example.com" -N '' -f tmp_rsa
Generating public/private rsa key pair.
Your identification has been saved in tmp_rsa.
Your public key has been saved in tmp_rsa.pub.
The key fingerprint is:
SHA256:1S94g+i4SBQeqUwQ7uuEx2ZOfp+z8dR+jFBC4qVRj5Y temp#example.com
The key's randomart image is:
+---[RSA 4096]----+
|o. .. |
|.. .o o+ . |
| .. +. *E o . |
|.o o oo..o.o . |
| .o o Soo + . |
|.... o.. . o |
|.o* . o o..o |
|oB ....* .. o |
| .o...=o. .. |
+----[SHA256]-----+
$ ls
tmp_rsa tmp_rsa.pub

What algorithm to use to format an EDIFACT file?

I work with EDIFACT messages and have developed lots of tools to help me parse and extract the relevant information out of the raw file format.
Something I have always struggled with is presenting the raw EDIFACT. I typically just copy the message into Microsoft Word, do a find and replace for the segment separator and view the contents line by line.
I have always wanted to display the EDIFACT file in its hierarchy format but can not for the life of me work out a method to do this.
Below is a small extract of a raw EDIFACT message.
The left side shows how I get the data (not including line numbers), the right side shows how I want it to be displayed based on a customers specification.
01. UNA -UNA
02. UNB -UNB
03. UNH -UNH
04. BGM -BGM
05. DTM - | DTM
06. DTM - | DTM
07. DTM - | DTM
08. NAD - | NAD
09. NAD - | NAD
10. NAD - | NAD
11. GIS - | GIS
12. LIN - | | LIN
13. LOC - | | | LOC
14. LOC - | | | LOC
15. LOC - | | | LOC
16. RFF - | | | RFF
17. QTY - | | | QTY
18. QTY - | | | QTY
19. RFF - | | | | RFF
20. DTM - | | | | | DTM
21. SCC - | | | SCC
22. QTY - | | | | QTY
23. DTM - | | | | | DTM
24. DTM - | | | | | DTM
25. SCC - | | | SCC
26. QTY - | | | | QTY
27. DTM - | | | | | DTM
28. DTM - | | | | | DTM
29. SCC - | | | SCC
30. QTY - | | | | QTY
31. DTM - | | | | | DTM
32. QTY - | | | | QTY
33. DTM - | | | | | DTM
34. SCC - | | | SCC
35. QTY - | | | | QTY
36. DTM - | | | | | DTM
37. NAD - | | | NAD
38. CTA - | | | | CTA
39. COM - | | | | | COM
40. SCC - | | | | SCC
41. QTY - | | | | | QTY
42. UNT -UNT
43. UNZ -UNZ
You can see that the data is tree based, and it is described by a specification that is sent to me. One specification for the above EDIFACT message is as follow:
Tag St Max Lvl
0000 1 UNA C 1 0 SERVICE STRING ADVICE
0000 2 UNB M 1 0 INTERCHANGE HEADER
0010 3 UNH M 1 0 MESSAGE HEADER
0020 4 BGM M 1 0 BEGINNING OF MESSAGE
0030 5 DTM M 10 1 DATE/TIME/PERIOD
0040 6 FTX C 5 1 FREE TEXT
0080 SG2 C 99 1 NAD
0090 7 NAD M 1 1 NAME AND ADDRESS
0190 SG6 C 9999 1 GIS-SG7-SG12
0200 8 GIS M 1 1 GENERAL INDICATOR
0210 SG7 C 1 2 NAD
0220 9 NAD M 1 2 NAME AND ADDRESS
0370 SG12 C 9999 2 LIN-LOC-FTX-SG13-SG15-SG17-SG22
0380 10 LIN M 1 2 LINE ITEM
0450 11 LOC C 999 3 PLACE/LOCATION IDENTIFICATION
0470 12 FTX C 5 3 FREE TEXT
0480 SG13 C 10 3 RFF
0490 13 RFF M 1 3 REFERENCE
0540 SG15 C 10 3 QTY-SG16
0550 14 QTY M 1 3 QUANTITY
0570 SG16 C 10 4 RFF-DTM
0580 15 RFF M 1 4 REFERENCE
0590 16 DTM C 1 5 DATE/TIME/PERIOD
0600 SG17 C 999 3 SCC-SG18
0610 17 SCC M 1 3 SCHEDULING CONDITIONS
0620 SG18 C 999 4 QTY-DTM
0630 18 QTY M 1 4 QUANTITY
0640 19 DTM C 2 5 DATE/TIME/PERIOD
0760 SG22 C 999 3 NAD-SG24-SG27
0770 20 NAD M 1 3 NAME AND ADDRESS
0830 SG24 C 5 4 CTA-COM
0840 21 CTA M 1 4 CONTACT INFORMATION
0850 22 COM C 5 5 COMMUNICATION CONTACT
0920 SG27 M 999 4 SCC-SG28
0940 SG28 M 999 5 QTY
0950 24 QTY M 1 5 QUANTITY
1030 25 UNT M 1 0 MESSAGE TRAILER
0000 26 UNZ M 1 0 INTERCHANGE TRAILER
The important columns are Tag, St (M=Mandatory, C=Conditional), Max (Maximum times it can repeat), lvl (How deep in the tree it is).
The Tags that start with SG denote that there is a loop
The problem I face is that the format is very flexible, where it can have conditional segments, conditional loops, repeated segments. Trying to think of a method that can handle all this has been my issue.
Starting from the top in the above specification, you can immeditely see that when you come to the DTM tag, it can be repeated upto a max of 10 times. In the sample EDIFACT message, it only appears 3 times on lines 5, 6, 7. Following the specification, FTX may appear but does not in my sample message, then there is a SG2 tag, which means the following NAD tag may repeat 99 times.
Moving slightly ahead inside the LIN tag (which is under the SG12 group, which can repeat upto 9999 times and in many cases does repeat a number of times), it comes to the first QTY tag.
According to the specification, this segment can have conditional group (SG15) RFF and a DTM under it. Using my sample, you can see on line 17 and 18 that it has the QTY segment but line 18, has this conditional group too.
Similiar things start happening when you look into the SCC segments too.
What I have in my mind, is to be able to enter that specification into some sort of file format, then run the raw EDIFACT message against the rules of this specification so the output is hierarchy based so it's easy to see at a glance what data relates to what segment and a way to check to see if the EDIFACT message is valid.
What I have trouble with, is the actual algorithm or process to do that conversion.
I have tried naive approaches, like going line by line but then it gets messy when I am trying to work out if the current line is in a group, or a repeat or something else.
I have tried a recursive approach, by splitting the entire EDIFACT by the largest group (The SG12-LIN group), then recursively process each of them splits and build an output. This has been my best approach yet but it's still far from working with many false readings due to my logic not being right.
I basically need to be able to pick a segment of the message, and determine where in the hierarchy it should be and display it.
I am at a loss on how I can solve this. I am sure there is a nice simple method at doing this but I just cannot work it out.
Any assistance would be most grateful.
Slight update.
I have converted the specification into a XML file following the hierarchy of said specification. This XML file now contains all the groups and various attributes related to each tag. Now I have a start on what the EDIFACT needs to conform too.
If I go through it on paper (and in head) and I can build the output that I am trying to do with a bit of forward thinking so my new idea is to "Scan ahead" in the EDIFACT file, and build a probably based result.
Bit like how a chess AI looks ahead a few moves.
most of the thing you want I can help you with (and did them). But this is not easy done on a small piece of paper with no interaction.
So if you want more information, just contact me. (no, this is not a commercial thing)

Segment multilanguage parallel text

I have multi-language text that contains a message translated to several languages.
For example:
English message
Russian message
Ukrainian message
The order is not exact.
I would like to devise some kind of supervised/unsupervised learning algorithm to do the segmentation automatically, and extract each translation in order to create a parallel corpus of data.
Could you suggest any papers/approaches?
I am not able to get the proper keywords for googling.
The most basic approach to your problem would be to generate a bag of words from your document. To sum up, a bag of word is a matrix where each row is a line in your document and each column a distinct term.
For instance, if your document is like this :
hello world
привет мир
привіт світ
You will have this matrix :
hello | world | привет | мир | привіт | світ
l1 | 1 | 1 | 0 | 0 | 0 | 0
l2 | 0 | 0 | 1 | 1 | 0 | 0
l3 | 0 | 0 | 0 | 0 | 1 | 1
You can then apply classifications algorithms (such as k-means or svms) according to your needs.
For more details, I would suggest to read this paper which provides a great summary of techniques.
Regarding keywords for googling, I would say text analysis, text mining or information retrieval are a good start.
Why don't you try some language identification software? They are reporting > 90% accuracy:
langid.py https://github.com/saffsd/langid.py
TextCat http://odur.let.rug.nl/~vannoord/TextCat/
Linguine http://www.jmis-web.org/articles/v16_n3_p71/index.html

Little-Endian Signed Integer

I know the WAV file format uses signed integers for 16-bit samples. It also stores them in little-endian order, meaning the lowest 8 bits come first, then the next, etc. Is the special sign bit on the first byte, or is the special sign bit always on the most significant bit (highest value)?
Meaning:
Which one is the sign bit in the WAV format?
++---+---+---+---+---+---+---+---++---+---+---+---+---+---+---+---++
|| a | b | c | d | e | f | g | h || i | j | k | l | m | n | o | p ||
++---+---+---+---+---+---+---+---++---+---+---+---+---+---+---+---++
--------------------------- here -> ^ ------------- or here? -> ^
i or p?
signed int, little endian:
byte 1(lsb) byte 2(msb)
---------------------------------
7|6|5|4|3|2|1|0 | 7|6|5|4|3|2|1|0|
----------------------------------
^
|
Sign bit
You only need to concern yourself with that when reading/writing a short int to some external media. Within your program, the sign bit is the most significant bit in the short, no matter if you're on a big or little endian platform.
The sign bit is the most significant bit on any two's-complement machine (like the x86), and thus will be in the last byte in a little-endian format
Just cause i didn't want to be the one not including ASCII art... :)
+---------------------------------------+---------------------------------------+
| first byte | second byte |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
^--- lsb msb / sign bit -----^
Bits are basically represented "backwards" from how most people think about them, which is why the high byte is last. But it's all consistent; "bit 15" comes after "bit 0" just as addresses ought to work, and is still the most significant bit of the most significant byte of the word. You don't have to do any bit twiddling, because the hardware talks in terms of bytes at all but the lowest levels -- so when you read a byte, it looks exactly like you'd expect. Just look at the most significant bit of your word (or the last byte of it, if you're reading a byte at a time), and there's your sign bit.
Note, though, that two's complement doesn't exactly designate a particular bit as the "sign bit". That's just a very convenient side effect of how the numbers are represented. For 16-bit numbers, -x is equal to 65536-x rather than 32768+x (which would be the case if the upper bit were strictly the sign).

Resources