in macOS, I use zsh terminal ,then input command 'man sort > sort-man.txt'.
When open sort-man.txt with Sublime text, I see many 'BS'.
What does 'BS' stands for in sublime text on macOS??
It can be some encoding issue??
question picture
The man command outputs a “bold” character by printing the character, then printing a backspace character, then printing the character again. Thus:
:; man sort | hexdump -C | head
00000000 0a 53 4f 52 54 28 31 29 20 20 20 20 20 20 20 20 |.SORT(1) |
00000010 20 20 20 20 20 20 20 20 20 20 20 42 53 44 20 47 | BSD G|
00000020 65 6e 65 72 61 6c 20 43 6f 6d 6d 61 6e 64 73 20 |eneral Commands |
00000030 4d 61 6e 75 61 6c 20 20 20 20 20 20 20 20 20 20 |Manual |
00000040 20 20 20 20 20 20 20 20 53 4f 52 54 28 31 29 0a | SORT(1).|
00000050 0a 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 |.N.NA.AM.ME.E. |
^ ^ ^
| | +--- ASCII N
| +------ ASCII Backspace
+--------- ASCII N
Way back in the days of physical terminals that printed on paper, this would have the effect of overstriking the character, making it appear bolder.
These days, your terminal emulator app interprets a sequence like this by changing the color or font of the character.
I guess Sublime Text shows the backspace character as BS.
Consulting the man man page, I find this under “TIPS”:
To get a plain text version of a man page, without backspaces and underscores, try
# man foo | col -b > foo.mantxt
Related
I have a csv file with text and numbers.
If a number is bigger than 1000, formatted like this: 1 000,
so it has a space as thousand separator, but it is not space. I tried to sed it, and it worked where real space was, but not in this format.
It is also not TAB, I removed all the TABs with "expand -t 1".
The following is a line that demonstrates the issue:
x17_Provident_GDN_REMARKETING_provident.hu_listák;Display_Hálózat;Szeged;2021-03-09;Kedd;Mobil;HUF;1 736;9;130.83;0.00
In penultimate row, in column 8: 1 736
is the problem.
And running this: grep -E -m 1 -e '[;]1[^;]+736[;]' <yourfile.csv | hexdump -C
gives:
00000000 78 31 37 5f 50 72 6f 76 69 64 65 6e 74 5f 47 44 |x17_Provident_GD|
00000010 4e 5f 52 45 4d 41 52 4b 45 54 49 4e 47 5f 70 72 |N_REMARKETING_pr|
00000020 6f 76 69 64 65 6e 74 2e 68 75 5f 6c 69 73 74 c3 |ovident.hu_list.|
00000030 a1 6b 3b 44 69 73 70 6c 61 79 5f 48 c3 a1 6c c3 |.k;Display_H..l.|
00000040 b3 7a 61 74 3b 53 7a 65 67 65 64 3b 32 30 32 31 |.zat;Szeged;2021|
00000050 2d 30 33 2d 30 39 3b 4b 65 64 64 3b 4d 6f 62 69 |-03-09;Kedd;Mobi|
00000060 6c 3b 48 55 46 3b 31 c2 a0 37 33 36 3b 39 3b 31 |l;HUF;1..736;9;1|
00000070 33 30 2e 38 33 3b 30 2e 30 30 0a |30.83;0.00.|
0000007b
It's a 2 byte, UTF-8 encoded non breaking space - c2 a0.
You can use perl to safely remove it.
perl -pe 's/\xc2\xa0//g' dirty.csv > clean.csv
After we know it is No break space, I simply sed it on mac with entry method:
opt+space
cat test4.csv | sed 's/ //g'
Similar to perl, you can use GNU sed with LC_ALL=C:
LC_ALL=C sed 's/\xc2\xa0//g'
I want to create a bash script to parse data returned by a this command :
openvpn --show-pkcs11-ids /usr/lib/libeTPkcs11.so
The typical output is :
The following objects are available for use.
Each object shown below may be used as parameter to
--pkcs11-id option please remember to use single quote mark.
Certificate
DN: XXX
Serial: XXXX
Serialized id: XXXX
Certificate
DN: XXXX
Serial: XXXX
Serialized id: XXXX
Certificate
DN: XXXXX
Serial: XXXX
Serialized id: XXXX
I want to get an array in bash containing 3 elements : the 3 "Certificate" blocks. I tried a lot of method of splitting but all of them only output an echo command, not an actual array.
Any ideas ?
Thx !
This is one where it would be much simpler and (much much faster) to use awk. awk provide arrays and is much more capable at processing input records than read. With awk you simply write rules to be applied to each line of input. In your case you just need to recognize whether the line begins with "DN:", "Serial:", or "Serialized". You can then store the associated value in a separate array, say arrays dn, serial, and serid. To accomplish this in awk you need nothing more than:
awk '
$1 == "Certificate" {n++}; # increment n
NF == 2 { # fill dn & serial array
$1 == "DN:" && dn[n]=$2
$1 == "Serial:" && serial[n]=$2
}
NF == 3 { # fill serid array
$1 == "Serialized" && serid[n]=$3
}
END { # output results
print "\nDN:\t\tSerial:\t\tSerialized id:"
for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
}' file
Above if the first field ($1) is "Certificate" you just increment a counter. If there are 2 fields in the line (NF == 2) then you check if the line begins with "DN:" or "Serial" and add the 2nd field to the proper array. If the line has 3-fields ("Serialized", "id:" and your value) you store the value in the serid array.
With all values stored, you can iterate over the arrays in the END rule, providing any output you need. Above it simply outputs the content in tabular form. You can just copy/middle-mouse-paste in the command line to test.
Example Use/Output
$ awk '
> $1 == "Certificate" {n++}; # increment n
> NF == 2 { # fill dn & serial array
> $1 == "DN:" && dn[n]=$2
> $1 == "Serial:" && serial[n]=$2
> }
> NF == 3 { # fill serid array
> $1 == "Serialized" && serid[n]=$3
> }
> END { # output results
> print "\nDN:\t\tSerial:\t\tSerialized id:"
> for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
> }' file
DN: Serial: Serialized id:
XXX XXXX XXXX
XXXX XXXX XXXX
XXXXX XXXX XXXX
For large file processing, awk will be orders of magnitude faster that looping in a shell script. Let me know if this satisfies your needs of if you need additional help.
Edit Per-Comment
If you are dealing with a file that has mixed tabs and spaces being used a separators, that can present problem with awk parsing using a default field separator (space). To consider a sequence of mixed spaces/tabs as a separator, with GNU awk you can provide a regular expression for the separator. For instance considering a sequence of one or more spaces or tabs can be specified as -F'[ \t]+'. The example below makes use of the separator. (note: the field numbers will change as a result)
awk -F'[ \t]+' '
$1 == "Certificate" {n++}; # increment n
NF == 3 { # fill dn & serial array
$2 == "DN:" && dn[n]=$3
$2 == "Serial:" && serial[n]=$3
}
NF == 4 { # fill serid array
$2 == "Serialized" && serid[n]=$4
}
END { # output results
print "\nDN:\t\tSerial:\t\tSerialized id:"
for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
}' f
Example Use/Output
With your same data you would then have:
$ awk -F'[ \t]+' '
> $1 == "Certificate" {n++}; # increment n
> NF == 3 { # fill dn & serial array
> $2 == "DN:" && dn[n]=$3
> $2 == "Serial:" && serial[n]=$3
> }
> NF == 4 { # fill serid array
> $2 == "Serialized" && serid[n]=$4
> }
> END { # output results
> print "\nDN:\t\tSerial:\t\tSerialized id:"
> for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
> }' f
DN: Serial: Serialized id:
XXX XXXX XXXX
XXXX XXXX XXXX
XXXXX XXXX XXXX
Not knowing what the space/tab makeup of your posted text actually is, this should handle either case.
Further Update Posting Input Contents Taken From Question
The following is the input file f (or file) used with the examples above. It was taken from your question, but there is no guarantee the space/tab translation is the same give the copy/paste into the question. The last example above should handle it regardless. The only other caveat is if you have a file with DOS line ending you are feeding to awk -- it won't work. You can check by running the utility file yourfilename and it will report is DOS CRLF line endings are present. You can then use dos2unix yourfilename to correct the problem and convert the file to Unix/POSIX line endings.
Example Input File
$ cat f
The following objects are available for use.
Each object shown below may be used as parameter to
--pkcs11-id option please remember to use single quote mark.
Certificate
DN: XXX
Serial: XXXX
Serialized id: XXXX
Certificate
DN: XXXX
Serial: XXXX
Serialized id: XXXX
Certificate
DN: XXXXX
Serial: XXXX
Serialized id: XXXX
Hexdump of Contents
$ hexdump -Cv f
00000000 54 68 65 20 66 6f 6c 6c 6f 77 69 6e 67 20 6f 62 |The following ob|
00000010 6a 65 63 74 73 20 61 72 65 20 61 76 61 69 6c 61 |jects are availa|
00000020 62 6c 65 20 66 6f 72 20 75 73 65 2e 0a 45 61 63 |ble for use..Eac|
00000030 68 20 6f 62 6a 65 63 74 20 73 68 6f 77 6e 20 62 |h object shown b|
00000040 65 6c 6f 77 20 6d 61 79 20 62 65 20 75 73 65 64 |elow may be used|
00000050 20 61 73 20 70 61 72 61 6d 65 74 65 72 20 74 6f | as parameter to|
00000060 0a 2d 2d 70 6b 63 73 31 31 2d 69 64 20 6f 70 74 |.--pkcs11-id opt|
00000070 69 6f 6e 20 70 6c 65 61 73 65 20 72 65 6d 65 6d |ion please remem|
00000080 62 65 72 20 74 6f 20 75 73 65 20 73 69 6e 67 6c |ber to use singl|
00000090 65 20 71 75 6f 74 65 20 6d 61 72 6b 2e 0a 0a 43 |e quote mark...C|
000000a0 65 72 74 69 66 69 63 61 74 65 0a 20 20 20 20 20 |ertificate. |
000000b0 20 20 44 4e 3a 20 20 20 20 20 20 20 20 20 20 20 | DN: |
000000c0 20 20 58 58 58 0a 20 20 20 20 20 20 20 53 65 72 | XXX. Ser|
000000d0 69 61 6c 3a 20 20 20 20 20 20 20 20 20 58 58 58 |ial: XXX|
000000e0 58 0a 20 20 20 20 20 20 20 53 65 72 69 61 6c 69 |X. Seriali|
000000f0 7a 65 64 20 69 64 3a 20 20 58 58 58 58 0a 0a 43 |zed id: XXXX..C|
00000100 65 72 74 69 66 69 63 61 74 65 0a 20 20 20 20 20 |ertificate. |
00000110 20 20 44 4e 3a 20 20 20 20 20 20 20 20 20 20 20 | DN: |
00000120 20 20 58 58 58 58 0a 20 20 20 20 20 20 20 53 65 | XXXX. Se|
00000130 72 69 61 6c 3a 20 20 20 20 20 20 20 20 20 58 58 |rial: XX|
00000140 58 58 0a 20 20 20 20 20 20 20 53 65 72 69 61 6c |XX. Serial|
00000150 69 7a 65 64 20 69 64 3a 20 20 58 58 58 58 0a 0a |ized id: XXXX..|
00000160 43 65 72 74 69 66 69 63 61 74 65 0a 20 20 20 20 |Certificate. |
00000170 20 20 20 44 4e 3a 20 20 20 20 20 20 20 20 20 20 | DN: |
00000180 20 20 20 58 58 58 58 58 0a 20 20 20 20 20 20 20 | XXXXX. |
00000190 53 65 72 69 61 6c 3a 20 20 20 20 20 20 20 20 20 |Serial: |
000001a0 58 58 58 58 0a 20 20 20 20 20 20 20 53 65 72 69 |XXXX. Seri|
000001b0 61 6c 69 7a 65 64 20 69 64 3a 20 20 58 58 58 58 |alized id: XXXX|
000001c0 0a |.|
000001c1
Let me know the results of your file examination.
You can use AWK to do that. It is a tool specifically created for transforming table-like output.
openvpn --show-pkcs11-ids /usr/lib/libeTPkcs11.so | grep 'Certificate\|DN:\|Serial:\|Serialized id:' | awk -v RS="Certificate" '{print $2,$4,$7}'
Explanation:
grep 'Certificate\|DN:\|Serial:\|Serialized id:' - Choose only interesting lines of output
awk -v RS="Certificate" '{print $2,$4,$7}' - See below comment
Comment: AWK enables you to change the record separator using "-v RS=" parameter. By default it is a newline, so each line of the file is a record, but it can be changed to any string e.g. "Certificate".
Output is not an array, but every certificate is described in separate line you can further pipe to another tool.
I am using robocopy with /NP and capturing stdout and stderr using >>log.txt 2>&1. I am copying all files from the "Outbound" directory to an "Archive" directory.
robocopy "%PNAME%\Outbound" "%ARCHIVE_PATH%" * /NP /MT:32
The output appears to contain character sequences that are classic problems with DOS vs. UNIX line endings.
New File 3381 DIM_DATE_CCYYMM.txt^M 0% ^M100%
New File 340759 DIM_DATE_CCYYMMDD_PAID_DT.txt^M100%
New File 340730 DIM_DATE_CCYYMMDD_SVC_DT.txt^M100%
Looking into the log file, there are indeed instances of 0x0D which are not followed by 0x0A. Is this just the way Microsoft envisioned the output? Can I do anything to change it short of manipulating the log file post-production?
000670 64 5c 0d 0a 09 20 20 20 20 4e 65 77 20 46 69 6c
d \ \r \n \t N e w F i l
000680 65 20 20 09 09 20 20 20 20 33 33 38 31 09 44 49
e \t \t 3 3 8 1 \t D I
000690 4d 5f 44 41 54 45 5f 43 43 59 59 4d 4d 2e 74 78
M _ D A T E _ C C Y Y M M . t x
0006a0 74 0d 20 20 30 25 20 20 0d 31 30 30 25 20 20 0d
t \r 0 % \r 1 0 0 % \r
0006b0 0a 09 20 20 20 20 4e 65 77 20 46 69 6c 65 20 20
\n \t N e w F i l e
I encountered this ascii's style ascii table.
Of course I can store it in a file ascii and use cat ascii to display it content.
But I want to make it behavior more like a command.
UPDATE
When I read cs:app I find that how I bother to restore it in a file and using other commands.
Just run man ascii
If your shell supports aliases, you can do:
alias ascii='cat ~/ascii'
Then just type ascii et voila!
If you're using bash, put the above line in your .bashrc to persist it across logins. Other shells have similar features.
Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex
0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 # 80 50 P 96 60 ` 112 70 p
1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q
2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r
3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s
4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t
5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u
6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v
7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w
8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x
9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y
10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z
11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {
12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |
13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }
14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~
15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
I found that wcslen() in VC++2010 returns correct count of letters; meanwhile Xcode does not.
For example, the code below returns correct 11 in VC++ 2010, but returns incorrect 17 in Xcode 4.2.
const wchar_t *p = L"123abc가1나1다";
size_t plen = wcslen(p);
I guess Xcode app stores wchar_t string as UTF-8 in memory. This is another strange thing.
How can I get 11 just like VC++ in Xcode too?
I ran this program on a Mac Mini running MacOS X 10.7.2 (Xcode 4.2):
#include <stdio.h>
#include <wchar.h>
int main(void)
{
const wchar_t p[] = L"123abc가1나1다";
size_t plen = wcslen(p);
if (fwide(stdout, 1) <= 0)
{
fprintf(stderr, "Failed to make stdout wide-oriented\n");
return -1;
}
wprintf(L"String <<%ls>>\n", p);
putwc(L'\n', stdout);
wprintf(L"Length = %zu\n", plen);
for (size_t i = 0; i < sizeof(p)/sizeof(*p); i++)
wprintf(L"Character %zu = 0x%X\n", i, p[i]);
return 0;
}
When I do a hex dump of the source file, I see:
0x0000: 23 69 6E 63 6C 75 64 65 20 3C 73 74 64 69 6F 2E #include <stdio.
0x0010: 68 3E 0A 23 69 6E 63 6C 75 64 65 20 3C 77 63 68 h>.#include <wch
0x0020: 61 72 2E 68 3E 0A 0A 69 6E 74 20 6D 61 69 6E 28 ar.h>..int main(
0x0030: 76 6F 69 64 29 0A 7B 0A 20 20 20 20 63 6F 6E 73 void).{. cons
0x0040: 74 20 77 63 68 61 72 5F 74 20 70 5B 5D 20 3D 20 t wchar_t p[] =
0x0050: 4C 22 31 32 33 61 62 63 EA B0 80 31 EB 82 98 31 L"123abc...1...1
0x0060: EB 8B A4 22 3B 0A 20 20 20 20 73 69 7A 65 5F 74 ...";. size_t
0x0070: 20 70 6C 65 6E 20 3D 20 77 63 73 6C 65 6E 28 70 plen = wcslen(p
0x0080: 29 3B 0A 20 20 20 20 69 66 20 28 66 77 69 64 65 );. if (fwide
0x0090: 28 73 74 64 6F 75 74 2C 20 31 29 20 3C 3D 20 30 (stdout, 1) <= 0
0x00A0: 29 0A 20 20 20 20 7B 0A 20 20 20 20 20 20 20 20 ). {.
0x00B0: 66 70 72 69 6E 74 66 28 73 74 64 65 72 72 2C 20 fprintf(stderr,
0x00C0: 22 46 61 69 6C 65 64 20 74 6F 20 6D 61 6B 65 20 "Failed to make
0x00D0: 73 74 64 6F 75 74 20 77 69 64 65 2D 6F 72 69 65 stdout wide-orie
0x00E0: 6E 74 65 64 5C 6E 22 29 3B 0A 20 20 20 20 20 20 nted\n");.
0x00F0: 20 20 72 65 74 75 72 6E 20 2D 31 3B 0A 20 20 20 return -1;.
0x0100: 20 7D 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C }. wprintf(L
0x0110: 22 53 74 72 69 6E 67 20 3C 3C 25 6C 73 3E 3E 5C "String <<%ls>>\
0x0120: 6E 22 2C 20 70 29 3B 0A 20 20 20 20 70 75 74 77 n", p);. putw
0x0130: 63 28 4C 27 5C 6E 27 2C 20 73 74 64 6F 75 74 29 c(L'\n', stdout)
0x0140: 3B 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22 ;. wprintf(L"
0x0150: 4C 65 6E 67 74 68 20 3D 20 25 7A 75 5C 6E 22 2C Length = %zu\n",
0x0160: 20 70 6C 65 6E 29 3B 0A 20 20 20 20 66 6F 72 20 plen);. for
0x0170: 28 73 69 7A 65 5F 74 20 69 20 3D 20 30 3B 20 69 (size_t i = 0; i
0x0180: 20 3C 20 73 69 7A 65 6F 66 28 70 29 2F 73 69 7A < sizeof(p)/siz
0x0190: 65 6F 66 28 2A 70 29 3B 20 69 2B 2B 29 0A 20 20 eof(*p); i++).
0x01A0: 20 20 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22 wprintf(L"
0x01B0: 43 68 61 72 61 63 74 65 72 20 25 7A 75 20 3D 20 Character %zu =
0x01C0: 30 78 25 58 5C 6E 22 2C 20 69 2C 20 70 5B 69 5D 0x%X\n", i, p[i]
0x01D0: 29 3B 0A 20 20 20 20 72 65 74 75 72 6E 20 30 3B );. return 0;
0x01E0: 0A 7D 0A .}.
0x01E3:
The output when compiled with GCC is:
String <<123abc
Length = 11
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xAC00
Character 7 = 0x31
Character 8 = 0xB098
Character 9 = 0x31
Character 10 = 0xB2E4
Character 11 = 0x0
Note that the string is truncated at the zero byte - I think that is probably a bug in the system, but it seems a little unlikely that I'd manage to find one on my first attempt at using wprintf(), so it is more likely I'm doing something wrong.
You're right, in the multi-byte UTF-8 source code, the string occupies 17 bytes (8 one-byte basic Latin-1 characters, and 3 characters each encoded using 3 bytes). So, the raw strlen() on the source string would return 17 bytes.
GCC version is:
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Just for giggles, I tried clang, and I get a different result. Compiled using:
clang -o row row.c -Wall -std=c99
using:
Apple clang version 2.1 (tags/Apple/clang-163.7.1) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.3.0
Thread model: posix
The output when compiled with clang is:
String <<123abc가1나1다>>
Length = 17
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xEA
Character 7 = 0xB0
Character 8 = 0x80
Character 9 = 0x31
Character 10 = 0xEB
Character 11 = 0x82
Character 12 = 0x98
Character 13 = 0x31
Character 14 = 0xEB
Character 15 = 0x8B
Character 16 = 0xA4
Character 17 = 0x0
So, now the string appears correctly, but the length is given as 17 instead of 11. Superficially, you can take your choice of bugs - string looks OK (in a terminal - /Applications/Utilities/Terminal - acclimatized to UTF8) but length is wrong, or length is right but string does not appear correctly.
I note that sizeof(wchar_t) in both gcc and clang is 4.
The left hand does not understand what the right hand is doing. I think there's a case for claiming both are broken, in different ways.