I have got sorted file that looks like this:
2019 02 09 07 00
2019 02 09 07 00
2019 02 09 08 00
2019 02 09 08 00
2019 02 09 08 00
2019 02 09 08 00
2019 02 09 08 00
However, when I run uniq -c over that file, it doesn't count occurrences as one would expect:
1 2019 02 09 07 00
1 2019 02 09 07 00
1 2019 02 09 08 00
1 2019 02 09 08 00
1 2019 02 09 08 00
1 2019 02 09 08 00
1 2019 02 09 08 00
The desired output should look like this:
2 2019 02 09 07 00
5 2019 02 09 08 00
I am looking for portable, POSIX compliant solution in shell. Thanks!
Alright, I have found the solution. I was processing the file by line and indeed uniq was working with only the line and not the file as a whole.
Related
I'm triying to find a string in a txt format and each time it's found then look for an specific string to change for another string.
Imagine the next hexa txt:
02 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 09 01 27 30 22 a0 0a 80 08 33 04 03 92 22 14
00 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 09 01 27 30 22 a0 0a 80 08 33 04 03 92 22 14
I need that each time I encounter a 2a sequence to look for 09 01 sequence and replace with 03 02.
Expected output:
02 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
00 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
Im triying something this:
sed -i 's/09 01\(.*2a\)/03 02/g' packet.txt
I would do this with awk:
$ awk ' { for ( i = 1; i <= NF; ++i ) {
if ( $i == "2a" )
r = 1
if ( r && $i == "09" && $(i+1) == "01" ) {
r = 0
$i = "03"
$++i = "02"
}
}
}
1 ' hexa.txt > hexa.txt.modified
Grep the differences:
$ sdiff hexa.txt hexa.txt.modified
02 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1 02 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01 09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 09 01 27 30 22 a0 0a 80 08 33 04 03 92 22 14 | b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
00 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1 00 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01 09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 09 01 27 30 22 a0 0a 80 08 33 04 03 92 22 14 | b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
Assuming you mean: "only replace if it occurs after 2a", then you can do it by transforming the bytes, so that only one 2a occurs on each line, e.g.:
<hexa.txt tr '\n' ' ' | sed 's/2a/\n&/g'
Now all you need to do is only replace 09 01 when the line starts with 2a, e.g.:
sed -E 's/(^2a.*) 09 01/\1 03 02/'
Now go back to the original formatting, i.e. 16 bytes per line:
tr '\n' ' ' | xargs -n16
All together:
<hexa.txt tr '\n' ' ' | sed 's/2a/\n&/g' |
sed -E 's/(^2a.*) 09 01/\1 03 02/' |
tr '\n' ' ' | xargs -n16
Output:
02 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
00 11 86 05 01 01 01 a0 11 60 0f 80 02 07 80 a1
09 06 07 04 00 00 01 00 1d 03 8a 02 01 2a 02 01
b7 03 02 27 30 22 a0 0a 80 08 33 04 03 92 22 14
If this can help,
cat *.txt | sed '/2a/s/09 01/02 03/g'
Alternative awk solution using GNU awk:
awk 'BEGIN { RS="2a" } { ORS=RS } $0 ~ /09 01/ { $0=gensub("09 01","03 02","g",$0)}1' file
Set 2a as the record separator. Check each record for "09 01". If it exists, replace "09 01" with "03 02" with the gensub function and set this as $0. Use short hand 1 to print the record after setting the output record separator the same as the record separator.
This might work for you (GNU sed):
sed -zE 's/^/\x00/ # introduce a unique delimiter
:a;/\x00$/{s///;b} # remove delimiter at end-of-file
/\x002a/!{s/\x00(.)/\1\x00/;ba} # if not 2a pass over next char
s//2a\x00/ # next char is 2a prep for next string
:b;/\x00$/ba # is it end of file
/\x0009(\s)01/{s//03\102\x00/;ba} # replace string and prep for 2a again
s/\x00(.)/\1\x00/;bb' file # not desired string so pass over char
Since the desired string (in this case09 01) may occur on another line or within in the same line twice or more, line processing is not feasible. Processing must be at character level and in this solution the entire file is processed as one string (see -z option).
Two cases are identified:
The key (in this case 2a), processing within the place holder :a.
The string to be replaced (09 01 with 03 02), processing within the place holder :b.
Once the key is identified, processing passes to the next case. Once the desired string has been replaced, processing is passed back the first case. Either case can terminate processing when the end-of-file is encountered.
N.B. The solution relies on the file not containing the null character hex 00.
I am currently using C# to retrieve frames from a borescope (via the FFMPEG library). However, I came across a problem weeks ago and I can't solve it.
The images are returned in JPEG format (since the borescope stream is MJPEG).
Some images come without quality problems, but others come with a strange line in the middle
followed by random staining. (At the end of the question there is an example of a normal image and one with problems).
Analyzing the structure of the files, I realized that there are some differences, but I don't really understand JPEG's binary structure very well, and I can't tell what is corrupted.
Getting to know what is corrupted in the image, which culminates in the quality problem, is very important to me because, through this, I can discard the frame using C#. However, without understanding this problem, I can't even discard the frame, much less fix it.
So, having the image without quality problems as a reference, what is the problem with the binary structure of the image with quality problems?
Examples:
JPEG 1: Image without quality problems
Image's preview (just to see the quality, do not download from here)
JPEG 2: Image with quality problems
Image's preview (just to see the quality, do not download from here)
It's possible to look into binary structure of images through online HEX editors like: Online hex editor, Hexed or Hex-works.
Thank you for reading and have a nice day.
There are at least 2 issues with the file.
The first I can detect with ImageMagick by running this command:
magick identify -verbose image.jpg
and it tells me that the data segment ends prematurely.
Image: outExemplo0169.jpeg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Class: DirectClass
Geometry: 640x480+0+0
Units: Undefined
Colorspace: sRGB
Type: TrueColor
Base type: Undefined
Endianess: Undefined
Depth: 8-bit
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 307200
Red:
min: 0 (0)
max: 255 (1)
mean: 107.234 (0.420527)
standard deviation: 66.7721 (0.261851)
kurtosis: -0.67934
skewness: 0.577494
entropy: 0.92876
Green:
min: 0 (0)
:2020-02-26T18:59:19+00:00 0:00.057 0.070u 7.0.9 Resource identify[80956]: resource.c/RelinquishMagickResource/1067/Resource
Memory: 3686400B/0B/32GiB
identify: Corrupt JPEG data: premature end of data segment `outExemplo0169.jpeg' # warning/jpeg.c/JPEGWarningHandler/399.
The second I can see with exiftool when I run this command:
exiftool -v -v -v outExemplo0169.jpeg
ExifToolVersion = 11.11
FileName = outExemplo0169.jpeg
Directory = .
FileSize = 66214
FileModifyDate = 1582743337
FileAccessDate = 1582743559
FileInodeChangeDate = 1582743337
FilePermissions = 33188
FileType = JPEG
FileTypeExtension = JPG
MIMEType = image/jpeg
JPEG APP0 (14 bytes):
0006: 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 [JFIF..........]
+ [BinaryData directory, 9 bytes]
| JFIFVersion = 1 1
| - Tag 0x0000 (2 bytes, int8u[2]):
| 000b: 01 01 [..]
| ResolutionUnit = 0
| - Tag 0x0002 (1 bytes, int8u[1]):
| 000d: 00 [.]
| XResolution = 1
| - Tag 0x0003 (2 bytes, int16u[1]):
| 000e: 00 01 [..]
| YResolution = 1
| - Tag 0x0005 (2 bytes, int16u[1]):
| 0010: 00 01 [..]
| ThumbnailWidth = 0
| - Tag 0x0007 (1 bytes, int8u[1]):
| 0012: 00 [.]
| ThumbnailHeight = 0
| - Tag 0x0008 (1 bytes, int8u[1]):
| 0013: 00 [.]
JPEG SOF0 (15 bytes):
0018: 08 01 e0 02 80 03 01 21 00 02 11 01 03 11 01 [.......!.......]
ImageWidth = 640
ImageHeight = 480
EncodingProcess = 0
BitsPerSample = 8
ColorComponents = 3
JPEG DQT (130 bytes):
002b: 00 03 03 03 03 03 03 04 03 03 03 04 04 04 05 06 [................]
003b: 09 06 06 05 05 06 0c 08 09 07 09 0e 0c 0e 0e 0d [................]
004b: 0c 0d 0d 0f 11 15 12 0f 10 14 10 0d 0d 13 19 13 [................]
005b: 14 16 17 18 18 18 0f 12 1a 1c 1a 17 1c 15 17 18 [................]
006b: 17 01 04 04 04 06 05 06 0b 06 06 0b 17 0f 0d 0f [................]
007b: 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 [................]
008b: 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 [................]
[snip 18 bytes]
JPEG DHT (416 bytes):
00b1: 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 [................]
00c1: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 10 00 02 [................]
00d1: 01 03 03 02 04 03 05 05 04 04 00 00 01 7d 01 02 [.............}..]
00e1: 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 [......!1A..Qa."q]
00f1: 14 32 81 91 a1 08 23 42 b1 c1 15 52 d1 f0 24 33 [.2....#B...R..$3]
0101: 62 72 82 09 0a 16 17 18 19 1a 25 26 27 28 29 2a [br........%&'()*]
0111: 34 35 36 37 38 39 3a 43 44 45 46 47 48 49 4a 53 [456789:CDEFGHIJS]
[snip 304 bytes]
JPEG SOS
JPEG EOI
Unknown trailer (50 bytes at offset 0x10274):
10274: 42 6f 75 6e 64 61 72 79 45 42 6f 75 6e 64 61 72 [BoundaryEBoundar]
10284: 79 53 00 00 01 00 90 08 01 00 fb 4b db 6a 2a 22 [yS.........K.j*"]
10294: 00 00 2a 22 00 00 01 00 01 00 80 02 00 00 e0 01 [..*"............]
102a4: 00 00
So there are 50 extraneous bytes at the end including the text string "BoundaryEBoundaryS" which may be recognisable to you as coming from somewhere else in your processing chain?
One test you could do for JPEG quality is check the last 2 bytes are a valid EOI which means it should end in FF D9 - see here.
I have these 2 arrays representing hexadecimal numbers and I want to write to a file in binary format.
I convert to hexadecimal string like this:
a=["A2","48","04","03","EE","72","B4","6B"]
b=["1A","28","18","06","07","00","11","86","05","01","01","01","A0"]
hex_string1 = a.map{|b| b.to_i(16)}.pack("C*")
hex_string2 = b.map{|b| b.to_i(16)}.pack("C*")
Now I want to write to a file the hex_string2 first and then prepend (with offset "0") the hex_string1 to the file.
I'm proceeding like this but the output is incorrect.
File.binwrite("outfile.bin",hex_string2)
File.binwrite("outfile.bin",hex_string1,0)
The current output is:
A2 48 04 03 EE 72 B4 6B 05 01 01 01 A0
And the correct content within the "output.bin" would be like this:
A2 48 04 03 EE 72 B4 6B 1A 28 18 06 07 00 11 86 05 01 01 01 A0
How would be the way to do this?
You should write second string with the offset by the size of first string:
File.binwrite("outfile.bin",hex_string2,hex_string1.size)
File.binwrite("outfile.bin",hex_string1,0)
In this case you'll get exactly what you want:
A2 48 04 03 EE 72 B4 6B 1A 28 18 06 07 00 11 86 05 01 01 01 A0
I am missing some episodes of the TV series friends, and I would like to know how many files I am missing per season. I would like to print out the last episode of each season and the number of files for each season.
The files have the format:
Friends S01E01 The Pilot.mkv
Friends S10E11 The One Where the Stripper Cries.mkv
The following bash script/oneliner should give you what you need, with details because it might help if you have the last episode of a season but earlier episodes are missing:
#!/bin/bash
ls Friends* | cut -c10-14 | \
awk -F'E' '{arr[$1]=arr[$1]" "$2; num[$1]++;} END { for (i in arr) printf "Season %s (%2d files) : %s\n", i, num[i], arr[i] }' | \
sort
Using awk, arrays with index being the number of the season are incremented to count the number of episodes, and also print the list of episode numbers so you can easily see which ones are missing. I used cut with columns 10 to 14 because in this case, we can safely assume that the numbers are where we want them.
The output is as follows:
Season 01 ( 9 files) : 01 02 03 04 05 06 07 08 09
Season 02 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 03 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 04 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 05 ( 9 files) : 01 03 04 05 06 07 08 09 10
Season 06 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 07 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 08 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 09 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 10 ( 7 files) : 01 02 03 04 05 06 10
The following bash scipt will work:
#!/bin/bash
for i in {01..10}
do
ls Friends\ S$i* | tail -n 1
ls Friends\ S$i* | wc -l
printf "\n"
done
It will produce results as follows:
Friends S01E24 The One Where Rachel Finds Out.mkv
24
Friends S02E24 The One with Barry and Mindy's Wedding.mkv
24
I am communicating with a servo via RS232 serial. The built-in functions that came with my servo are too slow (25 ms for a simple 54 byte message on a 57,600 baud port), so I am trying to write my own communication functions, however the built-in functions are not documented. I have used a port monitor to determine what information is being sent to the servo and I need help deciphering the results.
I used the built-in functions to command the servo to "goto" incrementally increasing steps (1, 2, 3, etc.). This resulted 5 packets being sent to the servo for each "goto" command. The first 4 packets are identical for each "goto" command. I have attached about 50 hex packet below (1 per line). If you need more, post, and we can work something out.
10 13 04 20 00 01 B6 24 E9 68
10 13 04 20 00 00 AE 24 54 82
10 13 04 20 00 00 B5 24 8B 0B
10 13 04 20 00 01 43 01 71 9B
The 5th packet varies based on the step the motor is being commanded to move to. I have included 1 packet here as an example. I have attached a file with about 1000 of these packets (1 per line).
10 13 08 20 03 01 11 25 0A 00 00 00 81 CF
The first 8 bytes of this packet (10 13 08 20 03 01 11 25) appear to be the actual "goto" command. They remain the same no matter what step is specified.
The last 6 bytes (0A 00 00 00 81 CF) change based upon the step that is requested. In the file I attached, I instructed the servo to initially goto step "0", then "1", "2", etc. The first 4 bytes appear to be a little-endian integer corresponding to the number of steps (i.e. the sample command I showed above instructs the servo to goto step 10 decimal).
My question regards the last 2 bytes of the command. They appear to vary randomly, but whenever the specified step is the same they match. This leads me to believe that these 2 bytes are a checksum of some kind. My question to you is: how is the checksum calculated?
I have already tried xor'ing all the bytes, both singly and in 2 byte pairs, and I tried Fletcher's checksum, and a simple checksum (sum of all bytes). I also checked the 2's complement of each of these methods (though I certainly wouldn't mind someone checking to make sure I didn't make a mistakes in the calculations). Does anyone have any ideas?
10 13 08 20 03 01 11 25 00 00 00 00 E9 64
10 13 08 20 03 01 11 25 01 00 00 00 9F D0
10 13 08 20 03 01 11 25 02 00 00 00 04 0C
10 13 08 20 03 01 11 25 04 00 00 00 23 95
10 13 08 20 03 01 11 25 05 00 00 00 55 21
10 13 08 20 03 01 11 25 06 00 00 00 CE FD
10 13 08 20 03 01 11 25 07 00 00 00 B8 49
10 13 08 20 03 01 11 25 08 00 00 00 6C A7
10 13 08 20 03 01 11 25 09 00 00 00 1A 13
10 13 08 20 03 01 11 25 0A 00 00 00 81 CF
10 13 08 20 03 01 11 25 0C 00 00 00 A6 56
10 13 08 20 03 01 11 25 0D 00 00 00 D0 E2
10 13 08 20 03 01 11 25 0F 00 00 00 3D 8A
10 13 08 20 03 01 11 25 10 10 00 00 00 17 FA
10 13 08 20 03 01 11 25 11 00 00 00 84 77
10 13 08 20 03 01 11 25 12 00 00 00 1F AB
10 13 08 20 03 01 11 25 13 00 00 00 69 1F
10 13 08 20 03 01 11 25 14 00 00 00 38 32
10 13 08 20 03 01 11 25 15 00 00 00 4E 86
10 13 08 20 03 01 11 25 16 00 00 00 D5 5A
10 13 08 20 03 01 11 25 17 00 00 00 A3 EE
10 13 08 20 03 01 11 25 18 00 00 00 77 00
10 13 08 20 03 01 11 25 19 00 00 00 01 B4
10 13 08 20 03 01 11 25 1A 00 00 00 9A 68
10 13 08 20 03 01 11 25 1B 00 00 00 EC DC
10 13 08 20 03 01 11 25 1C 00 00 00 BD F1
10 13 08 20 03 01 11 25 1D 00 00 00 CB 45
10 13 08 20 03 01 11 25 1E 00 00 00 50 99
10 13 08 20 03 01 11 25 1F 00 00 00 26 2D
10 13 08 20 03 01 11 25 20 00 00 00 DE 2A
10 13 08 20 03 01 11 25 21 00 00 00 A8 9E
10 13 08 20 03 01 11 25 22 00 00 00 33 42
10 13 08 20 03 01 11 25 24 00 00 00 14 DB
10 13 08 20 03 01 11 25 25 00 00 00 62 6F
10 13 08 20 03 01 11 25 26 00 00 00 F9 B3
10 13 08 20 03 01 11 25 27 00 00 00 8F 07
10 13 08 20 03 01 11 25 28 00 00 00 5B E9
10 13 08 20 03 01 11 25 29 00 00 00 2D 5D
10 13 08 20 03 01 11 25 2A 00 00 00 B6 81
10 13 08 20 03 01 11 25 2B 00 00 00 C0 35
10 13 08 20 03 01 11 25 2C 00 00 00 91 18
10 13 08 20 03 01 11 25 2D 00 00 00 E7 AC
10 13 08 20 03 01 11 25 2E 00 00 00 7C 70
10 13 08 20 03 01 11 25 2F 00 00 00 0A C4
10 13 08 20 03 01 11 25 30 00 00 00 C5 8D
10 13 08 20 03 01 11 25 31 00 00 00 B3 39
10 13 08 20 03 01 11 25 32 00 00 00 28 E5
10 13 08 20 03 01 11 25 33 00 00 00 5E 51
10 13 08 20 03 01 11 25 34 00 00 00 0F 7C
10 13 08 20 03 01 11 25 35 00 00 00 79 C8
10 13 08 20 03 01 11 25 36 00 00 00 E2 14
10 13 08 20 03 01 11 25 37 00 00 00 94 A0
10 13 08 20 03 01 11 25 38 00 00 00 40 4E
10 13 08 20 03 01 11 25 39 00 00 00 36 FA
10 13 08 20 03 01 11 25 3A 00 00 00 AD 26
10 13 08 20 03 01 11 25 3B 00 00 00 DB 92
10 13 08 20 03 01 11 25 3C 00 00 00 8A BF
10 13 08 20 03 01 11 25 3D 00 00 00 FC 0B
10 13 08 20 03 01 11 25 3E 00 00 00 67 D7
10 13 08 20 03 01 11 25 3F 00 00 00 11 63
this is a late answer, but hopefully this can help for other CRC re-engineering tasks:
Your CRC is a derivation of the so-called "16 bit width CRC as designated by CCITT", but with "init value zero".
The CRC is calculated from byte position 3 to byte position 12 of your example data. e.g.
08 20 03 01 11 25 00 00 00 00
The full CRC specification according to our CRC specification overview is:
CRC:16,1021,0000,0000,No,No
The problem was not only to find the right CRC polynomial, but finding the following answers:
Which part of the data is included in the CRC calculation, and which is not.
Which init value to use? Apply final xor value?
Does this algorithm expect reflected input or output values?
Again, see our manual description or the Boost CRC library on what this means.
What I did is running a brute-force script that simply tries out several popular 16 bit CRC polynomials with all kinds of combinations of start/end positions, initial values, reflected versions. Here is how the processing output looked:
Finding CRC for test message (HEX): 10 13 08 20 03 01 11 25 00 00 00 00 E9 64
Trying CRC spec : CRC:16,1021,FFFF,0000,No,No
Trying CRC spec : CRC:16,8005,0000,0000,No,No
Trying CRC spec : CRC:16,8005,FFFF,0000,No,No
Trying CRC spec : CRC:16,1021,FFFF,FFFF,No,No
Trying CRC spec : CRC:16,1021,0000,FFFF,No,No
Trying CRC spec : CRC:16,1021,0000,0000,No,No
Found it!
Relevant sequence for checksum from startpos=3 to endpos=12
08 20 03 01 11 25 00 00 00 00
CRC spec: CRC:16,1021,0000,0000,No,No
CRC result: E9 64 (Integer = 59748)
With the result I could re-calculate the checksum of your example telegrams correctly
19.09.2016 12:18:12.764 [TX] - 10 13 08 20 03 01 11 25 00 00 00 00 E9 64
19.09.2016 12:18:14.606 [TX] - 10 13 08 20 03 01 11 25 01 00 00 00 9F D0
19.09.2016 12:18:16.030 [TX] - 10 13 08 20 03 01 11 25 02 00 00 00 04 0C
I uploaded the documented CRC finder example script which works with the free Docklight Scripting V2.2 evaluation. I assume this can be very useful for other CRC re-engineering puzzles, too.
The example also helped to solve Stackoverflow question 22219796