Removing duplicate blocks of lines from a file - bash

I have a certain file structure like this
>ID1
data about ID1....
................
................
>ID2
data about ID2....
................
................
................
................
>ID3
data about ID3....
................
................
...............
>ID1
data about ID1....
................
>ID5
data about ID5....
................
................
I want to remove these duplicate blocks of IDs. For eg in the above case it is ID1. It should be noted that only the ID part is same, the data after that could be different. However, I want to keep the first one and remove all the other ones. How can I do this in shell scripting manner?

In awk
awk '/^>/{p=!($0 in a);a[$0]}p' file1

Related

Ruby process: broken /proc/self/environ

There is a really strange situation in my Ruby-based processes: their /proc/self/environ is really broken. For some reason, the ENV inside Ruby looks fine but I'd like to understand what's going on.
Processes are started using bundle exec, for example bundle exec sidekiq. The end result is that the /proc/<pid>/environ file only contains a couple of bytes (usually 4) of the invoked command plus a bunch of zeroes. In the example above, the environ file would look like
$ sudo hexdump -C /proc/2613895/environ
00000000 65 6b 69 71 00 00 00 00 00 00 00 00 00 00 00 00 |ekiq............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000390 00 00 00 00 00 00 00 |.......|
00000397
When invoking another command line, for example a rake task, the environ file would contain the last couple of characters of the rake task name.
Since the environ file cannot be modified by the process itself after it starts, it must have been set by whomever made the execve call but I am stupefied who might be responsible and why.
This seems to mostly happen when processes are started through systemd, but none of the other processes started by systemd show the same behaviour; only the ones started through bundle exec so I'm thinking that it's not related to systemd.
The /proc/$pid/environ file normally only contains the environment passed to the process when it was created. It does not reflect any changes to its environment the process may have made after it began execution. Furthermore, it simply exposes the portion of the stack of the process which contained the original env vars. If the process modifies those stack locations that will be reflected in the content of that procfs file. See, for example, https://unix.stackexchange.com/questions/302948/change-proc-pid-environ-after-process-start.
I would complain to the Ruby team since they shouldn't be stomping on the original env var stack locations.

File format for Macos Finder Alias File Version 3

I'm trying to read the LoginItems preferences file to change a Network Volume a user has in their login items. Which mounts when they log in.
I can read the login items using NSUserDefaults and getting it as an NSDictionary. The problem is that the volume mount details are stored as binary data. The data appears to be an Alias file format except it's missing the header: 62 6F 6F 6B 00 00 00 00 6D 61 72 6B 00 00 00 00
I've tried using the current NSURL resourceValuesForKeys:fromBookmarkData: but it did not return the remote share. I realised that the machine they were created on is a 10.6 machine and these calls were only made available in 10.11.
So I tried using the old deprecated Core Carbon libraries PTRToHand and FSCopyAliasInfo to get information out of this file but all it says is that it is an Alias pointing to /Volumes/ShareMount and makes no mention of which server it is trying to connect to.
The binary data does show the string smb mount for the server.
0000000: 0000 0000 00a0 0003 0001 0000 0000 0000 ................
0000010: 0000 4244 6375 0001 ffff ffff ffff ffff ..BDcu..........
0000020: 0000 0000 0000 0000 0000 1201 fffe 0000 ................
0000030: 0000 0000 0000 ffff ffff 000e 000a 0004 ................
0000040: 006d 0061 0072 006b 000f 000a 0004 006d .m.a.r.k.......m
0000050: 0061 0072 006b 0012 0000 0013 000d 2f56 .a.r.k......../V
0000060: 6f6c 756d 6573 2f6d 6172 6b00 0009 002b olumes/mark....+
0000070: 002b 6369 6673 0000 0100 0000 736d 623a .+cifs......smb:
0000080: 2f2f 6d61 726b 4031 302e 3130 312e 3232 //mark#10.101.22
0000090: 322e 3138 352f 6d61 726b 0000 ffff 0000 2.185/mark......
I've compared this data with the wikipedia article on Macos Aliases and a python module that reads aliases both of them spcify the version number should be 2, however the byte in this position (byte 6) is 3, after this byte the binary compatibility between the two no longer match.
Does anyone know about this Version 3 Alias format?
I've found two web pages that seem to document the exact same problem I'm having;
One only available via wayback: Reversing Mac Alias v3 Data Objects He has only parsed 1 file here and an SMB mount one is slightly different.
Apple's Bookmark Data Exposed Pretty much describes the steps that I have followed myself to end up asking this question.
Thanks

Replace bootloader on sama5d3 from the running linux system

I'd like to replace the first stage bootloader in the nand flash on a sama5d36 based system running 4.1.0-linux4sam_5.1 and buildroot-2016.02.
I can replace the kernel image with flashcp just fine, but when I try it with the bootloader, flashcp runs without errors, but the system doesn't boot afterwards, stays at the ROMBOOT prompt.
buildroot:~# flashcp -v at91bootstrap.bin /dev/mtd0
Erasing block: 1/1 (100%)
Writing kb: 14/14 (100%)
Verifying kb: 14/14 (100%)
buildroot:~# reboot
[...]
Sent SIGKILL to all processes
Requesting system reboot
�RomBOOTRestarting system
Then I can write the same bootloader image with sam-ba, and it will boot, so the image is good. How can it be flashed in Linux, without user intervention?
There should be a 208 byte header preceding the actual boot code at the beginning of the flash.
From the SAMA5D3 Datasheet (that I should have read before posting the question)
After Initialization and Reset command, the Boot Program reads the first page without an ECC check, to determine if the NAND parameter header is present. The header is made of 52 times the same 32-bit word (for redundancy reasons) which must contain NAND and
PMECC parameters used to correctly perform the read of the rest of the data in the NAND.
The header is of course there when I dump the contents of the boot sector
buildroot:~# hd < /dev/mtd0 | head -4
00000000 05 24 90 c0 05 24 90 c0 05 24 90 c0 05 24 90 c0 |.$...$...$...$..|
*
000000d0 0e 00 00 ea 05 00 00 ea 05 00 00 ea 05 00 00 ea |................|
000000e0 05 00 00 ea cc 3b 00 00 06 00 00 ea 06 00 00 ea |.....;..........|
the first four bytes are repeated over and over, and the ARM jump table begins at offset 0xD0 (=208=52 * 4)
sam-ba takes care of this header when it writes the boot sector, but the Linux mtd driver and flashcp treats it as ordinary data, so I should supply it.

tcpdump: server client communication

I'm capturing the communication between a server and a client with tcpdump -X. I noticed a pattern and I'm not sure I fully understand it. In the following I have replaced all the header data (IP and TCP, both 20 bytes, thus the payload starts on the third row and fifth hex in every packet) with an X and accordingly all the ascii with dots. But you can read the payload which my question will be referring to.
Below you see the exact pattern that happens every ~15 seconds between the server and the client. If you convert "0500 0000 0000" (payload of first packet) from hex to binary to decimal and look that up in an ascii table you will notice that the first message that the client is sending to the server is "ENQ". Then the server responds with "06". Again if you convert that 06 (hex > binary > ascii table) you notice that the server actually says "ACK". In the third packet the client sends zeros (why???). After that there's silence and after 15 seconds this exact pattern repeats itself.
So here are my questions:
Is this a known pattern and what is it good for? Must be some sort of "hey, you still there buddy?" communication (please confirm if true?). I'm quite new to exploring network communication. But why does the ENQ come with trailing zeros (first packet)? I mean the ACK (second packet) only comes in with 1 byte which is sufficient and makes sense. I would expect the same for the ENQ? And why does the client send zeros in the third packet before the pattern repeats itself? What's the purpose of that?
10:22:10.579188 IP CLIENT > SERVER
0x0000: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0010: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0020: XXXX XXXX XXXX XXXX 0500 0000 0000 ..............
10:22:10.579360 SERVER > CLIENT
0x0000: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0010: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0020: XXXX XXXX XXXX XXXX 06 .........
10:22:10.779322 CLIENT > SERVER
0x0000: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0010: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ................
0x0020: XXXX XXXX XXXX XXXX 0000 0000 0000 ..............
Edited:
Looks like keep-alive messages (https://www.rfc-editor.org/rfc/rfc1122#page-101), but normally it's ACK packets with no data or one octet garbage data. Yours packet are filled with XX so I'm not sure if it's a ACK packets, but only one packet is with one octet (probably) garbage data. Could you show your flag bytes in packets (byte number 34)?
BTW, IMHO there is no packets named ENQ - writing ENQ do you mean flag or something else?
Previous answer:
IP and TCP headers indeed has 20 bytes (usually), but don't forget about first 14 bytes - it is Ethernet header. So, usually valid TCP/IP header is 14 + 20 + 20 = 54 bytes - your packets has 46 or 41 bytes. It's not a TCP packets. May be this article will be helpful:
   https://www.pacificsimplicity.ca/blog/reading-packet-hex-dumps-manually-no-wireshark
Check Ethernet header to be sure what kind of protocol is used (bytes 13-14 and compare to http://standards-oui.ieee.org/ethertype/eth.txt).

How to export FTP-data from several packages

I've been trying for hours to solve this. Googling like a maniac aswell. How do I export the FTP-data from a bunch of packages? Like when you export HTTP-packages in Wireshark, in just a few clicks you can export all packages as a single one to a file and then just open the HTML page.
Lets say you downloaded a .zip file (through FTP) and you caught this with Wireshark. Now I want to export all those FTP-data packages containing the .zip file to a copy of the .zip file. How can I do that? I managed to get all hexdumps (I think that's what it is called) of the packages, and it looks like this:
0000 00 50 56 ca 11 d8 00 50 18 03 39 80 08 00 45 00 .PV....P..9...E.
0010 04 34 06 34 40 00 2d 06 d3 6f c1 e7 ec 2a c0 a8 .4.4#.-..o...*..
etc...
Maybe I can convert that somehow? Or is there some other way?
You can use Bro to extract files from FTP traffic (and other protocols as well). Simply run it as follows:
bro -r trace.pcap 'FTP::extract_file_types = /.*/'
The pattern controls the MIME type of the files to extract. Change -r <trace> to -i <interface> when sniffing on a network interface. Bro creates log files in the same directory it is being run. In addition to the basic logs, you'll now find files named
ftp-item_<SERVER-IP>:<SERVER-DATA-PORT>-<CLIENT-IP>:<CLIENT-PORT>.dat
which contain the payload of the FTP data.

Resources