I am using git 1.7.2.3 via cygwin on Windows 7 and seeing strange artifacts appearing in some of my source files when switching branches. git status reports everything as unchanged yet they crazy characters are present. I've confirmed on GitHub that the files are as they should be in the repo.
My Copy:
⼀⼀⼀ 㰀猀甀洀洀愀爀礀㸀ഀഀ
/// Set up method.
⼀⼀⼀ 㰀⼀猀甀洀洀愀爀礀㸀ഀഀ
[SetUp]
瀀甀戀氀椀挀 漀瘀攀爀爀椀搀攀 瘀漀椀搀 匀攀琀甀瀀⠀⤀ഀഀ
{
琀栀椀猀⸀匀挀漀瀀攀 㴀 渀攀眀 吀爀愀渀猀愀挀琀椀漀渀匀挀漀瀀攀⠀⤀㬀ഀഀ
琀栀椀猀⸀琀攀猀琀䤀琀攀洀 㴀 渀攀眀 嘀椀攀眀䐀漀挀甀洀攀渀琀䠀椀猀琀漀爀礀⠀ ഀഀ
625016,
㔀㜀㤀㤀㘀Ⰰ ഀഀ
'T',
㌀㐀㠀㌀㔀㈀㤀Ⰰ ഀഀ
DateTime.Parse("2003-01-08 09:57:04.957"),
㌀Ⰰ ഀഀ
"Invoice (PG-PS) - SUPP(11/16/2008)",
∀䘀䤀一䄀一䌀䔀∀Ⰰ ഀഀ
DateTime.Parse("2008-04-11 11:15:07.770"),
䀀∀尀尀䐀伀匀䬀尀䌀䜀䐀伀䌀匀尀㌀㜀㐀㤀㐀尀㐀㘀 㐀㘀尀戀椀氀猀氀椀瀀开 㠀㘀㐀㠀⸀搀漀挀∀⤀㬀ഀഀ
}
Repo Copy:
/// <summary>
/// Set up method.
/// </summary>
[SetUp]
public override void Setup()
{
this.Scope = new TransactionScope();
this.testItem = new ViewDocumentHistory(
625016,
57996,
'T',
3483529,
DateTime.Parse("2003-01-08 09:57:04.957"),
3,
"Invoice (PG-PS) - SUPP(11/16/2008)",
"FINANCE",
DateTime.Parse("2008-04-11 11:15:07.770"),
#"\\DOSK\CGDOCS\374914\46046\bilslip_1081648.doc");
}
I'm also using a .gitattributes file to ensure line endings are correct since we are developing on Windows.
*.cs eol=crlf text
*.csproj eol=crlf text
*.sln eol=crlf text
*.xml eol=crlf text
The text is an addition by me to attempt to fix the problem as git diff was interpreting the file as binary when I modified it. Didn't have any effect.
This also occurs on fresh checkouts in 1.7.2.3 but not in 1.6.5.1 (mysysgit) as far as I can tell. The caveat is that 1.6 doesn't support .gitattributes which I need for working on Windows. This seems to be a fairly new bug and I haven't changed any configuration.
Does anyone have any idea what could be causing this?
edit:
hexdump -C ViewDocumentHistoryTests.cs | sed -n "130,212p"
000008d0 00 20 00 20 00 2f 00 2f 00 2f 00 20 00 3c 00 73 |. . ./././. .<.s|
000008e0 00 75 00 6d 00 6d 00 61 00 72 00 79 00 3e 00 0d |.u.m.m.a.r.y.>..|
000008f0 00 0d 0a 00 20 00 20 00 20 00 20 00 20 00 20 00 |.... . . . . . .|
00000900 20 00 20 00 2f 00 2f 00 2f 00 20 00 53 00 65 00 | . ./././. .S.e.|
00000910 74 00 20 00 75 00 70 00 20 00 6d 00 65 00 74 00 |t. .u.p. .m.e.t.|
00000920 68 00 6f 00 64 00 2e 00 0d 00 0d 0a 00 20 00 20 |h.o.d........ . |
00000930 00 20 00 20 00 20 00 20 00 20 00 20 00 2f 00 2f |. . . . . . ././|
00000940 00 2f 00 20 00 3c 00 2f 00 73 00 75 00 6d 00 6d |./. .<./.s.u.m.m|
00000950 00 61 00 72 00 79 00 3e 00 0d 00 0d 0a 00 20 00 |.a.r.y.>...... .|
00000960 20 00 20 00 20 00 20 00 20 00 20 00 20 00 5b 00 | . . . . . . .[.|
00000970 53 00 65 00 74 00 55 00 70 00 5d 00 0d 00 0d 0a |S.e.t.U.p.].....|
00000980 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000990 00 70 00 75 00 62 00 6c 00 69 00 63 00 20 00 6f |.p.u.b.l.i.c. .o|
000009a0 00 76 00 65 00 72 00 72 00 69 00 64 00 65 00 20 |.v.e.r.r.i.d.e. |
000009b0 00 76 00 6f 00 69 00 64 00 20 00 53 00 65 00 74 |.v.o.i.d. .S.e.t|
000009c0 00 75 00 70 00 28 00 29 00 0d 00 0d 0a 00 20 00 |.u.p.(.)...... .|
000009d0 20 00 20 00 20 00 20 00 20 00 20 00 20 00 7b 00 | . . . . . . .{.|
000009e0 0d 00 0d 0a 00 20 00 20 00 20 00 20 00 20 00 20 |..... . . . . . |
000009f0 00 20 00 20 00 20 00 20 00 20 00 20 00 74 00 68 |. . . . . . .t.h|
00000a00 00 69 00 73 00 2e 00 53 00 63 00 6f 00 70 00 65 |.i.s...S.c.o.p.e|
00000a10 00 20 00 3d 00 20 00 6e 00 65 00 77 00 20 00 54 |. .=. .n.e.w. .T|
00000a20 00 72 00 61 00 6e 00 73 00 61 00 63 00 74 00 69 |.r.a.n.s.a.c.t.i|
00000a30 00 6f 00 6e 00 53 00 63 00 6f 00 70 00 65 00 28 |.o.n.S.c.o.p.e.(|
00000a40 00 29 00 3b 00 0d 00 0d 0a 00 0d 00 0d 0a 00 20 |.).;........... |
00000a50 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000a60 00 20 00 20 00 20 00 74 00 68 00 69 00 73 00 2e |. . . .t.h.i.s..|
00000a70 00 74 00 65 00 73 00 74 00 49 00 74 00 65 00 6d |.t.e.s.t.I.t.e.m|
00000a80 00 20 00 3d 00 20 00 6e 00 65 00 77 00 20 00 56 |. .=. .n.e.w. .V|
00000a90 00 69 00 65 00 77 00 44 00 6f 00 63 00 75 00 6d |.i.e.w.D.o.c.u.m|
00000aa0 00 65 00 6e 00 74 00 48 00 69 00 73 00 74 00 6f |.e.n.t.H.i.s.t.o|
00000ab0 00 72 00 79 00 28 00 20 00 0d 00 0d 0a 00 20 00 |.r.y.(. ...... .|
00000ac0 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 | . . . . . . . .|
00000ad0 20 00 20 00 20 00 20 00 20 00 20 00 20 00 36 00 | . . . . . . .6.|
00000ae0 32 00 35 00 30 00 31 00 36 00 2c 00 20 00 0d 00 |2.5.0.1.6.,. ...|
00000af0 0d 0a 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |... . . . . . . |
00000b00 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000b10 00 20 00 35 00 37 00 39 00 39 00 36 00 2c 00 20 |. .5.7.9.9.6.,. |
00000b20 00 0d 00 0d 0a 00 20 00 20 00 20 00 20 00 20 00 |...... . . . . .|
00000b30 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 | . . . . . . . .|
00000b40 20 00 20 00 20 00 27 00 54 00 27 00 2c 00 20 00 | . . .'.T.'.,. .|
00000b50 0d 00 0d 0a 00 20 00 20 00 20 00 20 00 20 00 20 |..... . . . . . |
00000b60 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000b70 00 20 00 20 00 33 00 34 00 38 00 33 00 35 00 32 |. . .3.4.8.3.5.2|
00000b80 00 39 00 2c 00 20 00 0d 00 0d 0a 00 20 00 20 00 |.9.,. ...... . .|
00000b90 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 | . . . . . . . .|
00000ba0 20 00 20 00 20 00 20 00 20 00 20 00 44 00 61 00 | . . . . . .D.a.|
00000bb0 74 00 65 00 54 00 69 00 6d 00 65 00 2e 00 50 00 |t.e.T.i.m.e...P.|
00000bc0 61 00 72 00 73 00 65 00 28 00 22 00 32 00 30 00 |a.r.s.e.(.".2.0.|
00000bd0 30 00 33 00 2d 00 30 00 31 00 2d 00 30 00 38 00 |0.3.-.0.1.-.0.8.|
00000be0 20 00 30 00 39 00 3a 00 35 00 37 00 3a 00 30 00 | .0.9.:.5.7.:.0.|
00000bf0 34 00 2e 00 39 00 35 00 37 00 22 00 29 00 2c 00 |4...9.5.7.".).,.|
00000c00 0d 00 0d 0a 00 20 00 20 00 20 00 20 00 20 00 20 |..... . . . . . |
00000c10 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000c20 00 20 00 20 00 33 00 2c 00 20 00 0d 00 0d 0a 00 |. . .3.,. ......|
00000c30 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 | . . . . . . . .|
*
00000c50 22 00 49 00 6e 00 76 00 6f 00 69 00 63 00 65 00 |".I.n.v.o.i.c.e.|
00000c60 20 00 28 00 50 00 47 00 2d 00 50 00 53 00 29 00 | .(.P.G.-.P.S.).|
00000c70 20 00 2d 00 20 00 53 00 55 00 50 00 50 00 28 00 | .-. .S.U.P.P.(.|
00000c80 31 00 31 00 2f 00 31 00 36 00 2f 00 32 00 30 00 |1.1./.1.6./.2.0.|
00000c90 30 00 38 00 29 00 22 00 2c 00 20 00 0d 00 0d 0a |0.8.).".,. .....|
00000ca0 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
*
00000cc0 00 22 00 46 00 49 00 4e 00 41 00 4e 00 43 00 45 |.".F.I.N.A.N.C.E|
00000cd0 00 22 00 2c 00 20 00 0d 00 0d 0a 00 20 00 20 00 |.".,. ...... . .|
00000ce0 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 | . . . . . . . .|
00000cf0 20 00 20 00 20 00 20 00 20 00 20 00 44 00 61 00 | . . . . . .D.a.|
00000d00 74 00 65 00 54 00 69 00 6d 00 65 00 2e 00 50 00 |t.e.T.i.m.e...P.|
00000d10 61 00 72 00 73 00 65 00 28 00 22 00 32 00 30 00 |a.r.s.e.(.".2.0.|
00000d20 30 00 38 00 2d 00 30 00 34 00 2d 00 31 00 31 00 |0.8.-.0.4.-.1.1.|
00000d30 20 00 31 00 31 00 3a 00 31 00 35 00 3a 00 30 00 | .1.1.:.1.5.:.0.|
00000d40 37 00 2e 00 37 00 37 00 30 00 22 00 29 00 2c 00 |7...7.7.0.".).,.|
00000d50 20 00 0d 00 0d 0a 00 20 00 20 00 20 00 20 00 20 | ...... . . . . |
00000d60 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 |. . . . . . . . |
00000d70 00 20 00 20 00 20 00 40 00 22 00 5c 00 5c 00 44 |. . . .#.".\.\.D|
00000d80 00 4f 00 53 00 4b 00 5c 00 43 00 47 00 44 00 4f |.O.S.K.\.C.G.D.O|
00000d90 00 43 00 53 00 5c 00 33 00 37 00 34 00 39 00 31 |.C.S.\.3.7.4.9.1|
00000da0 00 34 00 5c 00 34 00 36 00 30 00 34 00 36 00 5c |.4.\.4.6.0.4.6.\|
00000db0 00 62 00 69 00 6c 00 73 00 6c 00 69 00 70 00 5f |.b.i.l.s.l.i.p._|
00000dc0 00 31 00 30 00 38 00 31 00 36 00 34 00 38 00 2e |.1.0.8.1.6.4.8..|
00000dd0 00 64 00 6f 00 63 00 22 00 29 00 3b 00 0d 00 0d |.d.o.c.".).;....|
00000de0 0a 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 |.. . . . . . . .|
00000df0 20 00 7d 00 0d 00 0d 0a 00 0d 00 0d 0a 00 20 00 | .}........... .|
It appears this is some sort of encoding problem.
You're saving your files as UTF-16, the encoding that Windows text editors misleadingly call “Unicode”.
UTF-16 is not ASCII-compatible and so won't work properly with the diff tool used by git. What you're getting is a single byte change to the input on every newline (presumably due to conversion between LF and Windows CRLF line endings) causing the two-byte alignment of UTF-16 code units to be out by one, causing the low byte and high byte to be swapped:
original text: < s u m m a r y >
representation in UTF-16LE: 3C 00 73 00 75 00 6D 00 6D 00 61 00 72 00 79 00 3E 00
accidentally misaligned: 00 3C 00 73 00 75 00 6D 00 6D 00 61 00 72 00 79 00 3E
decoded from misaligned: 㰀 猀 甀 洀 洀 愀 爀 礀 㸀
Save your files in an ASCII-compatible encoding and you'll not have this trouble. Preferably: UTF-8-without-BOM.
Related
Short Version
Is there any documentation on the Outlook RenPrivateAppointment clipboard format used to transfer appointments?
Long version
As a reminder, for anything on the clipboard, the source application can present you the data in a number of different formats. The receiver can go through the list, in order, and decide which format it understands the best.
In the case of my Outlook appointment, the formats are:
0: "RenPrivateSourceFolder" (IStream)
1: "RenPrivateMessages" (IStream)
2: "RenPrivateItem" (HGlobal)
3: "FileGroupDescriptor" (HGlobal)
4: CFSTR_FILEDESCRIPTOR (HGlobal)
5: CFSTR_FILENAME (File)
6: CFSTR_FILECONTENTS (IStream, IStorage)
7: "Object Descriptor" (HGlobal)
8: "RenPrivateAppointment" (IStream)
9: CF_TEXT (HGlobal)
10: CF_UNICODETEXT (HGlobal)
Looking at the content of the various formats, the most promising looks like the RenPrivateAppointment format:
01 00 00 00 C0 C8 1E 0D 60 CE 1E 0D 01 00 00 00 ....ÀÈ.`Î......
6A CB 1E 0D 79 CB 1E 0D 41 00 00 00 41 73 6B 20 jË..yË..A...Ask
71 75 65 73 74 69 6F 6E 20 61 62 6F 75 74 20 61 question about a
70 70 6F 69 6E 74 6D 65 6E 74 20 63 6C 69 70 62 ppointment clipb
6F 61 72 64 20 66 6F 72 6D 61 74 20 6F 6E 20 53 oard format on S
74 61 63 6B 6F 76 65 72 66 6C 6F 77 00 02 00 00 tackoverflow...
00 02 00 00 00 18 00 00 00 00 00 00 00 BC B9 6E ............¼¹n
9C 12 F8 D3 43 AC B7 74 81 5E F0 3D FC 04 D2 97 œ.øÓC¬·t.^ð=ü.Ò—
00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 ...............
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 FF 92 81 02 41 00 73 00 6B 00 20 00 71 00 75 .ÿ’.A.s.k. .q.u
00 65 00 73 00 74 00 69 00 6F 00 6E 00 20 00 61 .e.s.t.i.o.n. .a
00 62 00 6F 00 75 00 74 00 20 00 61 00 70 00 70 .b.o.u.t. .a.p.p
00 6F 00 69 00 6E 00 74 00 6D 00 65 00 6E 00 74 .o.i.n.t.m.e.n.t
00 20 00 63 00 6C 00 69 00 70 00 62 00 6F 00 61 . .c.l.i.p.b.o.a
00 72 00 64 00 20 00 66 00 6F 00 72 00 6D 00 61 .r.d. .f.o.r.m.a
00 74 00 20 00 6F 00 6E 00 20 00 53 00 74 00 61 .t. .o.n. .S.t.a
00 63 00 6B 00 6F 00 76 00 65 00 72 00 66 00 6C .c.k.o.v.e.r.f.l
00 6F 00 77 00 00 00 01 00 00 00 00 00 FF FF FF .o.w.........ÿÿÿ
FF ÿ
Some of this can be interpreted:
Clipboard format "RenPrivateAppointment"
01 00 00 00 ; always 0x00000001 (Version 1?)
C0 C8 1E 0D ; Start day of appt. minutes from 1/1/1601 0x0D1EC8C0 = 220,121,280 minutes = 7/11/2019 12:00 am
60 CE 1E 0D ; End day of appt. minutes from 1/1/1601 0x0D1ECE60 = 220,122,720 minutes = 7/12/2019 12:00 am
01 00 00 00 ; 0x00000001 (fixed)
6A CB 1E 0D ; Start of appt. minutes from 1/1/1601 0x0D1ECB6A = 220,121,962 minutes = 7/11/2019 11:22 am
79 CB 1E 0D ; End of appt. minutes from 1/1/1601 0x0D1ECB79 = 220,121,977 minutes = 7/11/2019 11:37 am
; "Ask question about appointment clipboard format on Stackoverflow.\0"
41 00 00 00 ; String length prefix, including null terminator (0x00000041 = 65 characters)
41 73 6B 20 71 75 65 73 Ask ques
74 69 6F 6E 20 61 62 6F tion abo
75 74 20 61 70 70 6F 69 ut appoi
6E 74 6D 65 6E 74 20 63 ntment c
6C 69 70 62 6F 61 72 64 lipboard
20 66 6F 72 6D 61 74 20 format
6F 6E 20 53 74 61 63 6B on Stack
6F 76 65 72 66 6C 6F 77 overflow
00 .
02 00 00 00 ; 0x0000002 = 2
02 00 00 00 ; 0x0000002 = 2
18 00 00 00 ; 0x00000018 = 24
00 00 00 00 ; 0x00000000 = 0
BC B9 6E 9C 12 F8 D3 43 ; always
AC B7 74 81 5E F0 3D FC ; always
04 D2 97 00 ; varies (~32 ticks per day) 0x0097D204 = 9,949,700
00 00 00 00
00 00 00 00
02 00 00 00 ; 0x00000002 = 2
00 00 00 00
01 00 00 00 ; 0x00000001 = 1
00 00 00 00
00 00 00 00
00 00 00 00
FF 92 81 02 ; always 0x028192FF
; N"Ask question about appointment clipboard format on Stackoverflow\0"
41 00 73 00 6B 00 20 00 71 00 75 00 65 00 73 00 A.s.k. .q.u.e.s.
74 00 69 00 6F 00 6E 00 20 00 61 00 62 00 6F 00 t.i.o.n. .a.b.o.
75 00 74 00 20 00 61 00 70 00 70 00 6F 00 69 00 u.t. .a.p.p.o.i.
6E 00 74 00 6D 00 65 00 6E 00 74 00 20 00 63 00 n.t.m.e.n.t. .c.
6C 00 69 00 70 00 62 00 6F 00 61 00 72 00 64 00 l.i.p.b.o.a.r.d.
20 00 66 00 6F 00 72 00 6D 00 61 00 74 00 20 00 .f.o.r.m.a.t. .
6F 00 6E 00 20 00 53 00 74 00 61 00 63 00 6B 00 o.n. .S.t.a.c.k.
6F 00 76 00 65 00 72 00 66 00 6C 00 6F 00 77 00 o.v.e.r.f.l.o.w.
00 00 ..
01 00 ; padding to DWORD
00 00 00 00
FF FF FF FF ; footer
Is there any documentation on RenPrivateAppointment, or any other the other formats that would allow rich interactions by the user?
Note: This is not automating Outlook. This is handling the IDataObject placed on the clipboard by Outlook. I want to retrieve:
start time
end time
description
See also
C# parse outlook calendar item (i'm not in C#)
microsoft.public.win32.programmer.ole: Identify correctly outlook items in Drag and Drop.
There is a project on GitHub that parses the RenPrivateAppointment clipboard format: https://github.com/yasoonOfficial/outlook-dndprotocol
The RenPrivateAppointment format isn't documented. You may read about that on the DragDrop Event in Outlook Calendar thread which has an official comment from a VSTO team member. Also, you may take a look at the Drag and Drop with Outlook page.
I have been trying to parse an ASCII text file of the following format --
0 0 0x2de0 [0x98]: PERF_RECORD_MMAP -1/0: [0xffffffffc06ae000(0x5000) # 0]: x /lib/modules/4.4.0-83-generic/kernel/net/ipv4/netfilter/nf_reject_ipv4.ko
0x2e78 [0x90]: event: 1
.
. ... raw event: size 144 bytes
. 0000: 01 00 00 00 01 00 90 00 ff ff ff ff 00 00 00 00 ................
. 0010: 00 30 6b c0 ff ff ff ff 00 50 00 00 00 00 00 00 .0k......P......
. 0020: 00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64 ......../lib/mod
. 0030: 75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65 ules/4.4.0-83-ge
. 0040: 6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 6e 65 74 neric/kernel/net
. 0050: 2f 69 70 76 34 2f 6e 65 74 66 69 6c 74 65 72 2f /ipv4/netfilter/
. 0060: 69 70 74 5f 52 45 4a 45 43 54 2e 6b 6f 00 2e 6b ipt_REJECT.ko..k
. 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0 0 0x2e78 [0x90]: PERF_RECORD_MMAP -1/0: [0xffffffffc06b3000(0x5000) # 0]: x /lib/modules/4.4.0-83-generic/kernel/net/ipv4/netfilter/ipt_REJECT.ko
0x2f08 [0x88]: event: 1
.
. ... raw event: size 136 bytes
. 0000: 01 00 00 00 01 00 88 00 ff ff ff ff 00 00 00 00 ................
. 0010: 00 80 6b c0 ff ff ff ff 00 50 00 00 00 00 00 00 ..k......P......
. 0020: 00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64 ......../lib/mod
. 0030: 75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65 ules/4.4.0-83-ge
. 0040: 6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 6e 65 74 neric/kernel/net
. 0050: 2f 6e 65 74 66 69 6c 74 65 72 2f 78 74 5f 74 63 /netfilter/xt_tc
. 0060: 70 75 64 70 2e 6b 6f 00 00 00 00 00 00 00 00 00 pudp.ko.........
. 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0080: 00 00 00 00 00 00 00 00
........[some other data]........
0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0 offset: 0 ref: 0x2d44e6441a3c2 idx: 0 tid: -1 cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
--- continued ---
The file will have several headers - as you can see in my snippet here.
PERF_RECORD_MMAP and PERF_RECORD_AUXTRACE
There will be other headers in the file as well.
What I want is that all the headers having PERF_RECORD_AUXTRACE in my text file should only be considered. All the data following the PERF_RECORD_AUXTRACE in my file should only be collected (i.e. all of the data starting with Intel Processor Trace Data). The PERF_RECORD_AUXTRACE header also has a size field with the use of which I can specify how much of data is there to be collected within the PERF_RECORD_AUXTRACE header.
Edit #1 :
So basically, given the above input file snippet, I want the output to be of the following form (all the lines after record containing PERF_RECORD_AUXTRACE)...
.
. ... Intel Processor Trace data: size 2097824 bytes
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
--- continued ---
EDIT #2 : This is another requirement that I have --
If I have an input snippet like below --
0 0 0x230 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x3f000000) # 0xffffffff81000000]: x [kernel.kallsyms]_text
0x290 [0x88]: event: 1
.
. ... raw event: size 136 bytes
. 0000: 01 00 00 00 01 00 88 00 ff ff ff ff 00 00 00 00 ................
. 0010: 00 00 00 c0 ff ff ff ff 00 90 00 00 00 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64 ......../lib/mod
. 0030: 75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65 ules/4.4.0-83-ge
. 0040: 6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 64 72 69 neric/kernel/dri
. 0050: 76 65 72 73 2f 61 74 61 2f 6c 69 62 61 68 63 69 vers/ata/libahci
. 0060: 2e 6b 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 .ko.............
. 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0080: 00 00 00 00 00 00 00 00 ........
0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0 offset: 0 ref: 0x2d44e6441a3c2 idx: 0 tid: -1 cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
. 00000036: 02 c8 c2 3a 7c 00 00 00 VMCS 0x7c3ac2
0 0 0x290 [0x88]: PERF_RECORD_MMAP -1/0: [0xffffffffc0000000(0x9000) # 0]: x /lib/modules/4.4.0-83-generic/kernel/drivers/ata/libahci.ko
0x318 [0x98]: event: 1
.
. ... raw event: size 152 bytes
. 0000: 01 00 00 00 01 00 98 00 ff ff ff ff 00 00 00 00 ................
. 0010: 00 90 00 c0 ff ff ff ff 00 50 00 00 00 00 00 00 .........P......
. 0020: 00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64 ......../lib/mod
. 0030: 75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65 ules/4.4.0-83-ge
. 0040: 6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 64 72 69 neric/kernel/dri
. 0050: 76 65 72 73 2f 76 69 64 65 6f 2f 66 62 64 65 76 vers/video/fbdev
. 0060: 2f 63 6f 72 65 2f 66 62 5f 73 79 73 5f 66 6f 70 /core/fb_sys_fop
. 0070: 73 2e 6b 6f 00 00 00 00 00 00 00 00 00 00 00 00 s.ko............
. 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0090: 00 00 00 00 00 00 00 00 ........
0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0 offset: 0 ref: 0x2d44e6441a3c2 idx: 0 tid: -1 cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
. 00000036: 02 c8 c2 3a 7c 00 00 00 VMCS 0x7c3ac2
I only would need the data under the records containing PERF_RECORD_AUXTRACE just like this. It would be great if the first line that contains
Intel Processor Trace Data : size 2097824 bytes
can also be avoided from my output.
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
. 00000000: 02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
. 00000010: 00 00 00 PAD
. 00000013: 99 20 MODE.TSX TXAbort:0 InTX:0
. 00000015: 99 01 MODE.Exec 64
. 00000017: 7d 08 45 06 81 ff ff 00 FUP 0xffff81064508
. 0000001f: 00 00 00 00 00 00 00 PAD
. 00000026: 02 43 00 76 49 1f 00 00 PIP 0xfa4bb00 (NR=0)
. 0000002e: 00 00 00 00 00 00 00 00 PAD
Edit #3 : This is what I initially tried to do.. but which obviously does not work!
cat "$file" | gawk -F' ' -- '
/PERF_RECORD_AUXTRACE / {
offset = strtonum($1)
hsize = strtonum(substr($2, 2))
size = strtonum($5)
idx = strtonum($11)
ext = ""
ofile = sprintf("raw-pt.txt")
begin = offset + hsize
cmd = sprintf("dd if=%s of=%s conv=notrunc oflag=append ibs=1 " \
"count=%d status=none", file, ofile, size)
#!cmd = sprintf("sed p")
if (dry_run != 0) {
print cmd
}
else {
system(cmd)
}
}
I am not quite sure how can I properly parse this file to exactly get what I want. I also am not sure if using Python would help.
How to resolve this ?
To get the output you say you want from the input you posted is just:
awk 'f; /PERF_RECORD_AUXTRACE/{f=1}' file
If that's not actually all you want then edit your question to clarify your requirements and provide different sample input/output that more truly demonstrates your problem if necessary.
I am working with neural network to classify images.
I have some files generated by a CytoVision Platform. I would like to use the images in those files but I need to extract them somehow.
These .slide files contain several images of apparently 16kb each one.
I have developed a program in C that I am currently running on linux to extract each 16kb in files. I should build a header in order to use those images.
I don't know which format they have.
If I look at the entire file as a bitmap with FileAlyzer I can see this:
File as a bitmap
This link should allow anyone to download an example file:
https://ufile.io/2ibdq
This is what it seems to be one image header:
42 4D 31 00 00 00 00 00 40 8F 40 05 00 9E 5F 98 D7 47 60 A1 40 01 04 4D 65 74 31 00 00 00 00 00 40 8F 40 05 00 64 31 2E 29 B5 46 DC 40 01 04 4D 65 74 32 00 00 00 00 00 40 8F 40 05 00 87 7D 26 70 88 C0 C5 40 01 04 4D 65 74 33 00 00 00 00 00 40 8F 40 05 00 C8 97 53 05 BB 0D 0F 41 01 04 54 65 78 31 00 00 00 00 00 00 D0 40 05 00 00 00 00 00 00 40 5C 40 07 04 54 65 78 32 00 00 00 00 00 00 D0 40 05 00 00 00 00 00 00 00 44 40 07 04 54 65 78 33 00 00 00 00 00 00 D0 40 05 00 00 00 00 00 00 90 76 40 07 04 54 65 78 34 00 00 00 00 00 00 D0 40 05 00 00 00 00 00 00 F4 CD 40 07 0A 43 68 72 6F 6D 73 41 72 65 61 00 00 00 00 00 4C BD 40 05 00 F3 76 84 D3 82 85 74 40 07 08 42 6F 75 6E 64 61 72 79 00 00 00 00 00 88 B3 40 05 00 D9 CE F7 53 E3 AD 7E 40 07 04 41 72 65 61 00 00 00 00 00 88 B3 40 05 00 20 EF 55 2B 13 0B 85 40 07 07 4F 62 6A 65 63 74 73 00 00 00 00 00 00 69 40 05 00 00 00 00 00 00 00 18 40 03 04 43 69 72 63 00 00 00 00 00 40 8F 40 05 00 9D E5 51 0E 5C 34 65 40 03 03 42 47 52 00 00 00 00 00 40 8F 40 05 00 7D 0C CE C7 E0 AC 86 40 03 04 54 65 78 35 00 00 00 00 00 00 D0 40 05 00 00 00 00 00 00 00 53 40 07 04 41 52 41 54 00 00 00 00 00 40 8F 40 05 00 86 89 F7 23 A7 79 7E 40 07 05 43 6C 61 73 73 00 00 00 00 00 00 F0 3F 05 00 00 00 00 00 00 00 F0 BF 00 01 00 00 00 01 00 00 00
With notepad++ I can see the previous hex like this:
BM1 #? ??G`?Met1 #? d1.)??Met2 #? ?&p?bMet3 #? ?S?ATex1 ? #\#Tex2 ? D#Tex3 ? ?#Tex4 ? ??
ChromsArea L? ???t#Boundary ?# ???~#Area ?# ?bObjects i# #Circ #? ?Q\4e#BGR #? }??#Tex5 ? S#ARAT #? Ð??~#Class ?? ?? #
Hope someone can give me an idea about the format of the images and what info I can extract from the header.
I want to reverse old .exe file; I'm 90% sure it is Delphi (class names are being with 'T' => "TCommonDialog")
I can't load file into IDR (becouse it is not valid PE-executable?) jet still .exe works just fine and icon is showing just right.
I was trying to maniupulate header but every time I just corrupt .exe
Header with MZ:
00000000 4d 5a 00 01 01 00 00 00 08 00 10 00 ff ff 08 00 |MZ..............|
00000010 00 01 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........#.......|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080
Further is longer header, but with NE; I was trying to change it to PE.
At this point I don't know what am I doing, I just mess with everything
00000000 4e 45 06 01 17 07 5a 00 00 00 00 00 0a 03 16 00 |NE....Z.........|
00000010 00 20 00 40 0e 00 01 00 00 00 16 00 16 00 0c 00 |. .#............|
00000020 0e 00 40 00 f0 00 9f 06 a9 06 c1 06 71 08 00 00 |..#.........q...|
00000030 0e 00 04 00 00 00 02 00 00 00 00 00 00 00 0a 03 |................|
00000040 e4 36 cc 3d 10 1d cd 3d e5 3a 47 3e 10 1d 47 3e |.6.=...=.:G>..G>|
00000050 d3 3e 6f 3f 10 1d 70 3f d6 42 96 3f 10 1d 97 3f |.>o?..p?.B.?...?|
00000060 d9 46 20 3a 10 1d 21 3a 8f 4a 09 33 10 1d 0a 33 |.F :..!:.J.3...3|
00000070 cd 4d 99 30 10 1d 9a 30 df 50 1c 33 10 1d 1c 33 |.M.0...0.P.3...3|
00000080 17 54 7b 9a 10 1d 7b 9a d3 5d 33 3c 10 1d 33 3c |.T{...{..]3<..3<|
00000090 90 00 17 3a 50 1d 17 3a 57 04 c1 2f 50 1d c1 2f |...:P..:W../P../|
000000a0 5b 07 2c 41 50 1d 2d 41 77 0b b9 66 50 1d ba 66 |[.,AP.-Aw..fP..f|
000000b0 f5 11 28 70 50 1d 28 70 1f 19 cc 22 50 1d cc 22 |..(pP.(p..."P.."|
000000c0 55 1b 00 6f 50 1d 00 6f 6c 22 84 7a 50 1d 85 7a |U..oP..ol".zP..z|
000000d0 45 2a 31 51 50 1d 31 51 65 2f 8d 30 50 0d 8d 30 |E*1QP.1Qe/.0P..0|
000000e0 99 32 c6 1f 50 0d c6 1f eb 34 5d 1f 59 0d 8c 2f |.2..P....4].Y../|
000000f0 04 00 03 80 01 00 00 00 00 00 a9 61 30 00 10 1c |...........a0...|
00000100 01 80 00 00 00 00 0e 80 01 00 00 00 00 00 d9 61 |...............a|
00000110 10 00 10 1c e4 03 00 00 00 00 0a 80 18 00 00 00 |................|
00000120 00 00 e9 61 30 00 30 1c ed 03 00 00 00 00 19 62 |...a0.0........b|
00000130 10 11 30 1c f1 03 00 00 00 00 29 73 80 04 30 1c |..0.......)s..0.|
00000140 fb 03 00 00 00 00 a9 77 40 00 30 1c 04 04 00 00 |.......w#.0.....|
00000150 00 00 e9 77 f0 04 30 1c 11 04 00 00 00 00 d9 7c |...w..0........||
00000160 30 00 30 1c 1c 04 00 00 00 00 09 7d 80 00 30 1c |0.0........}..0.|
00000170 25 04 00 00 00 00 89 7d e0 00 30 1c 39 04 00 00 |%......}..0.9...|
00000180 00 00 69 7e 40 00 30 1c 48 04 00 00 00 00 a9 7e |..i~#.0.H......~|
00000190 c0 01 30 1c 54 04 00 00 00 00 69 80 d0 05 30 1c |..0.T.....i...0.|
000001a0 5b 04 00 00 00 00 39 86 50 00 30 1c 65 04 00 00 |[.....9.P.0.e...|
000001b0 00 00 89 86 60 00 30 1c 72 04 00 00 00 00 e9 86 |....`.0.r.......|
000001c0 50 00 30 1c 80 04 00 00 00 00 39 87 50 00 30 1c |P.0.......9.P.0.|
000001d0 8f 04 00 00 00 00 89 87 40 00 30 1c 9c 04 00 00 |........#.0.....|
000001e0 00 00 c9 87 50 00 30 1c a9 04 00 00 00 00 19 88 |....P.0.........|
000001f0 40 00 30 1c b6 04 00 00 00 00 59 88 50 00 30 1c |#.0.......Y.P.0.|
00000200 c3 04 00 00 00 00 a9 88 50 00 30 1c d1 04 00 00 |........P.0.....|
00000210 00 00 f9 88 50 00 30 1c de 04 00 00 00 00 49 89 |....P.0.......I.|
00000220 40 00 30 1c eb 04 00 00 00 00 89 89 70 01 30 1c |#.0.........p.0.|
00000230 f8 04 00 00 00 00 f9 8a 90 00 30 1c 00 05 00 00 |..........0.....|
00000240 00 00 02 80 17 00 00 00 00 00 89 8b 10 00 30 1c |..............0.|
00000250 05 05 00 00 00 00 99 8b 10 00 30 1c 0d 05 00 00 |..........0.....|
00000260 00 00 a9 8b 10 00 30 1c 18 05 00 00 00 00 b9 8b |......0.........|
00000270 10 00 30 1c 1f 05 00 00 00 00 c9 8b 10 00 30 1c |..0...........0.|
00000280 29 05 00 00 00 00 d9 8b 10 00 30 1c 33 05 00 00 |).........0.3...|
00000290 00 00 e9 8b 10 00 30 0c 3c 05 00 00 00 00 f9 8b |......0.<.......|
000002a0 10 00 30 0c 45 05 00 00 00 00 09 8c 10 00 30 1c |..0.E.........0.|
000002b0 4c 05 00 00 00 00 19 8c 10 00 30 1c 51 05 00 00 |L.........0.Q...|
000002c0 00 00 29 8c 10 00 30 1c 57 05 00 00 00 00 39 8c |..)...0.W.....9.|
000002d0 10 00 30 1c 5c 05 00 00 00 00 49 8c 10 00 30 1c |..0.\.....I...0.|
000002e0 63 05 00 00 00 00 59 8c 20 00 30 1c 68 05 00 00 |c.....Y. .0.h...|
000002f0 00 00 79 8c 20 00 30 1c 6f 05 00 00 00 00 99 8c |..y. .0.o.......|
00000300 20 00 30 1c 74 05 00 00 00 00 b9 8c 20 00 30 1c | .0.t....... .0.|
00000310 79 05 00 00 00 00 d9 8c 20 00 30 1c 7f 05 00 00 |y....... .0.....|
00000320 00 00 f9 8c 20 00 30 1c 88 05 00 00 00 00 19 8d |.... .0.........|
00000330 20 00 30 1c 90 05 00 00 00 00 39 8d 20 00 30 1c | .0.......9. .0.|
00000340 98 05 00 00 00 00 59 8d 20 00 30 1c 9e 05 00 00 |......Y. .0.....|
00000350 00 00 79 8d 20 00 30 1c a6 05 00 00 00 00 01 80 |..y. .0.........|
00000360 06 00 00 00 00 00 99 8d 20 00 30 1c 01 80 00 00 |........ .0.....|
00000370 00 00 c9 8d 20 00 30 1c 02 80 00 00 00 00 f9 8d |.... .0.........|
00000380 20 00 30 1c 03 80 00 00 00 00 29 8e 20 00 10 1c | .0.......). ...|
00000390 04 80 00 00 00 00 59 8e 20 00 10 1c 05 80 00 00 |......Y. .......|
000003a0 00 00 89 8e 20 00 30 1c 06 80 00 00 00 00 0c 80 |.... .0.........|
000003b0 06 00 00 00 00 00 b9 8d 10 00 30 1c fb ff 00 00 |..........0.....|
000003c0 00 00 e9 8d 10 00 30 1c fc ff 00 00 00 00 19 8e |......0.........|
000003d0 10 00 30 1c fd ff 00 00 00 00 49 8e 10 00 30 1c |..0.......I...0.|
000003e0 fe ff 00 00 00 00 79 8e 10 00 30 1c ff ff 00 00 |......y...0.....|
000003f0 00 00 a9 8e 10 00 30 1c fa ff 00 00 00 00 06 80 |......0.........|
00000400 11 00 00 00 00 00 b9 8e 20 00 30 1c 01 8f 00 00 |........ .0.....|
00000410 00 00 d9 8e 20 00 30 1c 02 8f 00 00 00 00 f9 8e |.... .0.........|
00000420 20 00 30 1c 03 8f 00 00 00 00 19 8f 20 00 30 1c | .0......... .0.|
00000430 04 8f 00 00 00 00 39 8f 20 00 30 1c 05 8f 00 00 |......9. .0.....|
00000440 00 00 59 8f 20 00 30 1c 06 8f 00 00 00 00 79 8f |..Y. .0.......y.|
00000450 20 00 30 1c 07 8f 00 00 00 00 99 8f 10 00 30 1c | .0...........0.|
00000460 08 8f 00 00 00 00 a9 8f 10 00 30 1c 09 8f 00 00 |..........0.....|
00000470 00 00 b9 8f 20 00 30 1c 0a 8f 00 00 00 00 d9 8f |.... .0.........|
00000480 20 00 30 1c 0b 8f 00 00 00 00 f9 8f 20 00 30 1c | .0......... .0.|
00000490 f9 8f 00 00 00 00 19 90 20 00 30 1c fa 8f 00 00 |........ .0.....|
000004a0 00 00 39 90 10 00 30 1c fb 8f 00 00 00 00 49 90 |..9...0.......I.|
000004b0 10 00 30 1c fd 8f 00 00 00 00 59 90 10 00 30 1c |..0.......Y...0.|
000004c0 fe 8f 00 00 00 00 69 90 10 00 30 1c ff 8f 00 00 |......i...0.....|
000004d0
If you look at the information provided by fileformat.info, you'll see that this is very likely an NE executable. These also start with MZ, but the rest is different.
Reading the bytes at offsets 0000000C and 0000000D, this is probably a Windows 3.x Protected Mode program. If it is made with Delphi, that can only have been Delphi 1, which did not produce PE executables, but 16 bit Windows executables instead.
I have a Macbook Pro and am getting contradictory answers when I try to determine its endian-ness.
Method 1
python -c "import sys;print sys.byteorder" tells me I am on a little endian system
Method 2
I have a text file. I used iconv to convert it into UTF16. Its supposed to detect the endianness of the computer and convert it into that format. So here I go:
iconv -f utf-8 -t utf-16 file.txt > utf16.txt
file utf16.txt
utf16.txt: Big-endian UTF-16 Unicode English text
vi utf16.txt works and hexdump -C utf16.txt shows:
00000000 fe ff 00 33 00 39 00 38 00 31 00 36 00 30 00 38 |...3.9.8.1.6.0.8|
00000010 00 09 00 54 00 69 00 61 00 20 00 4a 00 75 00 61 |...T.i.a. .J.u.a|
00000020 00 6e 00 61 00 20 00 52 00 69 00 76 00 65 00 72 |.n.a. .R.i.v.e.r|
00000030 00 09 00 54 00 69 00 61 00 20 00 4a 00 75 00 61 |...T.i.a. .J.u.a|
00000040 00 6e 00 61 00 20 00 52 00 69 00 76 00 65 00 72 |.n.a. .R.i.v.e.r|
00000050 00 09 00 52 00 69 00 6f 00 20 00 54 00 69 00 61 |...R.i.o. .T.i.a|
00000060 00 6a 00 75 00 61 00 6e 00 61 00 2c 00 52 00 69 |.j.u.a.n.a.,.R.i|
00000070 00 6f 00 20 00 54 00 69 00 6a 00 75 00 61 00 6e |.o. .T.i.j.u.a.n|
00000080 00 61 00 2c 00 52 00 ed 00 6f 00 20 00 54 00 69 |.a.,.R...o. .T.i|
00000090 00 6a 00 75 00 61 00 6e 00 61 00 2c 00 54 00 69 |.j.u.a.n.a.,.T.i|
if I convert it to little-endian and manually insert a BOM like this:
( printf "\xff\xfe" ; iconv -f utf-8 -t utf-16le file.txt ) > UTF16LEBOM.txt
file UTF16LEBOM.txt
UTF16LEBOM.txt: Little-endian UTF-16 Unicode English text
vi UTF16LEBOM.txt works
and hexdump -C UTF16LEBOM.txt shows
00000000 ff fe 33 00 39 00 38 00 31 00 36 00 30 00 38 00 |..3.9.8.1.6.0.8.|
00000010 09 00 54 00 69 00 61 00 20 00 4a 00 75 00 61 00 |..T.i.a. .J.u.a.|
00000020 6e 00 61 00 20 00 52 00 69 00 76 00 65 00 72 00 |n.a. .R.i.v.e.r.|
00000030 09 00 54 00 69 00 61 00 20 00 4a 00 75 00 61 00 |..T.i.a. .J.u.a.|
00000040 6e 00 61 00 20 00 52 00 69 00 76 00 65 00 72 00 |n.a. .R.i.v.e.r.|
00000050 09 00 52 00 69 00 6f 00 20 00 54 00 69 00 61 00 |..R.i.o. .T.i.a.|
00000060 6a 00 75 00 61 00 6e 00 61 00 2c 00 52 00 69 00 |j.u.a.n.a.,.R.i.|
00000070 6f 00 20 00 54 00 69 00 6a 00 75 00 61 00 6e 00 |o. .T.i.j.u.a.n.|
00000080 61 00 2c 00 52 00 ed 00 6f 00 20 00 54 00 69 00 |a.,.R...o. .T.i.|
00000090 6a 00 75 00 61 00 6e 00 61 00 2c 00 54 00 69 00 |j.u.a.n.a.,.T.i.|
From this link:
The other approach is to include a magic number, such as 0xFEFF,
before every piece of data. If you read the magic number and it is
0xFEFF, it means the data is in the same format as your machine, and
all is well.
If you read the magic number and it is 0xFFFE (it is backwards), it
means the data was written in a format different from your own. You'll
have to translate it.
Who is right and why am I getting contradictory answers?
"Endian-ness of my Macbook Pro" means nothing. You need to be more specific; different applications will have different impressions. As you've just seen, you can arbitrarily encode bytes in a file. In the end, a series of bytes is just that, and files are ultimately just a series of bytes that can be read in either fashion. In the context of programming (Stack Overflow) what's important is knowing a) Whether the input you are getting is in Big Endian or Little Endian, and b) Whether the output you send should be in Big Endian or Little Endian.
If your question is the conventional reading of files, the answer is usually Little Endian. But, for example, network data tends to be Big Endian.