How can two 100% identical files have different sizes?

How can two 100% identical files have different sizes? - macos

I have two 100% identical empty .sh shell script files on Mac:
encrypt.sh: 299 bytes
decrypt.sh: 13 bytes (Actually this size is correct, since I have 13 bytes: 11 character + two new line)
The contents of encrypt.sh and its hexdump:
The contents of decrypt.sh and its hexdump:
The file info window of encrypt.sh:
The file info window of decrypt.sh:
They have the exact same hexdump, then how is it possible that they have different sizes?

Mac OS X file system is implementing forks, so the larger one is likely having something specific stored in its resource fork.
Use ls -l# to get more details.

Related

Copying file in Windows 10 changes its size

I copied a large file to a new directory in Windows 10 by dragging the file from Explorer to a folder in Eclipse. The file size of the copied file changed even though fc shows the original and new files as identical. The original file has a size of 209,715,200 bytes (200 MiB):
c:\>dir c:\Users\GeoffAlexander\Documents\Python\200MiB.txt
Volume in drive C is Windows
Volume Serial Number is 0447-709A
Directory of c:\Users\GeoffAlexander\Documents\Python
08/13/2019 09:42 AM 209,715,200 200MiB.txt
1 File(s) 209,715,200 bytes
0 Dir(s) 268,331,835,392 bytes free
The new file has a size of 211,812,352 bytes:
c:\>dir c:\Users\GeoffAlexander\Desktop\200MiB.txt
Volume in drive C is Windows
Volume Serial Number is 0447-709A
Directory of c:\Users\GeoffAlexander\Desktop
08/15/2019 09:11 AM 211,812,352 200MiB.txt
1 File(s) 211,812,352 bytes
0 Dir(s) 268,232,798,208 bytes free
The fc command shows the files as being identical:
c:\>fc c:\Users\GeoffAlexander\Documents\Python\200MiB.txt c:\Users\GeoffAlexander\Desktop\200MiB.txt
Comparing files C:\USERS\GEOFFALEXANDER\DOCUMENTS\PYTHON\200MiB.txt and C:\USERS\GEOFFALEXANDER\DESKTOP\200MIB.TXT
FC: no differences encountered
Why does the copied file get a new size? How can two files with different sizes be identical? Is Windows 10 incorrectly reporting the size of the new file?
I'm running Windows 10 Enterprise Build 1809 (OS Build 17763.615) if that makes any difference.

It turns out the file size change wasn't due to the copying of the file. Rather the file size change occurred when checking in the file to RTC (Rational Team Concert). The RTC check in was converting existing LF line delimiters into CRLF line delimiters (Windows line delimiters). See RTC
File content types and line delimiters for details.

macOS size command shows a really large number?

> size /bin/ls
__TEXT __DATA __OBJC others dec hex
20480 4096 0 4294983680 4295008256 10000a000
How could it be that ls is 4GB? Is size not meant to be used on executables? I have 4GB ram, so is it just showing me the amount memory it can use?

On macOS, 64-bit apps have a 4GB page zero, by default. Page zero is chunk of the address space starting at address 0 which allows no access. This is what causes access violations when a program dereferences a null pointer.
64-bit Mac programs use a 4GB page zero so that, should any valid pointer get accidentally truncated to 32 bits by a program bug (e.g. cast to int and back to a pointer), it will be invalid and cause a crash as soon as possible. That helps to find and fix such bugs.
The page zero segment in the Mach-O executable file doesn't actually use 4GB on disk. It's just a bit of metadata that tells the kernel and dynamic loader how much address space to reserve for it. It seems that size is including the virtual size of all segments, regardless of whether they take up space on disk or not.
Also, the page zero doesn't consume actual RAM when the program is loaded, either. Again, there's just some bookkeeping data to track the fact that the lower 4GB of the address space is reserved.
The size being reported for "others", 4294983680 bytes, is 0x100004000 in hex. That's the 4GB page zero (0x100000000) plus another 4 pages for some other segments.
You can use the -m option to size to get more detail:
$ size -m /bin/ls
Segment __PAGEZERO: 4294967296
Segment __TEXT: 20480
Section __text: 13599
Section __stubs: 456
Section __stub_helper: 776
Section __const: 504
Section __cstring: 1150
Section __unwind_info: 148
total 16633
Segment __DATA: 4096
Section __got: 40
Section __nl_symbol_ptr: 16
Section __la_symbol_ptr: 608
Section __const: 552
Section __data: 40
Section __bss: 224
Section __common: 140
total 1620
Segment __LINKEDIT: 16384
total 4295008256
You can also use the command otool -lV /bin/ls to see the loader commands of the executable, including the one establishing the __PAGEZERO segment.

The size command outputs information related to some binary executable and how it is running. It is not about the file. The 4Gb number might be (that is just my guess) related to the virtual address space needed to run it.
I don't have a MacOSX operating system (because it is proprietary and tied to hardware that I dislike and find too expensive). But on Linux (which is mostly POSIX, like MacOSX), size /bin/ls gives:
text data bss dec hex filename
124847 4672 4824 134343 20cc7 /bin/ls
while ls -l /bin/ls shows
-rwxr-xr-x 1 root root 138856 Feb 28 16:30 /bin/ls
Of course, when ls is running, it has some data (notably bss) which is not corresponding to a part of the executable
Try man size on your system to get an explanation. For Linux, see size(1) (it gives info about sections of an ELF executable) and ls(1) (it gives the file size).
On MacOSX, executables follow the Mach-O format.
On Linux, if you try size on a non-executable file such as /etc/passwd, you get
size: /etc/passwd: file format not recognized
and I guess that you should have some error message on MacOSX if you try that.
Think of size giving executable size information. The name is historical and a bit misleading.

Mainframe pkunzip generates PEX013W Record(s) being truncated to lrecl=

I'm sending binary .gz files from Linux to z/OS via ftps. The file transfers seem to be fine, but when the mainframe folks pkunzip the file, they get a warning:
PEX013W Record(s) being truncated to lrecl= 996. Record# 1 is 1000 bytes.
Currently I’m sending the site commands:
SITE TRAIL
200 SITE command was accepted
SITE CYLINDERS PRIMARY=50 SECONDARY=50
200 SITE command was accepted
SITE RECFM=VB LRECL=1000 BLKSIZE=32000
200 SITE command was accepted
SITE CONDDISP=delete
200 SITE command was accepted
TYPE I
200 Representation type is Image
...
250 Transfer completed successfully.
QUIT
221 Quit command received. Goodbye.
They could read the file after the pkunzip, but having a warning is not a good thing.
Output from pkunzip:
SDSF OUTPUT DISPLAY RMD0063A JOB22093 DSID 103 LINE 25 COLUMNS 02- 81
COMMAND INPUT ===> SCROLL ===> CSR
PCM123I Authorized services are unavailable.
PAM030I INPUT Archive opened: TEST.FTP.SOA5021.GZ
PAM560I ARCHIVE FASTSEEK processing is disabled.
PDA000I DDNAME=SYS00001,DISP_STATUS=MOD,DISP_NORMAL=CATALOG,DISP_ABNORMAL=
PDA000I SPACE_TYPE=TRK,SPACE_TYPE=CYL,SPACE_TYPE=BLK
PDA000I SPACE_PRIMARY=4194304,SPACE_DIRBLKS=5767182,INFO_ALCFMT=00
PDA000I VOLUMES=DPPT71,INFO_CNTL=,INFO_STORCLASS=,INFO_MGMTCLASS=
PDA000I INFO_DATACLASS=,INFO_VSAMRECORG=00,INFO_VSAMKEYOFF=0
PDA000I INFO_COPYDD=,INFO_COPYMDL=,INFO_AVGRECU=00,INFO_DSTYPE=00
PEX013W Record(s) being truncated to lrecl= 996. Record# 1 is 1000 bytes.
PEX002I TEST.FTP.SOA5021
PEX003I Extracted to TEST.FTP.SOA5021I.TXT
PAM140I FILES: EXTRACTED EXCLUDED BYPASSED IN ERROR
PAM140I 1 0 0 0
PMT002I PKUNZIP processing complete. RC=00000004 4(Dec) Start: 12:59:48.86 End
Is there a better set of site commands to transfer a .gz file from Linux to z/OS to avoid this error?
**** Update ****
Using SaggingRufus's answer below, it turns out it doesn't much matter how you send the .gz file, as long as it's binary. His suggestion pointed us to the parameters sent to the pkunzip for the output file, which was VB and was truncating 4 bytes off the record.

Because it is a variable block file, there are 4 bytes allocated to the record attributes. Allocate the file with an LRECL of 1004 and it will be fine.

Rather than generating a .zip file, perhaps generate a .tar.gz file and transfer it to z/OS UNIX? Tar is shipped with z/OS by default, and Rocket Software provides a port of gzip that is optimized for z/OS.

finding size of a file using ls and du .what is difference [duplicate]

This question already has answers here:
Size() vs ls -la vs du -h which one is correct size?
(3 answers)
Closed 8 years ago.
There is a file named today.log in my server.
ls -l today.log showing 400GB.
du -sh today.log. showing 240GB
What is the difference between ls and du ...

du shows how much disk the file uses. ls shows how big the file is. These two values can be different. Files with holes can take up less space than their size. Most files do not completely fill the blocks of the filesystem, so they take up more space than their size. A file with a single byte still takes up at least one full block. (512 or 1024 bytes, typically.) As an examle, consider a file with a single byte at position 183738475 (randomly typed numbers). That file can be stored on disk using a single block (whenever the kernel queries the filesystem for bytes other than the single byte in the file, the filesystem reports them as being zero, and there is no need to store anything. Not all filesystems work this way.) But the size of the file is 183738475, so ls will report that and du will report how many blocks are used by the filesystem. du -h will report the number of blocks used times the block size converted to a human readable format. Keep in mind that the actual numbers will vary depending on your filesystem. For example:
$ echo > foo; ls -l foo |awk '{print $5}'; du foo; du -h foo
1
8 foo
4.0K foo
This file is one byte in size but consumes 8 blocks on disk, and the block size is 512 so those 8 blocks consume 4k. (My filesystem has been optimized for large files, and small files waste a lot of space.)

How to determine compression method of a ZIP/RAR file

I have a few zip and rar files that I'm working with, and I'm trying to analyze the properties of how each file was compressed (compression level, compression algorithm (e.g. deflate, LZMA, BZip2), dictionary size, word size, etc.), and I haven't figured out a way to do this yet.
Is there any way to analyze the files to determine these properties, with software or otherwise?
Cheers and thanks!

This is a fairly old question, but I wanted to throw in my two cents anyway since some of the methods above weren't as easy for me to use.
You can also determine this with 7-Zip. After opening the archive there is a column for method of compression:

For ZIP - yes, zipinfo
For RAR, the headers are easily found with either 7Zip or WinRAR, read the attached documentation

Via 7-Zip (or p7zip) command line:
7z l -slt archive.file
If looking specifically for the compression method:
7z l -slt archive.file | grep -e '^---' -e '^Path =' -e '^Method ='

I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:
python C:\Python27\Scripts\hachoir-wx
It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.
For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:
HEAD_FLAGS Bit flags: 2 bytes
0x10 - information from previous files is used (solid flag)
bits 7 6 5 (for RAR 2.0 and later)
0 0 0 - dictionary size 64 KB
0 0 1 - dictionary size 128 KB
0 1 0 - dictionary size 256 KB
0 1 1 - dictionary size 512 KB
1 0 0 - dictionary size 1024 KB
1 0 1 - dictionary size 2048 KB
1 1 0 - dictionary size 4096 KB
1 1 1 - file is directory
Dictionary size can be found in the WinRAR GUI too.
METHOD Packing method 1 byte
0x30 - storing
0x31 - fastest compression
0x32 - fast compression
0x33 - normal compression
0x34 - good compression
0x35 - best compression
And Wikipedia also knows this:
The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.
For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:
general purpose bit flag: (2 bytes)
compression method: (2 bytes)

For the ZIP files, there is a command zipinfo.

The zipfile python module can be used to get info about the zipfile.
The ZipInfo class provides information like filename, compress_type, compress_size, file_size etc...
Python snippet to get filename and the compress type of files in a zip archive
import zipfile
with zipfile.ZipFile(path_to_zipfile, 'r') as zip:
for info in zip.infolist():
print(f'filename: {info.filename}')
print(f'compress type: {info.compress_type}')
This would list all the filenames and their corresponding compression type(integer), which can be used to look up the compression method.
You can get a lot more info about the files using infolist().
The python module linked in the accepted answer is not available, zipfile module might help

The type is easy, just look at the file headers (PK and Rar).
As for the rest, I doubt that information is available in the compressed content.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio