Algorithm for compressing small files (345 Bytes) [closed]

Algorithm for compressing small files (345 Bytes) [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a software that can losslessy compress a small amount of noisy data; example:
ZmNiYWNma3F5gYqSmqCkpqenpKGdmJONiIN/e3d1c3FxcXFxcnN0dXZ3eXp7fHx9fn5/gIGDhISFhYWFhIOCgYGAgICBgoOEhYeIiYmJiYiGhIF+fHl3dHNyc3N1d3p9gIOGiYuMjY2NjIuKiIeFg4GAfnx7eXl4eHl5enx9fn5/gICAgYGBgYGBgYGBgoKDhISEhISEhIOCgoGAgH9/fn5+fn5/gICAgICAgICAgICAgICAf39/gICBgoOFhoeIiIiIiIeGhYOCgH99fHt6enp6e3x9foCAgoKDg4ODgoKBgH59fHt7e3t8fX5/gIKDhIWFhoaGhoaFhISDgoKBgQ==
Original data: 345B (100%),
gzip: 280B (81%),
bzip2: 289B (84%),
lzop: 415B (120%),
Are there any other methods I should try?

Since the data is Base64 encoded (every 3 bytes become 4 bytes), the first step would be to decode it (the compressed data will be binary anyway):
344 bytes -> 256 bytes
Then, a simple test with standard winzip here shows a compression to 170 bytes (COMP_DEFLATE block). You should get about the same with gzip/zlib.
This could probably made somewhat smaller with a higher compression factor.
Compressing the original data gives 243 bytes (data block inside the .zip file, the full zip file is 359 bytes, but you don't need all that extra data).
So, using zlib on the decoded data, should compress that to ±170 bytes.
Looking at the decoded data, an even better compression would be possible. But that depends on the other data having the same structure.
Hex dump of the decoded data (a lot of values are repeating, or only changing slightly):
66 63 62 61 63 66 6B 71 79 81 8A 92 9A A0 A4 A6
A7 A7 A4 A1 9D 98 93 8D 88 83 7F 7B 77 75 73 71
71 71 71 71 72 73 74 75 76 77 79 7A 7B 7C 7C 7D
7E 7E 7F 80 81 83 84 84 85 85 85 85 84 83 82 81
81 80 80 80 81 82 83 84 85 87 88 89 89 89 89 88
86 84 81 7E 7C 79 77 74 73 72 73 73 75 77 7A 7D
80 83 86 89 8B 8C 8D 8D 8D 8C 8B 8A 88 87 85 83
81 80 7E 7C 7B 79 79 78 78 79 79 7A 7C 7D 7E 7E
7F 80 80 80 81 81 81 81 81 81 81 81 81 82 82 83
84 84 84 84 84 84 84 83 82 82 81 80 80 7F 7F 7E
7E 7E 7E 7E 7F 80 80 80 80 80 80 80 80 80 80 80
80 80 80 80 7F 7F 7F 80 80 81 82 83 85 86 87 88
88 88 88 88 87 86 85 83 82 80 7F 7D 7C 7B 7A 7A
7A 7A 7B 7C 7D 7E 80 80 82 82 83 83 83 83 82 82
81 80 7E 7D 7C 7B 7B 7B 7B 7C 7D 7E 7F 80 82 83
84 85 85 86 86 86 86 86 85 84 84 83 82 82 81 81
Only after a quick look: it should be possible to reach about 2 to 3 bits per byte on average, resulting in 64 to 96 bytes.
A closer look at the data:
Most values don't change that much. If all data is similar to this, a high compression rate could be achieved using some custom code. For example, the differences could be stored in 1, 2, 3 or 4 bits depending on the block of data (4 bits only needed for the first data points). Another approach, instead of full custom code, would be to compress the differences (delta values) with an existing algorithm (zlib, Huffman coding, and others).
Decimal values with 2 rounds of delta-encoding:
102
99 -3
98 -1 2
97 -1 0
99 2 3
102 3 1
107 5 2
113 6 1
121 8 2
129 8 0
138 9 1
146 8 -1
154 8 0
160 6 -2
164 4 -2
166 2 -2
167 1 -1
167 0 -1
164 -3 -3
161 -3 0
157 -4 -1
152 -5 -1
147 -5 0
141 -6 -1
136 -5 1
131 -5 0
127 -4 1
123 -4 0
119 -4 0
117 -2 2
115 -2 0
113 -2 0
113 0 2
113 0 0
113 0 0
113 0 0
114 1 1
115 1 0
116 1 0
117 1 0
118 1 0
119 1 0
121 2 1
122 1 -1
123 1 0
124 1 0
124 0 -1
125 1 1
126 1 0
126 0 -1
127 1 1
128 1 0
129 1 0
131 2 1
132 1 -1
132 0 -1
133 1 1
133 0 -1
133 0 0
133 0 0
132 -1 -1
131 -1 0
130 -1 0
129 -1 0
129 0 1
128 -1 -1
128 0 1
128 0 0
129 1 1
130 1 0
131 1 0
132 1 0
133 1 0
135 2 1
136 1 -1
137 1 0
137 0 -1
137 0 0
137 0 0
136 -1 -1
134 -2 -1
132 -2 0
129 -3 -1
126 -3 0
124 -2 1
121 -3 -1
119 -2 1
116 -3 -1
115 -1 2
114 -1 0
115 1 2
115 0 -1
117 2 2
119 2 0
122 3 1
125 3 0
128 3 0
131 3 0
134 3 0
137 3 0
139 2 -1
140 1 -1
141 1 0
141 0 -1
141 0 0
140 -1 -1
139 -1 0
138 -1 0
136 -2 -1
135 -1 1
133 -2 -1
131 -2 0
129 -2 0
128 -1 1
126 -2 -1
124 -2 0
123 -1 1
121 -2 -1
121 0 2
120 -1 -1
120 0 1
121 1 1
121 0 -1
122 1 1
124 2 1
125 1 -1
126 1 0
126 0 -1
127 1 1
128 1 0
128 0 -1
128 0 0
129 1 1
129 0 -1
129 0 0
129 0 0
129 0 0
129 0 0
129 0 0
129 0 0
129 0 0
130 1 1
130 0 -1
131 1 1
132 1 0
132 0 -1
132 0 0
132 0 0
132 0 0
132 0 0
132 0 0
131 -1 -1
130 -1 0
130 0 1
129 -1 -1
128 -1 0
128 0 1
127 -1 -1
127 0 1
126 -1 -1
126 0 1
126 0 0
126 0 0
126 0 0
127 1 1
128 1 0
128 0 -1
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
128 0 0
127 -1 -1
127 0 1
127 0 0
128 1 1
128 0 -1
129 1 1
130 1 0
131 1 0
133 2 1
134 1 -1
135 1 0
136 1 0
136 0 -1
136 0 0
136 0 0
136 0 0
135 -1 -1
134 -1 0
133 -1 0
131 -2 -1
130 -1 1
128 -2 -1
127 -1 1
125 -2 -1
124 -1 1
123 -1 0
122 -1 0
122 0 1
122 0 0
122 0 0
123 1 1
124 1 0
125 1 0
126 1 0
128 2 1
128 0 -2
130 2 2
130 0 -2
131 1 1
131 0 -1
131 0 0
131 0 0
130 -1 -1
130 0 1
129 -1 -1
128 -1 0
126 -2 -1
125 -1 1
124 -1 0
123 -1 0
123 0 1
123 0 0
123 0 0
124 1 1
125 1 0
126 1 0
127 1 0
128 1 0
130 2 1
131 1 -1
132 1 0
133 1 0
133 0 -1
134 1 1
134 0 -1
134 0 0
134 0 0
134 0 0
133 -1 -1
132 -1 0
132 0 1
131 -1 -1
130 -1 0
130 0 1
129 -1 -1
129 0 1

Here is a complete set of classes that can be used to compress this data with the following approach:
First Base64 decode the string to get bytes (345 -> 256 bytes)
Then delta-encode the bytes (meaning, subtract one byte from the previous, storing the results)
Another round of delta-encoding (but 3 was worse than 2)
Then compress the deltas using Huffman compression
This approach got down to 76 bytes including the necessary overhead to decompress later on.
A full Mercurial repository with the code can be found here.
Note! The code likely contains bugs around edge-cases such as empty or near-empty inputs. A suite of unit-tests can be found in the above linked repository but more testing is likely needed in order to trust this code for production usage.
Test program:
static void Main(string[] args)
{
string inputString = "ZmNiYWNma3F5gYqSmqCkpqenpKGdmJONiIN/e3d1c3FxcXFxcnN0dXZ3eXp7fHx9fn5/gIGDhISFhYWFhIOCgYGAgICBgoOEhYeIiYmJiYiGhIF+fHl3dHNyc3N1d3p9gIOGiYuMjY2NjIuKiIeFg4GAfnx7eXl4eHl5enx9fn5/gICAgYGBgYGBgYGBgoKDhISEhISEhIOCgoGAgH9/fn5+fn5/gICAgICAgICAgICAgICAf39/gICBgoOFhoeIiIiIiIeGhYOCgH99fHt6enp6e3x9foCAgoKDg4ODgoKBgH59fHt7e3t8fX5/gIKDhIWFhoaGhoaFhISDgoKBgQ==";
byte[] original = Convert.FromBase64String(inputString);
Console.WriteLine($"Original String: {inputString.Length}");
Console.WriteLine($"Original bytes: {original.Length}");
byte[] deltaEncoded = DeltaEncoderDecoder.Encode(original, 2);
byte[] compressed = HuffmanCompression.Compress(deltaEncoded);
Console.WriteLine($"Compressed bytes: {compressed.Length}");
byte[] deltaDecoded = HuffmanCompression.Decompress(compressed);
byte[] decompressed = DeltaEncoderDecoder.Decode(deltaDecoded, 2);
Console.WriteLine($"Decompressed bytes: {decompressed.Length}");
Console.WriteLine($"Decompressed == original: {decompressed.Length == original.Length && Enumerable.Range(0, original.Length).All(index => original[index] == decompressed[index])}");
}
Output:
Original String: 344
Original bytes: 256
Compressed bytes: 76
Decompressed bytes: 256
Decompressed == original: True
Here are the necessary classes to both compress and decompress the data:
public static class HuffmanCompression
{
internal const byte CompressedSignature = 255;
internal const byte UncompressedSignature = 0;
[NotNull]
public static byte[] Compress([NotNull] byte[] input)
{
if (input == null)
throw new ArgumentNullException(nameof(input));
if (input.Length == 0)
return input;
var rootNode = GetNodesFromRawInput(input);
var bitStrings = GetBitStringsFromTree(rootNode);
var output = new MemoryStream(input.Length);
var writer = new BitStreamWriter(output);
writer.Write(CompressedSignature);
writer.Write(input.Length);
WriteNodes(writer, rootNode);
WriteStrings(writer, bitStrings, input);
writer.Flush();
if (output.Length < input.Length + 1)
return output.ToArray();
return EncodeAsUncompressed(input);
}
[NotNull]
private static byte[] EncodeAsUncompressed([NotNull] byte[] input)
{
var output = new MemoryStream();
output.WriteByte(UncompressedSignature);
output.Write(input, 0, input.Length);
return output.ToArray();
}
private static void WriteStrings([NotNull] BitStreamWriter writer, [NotNull] string[] bitStrings, [NotNull] byte[] input)
{
foreach (byte value in input)
{
Assume(bitStrings[value] != null);
foreach (char bitChar in bitStrings[value])
writer.Write(bitChar == '1');
}
}
private static void WriteNodes([NotNull] BitStreamWriter writer, [NotNull] Node node)
{
if (node.Left == null)
{
writer.Write(false);
writer.Write(node.Value);
}
else
{
Assume(node.Right != null);
writer.Write(true);
WriteNodes(writer, node.Left);
WriteNodes(writer, node.Right);
}
}
[NotNull, ItemNotNull]
private static string[] GetBitStringsFromTree([NotNull] Node node)
{
var result = new string[256];
TraverseToGetBitStringsFromTree(node, string.Empty, result);
return result;
}
private static void TraverseToGetBitStringsFromTree([NotNull] Node node, [NotNull] string prefix, [NotNull, ItemNotNull] string[] dictionary)
{
if (node.Left != null)
{
Assume(node.Right != null);
TraverseToGetBitStringsFromTree(node.Left, prefix + "0", dictionary);
TraverseToGetBitStringsFromTree(node.Right, prefix + "1", dictionary);
}
else
dictionary[node.Value] = prefix;
}
[NotNull]
private static Node GetNodesFromRawInput([NotNull] byte[] input)
{
var occurances = new int[256];
foreach (byte value in input)
occurances[value]++;
var nodes = new List<Node>(256);
for (int index = 0; index < 256; index++)
if (occurances[index] > 0)
nodes.Add(new Node
{
Occurances = occurances[index],
Value = (byte)index
});
while (nodes.Count > 1)
{
nodes.Sort((n1, n2) =>
{
Assume(n1 != null && n2 != null);
return n1.Occurances.CompareTo(n2.Occurances);
});
Assume(nodes[0] != null && nodes[1] != null);
nodes[0] = new Node
{
Left = nodes[0],
Right = nodes[1],
Occurances = nodes[0].Occurances + nodes[1].Occurances
};
nodes.RemoveAt(1);
}
Assume(nodes[0] != null);
return nodes[0];
}
[NotNull]
public static byte[] Decompress([NotNull] byte[] input)
{
if (input == null)
throw new ArgumentNullException(nameof(input));
if (input.Length == 0)
return input;
if (input[0] != CompressedSignature)
return DecodeUncompressed(input);
var reader = new BitStreamReader(new MemoryStream(input));
reader.ReadByte(); // skip signature
int length = reader.ReadInt32();
var rootNode = ReadNodes(reader);
var output = new byte[length];
for (int index = 0; index < length; index++)
output[index] = DecompressOneByte(reader, rootNode);
return output;
}
private static byte DecompressOneByte([NotNull] BitStreamReader reader, [NotNull] Node node)
{
while (node.Left != null)
{
if (reader.ReadBit())
node = node.Right;
else
node = node.Left;
Assume(node != null);
}
return node.Value;
}
[NotNull]
private static Node ReadNodes([NotNull] BitStreamReader reader)
{
if (reader.ReadBit())
return new Node
{
Left = ReadNodes(reader),
Right = ReadNodes(reader)
};
return new Node
{
Value = reader.ReadByte()
};
}
[NotNull]
private static byte[] DecodeUncompressed([NotNull] byte[] input)
{
return input.Skip(1).ToArray();
}
}
public class BitStreamReader
{
[NotNull]
private readonly MemoryStream _Source;
private byte _Buffer;
private int _InBuffer;
public BitStreamReader([NotNull] MemoryStream source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
_Source = source;
}
public bool ReadBit()
{
if (_InBuffer == 0)
FillBuffer();
return (_Buffer & BitStreamConstants.BitMasks[8 - _InBuffer--]) != 0;
}
public byte ReadByte()
{
if (_InBuffer == 8)
{
_InBuffer = 0;
return _Buffer;
}
return (byte)((ReadBit() ? 128 : 0) | (ReadBit() ? 64 : 0) | (ReadBit() ? 32 : 0) | (ReadBit() ? 16 : 0) | (ReadBit() ? 8 : 0) | (ReadBit() ? 4 : 0) | (ReadBit() ? 2 : 0) | (ReadBit() ? 1 : 0));
}
public int ReadInt32()
{
int result = 0;
if (ReadBit())
result |= ReadByte();
if (ReadBit())
result |= ReadByte() << 8;
if (ReadBit())
result |= ReadByte() << 16;
if (ReadBit())
result |= ReadByte() << 24;
return result;
}
private void FillBuffer()
{
int value = _Source.ReadByte();
if (value < 0)
throw new InvalidOperationException("Read past end of source stream");
_Buffer = (byte)value;
_InBuffer = 8;
}
}
public class BitStreamWriter
{
[NotNull]
private readonly MemoryStream _Target;
private byte _Buffer;
private int _InBuffer;
public BitStreamWriter([NotNull] MemoryStream target)
{
if (target == null)
throw new ArgumentNullException(nameof(target));
_Target = target;
}
public void Flush()
{
if (_InBuffer == 0)
return;
_Target.WriteByte(_Buffer);
_Buffer = 0;
_InBuffer = 0;
}
public void Write(bool bit)
{
unchecked
{
if (bit)
_Buffer = (byte)(_Buffer | 1 << (7 - _InBuffer));
if (++_InBuffer == 8)
Flush();
}
}
public void Write(byte value)
{
for (int index = 0; index < 8; index++)
Write((value & BitStreamConstants.BitMasks[index]) != 0);
}
public void Write(int value)
{
byte b0 = (byte)(value & 0xff);
byte b1 = (byte)((value >> 8) & 0xff);
byte b2 = (byte)((value >> 16) & 0xff);
byte b3 = (byte)((value >> 24) & 0xff);
Write(b0 != 0);
if (b0 != 0)
Write(b0);
Write(b1 != 0);
if (b1 != 0)
Write(b1);
Write(b2 != 0);
if (b2 != 0)
Write(b2);
Write(b3 != 0);
if (b3 != 0)
Write(b3);
}
}
internal static class BitStreamConstants
{
[NotNull]
public static readonly byte[] BitMasks = { 128, 64, 32, 16, 8, 4, 2, 1 };
public const byte CompressedSignature = 255;
}
public static class DeltaEncoderDecoder
{
[NotNull]
public static byte[] Encode([NotNull] byte[] input, int iterations)
{
if (input == null)
throw new ArgumentNullException(nameof(input));
var output = new byte[input.Length];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
while (iterations-- > 0)
{
byte previous = 0;
for (int index = 0; index < output.Length; index++)
{
byte current = output[index];
output[index] = (byte)(current - previous);
previous = current;
}
}
return output;
}
[NotNull]
public static byte[] Decode([NotNull] byte[] input, int iterations)
{
if (input == null)
throw new ArgumentNullException(nameof(input));
var output = new byte[input.Length];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
while (iterations-- > 0)
{
byte previous = 0;
for (int index = 0; index < output.Length; index++)
{
output[index] = (byte)(previous + output[index]);
previous = output[index];
}
}
return output;
}
}

Related

how to strip out non-human-readable character at the start of each line using Xcode

I am trying to set up Xcode to get rid of non-human readable characters in legacy text files recovered from 8” floppy disks created in 1986. The files were created in QDOS, a proprietary disk operating system using a text-based Music Composition Language application aka MCL.
I aim to write a C program to read the ascii file, character by character, filter out non-printable characters from the source file and save it to a destination file thereby making it possible to view file contents in exactly the same format a composer would have seen it in 1986.
When Xcode reads the legacy text file, the unwanted character appears as the first human readable character of every line except the first line.
!B=24:Af
* BAR 1
G2,6
* BAR 2 & 3
!G2,1/4:Bf2,1/4:C2,1/4:Ef2,1/4:F3,1/4:G3,35/4:D3:A4
"* BAR 4
#Bf4:G4,2:D3:A4:Bf4
$* BAR 5
%D4,2:C4,3:F5
&* BAR 6
'D4:Bf4:A4,2:G4:D3:?
(* BAR 7 &
A hex dump of the above text file shows the two ascii bytes $0D (Carriage Return) followed by $1C (File Separator). These two bytes plus the byte that follows immediately after them, are the characters I am trying to remove.
0000: 1C 1D 21 42 3D 32 34 3A 41 66 0A 1C 1E 2A 20 20 ¿¿!B=24:Af¬¿¿*
0010: 20 20 20 20 20 20 20 20 20 42 41 52 20 31 0A 1C BAR 1¬¿
0020: 1F 47 32 2C 36 0A 1C 20 2A 20 20 20 20 20 20 20 ¿G2,6¬¿ *
0030: 20 20 20 20 42 41 52 20 32 20 26 20 33 0A 1C 21 BAR 2 & 3¬¿!
0040: 47 32 2C 31 2F 34 3A 42 66 32 2C 31 2F 34 3A 43 G2,1/4:Bf2,1/4:C
0050: 32 2C 31 2F 34 3A 45 66 32 2C 31 2F 34 3A 46 33 2,1/4:Ef2,1/4:F3
0060: 2C 31 2F 34 3A 47 33 2C 33 35 2F 34 3A 44 33 3A ,1/4:G3,35/4:D3:
0070: 41 34 0A 1C 22 2A 20 20 20 20 20 20 20 20 20 20 A4¬¿"*
0080: 20 42 41 52 20 34 20 0A 1C 23 42 66 34 3A 47 34 BAR 4 ¬¿#Bf4:G4
0090: 2C 32 3A 44 33 3A 41 34 3A 42 66 34 0A 1C 24 2A ,2:D3:A4:Bf4¬¿$*
00A0: 20 20 20 20 20 20 20 20 20 20 20 42 41 52 20 35 BAR 5
00B0: 0A 1C 25 44 34 2C 32 3A 43 34 2C 33 3A 46 35 0A ¬¿%D4,2:C4,3:F5¬
00C0: 1C 26 2A 20 20 20 20 20 20 20 20 20 20 20 42 41 ¿&* BA
00D0: 52 20 36 0A 1C 27 44 34 3A 42 66 34 3A 41 34 2C R 6¬¿'D4:Bf4:A4,
00E0: 32 3A 47 34 3A 44 33 3A 3F 0A 1C 28 2A 20 20 20 2:G4:D3:?¬¿(*
00F0: 20 20 20 20 20 20 20 20 42 41 52 20 37 20 26 20 BAR 7 &
I created an Xcode Command Line Tool Project. When I select Type : Plain Text and Text Encoding : Unicode (UTF-8) in the Xcode Inspectors Window the same single printable character is visible. I chose those settings because my MacOS expects en_AU.UTF-8.
The C code that follows makes an identical copy of the text file without identifying individual characters. Essentially it will read old file contents and write a new file successfully. The hex dump for the output file is identical to the hex dump above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char * argv[]) {
char filename[] = {"~/Desktop/MCLRead/bell1.ss"} ;
printf("MCLRead\n\t%s\n", filename);
FILE* fin = fopen(filename, "r");
if (!fin) { perror("input error"); return 0; }
FILE* fout = fopen("output.txt", "w");
if (!fout) { perror("fout error"); return 0; }
fseek(fin, 0, SEEK_END); // go to the end of file
size_t filesize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the beginning
//allocate enough memory
char* buffer = malloc(filesize * sizeof(char));
//read one character at a time (or `fread` the whole file)
size_t i = 0;
while (1)
{
int c = fgetc(fin);
if (c == EOF) break;
//save to buffer
buffer[i++] = (char)c;
}
However when I compile, build and run this in Xcode the characters are unrecognisable regardless of the Type or Text Encoding settings in the Xcode Inspectors Window. The following error message appears in the Console Window
error: No such file or directory
Program ended with exit code: 0
When I run the same code in the Terminal Window it generates an output text file but the characters are unrecognisable
Desktop % gcc main.c
Desktop % ./a.out output.txt
Desktop % cat output.txt
cat results in a string of 128 ? characters in the Terminal Command Line - a total of 128 even though the file contains more than a thousand characters in total.
Can someone give me any clues for making this text file readable in a format that allows the non-human-readable characters to be stripped from the start of each line.
Please note, I am not asking for help to write the C code but rather what Text Format will make the unwanted 8-bit characters readable so I can remove them (a slight refinement on the question I asked initially). Any further help would be most appreciated. Thanks in advance.
Note
This post has been revised in response to comments.
The hex dump has been done as text rather than as an image. This offers the most reliable way to share the text file for anyone who wants to test what I have done

The problem can be solved easily by reading each byte as a 7-bit binary value using int not char. Source file is read in hex, saved in decimal and read as text.
Note. There is no EOF character. MCL used the word 'END' at the end of the file. Because it has been salvaged from a floppy disk image, the file sometimes has a trailing string of hex E5 characters written on the floppy disk when it was formatted. At other times where the format track is already overwritten the file has a trailing string of zeros.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CR 0x0D // ASCII Carriage Return
#define FS 0x1C // ASCII File Separator
#define FD_FORMAT 0xE5 // floppy disk format track
int main(int argc, const char * argv[])
{
char fname[20];
printf("\n Enter MCL file name : ");
scanf("%s", fname);
printf("\n\t%s\n", fname);
int a = 0; // init CR holder
int b = a; // init File Separator holder
FILE* fin = fopen(fname, "r"); // init read
if (!fin)
{ perror("input error"); return 0;
}
FILE* fout = fopen("output.txt", "w"); // init write
if (!fout)
{ perror("fout error"); return 0;
}
fseek(fin, 0, SEEK_END); // look for end of file
size_t fsize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the start
int* buffer = malloc(fsize * sizeof(int)); // allocate buffer
size_t i = 0;
while (1)
{
int c = fgetc(fin); // read one byte at a time
if (c < CR) break; // skip low control codes
if (c == FD_FORMAT) break; // skip floppy format track
printf("\t%X", a);
printf("\t%X", b);
if ((a != CR) && (b != FS)) // skip save if new line
{
printf("\t%0X\n", c);
buffer[i++] = c; // save to buffer
}
a = b;
b = c;
}
for (i = 0; i < fsize; i++) // write out int by int
fputc(buffer[i], fout);
free(buffer);
fclose(fin);
fclose(fout);
return 0;
}

Is there a better way to write HQL queries in Spring Data JPA that has many conditions?

Below I present a hypothetical scenario where the age of a user is converted to words in user resource and I want to sort the user data based on this new string i.e. eight has the highest priority in this order.
#Repository
public interface UserRepository extends JpaRepository<User, Long>, QueryByExampleExecutor<User> {
#Query(value = "SELECT DISTINCT(u) FROM user u ORDER BY " +
"CASE WHEN u.age = 8 THEN 1 WHEN u.age = 18 THEN 2 WHEN u.age = 80 THEN 3 " +
"WHEN u.age = 88 THEN 4 WHEN u.age = 85 THEN 5 WHEN u.age = 84 THEN 6 WHEN u.age = 89 THEN 7 WHEN u.age = 81 THEN 8 " +
"WHEN u.age = 87 THEN 9 WHEN u.age = 86 THEN 10 WHEN u.age = 83 THEN 11 WHEN u.age = 82 THEN 12 WHEN u.age = 11 THEN " +
"13 WHEN u.age = 15 THEN 14 WHEN u.age = 50 THEN 15 WHEN u.age = 58 THEN 16 WHEN u.age = 55 THEN 17 WHEN u.age = 54 " +
"THEN 18 WHEN u.age = 59 THEN 19 WHEN u.age = 51 THEN 20 WHEN u.age = 57 THEN 21 WHEN u.age = 56 THEN 22 WHEN u.age = " +
"53 THEN 23 WHEN u.age = 52 THEN 24 WHEN u.age = 5 THEN 25 WHEN u.age = 40 THEN 26 WHEN u.age = 48 THEN 27 WHEN u.age " +
"= 45 THEN 28 WHEN u.age = 44 THEN 29 WHEN u.age = 49 THEN 30 WHEN u.age = 41 THEN 31 WHEN u.age = 47 THEN 32 WHEN u" +
".age = 46 THEN 33 WHEN u.age = 43 THEN 34 WHEN u.age = 42 THEN 35 WHEN u.age = 4 THEN 36 WHEN u.age = 14 THEN 37 WHEN u.age = 9 THEN 38 WHEN u.age = 19 THEN 39 WHEN u.age = 90 THEN 40 WHEN u.age = 98 THEN 41 WHEN u.age = 95 THEN 42 WHEN u.age = 94 THEN 43 WHEN u.age = 99 THEN 44 WHEN u.age = 91 THEN 45 WHEN u.age = 97 THEN 46 WHEN u.age = 96 THEN 47 WHEN u.age = 93 THEN 48 WHEN u.age = 92 THEN 49 WHEN u.age = 1 THEN 50 WHEN u.age = 100 THEN 51 WHEN u.age = 7 THEN 52 WHEN u.age = 17 THEN 53 WHEN u.age = 70 THEN 54 WHEN u.age = 78 THEN 55 WHEN u.age = 75 THEN 56 WHEN u.age = 74 THEN 57 WHEN u.age = 79 THEN 58 WHEN u.age = 71 THEN 59 WHEN u.age = 77 THEN 60 WHEN u.age = 76 THEN 61 WHEN u.age = 73 THEN 62 WHEN u.age = 72 THEN 63 WHEN u.age = 6 THEN 64 WHEN u.age = 16 THEN 65 WHEN u.age = 60 THEN 66 WHEN u.age = 68 THEN 67 WHEN u.age = 65 THEN 68 WHEN u.age = 64 THEN 69 WHEN u.age = 69 THEN 70 WHEN u.age = 61 THEN 71 WHEN u.age = 67 THEN 72 WHEN u.age = 66 THEN 73 WHEN u.age = 63 THEN 74 WHEN u.age = 62 THEN 75 WHEN u.age = 10 THEN 76 WHEN u.age = 13 THEN 77 WHEN u.age = 30 THEN 78 WHEN u.age = 38 THEN 79 WHEN u.age = 35 THEN 80 WHEN u.age = 34 THEN 81 WHEN u.age = 39 THEN 82 WHEN u.age = 31 THEN 83 WHEN u.age = 37 THEN 84 WHEN u.age = 36 THEN 85 WHEN u.age = 33 THEN 86 WHEN u.age = 32 THEN 87 WHEN u.age = 3 THEN 88 WHEN u.age = 12 THEN 89 WHEN u.age = 20 THEN 90 WHEN u.age = 28 THEN 91 WHEN u.age = 25 THEN 92 WHEN u.age = 24 THEN 93 WHEN u.age = 29 THEN 94 WHEN u.age = 21 THEN 95 WHEN u.age = 27 THEN 96 WHEN u.age = 26 THEN 97 WHEN u.age = 23 THEN 98 WHEN u.age = 22 THEN 99 WHEN u.age = 2 THEN 100 ELSE 0 END ASC")
Page<User> findAllByCustomOrder(Pageable pageable);
My User class is like this -
#Entity(name = "user")
public class User {
private #Id #GeneratedValue Long id;
private final String firstName;
private final String lastName;
private final Integer age;
// standard getters, setters and constructors
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || Hibernate.getClass(this) != Hibernate.getClass(o)) return false;
User user = (User) o;
return id != null && Objects.equals(id, user.id);
}
#Override
public int hashCode() {
return getClass().hashCode();
}
}
My UserResource class is like this -
public class UserResource {
private String firstName;
private String lastName;
private String age;
// standard getters, setters and constructors
}
In my original situation I have over 2000 distinct conditions so this method is clearly not the right way to go.
An alternative solution to this might be somehow passing a custom comparator to Spring Data JPA's method so that it uses that comparator while sorting the data. But I could not find if there exists such a convenient mechanism.
Any help would be much appreciated. :pray:

Since you seem to care about database based pagination, your best bet is to use some sort of SQL-based sorting. You could move this big case expression into a database function or put the mapping into some sort of table which you join and use that column for sorting. A comparator wouldn't work because the sorting is important for pagination to work properly.

An allocation problem of people into groups getting them to meet as little as possible

I have been trying to solve this for quite a while without success.
I have not found (I searched though) theory that could help me on wikipedia.
Here is the problem.
I have a group of n players (more than 7)
I have a game (diplomacy for those who know !) that requires 7 players, one for these roles : E,F,G,I,A,R and T (countries in fact)
I want to set up a tournament (many games).
There will be n games.
(*) Every player gets into 7 different games, with different role each time
(**) Every game gets 7 different players
=> That is very easy to do.
However, when things get tough, is when you want to limit interactions between players.
What I want is any player to interact (interact = play in same game) at most with one other player.
(In other words, I want to prevent players from making such deals : "I help you in game A, you help me in game B")
So:
Question 1 : For which n is this possible ? (obviously at least 50)
Question 2 : When it is possible, how do you do it ?
Question 3 : What is the algo to minimize these interactions when it is not possible ?
For the record, I did implement a try-and-error program in python (using recursion), working quite well, but I never can get maximum intearctions between players limited to 1 (endless calculations)
thanks for any help !
PS This is no homework, it is for actually designing game tournaments :-)

I did some doodling and I think; If I understand you correctly, that you can do the following:
28 people to do the 7 roles/7 games meeting other players only once.
If the position of a person in the following games is allocated to distinct roles then no person plays the same role in the games they play.
Python
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/71143133/an-allocation-problem-of-people-into-groups-getting-them-to-meet-as-little-as-po
Created on Sat Feb 19 21:15:02 2022
#author: paddy3118
By hand to get the algo:
0 roles, 0 people for 0 interactions
1 role, 1 person
2 roles, 3 people: p0+p1, p1+p2
3 roles, 012, 134, 235 = 6 people
4 roles, 0123, 1456, 2478, 3579 = 10 people
"""
from itertools import count
def new_person():
yield from count()
def people_allocation(roles: int=4):
games, new_p = [], count()
for i in range(roles):
game = []
games.append(game)
for j in range(i):
game.append(games[j][i])
for _ in range(i, roles):
game.append(next(new_p))
return games, next(new_p)
for roles in range(8):
print(f"Roles = {roles}:")
games, people = people_allocation(roles)
print(f" Takes {people} people in the following games")
print(' ',
', '.join('+'.join(f"P{x}" for x in game) for game in games))
Output
Roles = 0:
Takes 0 people in the following games
Roles = 1:
Takes 1 people in the following games
P0
Roles = 2:
Takes 3 people in the following games
P0+P1, P1+P2
Roles = 3:
Takes 6 people in the following games
P0+P1+P2, P1+P3+P4, P2+P4+P5
Roles = 4:
Takes 10 people in the following games
P0+P1+P2+P3, P1+P4+P5+P6, P2+P5+P7+P8, P3+P6+P8+P9
Roles = 5:
Takes 15 people in the following games
P0+P1+P2+P3+P4, P1+P5+P6+P7+P8, P2+P6+P9+P10+P11, P3+P7+P10+P12+P13, P4+P8+P11+P13+P14
Roles = 6:
Takes 21 people in the following games
P0+P1+P2+P3+P4+P5, P1+P6+P7+P8+P9+P10, P2+P7+P11+P12+P13+P14, P3+P8+P12+P15+P16+P17, P4+P9+P13+P16+P18+P19, P5+P10+P14+P17+P19+P20
Roles = 7:
Takes 28 people in the following games
P0+P1+P2+P3+P4+P5+P6, P1+P7+P8+P9+P10+P11+P12, P2+P8+P13+P14+P15+P16+P17, P3+P9+P14+P18+P19+P20+P21, P4+P10+P15+P19+P22+P23+P24, P5+P11+P16+P20+P23+P25+P26, P6+P12+P17+P21+P24+P26+P27

Assuming that n ≥ 78 or so, the following simple hill climbing algorithm with periodic restarts will return a solution.
The algorithmic idea is to initialize games where each player plays each role exactly once, then drive the number of conflicts to zero (where a conflict is a player playing two roles in a single game, or two players meeting each other more than once) by choosing two random games and a random role and swapping the players involved. We restart every 107 steps because that seems to work well in practice.
Doubtless we could do a little better with constraint programming.
#include <algorithm>
#include <array>
#include <cstdlib>
#include <iostream>
#include <random>
#include <vector>
constexpr int r = 7;
int main() {
int n;
std::cin >> n;
if (n <= r * (r - 1)) {
return EXIT_FAILURE;
}
std::uniform_int_distribution<int> uniform_game(0, n - 1);
std::uniform_int_distribution<int> uniform_role(0, r - 1);
std::random_device device;
std::default_random_engine engine(device());
while (true) {
std::vector<std::array<int, r>> games(n);
for (int i = 0; i < n; i++) {
for (int j = 0; j < r; j++) {
games[i][j] = (i + j) % n;
}
}
int badness = 0;
std::vector<std::vector<int>> pair_counts(n, std::vector<int>(n, 0));
auto count = [&badness, &games, &pair_counts](int i, int j, int increment) {
for (int k = 0; k < r; k++) {
if (k == j) continue;
auto [a, b] = std::minmax(games[i][j], games[i][k]);
badness -= pair_counts[a][b] > (a != b ? 2 : 0);
pair_counts[a][b] += increment;
badness += pair_counts[a][b] > (a != b ? 2 : 0);
}
};
for (int i = 0; i < n; i++) {
for (int j = 0; j < r; j++) {
count(i, j, 1);
}
}
for (long t = 0; t < 10000000; t++) {
int i1;
int i2;
do {
i1 = uniform_game(engine);
i2 = uniform_game(engine);
} while (i1 == i2);
int j = uniform_role(engine);
auto swap_players = [&]() {
count(i2, j, -2);
count(i1, j, -2);
std::swap(games[i1][j], games[i2][j]);
count(i1, j, 2);
count(i2, j, 2);
};
int old_badness = badness;
swap_players();
if (old_badness < badness) {
swap_players();
} else if (badness < old_badness) {
std::cerr << badness << '\n';
}
if (badness <= 0) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < r; j++) {
if (j) std::cout << ' ';
std::cout << games[i][j];
}
std::cout << '\n';
}
return EXIT_SUCCESS;
}
}
}
}
Sample output:
21 38 61 75 77 2 22
70 31 75 7 15 59 69
28 52 29 73 59 23 40
61 45 16 65 35 15 55
12 72 44 45 46 14 10
57 1 3 38 11 6 49
20 6 7 26 0 74 18
54 73 67 58 6 55 75
73 77 63 36 3 0 45
37 57 55 28 34 43 7
17 46 36 66 16 7 48
74 9 24 22 17 73 15
36 50 4 69 28 65 6
59 62 12 32 24 20 51
38 8 59 17 10 19 39
6 53 21 70 13 71 56
55 33 49 59 5 27 36
15 71 33 54 43 18 29
60 36 8 40 71 51 67
19 49 9 34 45 53 60
41 26 73 21 72 35 19
14 64 42 15 57 63 62
44 11 23 27 9 16 21
2 7 68 9 63 52 54
35 18 2 3 60 64 17
29 17 50 41 31 61 57
47 10 32 25 75 28 35
30 48 64 6 32 39 44
46 22 71 35 20 31 11
43 76 41 11 47 60 14
56 2 1 33 74 10 37
51 41 39 0 33 70 34
32 5 18 31 23 76 68
65 43 53 8 27 46 73
63 44 26 5 70 24 28
62 75 66 71 44 3 41
16 69 54 13 22 41 5
58 19 52 57 36 22 70
42 25 22 44 55 56 76
4 30 35 51 76 13 9
72 13 17 62 40 77 43
53 16 10 68 50 58 64
64 47 70 55 38 40 20
50 59 14 48 1 9 71
40 63 19 76 69 1 12
24 35 69 56 68 57 27
67 34 15 72 56 66 32
68 3 46 37 25 21 59
77 54 47 19 4 44 31
76 67 38 46 52 50 33
25 15 77 1 51 26 23
3 61 40 4 53 5 74
45 51 74 29 48 68 47
75 0 57 23 30 8 72
13 27 28 14 19 67 2
9 20 72 42 65 33 3
49 58 48 20 41 37 77
22 12 37 67 39 47 53
26 42 11 61 37 36 13
1 60 5 39 29 75 46
48 40 65 10 54 34 26
23 66 60 50 42 54 24
8 74 13 63 49 32 50
27 29 58 12 7 38 42
7 4 45 24 64 25 8
71 24 30 47 2 49 16
31 28 56 60 12 48 0
10 23 62 49 67 69 61
34 21 31 30 58 62 1
66 70 27 18 61 30 25
0 37 76 64 66 29 65
69 39 43 2 26 45 58
39 65 25 74 62 11 52
5 56 20 52 14 17 30
33 68 6 77 8 12 66
11 55 51 53 18 72 63
52 32 0 43 21 42 4
18 14 34 16 73 4 38

Eventually I think I solved this problem.
I explain it here to help any other person facing such a problem.
I used two "classical" algorithms.
try-and-error to get a first configuration of low quality, with iterations that tries first those players with fewest intearctions and which are already in more games
hill-climibing to improve quality of configuration (making swaps between a conflicting and not conflicting player, or two conflicting players if all players are conflicting) and selecting randomly, keeping the result of swap only if it increases quality - quality is worst number of conflicts (2 usually) and number of occurences)
I reach the following conclusion :
always a solution above 100
never a solution below 90
Thanks for all support you provided !

how to sort the numbers 1, ... n lexicographically without converting the numbers to string?

Assume I have input n how I can print this sequence without convert the number to string?
i.e:-
n = 100
Output :- 1 10 100 11 12 13 14 15 16 17 18 19 2 20 ...
n = 15 Output :- 1 10 11 12 13 14 15 2 3 4 5 6 7 8 9
n = 20 Output :- 1 10 11 12 13 15 15 16 17 18 19 2 20 3 4 5 6 7 8 9
What is the main factor here?
My initial solution that will print the 1's followed by 0's or 2's followed by 0's
int n = 100;
for (int i = 1; i <= n; i++) {
int x = i;
while (x <= n) {
System.out.println(x);
x *= 10;
}
}

I figured out the solution:-
You can call it with initial k = 1
for example printnum(15, 1
void printnums(int n, int k) {
if (k > n) {
return;
}
for (int i = 0; i < 10; i++) {
if (k <= n) {
System.out.println(k);
k *= 10;
printnums(n, k);
k /= 10;
k++;
if (k % 10 == 0) return;
}
}
}
I don't know if there are more optimized one?

Here a condensed solution in Python based on the OP's own answer:
def genRangeLexiSorted(n, k=1):
for i in range(k, min(k+10-k%10, n+1)):
yield i
for j in genRangeLexiSorted(n, 10*i):
yield j
def printnums(n):
print(*list(genRangeLexiSorted(n)))
Then the calls
printnums(1)
printnums(9)
printnums(11)
printnums(20)
printnums(100)
give the following outputs:
1
1 2 3 4 5 6 7 8 9
1 10 11 2 3 4 5 6 7 8 9
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
1 10 100 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40 41 42 43 44 45 46 47 48 49 5 50 51 52 53 54 55 56 57 58 59 6 60 61 62 63 64 65 66 67 68 69 7 70 71 72 73 74 75 76 77 78 79 8 80 81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95 96 97 98 99

How to draw a circle in matlab with different range in X and Y axis

I would like draw a circle in matlab.
However, my data X and Y have different range axis.
I suspected that my values of X and Y have different range.
Could anyone advice me on how to improve the code?
%% Data set
fData = [ 3.6 79
1.8 54
3.333 74
2.283 62
4.533 85
2.883 55
4.7 88
3.6 85
1.95 51
4.35 85
1.833 54
3.917 84
4.2 78
1.75 47
4.7 83
2.167 52
1.75 62
4.8 84
1.6 52
4.25 79
1.8 51
1.75 47
3.45 78
3.067 69
4.533 74
3.6 83
1.967 55
4.083 76
3.85 78
4.433 79
4.3 73
4.467 77
3.367 66
4.033 80
3.833 74
2.017 52
1.867 48
4.833 80
1.833 59
4.783 90 ]
[n,dim]=size(fData);
rng(1);
idx = randsample(n,2)
X = fData(~ismember(1:n,idx),:); % Training data
Y = fData(idx,:)
for j = 1:length(Y)
c = Y(j,:);
pos = [c-r 2*r 2*r];
rectangle('Position',pos,'Curvature',[1 1])
%axis equal
end
How to make the both circle in the image become a perfect circle?
Thank you

What you need to do is set the aspect ratio you want for your data, replacing the %axis equal line in you code with
daspect([1 1 1])
should do the job, as you can see in this example obtained with your code:

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Algorithm for compressing small files (345 Bytes) [closed] - algorithm

Related

how to strip out non-human-readable character at the start of each line using Xcode

Is there a better way to write HQL queries in Spring Data JPA that has many conditions?

An allocation problem of people into groups getting them to meet as little as possible

how to sort the numbers 1, ... n lexicographically without converting the numbers to string?

How to draw a circle in matlab with different range in X and Y axis

Categories

Resources