Simple encryption algorithm for homework. not getting decryption working properly - algorithm

This is a homework question that I can't get my head around at all
Its a very simple encryption algorithm. You start with a string of characters as your alphabet:
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!, .
Then ask the user to enter there own string that will act as a map such as:
0987654321! .,POIUYTREWQASDFGHJKLMNBVCXZ
Then the program uses this to make a map and allows you to enter text that gets encrypted.
For example MY NAME IS JOSEPH would be encrypted as .AX,0.6X2YX1PY6O3
This is all very easy, however he said that its a one to one mapping and thus implied that if I enter .AX,0.6X2YX1PY6O3 back into the program I will get out MY NAME IS JOSEPH
This doesn't happen, because .AX,0.6X2YX1PY6O3 becomes Z0QCDZQGAQFOALDH
The mapping only works to decrypt when you go backwards but the question implies that the program just loops and runs the one algorithm every time.
Even if some could say that it is possible I would be happy, I have pages and pages of paper filled up with possible workings, but I came up with nothing, the only solution to run the algorithm backwards back I don't think we are allowed to do that.
Any ideas?
Edit:
Unfortunately I can't get this to work (Using the orbit computation idea) What am I doing wrong?
//import scanner class
import java.util.Scanner;
public class Encryption {
static Scanner inputString = new Scanner(System.in);
//define alphabet
private static String alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!, .";
private static String map;
private static int[] encryptionMap = new int[40];//mapping int array
private static boolean exit = false;
private static boolean valid = true;
public static void main(String[] args) {
String encrypt, userInput;
userInput = new String();
System.out.println("This program takes a large reordered string");
System.out.println("and uses it to encrypt your data");
System.out.println("Please enter a mapping string of 40 length and the same characters as below but in different order:");
System.out.println(alpha);
//getMap();//don't get user input for map, for testing!
map=".ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!, ";//forced input for testing only!
do{
if (valid == true){
System.out.println("Enter Q to quit, otherwise enter a string:");
userInput = getInput();
if (userInput.charAt(0) != 'Q' ){//&& userInput.length()<2){
encrypt = encrypt(userInput);
for (int x=0; x<39; x++){//here I am trying to get the orbit computation going
encrypt = encrypt(encrypt);
}
System.out.println("You entered: "+userInput);
System.out.println("Encrypted Version: "+encrypt);
}else if (userInput.charAt(0) == 'Q'){//&& userInput.length()<2){
exit = true;
}
}
else if (valid == false){
System.out.println("Error, your string for mapping is incorrect");
valid = true;//reset condition to repeat
}
}while(exit == false);
System.out.println("Good bye");
}
static String encrypt(String userInput){
//use mapping array to encypt data
String encrypt;
StringBuffer tmp = new StringBuffer();
char current;
int alphaPosition;
int temp;
//run through the user string
for (int x=0; x<userInput.length(); x++){
//get character
current = userInput.charAt(x);
//get location of current character in alphabet
alphaPosition = alpha.indexOf(current);
//encryptionMap.charAt(alphaPosition)
tmp.append(map.charAt(alphaPosition));
}
encrypt = tmp.toString();
return(encrypt);
}
static void getMap(){
//get a mapping string and validate from the user
map = getInput();
//validate code
if (map.length() != 40){
valid = false;
}
else{
for (int x=0; x<40; x++){
if (map.indexOf(alpha.charAt(x)) == -1){
valid = false;
}
}
}
if (valid == true){
for (int x=0; x<40; x++){
int a = (int)(alpha.charAt(x));
int y = (int)( map.charAt(x));
//create encryption map
encryptionMap[x]=(a-y);
}
}
}
static String getInput(){
//get input(this repeats)
String input = inputString.nextLine();
input = input.toUpperCase();
if ("QUIT".equals(input) || "END".equals(input) || "NO".equals(input) || "N".equals(input)){
StringBuffer tmp = new StringBuffer();
tmp.append('Q');
input = tmp.toString();
}
return(input);
}
}

You will (probably) not get your original string back if you apply that substitution again. I say probably because you can construct such inputs (they all do things like if A->B then B->A). But most inputs won't do that. You would have to construct the reverse map to decrypt.
However, there is a trick you can do if you're only allowed to go forward. Keep applying the mapping and you'll eventually return to your original input. The number of times you'll have to do that depends on your input. To figure out how many times, compute the orbit of each character, and take the least common multiple of all the orbit sizes. For your input the orbits are size 1 (T->T, W->W), 2 (B->9->B H->3->H U->R->U P->O->P), 4 (C->8->N->,->C), 9 (A->...->Y->A), and 17 (E->...->V->E). The LCM of all those is 612, so 611 forward mappings applied to the ciphertext will return you to the plaintext.

Well, you can get your string back this way only if you do reverse mapping. One to one mapping means that a single letter of your default alphabet maps to only one letter of your new alphabet and vice versa. I.e. you can't map ABCD to ABBA. It doesn't imply that you can get your initial string by doing a second round of encryption.
The thing you have described can be achieved if you use a finite alphabet and a displacement to encode your string. You can choose the displacement in such a way that after a number of rounds of encryption totalDisplacement mod alphabetSize == 0 Than you will get your string back going only forward.

Related

JAVA (I've tried reversing two String using for loop

(1) Can anyone tell me how do I declare array org[] and rev[] where I'm storing my two strings the original and reverse string?
(2) When I'm trying to store the characters of original String in org[] using charAt() they are throwing me an error.
Can anyone help me out on reversing two strings without making use of reverse() instead making use of for loop ?
String a = "12345";
char [] original= a.toCharArray();
char[] reverse = new char[a.length()];
int j =0;
for(int i=reverse.length-1; i>=0; i--) {
reverse[j] = original[i];
j++;
}
System.out.println(new String(reverse));
Here's a quick method you can use (must go inside your class):
public static void getCharacters(String input, char[] forward, char[] reverse)
{
for(int i = 0; i < input.length(); i++)
{
forward[i] = input.charAt(i);
reverse[i] = input.charAt(input.length() - 1 - i);
}
}
This assumes that forward and reverse were already allocated, and were initialized to the length of the string input. Example usage from inside another method:
String input = "Hello, world!";
char[] forward = new char[input.length()];
char[] reverse = new char[input.length()];
getCharacters(input, forward, reverse);
// forward and reverse now contain the characters from input

splitting up the contents of a single line

I just went through a problem, where input is a string which is a single word.
This line is not readable,
Like, I want to leave is written as Iwanttoleave.
The problem is of separating out each of the tokens(words, numbers, abbreviations, etc)
I have no idea where to start
The first thought that came to my mind is making a dictionary and then mapping accordingly but I think making a dictionary is not at all a good idea.
Can anyone suggest some algorithm to do it ?
First of all, create a dictionary which helps you to identify if some string is a valid word or not.
bool isValidString(String s){
if(dictionary.contains(s))
return true;
return false;
}
Now, you can write a recursive code to split the string and create an array of actually useful words.
ArrayList usefulWords = new ArrayList<String>; //global declaration
void split(String s){
int l = s.length();
int i,j;
for(i = l-1; i >= 0; i--){
if(isValidString(s.substr(i,l)){ //s.substr(i,l) will return substring starting from index `i` and ending at `l-1`
usefulWords.add(s.substr(i,l));
split(s.substr(0,i));
}
}
}
Now, use these usefulWords to generate all possible strings. Maybe something like this:
ArrayList<String> splits = new ArrayList<String>[10]; //assuming max 10 possible outputs
ArrayList<String>[] allPossibleStrings(String s, int level){
for(int i = 0; i < s.length(); i++){
if(usefulWords.contains(s.substr(0,i)){
splits[level].add(s.substr(0,i));
allPossibleStrings(s.substr(i,s.length()),level);
level++;
}
}
}
Now, this code gives you all possible splits in a somewhat arbitrary manner. eg.
dictionary = {cat, dog, i, am, pro, gram, program, programmer, grammer}
input:
string = program
output:
splits[0] = {pro, gram}
splits[1] = {program}
input:
string = iamprogram
output:
splits[0] = {i, am, pro, gram} //since `mer` is not in dictionary
splits[1] = {program}
I did not give much thought to the last part, but I think you should be able to formulate a code from there as per your requirement.
Also, since no language is tagged, I've taken the liberty of writing the code in JAVA-like syntax as it is really easy to understand.
Instead of using a Dictionary, I'd suggest you use a Trie with all your valid words (the whole English dictionary?). Then you can start moving one letter at a time in your input line and the trie at the same time. If the letter leads to more results in the trie, you can continue expanding the current word, and if not, you can start looking for a new word in the trie.
This won't be a forward only search for sure, so you'll need some sort of backtracking.
// This method Generates a list with all the matching phrases for the given input
List<string> CandidatePhrases(string input) {
Trie validWords = BuildTheTrieWithAllValidWords();
List<string> currentWords = new List<string>();
List<string> possiblePhrases = new List<string>();
// The root of the trie has an empty key that points to all the first letters of all words
Trie currentWord = validWords;
int currentLetter = -1;
// Calls a backtracking method that creates all possible phrases
FindPossiblePhrases(input, validWords, currentWords, currentWord, currentLetter, possiblePhrases);
return possiblePhrases;
}
// The Trie structure could be something like
class Trie {
char key;
bool valid;
List<Trie> children;
Trie parent;
Trie Next(char nextLetter) {
return children.FirstOrDefault(c => c.key == nextLetter);
}
string WholeWord() {
Debug.Assert(valid);
string word = "";
Trie current = this;
while (current.Key != '\0')
{
word = current.Key + word;
current = current.parent;
}
}
}
void FindPossiblePhrases(string input, Trie validWords, List<string> currentWords, Trie currentWord, int currentLetter, List<string> possiblePhrases) {
if (currentLetter == input.Length - 1) {
if (currentWord.valid) {
string phrase = ""
foreach (string word in currentWords) {
phrase += word;
phrase += " ";
}
phrase += currentWord.WholeWord();
possiblePhrases.Add(phrase);
}
}
else {
// The currentWord may be a valid word. If that's the case, the next letter could be the first of a new word, or could be the next letter of a bigger word that begins with currentWord
if (currentWord.valid) {
// Try to match phrases when the currentWord is a valid word
currentWords.Add(currentWord.WholeWord());
FindPossiblePhrases(input, validWords, currentWords, validWords, currentLetter, possiblePhrases);
currentWords.RemoveAt(currentWords.Length - 1);
}
// If either the currentWord is a valid word, or not, try to match a longer word that begins with current word
int nextLetter = currentLetter + 1;
Trie nextWord = currentWord.Next(input[nextLetter]);
// If the nextWord is null, there was no matching word that begins with currentWord and has input[nextLetter] as the following letter.
if (nextWord != null) {
FindPossiblePhrases(input, validWords, currentWords, nextWord, nextLetter, possiblePhrases);
}
}
}

Creating a unique filename from a list of alphanumeric strings

I apologize for creating a similar thread to many that are out there now, but I mainly wanted to also get some insight on some methods.
I have a list of Strings (could be just 1 or over a 1000)
Format = XXX-XXXXX-XX where each one is alphanumeric
I am trying to generate a unique string (currently 18 in length but probably could be longer ensuring not to maximize file length or path length) that I could reproduce if I have that same list. Order doesn't matter; although I may be interested if its easier to restrict the order as well.
My current Java code is follows (which failed today, hence why I am here):
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName = "";
long partNum = 0;
for (String sGPN : alInput) {
sGPN = sGPN.replaceAll("-", ""); //remove dashes
partNum += Long.parseLong(sGPN, 36); //(base 36)
}
sFileName = Long.toString(partNum);
if (sFileName.length() > 19) {
sFileName.substring(0, 18); //Max length of 19
}
return alInput;
}
So obviously just adding them did not work out so well I found out (also think I should take last 18 digits and not first 18)
Are there any good methods out there (possibly CRC related) that would work?
To assist with my key creation:
The first 3 characters are almost always numeric and would probably have many duplicate (out of 100, there may only be 10 different starting numbers)
These characters are not allowed - I,O
There will never be a character then a number in the last two alphachar subset.
I would use the system time. Here's how you might do it in Java:
public String createOutputFileName() {
long mills = System.currentTimeMillis();
long nanos = System.nanoTime();
return mills + " " + nanos;
}
If you want to add some information about the items and their part numbers, you can, of course!
======== EDIT: "What do I mean by batch object" =========
class Batch {
ArrayList<Item> itemsToProcess;
String inputFilename; // input to external process
boolean processingFinished;
public Batch(ArrayList<Item> itemsToProcess) {
this.itemsToProcess = itemsToProcess;
inputFilename = null;
processingFinished = false;
}
public void processWithExternal() {
if(inputFilename != null || processingFinished) {
throw new IllegalStateException("Cannot initiate process more than once!");
}
String base = System.currentTimeMillis() + " " + System.nanoTime();
this.inputFilename = base + "_input";
writeItemsToFile();
// however you build your process, do it here
Process p = new ProcessBuilder("myProcess","myargs", inputFilename);
p.start();
p.waitFor();
processingFinished = true;
}
private void writeItemsToFile() {
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(inputFilename)));
int flushcount = 0;
for(Item item : itemsToProcess) {
String output = item.getFileRepresentation();
out.println(output);
if(++flushcount % 10 == 0) out.flush();
}
out.flush();
out.close();
}
}
In addition to GlowCoder's response, I have thought of another "decent one" that would work.
Instead of just adding the list in base 36, I would do two separate things to the same list.
In this case, since there is no way for negative or decimal numbers, adding every number and multiplying every number separately and concatenating these base36 number strings isn't a bad way either.
In my case, I would take the last nine digits of the added number and last nine of the multiplied number. This would eliminate my previous errors and make it quite robust. It obviously is still possible for errors once overflow starts occurring, but could also work in this case. Extending the allowable string length would make it more robust as well.
Sample code:
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName1 = "";
String sFileName2 = "";
long partNum1 = 0; // Starting point for addition
long partNum2 = 1; // Starting point for multiplication
for (String sGPN : alInput) {
//remove dashes
sGPN = sGPN.replaceAll("-", "");
partNum1 += Long.parseLong(sGPN, 36); //(base 36)
partNum2 *= Long.parseLong(sGPN, 36); //(base 36)
}
// Initial strings
sFileName1 = "000000000" + Long.toString(partNum1, 36); // base 36
sFileName2 = "000000000" + Long.toString(partNum2, 36); // base 36
// Cropped strings
sFileName1 = sFileName1.substring(sFileName1.length()-9, sFileName1.length());
sFileName2 = sFileName2.substring(sFileName2.length()-9, sFileName2.length());
return sFileName1 + sFileName2;
}

Lossless compression in small blocks with precomputed dictionary

I have an application where I am reading and writing small blocks of data (a few hundred bytes) hundreds of millions of times. I'd like to generate a compression dictionary based on an example data file and use that dictionary forever as I read and write the small blocks. I'm leaning toward the LZW compression algorithm. The Wikipedia page (http://en.wikipedia.org/wiki/Lempel-Ziv-Welch) lists pseudocode for compression and decompression. It looks fairly straightforward to modify it such that the dictionary creation is a separate block of code. So I have two questions:
Am I on the right track or is there a better way?
Why does the LZW algorithm add to the dictionary during the decompression step? Can I omit that, or would I lose efficiency in my dictionary?
Thanks.
Update: Now I'm thinking the ideal case be to find a library that lets me store the dictionary separate from the compressed data. Does anything like that exist?
Update: I ended up taking the code at http://www.enusbaum.com/blog/2009/05/22/example-huffman-compression-routine-in-c and adapting it. I am Chris in the comments on that page. I emailed my mods back to that blog author, but I haven't heard back yet. The compression rates I'm seeing with that code are not at all impressive. Maybe that is due to the 8-bit tree size.
Update: I converted it to 16 bits and the compression is better. It's also much faster than the original code.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace Book.Core
{
public class Huffman16
{
private readonly double log2 = Math.Log(2);
private List<Node> HuffmanTree = new List<Node>();
internal class Node
{
public long Frequency { get; set; }
public byte Uncoded0 { get; set; }
public byte Uncoded1 { get; set; }
public uint Coded { get; set; }
public int CodeLength { get; set; }
public Node Left { get; set; }
public Node Right { get; set; }
public bool IsLeaf
{
get { return Left == null; }
}
public override string ToString()
{
var coded = "00000000" + Convert.ToString(Coded, 2);
return string.Format("Uncoded={0}, Coded={1}, Frequency={2}", (Uncoded1 << 8) | Uncoded0, coded.Substring(coded.Length - CodeLength), Frequency);
}
}
public Huffman16(long[] frequencies)
{
if (frequencies.Length != ushort.MaxValue + 1)
{
throw new ArgumentException("frequencies.Length must equal " + ushort.MaxValue + 1);
}
BuildTree(frequencies);
EncodeTree(HuffmanTree[HuffmanTree.Count - 1], 0, 0);
}
public static long[] GetFrequencies(byte[] sampleData, bool safe)
{
if (sampleData.Length % 2 != 0)
{
throw new ArgumentException("sampleData.Length must be a multiple of 2.");
}
var histogram = new long[ushort.MaxValue + 1];
if (safe)
{
for (int i = 0; i <= ushort.MaxValue; i++)
{
histogram[i] = 1;
}
}
for (int i = 0; i < sampleData.Length; i += 2)
{
histogram[(sampleData[i] << 8) | sampleData[i + 1]] += 1000;
}
return histogram;
}
public byte[] Encode(byte[] plainData)
{
if (plainData.Length % 2 != 0)
{
throw new ArgumentException("plainData.Length must be a multiple of 2.");
}
Int64 iBuffer = 0;
int iBufferCount = 0;
using (MemoryStream msEncodedOutput = new MemoryStream())
{
//Write Final Output Size 1st
msEncodedOutput.Write(BitConverter.GetBytes(plainData.Length), 0, 4);
//Begin Writing Encoded Data Stream
iBuffer = 0;
iBufferCount = 0;
for (int i = 0; i < plainData.Length; i += 2)
{
Node FoundLeaf = HuffmanTree[(plainData[i] << 8) | plainData[i + 1]];
//How many bits are we adding?
iBufferCount += FoundLeaf.CodeLength;
//Shift the buffer
iBuffer = (iBuffer << FoundLeaf.CodeLength) | FoundLeaf.Coded;
//Are there at least 8 bits in the buffer?
while (iBufferCount > 7)
{
//Write to output
int iBufferOutput = (int)(iBuffer >> (iBufferCount - 8));
msEncodedOutput.WriteByte((byte)iBufferOutput);
iBufferCount = iBufferCount - 8;
iBufferOutput <<= iBufferCount;
iBuffer ^= iBufferOutput;
}
}
//Write remaining bits in buffer
if (iBufferCount > 0)
{
iBuffer = iBuffer << (8 - iBufferCount);
msEncodedOutput.WriteByte((byte)iBuffer);
}
return msEncodedOutput.ToArray();
}
}
public byte[] Decode(byte[] bInput)
{
long iInputBuffer = 0;
int iBytesWritten = 0;
//Establish Output Buffer to write unencoded data to
byte[] bDecodedOutput = new byte[BitConverter.ToInt32(bInput, 0)];
var current = HuffmanTree[HuffmanTree.Count - 1];
//Begin Looping through Input and Decoding
iInputBuffer = 0;
for (int i = 4; i < bInput.Length; i++)
{
iInputBuffer = bInput[i];
for (int bit = 0; bit < 8; bit++)
{
if ((iInputBuffer & 128) == 0)
{
current = current.Left;
}
else
{
current = current.Right;
}
if (current.IsLeaf)
{
bDecodedOutput[iBytesWritten++] = current.Uncoded1;
bDecodedOutput[iBytesWritten++] = current.Uncoded0;
if (iBytesWritten == bDecodedOutput.Length)
{
return bDecodedOutput;
}
current = HuffmanTree[HuffmanTree.Count - 1];
}
iInputBuffer <<= 1;
}
}
throw new Exception();
}
private static void EncodeTree(Node node, int depth, uint value)
{
if (node != null)
{
if (node.IsLeaf)
{
node.CodeLength = depth;
node.Coded = value;
}
else
{
depth++;
value <<= 1;
EncodeTree(node.Left, depth, value);
EncodeTree(node.Right, depth, value | 1);
}
}
}
private void BuildTree(long[] frequencies)
{
var tiny = 0.1 / ushort.MaxValue;
var fraction = 0.0;
SortedDictionary<double, Node> trees = new SortedDictionary<double, Node>();
for (int i = 0; i <= ushort.MaxValue; i++)
{
var leaf = new Node()
{
Uncoded1 = (byte)(i >> 8),
Uncoded0 = (byte)(i & 255),
Frequency = frequencies[i]
};
HuffmanTree.Add(leaf);
if (leaf.Frequency > 0)
{
trees.Add(leaf.Frequency + (fraction += tiny), leaf);
}
}
while (trees.Count > 1)
{
var e = trees.GetEnumerator();
e.MoveNext();
var first = e.Current;
e.MoveNext();
var second = e.Current;
//Join smallest two nodes
var NewParent = new Node();
NewParent.Frequency = first.Value.Frequency + second.Value.Frequency;
NewParent.Left = first.Value;
NewParent.Right = second.Value;
HuffmanTree.Add(NewParent);
//Remove the two that just got joined into one
trees.Remove(first.Key);
trees.Remove(second.Key);
trees.Add(NewParent.Frequency + (fraction += tiny), NewParent);
}
}
}
}
Usage examples:
To create the dictionary from sample data:
var freqs = Huffman16.GetFrequencies(File.ReadAllBytes(#"D:\nodes"), true);
To initialize an encoder with a given dictionary:
var huff = new Huffman16(freqs);
And to do some compression:
var encoded = huff.Encode(raw);
And decompression:
var raw = huff.Decode(encoded);
The hard part in my mind is how you build your static dictionary. You don't want to use the LZW dictionary built from your sample data. LZW wastes a bunch of time learning since it can't build the dictionary faster than the decompressor can (a token will only be used the second time it's seen by the compressor so the decompressor can add it to its dictionary the first time its seen). The flip side of this is that it's adding things to the dictionary that may never get used, just in case the string shows up again. (e.g., to have a token for 'stackoverflow' you'll also have entries for 'ac','ko','ve','rf' etc...)
However, looking at the raw token stream from an LZ77 algorithm could work well. You'll only see tokens for strings seen at least twice. You can then build a list of the most common tokens/strings to include in your dictionary.
Once you have a static dictionary, using LZW sans the dictionary update seems like an easy implementation but to get the best compression I'd consider a static Huffman table instead of the traditional 12 bit fixed size token (as George Phillips suggested). An LZW dictionary will burn tokens for all the sub-strings you may never actually encode (e.g, if you can encode 'stackoverflow', there will be tokens for 'st', 'sta', 'stac', 'stack', 'stacko' etc.).
At this point it really isn't LZW - what makes LZW clever is how the decompressor can build the same dictionary the compressor used only seeing the compressed data stream. Something you won't be using. But all LZW implementations have a state where the dictionary is full and is no longer updated, this is how you'd use it with your static dictionary.
LZW adds to the dictionary during decompression to ensure it has the same dictionary state as the compressor. Otherwise the decoding would not function properly.
However, if you were in a state where the dictionary was fixed then, yes, you would not need to add new codes.
Your approach will work reasonably well and it's easy to use existing tools to prototype and measure the results. That is, compress the example file and then the example and test data together. The size of the latter less the former will be the expected compressed size of a block.
LZW is a clever way to build up a dictionary on the fly and gives decent results. But a more thorough analysis of your typical data blocks is likely to generate a more efficient dictionary.
There's also room for improvement in how LZW represents compressed data. For instance, each dictionary reference could be Huffman encoded to a closer to optimal length based on the expected frequency of their use. To be truly optimal the codes should be arithmetic encoded.
I would look at your data to see if there's an obvious reason it's so easy to compress. You might be able to do something much simpler than LZ78. I've done both LZ77 (lookback) and LZ78 (dictionary).
Try running a LZ77 on your data. There's no dictionary with LZ77, so you could use a library without alteration. Deflate is an implementation of LZ77.
Your idea of using a common dictionary is a good one, but it's hard to know whether the files are similar to each other or just self-similar without doing some tests.
The right track is to use an library -- almost every modern language have a compression library. C#, Python, Perl, Java, VB.net, whatever you use.
LZW save some space by depending the dictionary on previous inputs. It have an initial dictionary, and when you decompress something, you add them to the dictionary -- so the dictionary is growing. (I am omitting some details here, but this is the general idea)
You can omit this step by supply the whole (complete) dictionary as the initial one. But this would cost some space.
I find this aproach quite interesting for repeated log entries and something I would like to explore using.
Can you share the compression statistics for using this approach for your use case so I can compare it with other alternatives?
Have you considered having the common dictionary grow over time or is that not a valid option?

Algorithm to format text to Pascal or camel casing

Using this question as the base is there an alogrithm or coding example to change some text to Pascal or Camel casing.
For example:
mynameisfred
becomes
Camel: myNameIsFred
Pascal: MyNameIsFred
I found a thread with a bunch of Perl guys arguing the toss on this question over at http://www.perlmonks.org/?node_id=336331.
I hope this isn't too much of a non-answer to the question, but I would say you have a bit of a problem in that it would be a very open-ended algorithm which could have a lot of 'misses' as well as hits. For example, say you inputted:-
camelCase("hithisisatest");
The output could be:-
"hiThisIsATest"
Or:-
"hitHisIsATest"
There's no way the algorithm would know which to prefer. You could add some extra code to specify that you'd prefer more common words, but again misses would occur (Peter Norvig wrote a very small spelling corrector over at http://norvig.com/spell-correct.html which might help algorithm-wise, I wrote a C# implementation if C#'s your language).
I'd agree with Mark and say you'd be better off having an algorithm that takes a delimited input, i.e. this_is_a_test and converts that. That'd be simple to implement, i.e. in pseudocode:-
SetPhraseCase(phrase, CamelOrPascal):
if no delimiters
if camelCase
return lowerFirstLetter(phrase)
else
return capitaliseFirstLetter(phrase)
words = splitOnDelimiter(phrase)
if camelCase
ret = lowerFirstLetter(first word)
else
ret = capitaliseFirstLetter(first word)
for i in 2 to len(words): ret += capitaliseFirstLetter(words[i])
return ret
capitaliseFirstLetter(word):
if len(word) <= 1 return upper(word)
return upper(word[0]) + word[1..len(word)]
lowerFirstLetter(word):
if len(word) <= 1 return lower(word)
return lower(word[0]) + word[1..len(word)]
You could also replace my capitaliseFirstLetter() function with a proper case algorithm if you so wished.
A C# implementation of the above described algorithm is as follows (complete console program with test harness):-
using System;
class Program {
static void Main(string[] args) {
var caseAlgorithm = new CaseAlgorithm('_');
while (true) {
string input = Console.ReadLine();
if (string.IsNullOrEmpty(input)) return;
Console.WriteLine("Input '{0}' in camel case: '{1}', pascal case: '{2}'",
input,
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.CamelCase),
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.PascalCase));
}
}
}
public class CaseAlgorithm {
public enum CaseMode { PascalCase, CamelCase }
private char delimiterChar;
public CaseAlgorithm(char inDelimiterChar) {
delimiterChar = inDelimiterChar;
}
public string SetPhraseCase(string phrase, CaseMode caseMode) {
// You might want to do some sanity checks here like making sure
// there's no invalid characters, etc.
if (string.IsNullOrEmpty(phrase)) return phrase;
// .Split() will simply return a string[] of size 1 if no delimiter present so
// no need to explicitly check this.
var words = phrase.Split(delimiterChar);
// Set first word accordingly.
string ret = setWordCase(words[0], caseMode);
// If there are other words, set them all to pascal case.
if (words.Length > 1) {
for (int i = 1; i < words.Length; ++i)
ret += setWordCase(words[i], CaseMode.PascalCase);
}
return ret;
}
private string setWordCase(string word, CaseMode caseMode) {
switch (caseMode) {
case CaseMode.CamelCase:
return lowerFirstLetter(word);
case CaseMode.PascalCase:
return capitaliseFirstLetter(word);
default:
throw new NotImplementedException(
string.Format("Case mode '{0}' is not recognised.", caseMode.ToString()));
}
}
private string lowerFirstLetter(string word) {
return char.ToLower(word[0]) + word.Substring(1);
}
private string capitaliseFirstLetter(string word) {
return char.ToUpper(word[0]) + word.Substring(1);
}
}
The only way to do that would be to run each section of the word through a dictionary.
"mynameisfred" is just an array of characters, splitting it up into my Name Is Fred means understanding what the joining of each of those characters means.
You could do it easily if your input was separated in some way, e.g. "my name is fred" or "my_name_is_fred".

Resources