How to get a substring in some length for special chars like Chinese - freemarker

For example, I can get 80 chars with {description?substring(0, 80)} if description is in English, but for Chinese chars, I can get only about 10 chars, and there is a garbage char at the end always.
How can I get 80 chars for any language?

FreeMarker relies on String#substring to do the actual (UTF-16-chars-based?) substring calculation, which doesn't work well with Chinese characters. Instead one should uses Unicode code points. Based on this post and FreeMarker's own substring builtin I hacked together a FreeMarker TemplateMethodModelEx implementation which operates on code points:
public class CodePointSubstring implements TemplateMethodModelEx {
#Override
public Object exec(List args) throws TemplateModelException {
int argCount = args.size(), left = 0, right = 0;
String s = "";
if (argCount != 3) {
throw new TemplateModelException(
"Error: Expecting 1 string and 2 numerical arguments here");
}
try {
TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
s = tsm.getAsString();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
try {
TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
left = tnm.getAsNumber().intValue();
tnm = (TemplateNumberModel) args.get(2);
right = tnm.getAsNumber().intValue();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
return new SimpleScalar(getSubstring(s, left, right));
}
private String getSubstring(String s, int start, int end) {
int[] codePoints = new int[end - start];
int length = s.length();
int i = 0;
for (int offset = 0; offset < length && i < codePoints.length;) {
int codepoint = s.codePointAt(offset);
if (offset >= start) {
codePoints[i] = codepoint;
i++;
}
offset += Character.charCount(codepoint);
}
return new String(codePoints, 0, i);
}
}
You can put an instance of it into your data model root, e.g.
SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);
and use the custom substring method in FTL:
${substring(description, 0, 80)}
I tested it with non-Chinese characters, which still worked, but so far I haven't tried it with Chinese characters. Maybe you want to give it a try.

Related

Is there a way to avoid the truncation of attached properties when using Appcenter with Xamarin?

Here's my code:
Crashes.TrackError(ex,
new Dictionary<string, string> {
{"RunQuery", "Exception"},
{"sql", s },
{"Device Model", DeviceInfo.Model },
{"Exception", ex.ToString()}
});
Everything works but I find that Appcenter limits the length of the parameters to 125 characters so it's useless for me as I can never see all of the sql or the ex string.
Has anyone found a way to get around this?
I ran into the same problem. My solution was to break my string into groups of 125 character strings and iterate through while logging. I chatted with AppCenter support. They have no way of extending this length currently.
Here is a scrubbed version of my code:
var tokenChunks = LoggingHelper.SplitBy(extremelyLongString, 120);
string title = "Long string here";
var props = new Dictionary<string, string>();
int item = 0;
foreach(string chunk in tokenChunks)
{
string chunkIndex = string.Format("item: {0}", item++);
props.Add(chunkIndex, chunk);
}
Analytics.TrackEvent(title, props);
Where the LoggingHelper class is:
public static class LoggingHelper
{
public static IEnumerable<string> SplitBy(this string str, int chunkLength)
{
if (String.IsNullOrEmpty(str)) throw new ArgumentException();
if (chunkLength < 1) throw new ArgumentException();
for (int i = 0; i < str.Length; i += chunkLength)
{
if (chunkLength + i > str.Length)
chunkLength = str.Length - i;
yield return str.Substring(i, chunkLength);
}
}
}
I should give credit to this post https://stackoverflow.com/a/8944374/117995 by #oleksii for the SplitBy method.

No need to check this! skip

So what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute isSo what I'm trying to do but clearly struggling to execute is
a single line in the text f
import java.util.Scanner;
import java.io.*;
public class hello
{
public static void main(String[] args) throws IOException
{
Scanner Keyboard = new Scanner(System.in);
System.out.print();
String response = Keyboard.nextLine();
File inFile = new File(response);
Scanner route = new Scanner(inFile);
while ()
{
System.out.print(");
String word = Keyboard.next();
String Street = route.next();
String stopNum = route.next();
You are closing your file after you read one "line" (actually, I'm not sure how many lines you're reading - you don't call nextLine). You also aren't parsing the line. Also, I'd prefer a try-with-resources over an explicit close (and many of your variables look like class names). Finally, you need to check if the line matches your criteria. That might be done like,
Scanner keyboard = new Scanner(System.in);
System.out.print("Enter filename >> ");
String response = keyboard.nextLine();
File inFile = new File(response);
System.out.print("Enter tram tracker ID >> ");
String word = keyboard.nextLine(); // <-- read a line. Bad idea to leave trailing
// new lines.
try (Scanner route = new Scanner(inFile)) {
while (route.hasNextLine()) {
String[] line = route.nextLine().split("\\^");
String street = line[0];
String stopNum = line[1];
String trkID = line[2];
String road = line[3];
String suburb = line[4];
if (!trkID.equals(word)) {
continue;
}
System.out.printf("street: %s, stop: %s, id: %s, road: %s, suburb: %s%n",
street, stopNum, trkID, road, suburb);
}
}
Your code print everything in the file.
To print a line with an given ID:
You can first buffer all lines of the file into a ArrayList like this in the main method:
ArrayList<String> lines = new ArrayList<>();
while (route.hasNextLine())
{
lines.add(route.nextLine());
}
Then create a method to find a line with a specific ID:
public static int find(ArrayList information, int ID)
{
String idString = "" + ID;
ListIterator<String> li = information.listIterator();
String currentLine = "";
int index = 0;
while(li.hasNext())
{
currentLine = li.next();
int count = 0;
int index1 = 0;
int index2 = 0;
/*Trying to locate the string between the 2nd and 3rd ^ */
for(int i = 0; i < currentLine.length(); i++)
{
if(currentLine.substring(i, i+1).equals("^"))
{
count++;
if(count == 2)
index1 = i;
else if(count == 3)
{
index2 = i;
break;
}
}
}
if(currentLine.substring(index1+1, index2).equals (idString))
return(index);
index++;
}
//If no such ID found, return -1;
return -1;
}
In the main method:
System.out.println("enter an ID")
int ID = Integer.parseInt(Keyboard.next());
int lineNumber = find(lines, ID);
if(lineNumber == -1)
System.out.println("no information found");
else
System.out.println(lines.get(lineNumber));

Java - For loop within ArrayList<String> method only returns one element

I am trying to use an ArrayList of string values from one table, modify the strings based on whether or not the string ends with ".tif" or ".tiff", then transfer the resulting strings to a new table. However, when I invoke this method, the new table only receives the first modified string. I'm not sure what is wrong with my logic, the first element of the original table would be checked to see if it satisfies a condition (either ending in ".tif" or ".tiff") then from there that string would be modified, added to the ArrayList fData, then iterate to the next table value. I don't understand why the method doesn't return more than one element contained within fData?
public ArrayList<String> getTableData() {
StringBuilder str = new StringBuilder();
String fString = null;
ArrayList<String> fData = new ArrayList<String>();
while(filePaths != null) {
int size = filePaths.size();
for (int i = 0; i <= size; i++) {
String pathName = filePaths.get(i);
if (pathName.endsWith(".tif")) {
int pathLength = pathName.length();
str = new StringBuilder(filePaths.get(i));
str.insert(pathLength - 4, "_Data");
fString = str.toString();
fData.add(fString);
tableModel2.addRow(new String[] { fString });
return fData;
}
else if (pathName.endsWith(".tiff")) {
int pathLength = pathName.length();
str = new StringBuilder(filePaths.get(i));
str.insert(pathLength - 5, "_Data");
fString = str.toString();
fData.add(fString);
tableModel2.addRow(new String[] { fString });
return fData;
}
}
tableModel2.fireTableDataChanged();
}
return null;
}
`
It appears that you are returning from getTableData() as soon as you do a single replacement. Instead, you should return only after having iterated over every file path.
Remove the return statements inside the loops and instead replace return null at the end with return fData.

Find anagram of input on set of strings..?

Given a set of strings (large set), and an input string, you need to find all the anagrams of the input string efficiently. What data structure will you use. And using that, how will you find the anagrams?
Things that I have thought of are these:
Using maps
a) eliminate all words with more/less letters than the input.
b) put the input characters in map
c) Traverse the map for each string and see if all letters are present with their count.
Using Tries
a) Put all strings which have the right number of characters into a trie.
b) traverse each branch and go deeper if the letter is contained in the input.
c) if leaf reached the word is an anagram
Can anyone find a better solution?
Are there any problems that you find in the above approaches?
Build a frequency-map from each word and compare these maps.
Pseudo code:
class Word
string word
map<char, int> frequency
Word(string w)
word = w
for char in word
int count = frequency.get(char)
if count == null
count = 0
count++
frequency.put(char, count)
boolean is_anagram_of(that)
return this.frequency == that.frequency
You could build an hashmap where the key is sorted(word), and the value is a list of all the words that, sorted, give the corresponding key:
private Map<String, List<String>> anagrams = new HashMap<String, List<String>>();
void buildIndex(){
for(String word : words){
String sortedWord = sortWord(word);
if(!anagrams.containsKey(sortedWord)){
anagrams.put(sortedWord, new ArrayList<String>());
}
anagrams.get(sortedWord).add(word);
}
}
Then you just do a lookup for the sorted word in the hashmap you just built, and you'll have the list of all the anagrams.
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
/*
*Program for Find Anagrams from Given A string of Arrays.
*
*Program's Maximum Time Complexity is O(n) + O(klogk), here k is the length of word.
*
* By removal of Sorting, Program's Complexity is O(n)
* **/
public class FindAnagramsOptimized {
public static void main(String[] args) {
String[] words = { "gOd", "doG", "doll", "llod", "lold", "life",
"sandesh", "101", "011", "110" };
System.out.println(getAnaGram(words));
}
// Space Complexity O(n)
// Time Complexity O(nLogn)
static Set<String> getAnaGram(String[] allWords) {
// Internal Data Structure for Keeping the Values
class OriginalOccurence {
int occurence;
int index;
}
Map<String, OriginalOccurence> mapOfOccurence = new HashMap<>();
int count = 0;
// Loop Time Complexity is O(n)
// Space Complexity O(K+2K), here K is unique words after sorting on a
for (String word : allWords) {
String key = sortedWord(word);
if (key == null) {
continue;
}
if (!mapOfOccurence.containsKey(key)) {
OriginalOccurence original = new OriginalOccurence();
original.index = count;
original.occurence = 1;
mapOfOccurence.put(key, original);
} else {
OriginalOccurence tempVar = mapOfOccurence.get(key);
tempVar.occurence += 1;
mapOfOccurence.put(key, tempVar);
}
count++;
}
Set<String> finalAnagrams = new HashSet<>();
// Loop works in O(K), here K is unique words after sorting on
// characters
for (Map.Entry<String, OriginalOccurence> anaGramedWordList : mapOfOccurence.entrySet()) {
if (anaGramedWordList.getValue().occurence > 1) {
finalAnagrams.add(allWords[anaGramedWordList.getValue().index]);
}
}
return finalAnagrams;
}
// Array Sort works in O(nLogn)
// Customized Sorting for only chracter's works in O(n) time.
private static String sortedWord(String word) {
// int[] asciiArray = new int[word.length()];
int[] asciiArrayOf26 = new int[26];
// char[] lowerCaseCharacterArray = new char[word.length()];
// int characterSequence = 0;
// Ignore Case Logic written in lower level
for (char character : word.toCharArray()) {
if (character >= 97 && character <= 122) {
// asciiArray[characterSequence] = character;
if (asciiArrayOf26[character - 97] != 0) {
asciiArrayOf26[character - 97] += 1;
} else {
asciiArrayOf26[character - 97] = 1;
}
} else if (character >= 65 && character <= 90) {
// asciiArray[characterSequence] = character + 32;
if (asciiArrayOf26[character + 32 - 97] != 0) {
asciiArrayOf26[character + 32 - 97] += 1;
} else {
asciiArrayOf26[character + 32 - 97] = 1;
}
} else {
return null;
}
// lowerCaseCharacterArray[characterSequence] = (char)
// asciiArray[characterSequence];
// characterSequence++;
}
// Arrays.sort(lowerCaseCharacterArray);
StringBuilder sortedWord = new StringBuilder();
int asciiToIndex = 0;
// This Logic uses for reading the occurrences from array and copying
// back into the character array
for (int asciiValueOfCharacter : asciiArrayOf26) {
if (asciiValueOfCharacter != 0) {
if (asciiValueOfCharacter == 1) {
sortedWord.append((char) (asciiToIndex + 97));
} else {
for (int i = 0; i < asciiValueOfCharacter; i++) {
sortedWord.append((char) (asciiToIndex + 97));
}
}
}
asciiToIndex++;
}
// return new String(lowerCaseCharacterArray);
return sortedWord.toString();
}
}

How to decode protobuf binary response

I have created a test app that can recognize some image using Goggle Goggles. It works for me, but I receive binaryt protobuf response. I have no proto-files, just binary response. How can I get data from it? (Have sent some image with bottle of bear and got the nex response):
A
TuborgLogo9 HoaniText���;�)b���2d8e991bff16229f6"�
+TR=T=AQBd6Cl4Kd8:X=OqSEi:S=_rSozFBgfKt5d9b0
+TR=T=6rLQxKE2xdA:X=OqSEi:S=gd6Aqb28X0ltBU9V
+TR=T=uGPf9zJDWe0:X=OqSEi:S=32zTfdIOdI6kuUTa
+TR=T=RLkVoGVd92I:X=OqSEi:S=P7yOhvSAOQW6SRHN
+TR=T=J1FMvNmcyMk:X=OqSEi:S=5Z631_rd2ijo_iuf�
need to get string "Tuborg" and if possible type - "Logo"
You can decode with protoc:
protoc --decode_raw < msg.bin
UnknownFieldSet.parseFrom(msg).toString()
This will show you the top level fields. Unfortunately it can't know the exact details of field types. long/int/bool/enum etc are all encoded as Varint and all look the same. Strings, byte-arrays and sub-messages are length-delimited and are also indistinguishable.
Some useful details here: https://github.com/dcodeIO/protobuf.js/wiki/How-to-reverse-engineer-a-buffer-by-hand
If you follow the code in the UnknownFieldSet.mergeFrom() you'll see how you could try decode sub-messages and falling back to strings if that fails - but it's not going to be very reliable.
There are 2 spare values for the wiretype in the protocol - it would have been really helpful if google had used one of these to denote sub-messages. (And the other for null values perhaps.)
Here's some very crude rushed code which attempts to produce a something useful for diagnostics. It guesses at the data types and in the case of strings and sub-messages it will print both alternatives in some cases. Please don't trust any values it prints:
public static String decodeProto(byte[] data, boolean singleLine) throws IOException {
return decodeProto(ByteString.copyFrom(data), 0, singleLine);
}
public static String decodeProto(ByteString data, int depth, boolean singleLine) throws IOException {
final CodedInputStream input = CodedInputStream.newInstance(data.asReadOnlyByteBuffer());
return decodeProtoInput(input, depth, singleLine);
}
private static String decodeProtoInput(CodedInputStream input, int depth, boolean singleLine) throws IOException {
StringBuilder s = new StringBuilder("{ ");
boolean foundFields = false;
while (true) {
final int tag = input.readTag();
int type = WireFormat.getTagWireType(tag);
if (tag == 0 || type == WireFormat.WIRETYPE_END_GROUP) {
break;
}
foundFields = true;
protoNewline(depth, s, singleLine);
final int number = WireFormat.getTagFieldNumber(tag);
s.append(number).append(": ");
switch (type) {
case WireFormat.WIRETYPE_VARINT:
s.append(input.readInt64());
break;
case WireFormat.WIRETYPE_FIXED64:
s.append(Double.longBitsToDouble(input.readFixed64()));
break;
case WireFormat.WIRETYPE_LENGTH_DELIMITED:
ByteString data = input.readBytes();
try {
String submessage = decodeProto(data, depth + 1, singleLine);
if (data.size() < 30) {
boolean probablyString = true;
String str = new String(data.toByteArray(), Charsets.UTF_8);
for (char c : str.toCharArray()) {
if (c < '\n') {
probablyString = false;
break;
}
}
if (probablyString) {
s.append("\"").append(str).append("\" ");
}
}
s.append(submessage);
} catch (IOException e) {
s.append('"').append(new String(data.toByteArray())).append('"');
}
break;
case WireFormat.WIRETYPE_START_GROUP:
s.append(decodeProtoInput(input, depth + 1, singleLine));
break;
case WireFormat.WIRETYPE_FIXED32:
s.append(Float.intBitsToFloat(input.readFixed32()));
break;
default:
throw new InvalidProtocolBufferException("Invalid wire type");
}
}
if (foundFields) {
protoNewline(depth - 1, s, singleLine);
}
return s.append('}').toString();
}
private static void protoNewline(int depth, StringBuilder s, boolean noNewline) {
if (noNewline) {
s.append(" ");
return;
}
s.append('\n');
for (int i = 0; i <= depth; i++) {
s.append(INDENT);
}
}
I'm going to assume the real question is how to decode protobufs and not how to read binary from the wire using Java.
The answer to your question can be found here
Briefly, on the wire, protobufs are encoded as 3-tuples of <key,type,value>, where:
the key is the field number assigned to the field in the .proto schema
the type is one of <Varint, int32, length-delimited, start-group, end-group,int64. It contains just enough information to decode the value of the 3-tuple, namely it tells you how long the value is.

Resources