How can I filter() stream method using regexp and predicate to get negated list - java-8

I am trying to filter anything not in the regexp.
So what I am trying to express is write anything to a list that has characters other than a-z,0-9 and -, so I can deal with these city names with invalid characters afterwards.
But whatever I try I either end up with a list of valid cities or an IllegalArgumentException where the list contains valid character cities.
String str;
List<String> invalidCharactersList = cityName.stream()
.filter(Pattern.compile("[^a-z0-9-]*$").asPredicate())
.collect(toList());
// Check for invalid names
if (!invalidCharactersList.isEmpty()) {
str = (inOut) ? "c" : "q";
throw new IllegalArgumentException("City name characters "
+ str + ": for city name " + invalidCharactersList.get(0)
+ ": fails constraint city names [a-z, 0-9, -]");
}
I am try to filter anything not in the regexp
Following is some test data which fails on the first list, I want it to fail on last
List<String> c = new ArrayList<>(Arrays.asList("fastcity", "bigbanana", "xyz"));
List<Integer> x = new ArrayList<>(Arrays.asList(23, 23, 23));
List<Integer> y = new ArrayList<>(Arrays.asList(1, 10, 20));
List<String> q = new ArrayList<>(Arrays.asList("fastcity*", "bigbanana", "xyz&"));
Following is output:

#Holger
filter(Pattern.compile("[^a-z0-9-]").asPredicate())
Thanks this works fine.

Related

I wrote a code to update the Lettering of the first name in Zoho but it's not working

Here's the deluge script to capitalize the first letter of the sentence and make the other letters small that isn't working:
a = zoho.crm.getRecordById("Contacts",input.ID);
d = a.get("First_Name");
firstChar = d.subString(0,1);
otherChars = d.removeFirstOccurence(firstChar);
Name = firstChar.toUppercase() + otherChars.toLowerCase();
mp = map();
mp.put("First_Name",d);
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
info Name;
info b;
I tried capitalizing the first letter of the alphabet and make the other letters small. But it isn't working as expected.
Try using concat
Name = firstChar.toUppercase().concat( otherChars.toLowerCase() );
Try removing the double-quotes from the Name value in the the following statement. The reason is that Name is a variable holding the case-adjusted name, but "Name" is the string "Name".
From:
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
To
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":Name});

LINQ: select rows where any word of string start with a certain character

I want extract from a table all rows where in a column (string) there is at least one word that starts with a specified character.
Example:
Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'
If the specified character is T -> I would extract all 3 rows
If the specified character is S -> I would extract only the second column
...
Please help me
Assuming you mean "space delimited sequence of characters, or begin to space or space to end" by "word", then you can split on the delimiter and test them for matches:
var src = new[] {
"this is the first row",
"this is th second row",
"this is the third row"
};
var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));
The LINQ Enumerable.Any method tests a sequence to see if any element matches, so you can split each string into a sequence of words and see if any word begins with the desired letter, compensating for case.
Try this:
rows.Where(r => Regex.IsMatch(r, " [Tt]"))
You can replace the Tt with Ss (both assuming you want either upper case or lower case).
The problem of course is, what is a "word"?
Is the character sequence 'word' in the sentence above a word according to your definition? It doesn't start with a space, not even a white-space.
A definition of a word could be:
Define wordCharacter: something like A-Z, a-z.
Define word:
- the non-empty sequence of wordCharacters at the beginning of a string followed by a non-wordcharacter
- or the non-empty sequence of wordCharacters at the end of a string preceded by a non-wordcharacter
- any non-empty sequence of wordCharacters in the string preceded and followed by a non-wordcharacter
Define start of word: the first character of a word.
String: "Some strange characters: 'A', 9, äll, B9 C$ X?
- Words: Some, strange characters, A
- Not Words: 9, äll, B9, C$ X?
So you first have to specify precisely what you mean by word, then you can define functions.
I'll write it as an extension method of IEnumerable<string>. Usage will look similar to LINQ. See Extension Methods Demystified
bool IsWordCharacter(char c) {... TODO: implement your definition of word character}
static IEnumerable<string> SplitIntoWords(this string text)
{
// TODO: exception if text null
if (text.Length == 0) return
int startIndex = 0;
while (startIndex != text.Length)
{ // not at end of string. Find the beginning of the next word:
while (startIndex < text.Length && !IsWordCharacter(text[startIndex]))
{
++startIndex;
}
// now startIndex points to the first character of the next word
// or to the end of the text
if (startIndex != text.Length)
{ // found the beginning of a word.
// the first character after the word is either the first non-word character,
// or the end of the string
int indexAfterWord = startWordIndex + 1;
while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
{
++indexAfterWord;
}
// all characters from startIndex to indexAfterWord-1 are word characters
// so all characters between startIndexWord and indexAfterWord-1 are a word
int wordLength = indexAfterWord - startIndexWord;
yield return text.SubString(startIndexWord, wordLength);
}
}
}
Now that you've got a procedure to split any string into your definition of words, your query will be simple:
IEnumerabl<string> texts = ...
char specifiedChar = 'T';
// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
// split the text into words
// keep only the words that start with specifiedChar
// if there is such a word: keep the text
.Where(text => text.SplitIntoWords()
.Where(word => word.Length > 0 && word[0] == specifiedChar)
.Any());
var yourChar = "s";
var texts = new List<string> {
"this is the first row",
"this is th second row",
"this is the third row"
};
var result = texts.Where(p => p.StartsWith(yourChar) || p.Contains(" " + yourChar));
EDITED:
Alternative way (I'm not sure it works in linq query)
var result = texts.Where(p => (" " + p).Contains(" " + yourChar));
you can use .ToLower() if you want Case-insensitive check.

How to reverse tokenization after running tokens through name finder?

After using NameFinderME to find the names in a series of tokens, I would like to reverse the tokenization and reconstruct the original text with the names that have been modified. Is there a way I can reverse the tokenization operation in the exact way in which it was performed, so that the output is the exact structure as the input?
Example
Hello my name is John. This is another sentence.
Find sentences
Hello my name is John.
This is another sentence.
Tokenize sentences.
> Hello
> my
> name
> is
> John.
>
> This
> is
> another
> sentence.
My code that analyzes the tokens above looks something like this so far.
TokenNameFinderModel model3 = new TokenNameFinderModel(modelIn3);
NameFinderME nameFinder = new NameFinderME(model3);
List<Span[]> spans = new List<Span[]>();
foreach (string sentence in sentences)
{
String[] tokens = tokenizer.tokenize(sentence);
Span[] nameSpans = nameFinder.find(tokens);
string[] namedEntities = Span.spansToStrings(nameSpans, tokens);
//I want to modify each of the named entities found
//foreach(string s in namedEntities) { modifystring(s) };
spans.Add(nameSpans);
}
Desired output, perhaps masking the names that were found.
Hello my name is XXXX. This is another sentence.
In the documentation, there is a link to this post describing how to use the detokenizer. I don't understand how the operations array relates to the original tokenization (if at all)
https://issues.apache.org/jira/browse/OPENNLP-216
Create instance of SimpleTokenizer.
String sentence = "He said \"This is a test\".";
SimpleTokenizer instance = SimpleTokenizer.INSTANCE;
Tokenize the sentence using tokenize(String str) method from SimpleTokenizer
String tokens[] = instance.tokenize(sentence);
The operations array must have the same number of operation name as tokens array. Basically array length should be equal.
Store the operation name N-times (tokens.length times) into operation array.
Operation operations[] = new Operation[tokens.length];
String oper = "MOVE_RIGHT"; // please refer above list for the list of operations
for (int i = 0; i < tokens.length; i++)
{ operations[i] = Operation.parse(oper); }
System.out.println(operations.length);
Here the operation array length will be equal to the tokens array length.
Now create an instance of DetokenizationDictionary by passing tokens and operations arrays to the constructor.
DetokenizationDictionary detokenizeDict = new DetokenizationDictionary(tokens, operations);
Pass DetokenizationDictionary instance to the DictionaryDetokenizer class to detokenize the tokens.
DictionaryDetokenizer dictDetokenize = new DictionaryDetokenizer(detokenizeDict);
DictionaryDetokenizer.detokenize requires two parameters. a). tokens array and b). split marker
String st = dictDetokenize.detokenize(tokens, " ");
Output:
Use the Detokenizer.
String text = detokenize(myTokens, null);

LINQ| How to transform only the first letter to lowercase

Using LINQ how can I transform only the first letter of s.Password to lowercase
if (s.Password == password){}
i want that the first char of s.Password will be in lower case,
i tried :
if( s.Password[0].toString().toLower() + s.Password(1) ) == password ){}
If you would like to make a decision based on an item's position in LINQ, you can use Select that takes a Func with two parameters - the item and its index:
var pwd = "BadPassword";
var res = new string(
pwd.Select((c, i) => i==0 ? char.ToLower(c) : c).ToArray()
); // produces badPassword
The functor above converts the initial character at i==0 to lower case, while leaving all other characters in place.
Demo 1.
Note: LINQ is not necessary for this conversion. You can do the same thing in one line by using Substring:
var res = char.ToLower(pwd[0]) + pwd.Substring(1);
Demo 2.

how to detect all caps word in a string

I am new using java. I wanted to ask, if I have a text file containing different words per line and I want to read that file as a string in order to detect if there are certain words that are written in all caps (abbreviations). The exception being that if the word starts with "#" or and "#" it will ignore counting it. For example I have:
OMG terry is cute #HAWT SMH
The result will be:
Abbreviations = 2.
or
terry likes TGIF parties #ANDERSON
The result will be:
Abbreviations = 1.
Please help
Try to use the .split(String T) method, and the .contains(char C) methods .....
I think they will help you a lot ....
Function split:
http://www.tutorialspoint.com/java/java_string_split.htm
Function contains:
http://www.tutorialspoint.com/java/lang/string_contains.htm
String str1 = "OMG terry is cute #HAWT SMH";
String str2 = "terry likes TGIF parties #ANDERSON";
Pattern p = Pattern.compile("(?>\\s)([A-Z]+)(?=\\s)");
Matcher matcher = p.matcher(" "+str1+" ");//pay attention! adding spaces
// before and after to catch potentials in
// beginning/end of the sentence
int i=0;
while (matcher.find()) {
i++; //count how many matches were found
}
System.out.println("matches: "+i); // prints 2
matcher = p.matcher(" "+str2+" ");
i=0;
while (matcher.find()) {
i++;
}
System.out.println("matches: "+i); // prints 1
OUTPUT:
matches: 2
matches: 1
Here is a bazooka for your spider problem.
(mystring+" ").split("(?<!#|#)[A-Z]{2,}").length-1;
Pad the string with a space (because .split removes trailing empty strings).
Split on the pattern "behind this is neither # nor #, and this is two or more capital letters". This returns an array of substrings of the that are not part of abbreviations.
Take the length of the array and subtract 1.
Example:
mystring = "OMG terry is cute #HAWT SMH";
String[] arr = (mystring+" ").split("(?<!#|#)[A-Z]{2,}").length-1;
//arr is now {"", " terry is cute #HAWT ", " "}, three strings
return arr.length-1; //returns 2

Resources