Iterate lists of different length using Java 8 streams - java-8

I am trying to iterate over three lists of different size but not getting the exact logic of how i can retrieve data from them and store in another list.
I was able to handle up to two list until I add some more filtration to the elements. For now I am using 3 for loops but i want to use Java 8 streams if possible. Can someone please suggest me the correct logic for the below iterations.
public class CustomDto {
public static void main(String... args) {
List<String> list1 = Arrays.asList("Hello", "World!");
List<String> list2 = Arrays.asList("Hi", "there");
List<String> list3 = Arrays.asList("Help Me");
Map<Integer, Object> map = new HashMap<>();
for (int i = 0; i < list1.size(); i++) {
List<String> list4 = new LinkedList();
for (int j = 0; j < list2.size(); j++) {
for (int k = 0; k < list3.size(); k++) {
if (!(list2.get(j).equals(list3.get(k))))
list4.add(list2.get(j));
}
if (j > list4.size() - 1) {
list4.add(null);
}
}
map.put(i, list4);
}
}
}
All i want to convert the above code into stream, in which i can iterate a list inside another list and can use the index of one another.

public static void main(String... args) {
List<String> list1 = Arrays.asList("Hello", "World!");
List<String> list2 = Arrays.asList("Hi", "there");
List<String> list3 = Arrays.asList("Help Me");
List<String> list4 = concat(list1, list2, list3); //you can add as many lists as you like here
System.out.println(list4);
}
private static List<String> concat(List<String>... lists) {
return Stream.of(lists)
.flatMap(List::stream)
.collect(Collectors.toList());
}
Output
[Hello, World!, Hi, there, Help Me]

Try this create a multiple dimension array out of List, from there you can use stream too
Customdto[][] listArray = new Customdto[][]{ l1.toArray(new Customdto[]{})
, l2.toArray(new Customdto[]{}), l3.toArray(new Customdto[]{})};
int size = listArray[0].length > listArray[1].length && listArray[0].length > listArray[2].length ?listArray[0].length
:(listArray[1].length > listArray[2].length ? listArray[1].length:listArray[2].length);
for(int i = 0; i <size;i++)
{
if(listArray[0].length >i && listArray[1].length >i && listArray[2].length >i &&
listArray[0][i].equals(listArray[1][i])
&& listArray[1][i].getCalendarDate().equals(listArray[2][i].getCalendarDate()))
{
l4.add(listArray[1][i]);
}else
{
l4.add(null);
}
}
Tried with Below Input
List<Customdto> l1 = new ArrayList<Customdto>();
List<Customdto> l2 = new ArrayList<Customdto>();
List<Customdto> l3 = new ArrayList<Customdto>();
List<Customdto> l4 = new ArrayList<Customdto>();
l1.add(new Customdto(1));
l1.add(new Customdto(2));
l1.add(new Customdto(3));
l1.add(new Customdto(4));
l2.add(new Customdto(1));
l2.add(new Customdto(2));
l2.add(new Customdto(3));
l3.add(new Customdto(1));
l3.add(new Customdto(2));
Output is
[Customdto [id=1], Customdto [id=2], null, null]

Related

Algorithm to remove duplicated location on list

I have a service to find journey and remove the duplicated visit city.
public static void main(String[] args){
List<List<String>> allPaths = new ArrayList<>();
allPaths.add(List.of("Newyork","Washington","Los Angeles","Chicago"));
allPaths.add(List.of("Newyork","Washington","Houston"));
allPaths.add(List.of("Newyork","Dallas"));
allPaths.add(List.of("Newyork","Columbus", "Chicago"));
Set<String> orphanageLocations = new HashSet<>();
removeDuplicatedLocation(allPaths, orphanageLocations);
//expected allPaths:
//"Newyork","Washington","Los Angeles","Chicago"
//"Newyork","Dallas"
//"Newyork","Columbus"
//expected orphanageLocations
//"Houston"
}
private static void removeDuplicatedLocation(List<List<String>> allPaths, Set<String> orphanageLocations){
//do something here
}
in the allPaths i store all the path from a origin to other cities.
but may be some path will contain same city, like Washington appear in both first and second path.
Now i need a service to remove that duplicated city. when two paths has same city then we take the path which going to more city.
And service also return city that can not visit. for example the 2nd path has "Washington" is duplicated with 1st path, then we remove that 2nd path (it has less city than first one), then there are no path to "Houston" available -> becoming orphanage
Other test cases:
public static void main(String[] args){
List<List<String>> allPaths = new ArrayList<>();
allPaths.add(List.of("Newyork","Washington","Los Angeles","Chicago", "Dallas"));
allPaths.add(List.of("Newyork","Los Angeles","Houston", "Philadenphia"));
allPaths.add(List.of("Newyork","Dallas"));
allPaths.add(List.of("Newyork","Columbus", "San Francisco"));
Set<String> orphanageLocations = new HashSet<>();
removeDuplicatedLocation(allPaths, orphanageLocations);
//expected allPaths:
//"Newyork","Washington","Los Angeles","Chicago", "Dallas"
//"Newyork","Columbus", "San Francisco"
//expected orphanageLocations
//"Houston","Philadenphia"
}
Would somebody suggest me a algorithm to solve it?
---Edit 1: i update my dirty solution here, still waiting for better one
private static void removeDuplicatedLocation(List<List<String>> allPaths, Set<String> orphanageLocations){
//sort to make sure the longest path is on top
List<List<String>> sortedPaths = allPaths.stream().sorted((a, b) -> Integer.compare(b.size(), a.size()))
.collect(Collectors.toList());
for(int i = 0; i < sortedPaths.size()-1; i++){
List<String> path = sortedPaths.get(i);
orphanageLocations.removeIf(path::contains);
for(int j = i+1; j < sortedPaths.size(); j++){
for(int k = 1; k < path.size();k++) {
Iterator<String> iterator = sortedPaths.get(j).iterator();
boolean isRemove = false;
while (iterator.hasNext()) {
String city = iterator.next();
if(isRemove && !path.contains(city)){
orphanageLocations.add(city);
}
if(StringUtils.equals(city, path.get(k))){
isRemove = true;
}
if(isRemove){
iterator.remove();
}
}
}
}
}
//remove path if it's only origin
sortedPaths.removeIf(item -> item.size() == 1);
allPaths.clear();
allPaths.addAll(sortedPaths);
}
---Edit 2: Thanks for solution of #devReddit, i made a small test with huge amount of route.
The more city in each path, the slower your solution is
public static void main(String[] args){
List<List<String>> allPaths = new ArrayList<>();
List<List<String>> allPaths2 = new ArrayList<>();
List<String> locations = Stream.of("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N",
"O", "P", "Q", "R", "S", "T", "U", "V", "X", "Y", "Z").collect(Collectors.toList());
Random rand = new Random();
int numberOfRoute = 10000;
String origin = "NY";
for(int i = 0; i < numberOfRoute; i++){
List<String> route = new ArrayList<>();
List<String> route2 = new ArrayList<>();
route.add(origin);
route2.add(origin);
//int routeLength = rand.nextInt(locations.size());
int routeLength = 10;
while(route.size() < routeLength){
int randomIndex = rand.nextInt(locations.size()-1);
if(!route.contains(locations.get(randomIndex))){
route.add(locations.get(randomIndex));
route2.add(locations.get(randomIndex));
}
}
allPaths.add(route);
allPaths2.add(route2);
}
System.out.println("Process for " + allPaths2.size() + " routes");
Set<String> orphanageLocations2 = new HashSet<>();
long startTime2 = System.currentTimeMillis();
removeDuplicatedLocation3(allPaths2, orphanageLocations2); //uncle bob solution
long endTime2 = System.currentTimeMillis();
System.out.println(allPaths2);
System.out.println(orphanageLocations2);
System.out.println("Total time uncleBob solution(ms):" + (endTime2-startTime2));
System.out.println("Process for " + allPaths.size() + " routes");
Set<String> orphanageLocations = new HashSet<>();
long startTime = System.currentTimeMillis();
removeDuplicatedLocation(allPaths, orphanageLocations); //devReddit solution
long endTime = System.currentTimeMillis();
System.out.println(allPaths);
System.out.println(orphanageLocations);
System.out.println("Total time devReddit solution(ms):" + (endTime-startTime));
}
//devReddit solution
private static void removeDuplicatedLocation(List<List<String>> allPaths, Set<String> orphanageLocations) {
List<List<String>> sortedFixedPaths = allPaths // List.of produces immutable lists,
.stream() // so you can't directly remove string from the list
.sorted((a, b) -> Integer.compare(b.size(), a.size())) // this fixed list will be needed later
.collect(Collectors.toList());
List<List<String>> sortedPaths = sortedFixedPaths // The list is regenerated through manual deep copy
.stream() // generated a single string from the streams of
.map(path -> // each List<String> and created new list, this is now mutable
new ArrayList<>(Arrays.asList(String.join(",", path).split(","))))
.collect(Collectors.toList());
Set<List<String>> valuesToBeRemoved = new HashSet<>();
String source = sortedPaths.get(0).get(0);
Map<String, List<Integer>> cityMapOfIndex = generateHashMap(sortedPaths, source); // This hashmap keeps track of the existence of cities in different lists
removeDuplicates(cityMapOfIndex, sortedPaths); // this method removes the duplicates from the smaller paths
// adds the remaining cities to orphanList
cityMapOfIndex.forEach((cityName, value) -> { // this block checks whether any mid element in the path is gone
int index = value.get(0); // removes the path from result list
List<String> list = sortedPaths.get(index);
int indexInPath = list.indexOf(cityName);
if (indexInPath != sortedFixedPaths.get(index).indexOf(cityName)) {
orphanageLocations.add(cityName);
sortedPaths.get(index).remove(indexInPath);
}
});
valuesToBeRemoved.add(new ArrayList<>(Collections.singleton(source))); // To handle the case where only source remains in the path
sortedPaths.removeAll(valuesToBeRemoved); // after removing other duplicates
allPaths.clear();
allPaths.addAll(sortedPaths);
}
private static void removeDuplicates(Map<String, List<Integer>> cityMapOfIndex, List<List<String>> sortedPaths) {
for (Map.Entry<String, List<Integer>> entry : cityMapOfIndex.entrySet()) {
List<Integer> indexList = entry.getValue();
while (indexList.size() > 1) {
int index = indexList.get(indexList.size() - 1); // get the last index i.e. the smallest list of city where this entry exists
sortedPaths.get(index).remove(entry.getKey()); // remove the city from the list
indexList.remove((Integer) index); // update the index list of occurrence
}
cityMapOfIndex.put(entry.getKey(), indexList);
}
}
private static Map<String, List<Integer>> generateHashMap(List<List<String>> sortedPaths,
String source) {
Map<String, List<Integer>> cityMapOfIndex = new HashMap<>();
for (int x = 0; x < sortedPaths.size(); x++) {
int finalX = x;
sortedPaths.get(x)
.forEach(city -> {
if (!city.equalsIgnoreCase(source)) { // add entries for all except the source
List<Integer> indexList = cityMapOfIndex.containsKey(city) ? // checks whether there's already an entry
cityMapOfIndex.get(city) : new ArrayList<>(); // to avoid data loss due to overwriting
indexList.add(finalX); // adds the index of the List of string
cityMapOfIndex.put(city, indexList); // add or update the map with current indexList
}
});
}
return cityMapOfIndex;
}
//Bob solution
private static void removeDuplicatedLocation3(List<List<String>> allPaths, Set<String> orphanageLocations){
//sort to make sure the longest path is on top
List<List<String>> sortedPaths = allPaths.stream().sorted((a, b) -> Integer.compare(b.size(), a.size()))
.collect(Collectors.toList());
for(int i = 0; i < sortedPaths.size()-1; i++){
List<String> path = sortedPaths.get(i);
orphanageLocations.removeIf(path::contains);
for(int j = i+1; j < sortedPaths.size(); j++){
for(int k = 1; k < path.size();k++) {
Iterator<String> iterator = sortedPaths.get(j).iterator();
boolean isRemove = false;
while (iterator.hasNext()) {
String city = iterator.next();
if(isRemove && !path.contains(city)){
orphanageLocations.add(city);
}
if(StringUtils.equals(city, path.get(k))){
isRemove = true;
}
if(isRemove){
iterator.remove();
}
}
}
}
}
//remove path if it's only origin
sortedPaths.removeIf(item -> item.size() == 1);
allPaths.clear();
allPaths.addAll(sortedPaths);
}
Here is one of the result:
Test with route lenth is 6
Process for 10000 routes
[[NY, Q, Y, T, S, X], [NY, E], [NY, V, A, H, N], [NY, J, L, I], [NY, D], [NY, O], [NY, C], [NY, P, M], [NY, F], [NY, K], [NY, U], [NY, G], [NY, R], [NY, B]]
[]
Total time uncleBob solution(ms):326
Process for 10000 routes
[[NY, Q, Y, T, S, X], [NY, E], [NY, V], [NY, J, L], [NY, D], [NY, O]]
[A, B, C, F, G, H, I, K, M, N, P, R, U]
Total time devReddit solution(ms):206
With route length is 10
Process for 10000 routes
[[NY, J, V, G, A, I, B, R, U, S], [NY, L, X, Q, M, E], [NY, K], [NY, Y], [NY, F, P], [NY, N], [NY, H, D], [NY, T, O], [NY, C]]
[]
Total time uncleBob solution(ms):292
Process for 10000 routes
[[NY, J, V, G, A, I, B, R, U, S]]
[C, D, E, F, H, K, L, M, N, O, P, Q, T, X, Y]
Total time devReddit solution(ms):471
Also result is not the same,from the same inpit, mine return more valid route
Actually i this is not what i expect because solution from #devReddit look better & faster
Thanks
Your provided solution is O(m^2xn^2). I've figured out a solution which has O(n^2) time complexity. Necessary comments have been added as explanation:
The core method removeDuplicatedLocation:
private static void removeDuplicatedLocation(List<List<String>> allPaths, Set<String> orphanageLocations) {
List<List<String>> sortedFixedPaths = allPaths // List.of produces immutable lists,
.stream() // so you can't directly remove string from the list
.sorted((a, b) -> Integer.compare(b.size(), a.size())) // this fixed list will be needed later
.collect(Collectors.toList());
List<List<String>> sortedPaths = sortedFixedPaths // The list is regenerated through manual deep copy
.stream() // generated a single string from the streams of
.map(path -> // each List<String> and created new list, this is now mutable
new ArrayList<>(Arrays.asList(String.join(",", path).split(","))))
.collect(Collectors.toList());
Set<List<String>> valuesToBeRemoved = new HashSet<>();
String source = sortedPaths.get(0).get(0);
Map<String, List<Integer>> cityMapOfIndex = generateHashMap(sortedPaths, source); // This hashmap keeps track of the existence of cities in different lists
removeDuplicates(cityMapOfIndex, sortedPaths); // this method removes the duplicates from the smaller paths
cityMapOfIndex.entrySet().stream().forEach(city -> { // this block checks whether any mid element in the path is gone
String cityName = city.getKey(); // adds the remaining cities to orphanList
int index = city.getValue().get(0); // removes the path from result list
List<String> list = sortedPaths.get(index);
int indexInPath = list.indexOf(cityName);
if (indexInPath != sortedFixedPaths.get(index).indexOf(cityName)) {
orphanageLocations.add(cityName);
sortedPaths.get(index).remove(indexInPath);
}
});
valuesToBeRemoved.add(new ArrayList<>(Collections.singleton(source))); // To handle the case where only source remains in the path
sortedPaths.removeAll(valuesToBeRemoved); // after removing other duplicates
allPaths.clear();
allPaths.addAll(sortedPaths);
}
The removeDuplicates and generateHashMap methods used in the aforementioned stub is given below:
private static void removeDuplicates(Map<String, List<Integer>> cityMapOfIndex, List<List<String>> sortedPaths) {
for (Map.Entry<String, List<Integer>> entry : cityMapOfIndex.entrySet()) {
List<Integer> indexList = entry.getValue();
while (indexList.size() > 1) {
int index = indexList.get(indexList.size() - 1); // get the last index i.e. the smallest list of city where this entry exists
sortedPaths.get(index).remove(entry.getKey()); // remove the city from the list
indexList.remove((Integer) index); // update the index list of occurrence
}
cityMapOfIndex.put(entry.getKey(), indexList);
}
}
private static Map<String, List<Integer>> generateHashMap(List<List<String>> sortedPaths,
String source) {
Map<String, List<Integer>> cityMapOfIndex = new HashMap<>();
for (int x = 0; x < sortedPaths.size(); x++) {
int finalX = x;
sortedPaths.get(x)
.stream()
.forEach(city -> {
if (!city.equalsIgnoreCase(source)) { // add entries for all except the source
List<Integer> indexList = cityMapOfIndex.containsKey(city) ? // checks whether there's already an entry
cityMapOfIndex.get(city) : new ArrayList<>(); // to avoid data loss due to overwriting
indexList.add(finalX); // adds the index of the List of string
cityMapOfIndex.put(city, indexList); // add or update the map with current indexList
}
});
}
return cityMapOfIndex;
}
Please let me know if you have any query.

C# coding, I need to create several text files and populate it with unsorted lists of integers, then read it from the respective file into a list

ArrayList List;
String FileName;
static void Main(string[] args)
{
List<int> Integers = new List<int>();
Console.WriteLine("Please pick desired list size");
Console.WriteLine("Use the respective number (1)Small, (2)Medium, (3)Large, or (4)XLarge");
int size = int.Parse(Console.ReadLine());
Randomgen(List);
}
static void Randomgen(int Size, ArrayList List)
{
StreamWriter SW = new StreamWriter(FileName); ;
switch (Size)
{
case 1:
Random random = new Random();
for (int i = 0; i < 101; i++)
{
List.Add(new Integers(random.Next(1, int.MaxValue)));
}
break;
case 2:
Random randomM = new Random();
for (int i = 0; i < 2001; i++)
{
List.Add(new Integers(randomM.Next(1, int.MaxValue)));
}
break;
case 3:
Random randomL = new Random();
for (int i = 0; i < 20001; i++)
{
List.Add(new Integers(randomL.Next(1, int.MaxValue)));
}
break;
case 4:
Random randomXL = new Random();
for (int i = 0; i < 200001; i++)
{
List.Add(new Integers(randomXL.Next(1, int.MaxValue)));
}
break;
}
}
static void populateListFromFile(string FileName, ArrayList List)
{
StreamReader Input = new StreamReader(FileName);
while (!Input.EndOfStream)
{
List.Add(new Integers(int.Parse(Input.ReadLine())));
}
Console.WriteLine("File has been successfully imported");
}
}
****Im trying to create 1 of 4 different text files based of the user selection using the switch case, with an arraylist of unsorted integers and then I want to Streamread back into an arraylist usign my populate listfrom file method, so that I may proceed to sort them later using 3 other sorting methods. the whole point is to measure the efficiency of the algorithms over different amounts of data or list sizes. the main method is giving me trouble though. Most of my knowledge is self studied so please bare with me.
So what exactly is the problem you are facing, from what I can see, your Randomgen function accepts 2 parameters which is int Size and ArrayList List. However in your call to it in Main you are only providing it with 1 parameters Randomgen(List);, therefore the whole switch case inside of that function won't work.

How to arrange array of objects with same property value?

I have an person model with two property like this:
int id;
String name;
and some object with this data:
person0 = {1,"James"};
person1 = {2,"James"};
person2 = {3,"James"};
person3 = {4,"Barbara"};
person4 = {5,"Barbara"};
person5 = {6,"Ramses"};
and array contain objects:
firstArray = [person0, person1, person2, person3, person4, person5];
Therefore how can have this array:
secondArray = [
[person0, person1, person2],
[person3, person4],
[person5]
]
Thank you.
If language does not matter.
map = new Map();
for (persona of personas) {
name = persona.name;
arrayForName = map.get(name);
if (arrayForName == null) {
arrayForName = [];
map.put(name, arrayForName);
}
arrayForName.put(persona)
}
The idea is to have a map (which is a collection key->value).
The value of the map should in turn be an array.
To add elements efficiently, you iterate only once through the data, and add arrays each time a new key is discovered (i.e. the name).
In Java it would be something like:
Map<String, List<Persona>> map = new HashMap<>();
for (Persona persona : personas) {
String name = persona.getName();
List<Persona> listForName = map.get(name);
if (listForName == null) {
listForName = new ArrayList<Persona>();
map.put(name, listForName);
}
listForName.add(persona)
}
Try this code in Java Android:
ArrayList<ArrayList<Person>> secondArr = new ArrayList<>();
ArrayList<Course> tempArr = new ArrayList<>();
for (int i = 0; i < firstArr.size(); i++) {
if ((i + 1) >= firstArr.size()) {
tempArr.add(firstArr.get(i));
secondArr.add(tempArr);
} else {
if (firstArr.get(i).name .equals( firstArr.get(i + 1).name) ) {
tempArr.add(firstArr.get(i));
} else {
tempArr.add(firstArr.get(i));
secondArr.add(tempArr);
tempArr = new ArrayList<>();
}
}
}
Finally secondArr prepared.
And if list not sorted we can use code like this:
for (int i = 0; i <firstArr.size() ; i++) {
boolean isAdd = false;
for (int j = 0; j < secondArr.size() ; j++) {
if (secondArr.get(j).get(0).getName().equals(firstArr.get(i).getName())){
secondArr.get(j).add(firstArr.get(i));
isAdd =true;
break;
}
}
if (!isAdd){
ArrayList<Person> arrayList = new ArrayList<>();
arrayList.add(firstArr.get(i));
secondArr.add(arrayList);
}
}

Java 8 apply list of functions on list of values

Task: we`ve got list of mappers, which must be applied on list of arguments.
How we can do?:
My not really good variant:
public static final Function<List<IntUnaryOperator>, UnaryOperator<List<Integer>>> multifunctionalMapper =
lst -> {
UnaryOperator<List<Integer>> uOp = new UnaryOperator<List<Integer>>() {
#Override
public List<Integer> apply(List<Integer> integers) {
final int[] curFunct = new int[1];
List<Integer> newLst = integers;
for (int i = 0; i < lst.size(); i++) {
curFunct[0] = i;
newLst = newLst.stream().map(curInt -> lst.get(curFunct[0]).applyAsInt(curInt)).collect(Collectors.toList());
}
return newLst;
}
};
return uOp;
};
list of mappers addOneMuxTwoTransformation:
public static final UnaryOperator<List<Integer>> addOneMuxTwoTransformation =
multifunctionalMapper.apply(Arrays.asList(x -> x+1, x -> x*2));
test:
addOneMuxTwoTransformation.apply(Arrays.asList(1,2,3)).stream().forEach(System.out::println);
will print:
4
6
8
How can be multifunctionalMapper's code reduced?
Is this what are you trying to do ?
List<IntUnaryOperator> ops = Arrays.asList(a -> a++, a -> a*2);
IntUnaryOperator reduce = ops.stream().reduce(a -> a, IntUnaryOperator::andThen);
IntStream.of(1, 2, 3).map(reduce).forEach(System.out::println);
next solution is
public static final Function<List<IntUnaryOperator>, UnaryOperator<List<Integer>>> multifunctionalMapper =lstFunc->
lstVals -> lstVals.stream()
.map(curValue -> lstFunc.stream().reduce(IntUnaryOperator::andThen).orElse(x -> x).applyAsInt(curValue))
.collect(Collectors.toList());

How to add tags to a parsed tree that has no tag?

For example, the parsing tree from Stanford Sentiment Treebank
"(2 (2 (2 near) (2 (2 the) (2 end))) (3 (3 (2 takes) (2 (2 on) (2 (2 a) (2 (2 whole) (2 (2 other) (2 meaning)))))) (2 .)))",
where the number is the sentiment label of each node.
I want to add POS tagging information to each node. Such as:
"(NP (ADJP (IN near)) (DT the) (NN end)) "
I have tried to directly parse the sentence, but the resulted tree is different from that in the Sentiment Treebank (may be because of the parsing version or parameters, I have tried to contact to the author but there is no response).
How can I obtain the tagging information?
I think the code in edu.stanford.nlp.sentiment.BuildBinarizedDataset should be helpful. The main() method steps through how these binary trees can be created in Java code.
Some key lines to look out for in the code:
LexicalizedParser parser = LexicalizedParser.loadModel(parserModel);
TreeBinarizer binarizer = TreeBinarizer.simpleTreeBinarizer(parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
...
Tree tree = parser.apply(tokens);
Tree binarized = binarizer.transformTree(tree);
You can access the node tag information from the Tree object. You should look at the javadoc for edu.stanford.nlp.trees.Tree to see how to access this information.
Also in this answer I have some code that shows accessing a Tree:
How to get NN andNNS from a text?
You want to look at the label() of each tree and subtree to get the tag for a node.
Here is the reference on GitHub to BuildBinarizedDataset.java:
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/sentiment/BuildBinarizedDataset.java
Please let me know if anything is unclear about this and I can provide further assistance!
First, you need to download the Stanford Parser
Set up
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
Parse
Tree tree = parser.apply(tokens);
Access POSTAG
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
Here is the full source code ConstituencyParse.java that run:
Use param:
java ConstituencyParse -tokpath outputtoken.toks -parentpath outputparent.txt -tagpath outputag.txt < input_sentence_in_text_file_one_sent_per_line.txt
(Note: the source code is adapt from treelstm repo, you also need to replace preprocess-sst.py to call ConstituencyParse.java file below)
import edu.stanford.nlp.process.WordTokenFactory;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.util.StringUtils;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
import edu.stanford.nlp.trees.GrammaticalStructure;
import edu.stanford.nlp.trees.GrammaticalStructureFactory;
import edu.stanford.nlp.trees.PennTreebankLanguagePack;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.Trees;
import edu.stanford.nlp.trees.TreebankLanguagePack;
import edu.stanford.nlp.trees.TypedDependency;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.StringReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.HashMap;
import java.util.Properties;
import java.util.Scanner;
public class ConstituencyParse {
private boolean tokenize;
private BufferedWriter tokWriter, parentWriter, tagWriter;
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
private GrammaticalStructureFactory gsf;
private static final String PCFG_PATH = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
public ConstituencyParse(String tokPath, String parentPath, String tagPath, boolean tokenize) throws IOException {
this.tokenize = tokenize;
if (tokPath != null) {
tokWriter = new BufferedWriter(new FileWriter(tokPath));
}
parentWriter = new BufferedWriter(new FileWriter(parentPath));
tagWriter = new BufferedWriter(new FileWriter(tagPath));
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
// set up to produce dependency representations from constituency trees
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
gsf = tlp.grammaticalStructureFactory();
}
public List<HasWord> sentenceToTokens(String line) {
List<HasWord> tokens = new ArrayList<>();
if (tokenize) {
PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordTokenFactory(), "");
for (Word label; tokenizer.hasNext(); ) {
tokens.add(tokenizer.next());
}
} else {
for (String word : line.split(" ")) {
tokens.add(new Word(word));
}
}
return tokens;
}
public Tree parse(List<HasWord> tokens) {
Tree tree = parser.apply(tokens);
return tree;
}
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
public int[] constTreeParents(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
int[] parents = new int[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
parents[curIdx] = 0;
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
parents[curIdx] = parentIdx + 1;
cur = parent;
curIdx = parentIdx;
}
}
return parents;
}
// convert constituency parse to a dependency representation and return the
// parent pointer representation of the tree
public int[] depTreeParents(Tree tree, List<HasWord> tokens) {
GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
Collection<TypedDependency> tdl = gs.typedDependencies();
int len = tokens.size();
int[] parents = new int[len];
for (int i = 0; i < len; i++) {
// if a node has a parent of -1 at the end of parsing, then the node
// has no parent.
parents[i] = -1;
}
for (TypedDependency td : tdl) {
// let root have index 0
int child = td.dep().index();
int parent = td.gov().index();
parents[child - 1] = parent;
}
return parents;
}
public void printTokens(List<HasWord> tokens) throws IOException {
int len = tokens.size();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len - 1; i++) {
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(i).word()));
} else {
sb.append(tokens.get(i).word());
}
sb.append(' ');
}
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(len - 1).word()));
} else {
sb.append(tokens.get(len - 1).word());
}
sb.append('\n');
tokWriter.write(sb.toString());
}
public void printParents(int[] parents) throws IOException {
StringBuilder sb = new StringBuilder();
int size = parents.length;
for (int i = 0; i < size - 1; i++) {
sb.append(parents[i]);
sb.append(' ');
}
sb.append(parents[size - 1]);
sb.append('\n');
parentWriter.write(sb.toString());
}
public void printTags(String[] tags) throws IOException {
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
tagWriter.write(sb.toString().toLowerCase());
}
public void close() throws IOException {
if (tokWriter != null) tokWriter.close();
parentWriter.close();
tagWriter.close();
}
public static void main(String[] args) throws Exception {
String TAGGER_MODEL = "stanford-tagger/models/english-left3words-distsim.tagger";
Properties props = StringUtils.argsToProperties(args);
if (!props.containsKey("parentpath")) {
System.err.println(
"usage: java ConstituencyParse -deps - -tokenize - -tokpath <tokpath> -parentpath <parentpath>");
System.exit(1);
}
// whether to tokenize input sentences
boolean tokenize = false;
if (props.containsKey("tokenize")) {
tokenize = true;
}
// whether to produce dependency trees from the constituency parse
boolean deps = false;
if (props.containsKey("deps")) {
deps = true;
}
String tokPath = props.containsKey("tokpath") ? props.getProperty("tokpath") : null;
String parentPath = props.getProperty("parentpath");
String tagPath = props.getProperty("tagpath");
ConstituencyParse processor = new ConstituencyParse(tokPath, parentPath, tagPath, tokenize);
Scanner stdin = new Scanner(System.in);
int count = 0;
long start = System.currentTimeMillis();
while (stdin.hasNextLine() && count < 2) {
String line = stdin.nextLine();
List<HasWord> tokens = processor.sentenceToTokens(line);
//end tagger
Tree parse = processor.parse(tokens);
// produce parent pointer representation
int[] parents = deps ? processor.depTreeParents(parse, tokens)
: processor.constTreeParents(parse);
String[] tags = processor.constTreePOSTAG(parse);
// print
if (tokPath != null) {
processor.printTokens(tokens);
}
processor.printParents(parents);
processor.printTags(tags);
// print tag
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
count++;
if (count % 100 == 0) {
double elapsed = (System.currentTimeMillis() - start) / 1000.0;
System.err.printf("Parsed %d lines (%.2fs)\n", count, elapsed);
}
}
long totalTimeMillis = System.currentTimeMillis() - start;
System.err.printf("Done: %d lines in %.2fs (%.1fms per line)\n",
count, totalTimeMillis / 100.0, totalTimeMillis / (double) count);
processor.close();
}
}

Resources