Parallel Access to Elements in a Groovy List - data-structures

This is a simple efficiency question around the Groovy language; I have a Customer object that within it has an id and I would like to transfer those IDs into another list which in my view is atomic so can be paralleled.
e.g. linear execution
public List<Long> extractIds(List<Customer> customerList) {
List<Long> customerIds = new ArrayList<Long>();
customerList.each { it -> customerIds.add(it.id) }
}
Question: What is the most efficient way to transfer the IDs in the above example when holding a large volume of customers?

The simplest method would be:
public List<Long> extractIds(List<Customer> customerList) {
customerList.id
}
Or, if you want to do it in a multi-threaded fashion, you can use gpars:
import static groovyx.gpars.GParsPool.withPool
public List<Long> extractIds(List<Customer> customerList) {
withPool {
customerList.collectParallel { it.id }
}
}
But you may find the first brute-force method is quicker for this simple example (rather than spinning up a thread pool, and synchronizing the collection of results from different threads)

Related

SpringBatch write to different entities

I have processed a "wrapperObject" (AimResponse in this case).
Depending on the property "type" I map to Document or SourceSpace object.
Then I need to persist these entities. I found an example similar to this one:
#Override
public void write(List<? extends List<AimResponse>> list)
throws Exception {
List<SourceSpace> sourceSpaces = new ArrayList<>();
List<Document> documents = new ArrayList<>();
for(List<AimResponse> item:list) {
for(AimResponse i:item) {
if(i.getType().indexOf("folder") >= 0) {
SourceSpace sourceSpace = Mapper.aimResponseToSourceSpace(i);
sourceSpace.setStatus(Status.FOUND.name());
sourceSpaces.add(sourceSpace);
} else if(i.getType().indexOf("document") >= 0) {
Document document = Mapper.aimResponseToDocument(i);
document.setStatus(Status.FOUND.name());
documents.add(document);
}
}
}
if(!CollectionUtils.isEmpty(sourceSpaces)) {
sourceSpaceWriter.write(sourceSpaces);
}
if(!CollectionUtils.isEmpty(documents)) {
documentWriter.write(documents);
}
}
In this example I'm not able to instantiate JdbcBatchItemWriter but anyway I think should be better if the processor could split into 2 different lists and call 2 different writers each one with its own type but I guess it's not possible.
Any help is appreciated.
ClassifierCompositeItemWriter is what you are looking for. It allows you to classify items according to a given criteria and call the corresponding writer.
In your case, you can classify items based on their type (i.getType()) and use a writer for each type. You can find an example of how to use that writer here.

Apache Mahout - Read preference value from String

I'm in a situation where I have a dataset that consists of the classical UserID, ItemID and preference values, however they are all strings.
I have managed to read the UserID and ItemID strings by Overriding the readItemIDFromString() and readUserIDFromString() methods in the FileDataModel class (which is a part of the Mahout library) however, there doesnt seem to be any support for the conversion of preference values if I am not mistaken.
If anyone has some input to what an approach to this problem could be I would greatly appreciate it.
To illustrate what I mean, here is an example of my UserID string "Conversion":
#Override
protected long readUserIDFromString(String value) {
if (memIdMigtr == null) {
memIdMigtr = new ItemMemIDMigrator();
}
long retValue = memIdMigtr.toLongID(value);
if (null == memIdMigtr.toStringID(retValue)) {
try {
memIdMigtr.singleInit(value);
} catch (TasteException e) {
e.printStackTrace();
}
}
return retValue;
}
String getUserIDAsString(long userId) {
return memIdMigtr.toStringID(userId);
}
And the implementation of the AbstractIDMigrator:
public class ItemMemIDMigrator extends AbstractIDMigrator {
private FastByIDMap<String> longToString;
public ItemMemIDMigrator() {
this.longToString = new FastByIDMap<String>(10000);
}
public void storeMapping(long longID, String stringID) {
longToString.put(longID, stringID);
}
public void singleInit(String stringID) throws TasteException {
storeMapping(toLongID(stringID), stringID);
}
public String toStringID(long longID) {
return longToString.get(longID);
}
}
Mahout is deprecating the old recommenders based on Hadoop. We have a much more modern offering based on a new algorithm called Correlated Cross-Occurrence (CCO). Its is built using Spark for 10x greater speed and gives real-time query results when combined with a query server.
This method ingests strings for user-id and item-id and produces results with the same ids so you don't need to manage those anymore. You really should have look at the new system, not sure how long the old one will be supported.
Mahout docs here: http://mahout.apache.org/users/algorithms/recommender-overview.html and here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
The entire system described, with SDK, input storage, training of model and real-time queries is part of the Apache PredictionIO project and docs for the PIO and "Universal Recommender" and here: http://predictionio.incubator.apache.org/ and here: http://actionml.com/docs/ur

Tranversing and filtering a Set comparing its objects' getters to an Array using Stream

I've got some working, inelegant code here:
The custom object is:
public class Person {
private int id;
public getId() { return this.id }
}
And I have a Class containing a Set<Person> allPersons containing all available subjects. I want to extract a new Set<Person> based upon one or more ID's of my choosing. I've written something which works using a nested enhanced for loop, but it strikes me as inefficient and will make a lot of unnecessary comparisons. I am getting used to working with Java 8, but can't quite figure out how to compare the Set against an Array. Here is my working, but verbose code:
public class MyProgram {
private Set<Person> allPersons; // contains 100 people with Ids 1-100
public Set<Person> getPersonById(int[] ids) {
Set<Person> personSet = new HashSet<>() //or any type of set
for (int i : ids) {
for (Person p : allPersons) {
if (p.getId() == i) {
personSet.add(p);
}
}
}
return personSet;
}
}
And to get my result, I'd call something along the lines of:
Set<Person> resultSet = getPersonById(int[] intArray = {2, 56, 66});
//resultSet would then contain 3 people with the corresponding ID
My question is how would i convert the getPersonById method to something using which streams allPersons and finds the ID match of any one of the ints in its parameter array? I thought of some filter operation, but since the parameter is an array, I can't get it to take just the one I want only.
The working answer to this is:
return allPersons.stream()
.filter(p -> (Arrays.stream(ids).anyMatch(i -> i == p.getId())) )
.collect(Collectors.toSet());
However, using the bottom half of #Flown's suggestion and if the program was designed to have a Map - it would also work (and work much more efficiently)
As you said, you can introduce a Stream::filter step using a Stream::anyMatch operation.
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
if (ids.length == 0) {
return Collections.emptySet();
}
return allPersons.stream()
.filter(p -> IntStream.of(ids).anyMatch(i -> i == p.getId()))
.collect(Collectors.toSet());
}
If the method is called more often, then it would be a good idea to map each Person to its id having a Map<Integer, Person>. The advantage is, that the lookup is much faster than iterating over the whole set of Person.Then your algorithm may look like this:
private Map<Integer, Person> idMapping;
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
return IntStream.of(ids)
.filter(idMapping::containsKey)
.mapToObj(idMapping::get)
.collect(Collectors.toSet());
}

Using eager loading with specification pattern

I've implemented the specification pattern with Linq as outlined here https://www.packtpub.com/article/nhibernate-3-using-linq-specifications-data-access-layer
I now want to add the ability to eager load and am unsure about the best way to go about it.
The generic repository class in the linked example:
public IEnumerable<T> FindAll(Specification<T> specification)
{
var query = GetQuery(specification);
return Transact(() => query.ToList());
}
public T FindOne(Specification<T> specification)
{
var query = GetQuery(specification);
return Transact(() => query.SingleOrDefault());
}
private IQueryable<T> GetQuery(
Specification<T> specification)
{
return session.Query<T>()
.Where(specification.IsSatisfiedBy());
}
And the specification implementation:
public class MoviesDirectedBy : Specification<Movie>
{
private readonly string _director;
public MoviesDirectedBy(string director)
{
_director = director;
}
public override
Expression<Func<Movie, bool>> IsSatisfiedBy()
{
return m => m.Director == _director;
}
}
This is working well, I now want to add the ability to be able to eager load. I understand NHibernate eager loading can be done by using Fetch on the query.
What I am looking for is whether to encapsulate the eager loading logic within the specification or to pass it into the repository, and also the Linq/expression tree syntax required to achieve this (i.e. an example of how it would be done).
A possible solution would be to extend the Specification class to add:
public virtual IEnumerable<Expression<Func<T, object>>> FetchRelated
{
get
{
return Enumerable.Empty<Expression<Func<T, object>>>();
}
}
And change GetQuery to something like:
return specification.FetchRelated.Aggregate(
session.Query<T>().Where(specification.IsSatisfiedBy()),
(current, related) => current.Fetch(related));
Now all you have to do is override FetchRelated when needed
public override IEnumerable<Expression<Func<Movie, object>>> FetchRelated
{
get
{
return new Expression<Func<Movie, object>>[]
{
m => m.RelatedEntity1,
m => m.RelatedEntity2
};
}
}
An important limitation of this implementation I just wrote is that you can only fetch entities that are directly related to the root entity.
An improvement would be to support arbitrary levels (using ThenFetch), which would require some changes in the way we work with generics (I used object to allow combining different entity types easily)
You wouldn't want to put the Fetch() call into the specification, because it's not needed. Specification is just for limiting the data that can then be shared across many different parts of your code, but those other parts could have drastically different needs in what data they want to present to the user, which is why at those points you would add your Fetch statements.

IList with an implicit sort order

I'd like to create an IList<Child> that maintains its Child objects in a default/implicit sort order at all times (i.e. regardless of additions/removals to the underlying list).
What I'm specifically trying to avoid is the need for all consumers of said IList<Child> to explicitly invoke IEnumerable<T>.OrderBy() every time they want to enumerate it. Apart from violating DRY, such an approach would also break encapsulation as consumers would have to know that my list is even sorted, which is really none of their business :)
The solution that seemed most logical/efficient was to expose IList<Child> as IEnumerable<Child> (to prevent List mutations) and add explicit Add/Remove methods to the containing Parent. This way, I can intercept changes to the List that necessitate a re-sort, and apply one via Linq:
public class Child {
public string StringProperty;
public int IntProperty;
}
public class Parent{
private IList<Child> _children = new List<Child>();
public IEnumerable<Child> Children{
get
{
return _children;
}
}
private void ReSortChildren(){
_children = new List<Child>(child.OrderBy(c=>c.StringProperty));
}
public void AddChild(Child c){
_children.Add();
ReSortChildren()
}
public void RemoveChild(Child c){
_children.Remove(c);
ReSortChildren()
}
}
Still, this approach doesn't intercept changes made to the underlying Child.StringProperty (which in this case is the property driving the sort). There must be a more elegant solution to such a basic problem, but I haven't been able to find one.
EDIT:
I wasn't clear in that I would preferable a LINQ compatible solution. I'd rather not resort to using .NET 2.0 constructs (i.e. SortedList)
What about using a SortedList<>?
One way you could go about it is to have Child publish an event OnStringPropertyChanged which passes along the previous value of StringProperty. Then create a derivation of SortedList that overrides the Add method to hookup a handler to that event. Whenever the event fires, remove the item from the list and re-add it with the new value of StringProperty. If you can't change Child, then I would make a proxy class that either derives from or wraps Child to implement the event.
If you don't want to do that, I would still use a SortedList, but internally manage the above sorting logic anytime the StringProperty needs to be changed. To be DRY, it's preferable to route all updates to StringProperty through a common method that correctly manages the sorting, rather than accessing the list directly from various places within the class and duplicating the sort management logic.
I would also caution against allowing the controller to pass in a reference to Child, which allows him to manipulate StringProperty after it's added to the list.
public class Parent{
private SortedList<string, Child> _children = new SortedList<string, Child>();
public ReadOnlyCollection<Child> Children{
get { return new ReadOnlyCollection<Child>(_children.Values); }
}
public void AddChild(string stringProperty, int data, Salamandar sal){
_children.Add(stringProperty, new Child(stringProperty, data, sal));
}
public void RemoveChild(string stringProperty){
_children.Remove(stringProperty);
}
private void UpdateChildStringProperty(Child c, string newStringProperty) {
if (c == null) throw new ArgumentNullException("c");
RemoveChild(c);
c.StringProperty = newStringProperty;
AddChild(c);
}
public void CheckSalamandar(string s) {
if (_children.ContainsKey(s))
var c = _children[s];
if (c.Salamandar.IsActive) {
// update StringProperty through our method
UpdateChildStringProperty(c, c.StringProperty.Reverse());
// update other properties directly
c.Number++;
}
}
}
I think that if you derive from KeyedCollection, you'll get what you need. That is only based on reading the documentation, though.
EDIT:
If this works, it won't be easy, unfortunately. Neither the underlying lookup dictionary nor the underlying List in this guy is sorted, nor are they exposed enough such that you'd be able to replace them. It might, however, provide a pattern for you to follow in your own implementation.

Resources