Groovy: Optimize code to find duplicate elements - performance

I have invoiceList as below which is a List<Map<String:String>> and am trying to findout if all the invoices have same SENDER_COUNTRY and CLIENT_COUNTRY or not, if not it will add message to JSON array
[
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:USA, CLIENT_COUNTRY:USA]
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:CAD, CLIENT_COUNTRY:MEX]
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:CAD, CLIENT_COUNTRY:MEX]
]
JSONArray jsonArray = new JSONArray();
def senderCountry = invoiceList[0]['SENDER_COUNTRY']
def clientCountry = invoiceList[0]['CLIENT_COUNTRY']
invoiceList.each{ it ->
if(it['SENDER_COUNTRY'] != senderCountry)
jsonArray.add((new JSONObject()).put("SENDER_COUNTRY","Multiple sender Countries Associated"));
if(it['CLIENT_COUNTRY'] != clientCountry)
jsonArray.add((new JSONObject()).put("CLIENT_COUNTRY","Multiple Client Countries Associated"));
}
I feel this code can be refactored/optimized to a better version in Groovy, can someone please help me with it?

What about this (note that my answer does not improve on performance):
def invoiceList = [
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:'USA', CLIENT_COUNTRY:'USA'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX']
]
def jsonarray = []
if (invoiceList.countBy{ it.CLIENT_COUNTRY }.size > 1)
jsonarray << [CLIENT_COUNTRY: "Multiple client countries associated"]
if (invoiceList.countBy{ it.SENDER_COUNTRY }.size > 1)
jsonarray << [SENDER_COUNTRY: "Multiple sender countries associated"]
groovy.json.JsonOutput.toJson(jsonarray)
// Result: [{"CLIENT_COUNTRY":"Multiple client countries associated"},{"SENDER_COUNTRY":"Multiple sender countries associated"}]

Here is another version to achieve the same.
def invoiceList = [
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:'USA', CLIENT_COUNTRY:'USA'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX']
]
def getFilteredList = { map ->
map.collect{ k,v -> invoiceList.countBy{ it."$k" }.findAll{it.value > 1}.collectEntries{[it.key,v] } }
}
//You may change the description in the values of below map
def findEntries = [CLIENT_COUNTRY: 'Multiple Client Countries found', SENDER_COUNTRY: 'Multiple Sender Countries found']
println groovy.json.JsonOutput.toJson(getFilteredList(findEntries))
Output:
[{"MEX":"Multiple Client Countries found"},{"CAD":"Multiple Sender Countries found"}]
You can quickly try online Demo
EDIT: OP requested for additional information saying it should also return empty if all client country or sender country are same.
Use below script:
def invoiceList = [
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'USA'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX']
]
def getFilteredList = { map->
map.collect{ k,v -> invoiceList.countBy{ it."$k" }.findAll{it.value > 1 && (it.value != invoiceList.size())}.collectEntries{ [it.key,v] } }.findAll{it.size()>0}
}
//You may change the descript in the values of below map
def findEntries = [CLIENT_COUNTRY: 'Multiple Client Countried found', SENDER_COUNTRY: 'Multiple Sender Countries found']
println groovy.json.JsonOutput.toJson(getFilteredList(findEntries))​
Quickly try online Demo
EDIT2: OP further request modification to change the output as
[ {"message", "Multiple clients, Multiple sender"}]
def invoiceList = [
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:'CAD1', CLIENT_COUNTRY:'USA'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX'],
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:'CAD', CLIENT_COUNTRY:'MEX']
]
def getFilteredList = { map->
def result = map.collect{ k,v -> invoiceList.countBy{ it."$k" }.findAll{it.value > 1 && (it.value != invoiceList.size())}.collect{ v } }.findAll{it.size()>0}
result ? [[message : result.flatten().join(',') ]] : []
}
//You may change the descript in the values of below map
def findEntries = [CLIENT_COUNTRY: 'Multiple Client Countried found', SENDER_COUNTRY: 'Multiple Sender Countries found']
println groovy.json.JsonOutput.toJson(getFilteredList(findEntries))​

If you can add a new library to your project, you could use GPars:
#Grab(group='org.codehaus.gpars', module='gpars', version='1.0.0')
import static groovyx.gpars.GParsPool.withPool
def invoiceList = [
[INVOICE_DATE:20150617, INVOICE_NUMBER:617151,SENDER_COUNTRY:USA, CLIENT_COUNTRY:USA]
[INVOICE_DATE:20150617, INVOICE_NUMBER:617152,SENDER_COUNTRY:CAD, CLIENT_COUNTRY:MEX]
[INVOICE_DATE:20150617, INVOICE_NUMBER:617153,SENDER_COUNTRY:CAD, CLIENT_COUNTRY:MEX]
]
def jsonArray = []
def senderCountry = invoiceList[0]['SENDER_COUNTRY']
def clientCountry = invoiceList[0]['CLIENT_COUNTRY']
withPool( 4 ) {
invoiceList.eachParallel{
if(it['SENDER_COUNTRY'] != senderCountry)
jsonArray.add((new JSONObject()).put("SENDER_COUNTRY","Multiple sender Countries Associated"));
if(it['CLIENT_COUNTRY'] != clientCountry)
jsonArray.add((new JSONObject()).put("CLIENT_COUNTRY","Multiple Client Countries Associated"))
}
}
​
This will create a thread pool with 4 workers and they will scan the invoiceList in parallel.

Related

How to convert dozens of fields to Nifi Attributes?

I have json with dozens of fields, how can I easily convert it to nifi attribute?
I used EvaluateJsonPath but it is necessary to enter the values ​​one by one.
I will use these attributes in Phoenix DB, When I use ConvertJsontoSQL it doesn't work...
Can you help with this issue?
JoltTransformJSON Sample Content as follows ;
{
"AAAA": "AAAA",
"BBBB": "BBBB",
"CCCC": "CCCC",
"DDDD": "DDDD",
"EEEE": "EEEE",
"FFFF": "FFFF",
"GGGG": "GGGG",
"HHHH": "HHHH",
...
...
...
}
I want to define json fields to nifi Attributes. I don't want to enter one by one with EvaluateJsonPath.
Edit : I found the this Script for ExecuteGroovyScript and handle it.
import org.apache.commons.io.IOUtils
import java.nio.charset.*
def flowFile = session.get();
if (flowFile == null) {
return;
}
def slurper = new groovy.json.JsonSlurper()
def attrs = [:] as Map<String,String>
session.read(flowFile,
{ inputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
def obj = slurper.parseText(text)
obj.each {k,v ->
attrs[k] = v.toString()
}
} as InputStreamCallback)
flowFile = session.putAllAttributes(flowFile, attrs)
session.transfer(flowFile, REL_SUCCESS)

DRF: problem with writing nested serializer with unique field

I need to add some tags to a post when creating it. They have a many to many relationship, and tag has a unique name field. But I get an already exists error.
Here is my setup:
class Tag(models.Model):
name = models.SlugField(max_length=100, unique=True)
class Post(models.Model):
(...)
tags = models.ManyToManyField(Tag, related_name='posts')
class PostSerializer(serializers.HyperlinkedModelSerializer):
tags = TagSerializer(many=True)
def create(self, validated_data):
tags_data = validated_data.pop('tags')
post = Post.objects.create(**validated_data)
for tag_data in tags_data:
try:
tag = Tag.objects.get(name=tag_data['name'])
except Tag.DoesNotExist:
tag = Tag.objects.create(**tag_data)
post.tags.add(tag)
return post
class Meta:
model = Post
(...)
Now when I post the following data to create a Post:
{
(...),
"tags": [{"name": "someExistentTag"}, {"name": "someTag"}]
}
serializer.is_valid is called prior to create and I get the following response:
{
"tags": [
{
"name": [
"tag with this name already exists."
]
},
{}
]
}
What is your solution?
Here is the first thing I got working; get tags away from post and validate them manually (which I'm not sure is done right). Yet I'd like to see a better solution.
class PostSerializer(HyperlinkedModelSerializer):
tags = TagSerializer(many=True)
def __init__(self, instance=None, data=empty, **kwargs):
super().__init__(instance, data, **kwargs)
if hasattr(self, 'initial_data'):
self.tags = self.initial_data.get('tags', [])
if 'tags' in self.initial_data:
self.initial_data['tags'] = []
def create(self, validated_data):
tags_data = self.tags
existing_tags = []
new_tags_data = []
for tag_data in tags_data:
try:
tag = Tag.objects.get(name=tag_data['name'])
except KeyError:
raise ValidationError("Field 'name' for tag is required.")
except Tag.DoesNotExist:
new_tags_data.append(tag_data)
else:
existing_tags.append(tag)
new_tags_serializer = TagSerializer(data=new_tags_data, many=True)
new_tags_serializer.is_valid(raise_exception=True)
validated_data.pop('tags')
post = Post.objects.create(**validated_data)
for tag_data in new_tags_data:
tag = Tag.objects.create(**tag_data)
post.tags.add(tag)
for tag in existing_tags:
post.tags.add(tag)
return post

Nested serializer doesn't pick up correct ID

There are two models, they are defined this way:
class ShoppingList(models.Model):
id = models.CharField(max_length=40, primary_key=True)
name = models.CharField(max_length=40)
session_id = models.CharField(max_length=40)
config_file = models.FileField(upload_to=upload_config_file)
def __str__(self):
return self.id
class FetchedData(models.Model):
model_id = models.CharField(max_length=40)
config_id = models.ForeignKey(BillOfMaterial, on_delete=models.CASCADE, default=0)
config_name = models.CharField(max_length=40)
def __str__(self):
return self.model_id
And serialized like this:
class FetchedDataSerializer(serializers.ModelSerializer):
file_fields = serializers.SerializerMethodField()
class Meta:
model = FetchedData
fields = ('model_id', 'config_id', 'config_name', 'file_fields')
def get_file_fields(self, obj):
print(obj)
# queryset = ShoppingList.objects.filter(config_file = obj) ## (1)
queryset = BillOfMaterial.objects.all() ## (2)
return [ShoppingListSerializer(cf).data for cf in queryset]
I was advised* to implement the solution marked as (1) in the serializer above, but when it's on, I get responses with an empty array, for example:
[
{
"model_id": "6553",
"config_id": "2322",
"config_name": "Config No. 1",
"file_fields": []
}
]
Meanwhile, with option (2) turned on and option (1) commented out, I get all the instances displayed:
[
{
"model_id": "6553",
"config_id": "2322",
"config_name": "Config No. 1",
"file_fields": [
{
"id": "2322",
"name": "First Example",
"session_id": "9883",
"config_file": "/uploads/2322/eq-example_7DQDsJ4.json"
},
{
"id": "4544",
"name": "Another Example",
"session_id": "4376",
"config_file": "/uploads/4544/d-jay12.json"
}
]
}
]
The print(obj) method always gives a model_id value. And it should output file_fields.id, I guess.
How should I re-build this piece of code to be able to display only the file_field with id matching config_id of the parent?
*This is a follow-up of an issue described here: TypeError: 'FieldFile' object is not callable
In FetchedData model I added this method:
def config_link(self):
return self.config_id.config_file
(it binds config_file from ShoppingList model).
FetchedDataSerializer should then look like this:
class FetchedDataSerializer(serializers.ModelSerializer):
file_link = serializers.SerializerMethodField()
class Meta:
model = FetchedData
fields = ('model_id', 'config_id', 'config_name', 'file_link')
def get_file_link(self, obj):
return obj.config_link()

Identify new records in a merged collection of objects in custom validation

I am using below function to merge 2 collections, one from DB and the other from User Inputs. I want to verify during validation which records are from DB and which are local.
def tags=(arg)
if arg.is_a?(Array)
#ids = arg.map{|x| x[:id]}
tags << Tags.where(id: #ids )
else
super
end
end
I tried the following inside a custom validation function but it is returning false for all records even records entered by user:
def tags_validator
tags.each do |item|
errors.add(:new_record , item.new_record?)
end
end
my JSON is as follow (I an just passing IDs for tags):
{
"data": {
"name": "Johnny English",
"tags" :
[ {"id" :3 } , {"id" :4 } ]
}
}

How to take that data from api and write a function

Okay, it's been many days. I have been going back and forth with this api, and would like to get the following results.
Here is the problem.
uri = URI('https://api.wmata.com/StationPrediction.svc/json/GetPrediction/All')
uri.query = URI.encode_www_form({'api_key' => 'ihaveit',})
request = Net::HTTP::Get.new(uri.request_uri)
#response = Net::HTTP.start(uri.host, uri.port, :use_ssl => uri.scheme == 'https') do |http|
http.request(request)
#data = JSON.parse #response.body
Now i've the #data parsed as JSON. I am trying to write a class Trains and use the following data.
{
"Car": "6",
"Destination": "SilvrSpg",
"DestinationCode": "B08",
"DestinationName": "Silver Spring",
"Group": "1",
"Line": "RD",
"LocationCode": "A01",
"LocationName": "Metro Center",
"Min": "3"
},
Here is the code for class Trains
class Trains
def initialize(car, destination, destinationcode, destinationname, group, line, locationcode, locationname)
#car = car
#destination = destination
#destinationcode = destinationcode
#destinationname = destinationname
#group = group
#line = locationcode
#locationname = locationname
end
And now im stuck about the next step. I am totally new to api. I can write the following for a static class.
def to_s
puts(#car + #destination + #destinationcode + #destinationname + #group + #line + #locationname)
end
= Trains.new
puts Train
end
I've got this so far.
class TrainLoader < Struct.new(:car, :destination, :destinationcode, :destinationname, :group, :line, :locationcode, :locationname)
class Trains
end
t = Trains.new(#data["Car"],#data["DestinationCode"], #data[DestinationName],#data[Group],#data[Line], #data[LocationCode], #data[LocationName], #data[Min])
You already have a class, just write a to_s funtcion:
class Trains
def initialize(car, destination, destinationcode, destinationname, group, line, locationcode, locationname)
#car = car
#destination = destination
#destinationcode = destinationcode
#destinationname = destinationname
#group = group
#line = locationcode
#locationname = locationname
end
# inspect attributes with own `to_s` method
def to_s
"#{#car} #{#destination} #{#destinationcode} #{#destinationname}"
end
end
Now create an instance of your class:
>> train = Trains.new(#data["Car"], #data["Destination"], #data["DestinationCode"])..
>> train.to_s
But i can suggest you a more elegant solution, with struct:
>> data = {
?> "Car": "6",
?> "Destination": "SilvrSpg",
?> "DestinationCode": "B08",
?> "DestinationName": "Silver Spring",
?> "Group": "1",
?> "Line": "RD",
?> "LocationCode": "A01",
?> "LocationName": "Metro Center",
?> "Min": "3"
>> }
>> class Trains < Struct.new(*data.keys)
#> # create a struct with attributes like keys in data hash `"Cat"` or `"Min"`
>> end
>> t = Trains.new(*data.values)
# create instance of Trains class with values from a data hash `"6"`
=> #<struct Trains Car="6", Destination="SilvrSpg", DestinationCode="B08", DestinationName="Silver Spring", Group="1", Line="RD", LocationCode="A01", LocationName="Metro Center", Min="3">
>> t.Car
=> "6"
>>

Resources