How to convert dozens of fields to Nifi Attributes? - apache-nifi

I have json with dozens of fields, how can I easily convert it to nifi attribute?
I used EvaluateJsonPath but it is necessary to enter the values ​​one by one.
I will use these attributes in Phoenix DB, When I use ConvertJsontoSQL it doesn't work...
Can you help with this issue?
JoltTransformJSON Sample Content as follows ;
{
"AAAA": "AAAA",
"BBBB": "BBBB",
"CCCC": "CCCC",
"DDDD": "DDDD",
"EEEE": "EEEE",
"FFFF": "FFFF",
"GGGG": "GGGG",
"HHHH": "HHHH",
...
...
...
}
I want to define json fields to nifi Attributes. I don't want to enter one by one with EvaluateJsonPath.

Edit : I found the this Script for ExecuteGroovyScript and handle it.
import org.apache.commons.io.IOUtils
import java.nio.charset.*
def flowFile = session.get();
if (flowFile == null) {
return;
}
def slurper = new groovy.json.JsonSlurper()
def attrs = [:] as Map<String,String>
session.read(flowFile,
{ inputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
def obj = slurper.parseText(text)
obj.each {k,v ->
attrs[k] = v.toString()
}
} as InputStreamCallback)
flowFile = session.putAllAttributes(flowFile, attrs)
session.transfer(flowFile, REL_SUCCESS)

Related

DRF: problem with writing nested serializer with unique field

I need to add some tags to a post when creating it. They have a many to many relationship, and tag has a unique name field. But I get an already exists error.
Here is my setup:
class Tag(models.Model):
name = models.SlugField(max_length=100, unique=True)
class Post(models.Model):
(...)
tags = models.ManyToManyField(Tag, related_name='posts')
class PostSerializer(serializers.HyperlinkedModelSerializer):
tags = TagSerializer(many=True)
def create(self, validated_data):
tags_data = validated_data.pop('tags')
post = Post.objects.create(**validated_data)
for tag_data in tags_data:
try:
tag = Tag.objects.get(name=tag_data['name'])
except Tag.DoesNotExist:
tag = Tag.objects.create(**tag_data)
post.tags.add(tag)
return post
class Meta:
model = Post
(...)
Now when I post the following data to create a Post:
{
(...),
"tags": [{"name": "someExistentTag"}, {"name": "someTag"}]
}
serializer.is_valid is called prior to create and I get the following response:
{
"tags": [
{
"name": [
"tag with this name already exists."
]
},
{}
]
}
What is your solution?
Here is the first thing I got working; get tags away from post and validate them manually (which I'm not sure is done right). Yet I'd like to see a better solution.
class PostSerializer(HyperlinkedModelSerializer):
tags = TagSerializer(many=True)
def __init__(self, instance=None, data=empty, **kwargs):
super().__init__(instance, data, **kwargs)
if hasattr(self, 'initial_data'):
self.tags = self.initial_data.get('tags', [])
if 'tags' in self.initial_data:
self.initial_data['tags'] = []
def create(self, validated_data):
tags_data = self.tags
existing_tags = []
new_tags_data = []
for tag_data in tags_data:
try:
tag = Tag.objects.get(name=tag_data['name'])
except KeyError:
raise ValidationError("Field 'name' for tag is required.")
except Tag.DoesNotExist:
new_tags_data.append(tag_data)
else:
existing_tags.append(tag)
new_tags_serializer = TagSerializer(data=new_tags_data, many=True)
new_tags_serializer.is_valid(raise_exception=True)
validated_data.pop('tags')
post = Post.objects.create(**validated_data)
for tag_data in new_tags_data:
tag = Tag.objects.create(**tag_data)
post.tags.add(tag)
for tag in existing_tags:
post.tags.add(tag)
return post

How to convert UNIX time to timestamp with NIFI?

I Have an HTTP response as a list of JSON records [{},{},{}] within those records we have some dates in UNIX format "1651030211980" but we need them in timestamp format.
[ {
"username_user":"json",
"surname":"file",
"creationDate":"1651030211980",
"modificationDate":"1651030211980"
},
{},
{}
]
The result needs to look like that:
[ {
"username":"json",
"surname":"file",
"creationDate":"YYYY-MM-DD HH:MM:SS",
"modificationDate":"YYYY-MM-DD HH:MM:SS"
},
{},
{}
]
You could use ConvertRecord (JsonTreeReader, JsonRecordSetWriter) where you could redefine fields
here is not bad example with json-to-csv but idea is the same: https://rihab-feki.medium.com/converting-json-to-csv-with-apache-nifi-a9899ca3f24b
as alternative you could use ExecuteGroovyScript
import groovy.json.*
def ff = session.get()
if(!ff)return
def formatDate={ s-> new Date(s as Long).format('yyyy-MM-dd HH:mm:ss') }
ff.write{rawIn,rawOut->
def data = new JsonSlurper().parse(rawIn)
data.each{obj->
if(obj.creationDate) obj.creationDate = formatDate(obj.creationDate)
if(obj.modificationDate) obj.modificationDate = formatDate(obj.modificationDate)
}
rawOut.withWriter{ w-> w << new JsonBuilder(data).toPrettyString() }
}
REL_SUCCESS << ff

Return any data from a query using GraphQL, Graphene and Python

I am receiving the following error:
{
"errors": [
{
"message": "Unknown argument \"project_id\" on field" +
\"get_project_detail_summary\" of type \"Query\".",
"locations": [
{
"line": 2,
"column": 30
}
]
}
]
}
With the following query:
query GetProjectDetailSummary($project_id: Int) {
get_project_detail_summary(project_id: $project_id) {
comments {
... on ManagerCommentNode {
id
text
created
}
... on VendorCommentNode {
id
text
created
}
... on TenantCommentNode {
id
text
created
}
}
}
}
With the following backend code, How can I get to the breakpoint?, or how do I send back custom data given a number?:
class CommentsUnion(graphene.types.union.Union):
class Meta:
name = 'CommentsUnion'
types = (ManagerCommentNode, VendorCommentNode, TenantCommentNode, )
class ProjectSummaryInput(graphene.InputObjectType):
project_id = graphene.Int()
class ProjectSummaryNode(graphene.ObjectType):
Input = ProjectSummaryInput
project_id = graphene.Int()
comments = graphene.List(CommentsUnion)
#classmethod
def resolve_comments(self, *args, **kwargs):
import pdb;pdb.set_trace()
return ProjectSummary.select_related('comments').objects.filter(comments__created__lt=dt)
class Query(graphene.ObjectType):
get_project_detail_summary = Field(ProjectSummaryNode)
In regards to the error.
Be sure to add a kwarg ( e.g. project_id in this example, it is the reason for the "unknown argument on field" error ) to the graphene.Field for get_project_detail_summary.
Like so:
class Query(graphene.ObjectType):
get_project_detail_summary = Field(ProjectSummaryNode, # see below for example
project_id=graphene.Int() # kwarg here
)
def resolve_get_project_detail_summary(self, info, **kwargs):
return ProjectSummary.objects.get(id=kwargs.get('project_id'))
In regards to returning any data.
This is a way ( returning a graphene.String ), but it untypes the response by putting everything inside a String:
from django.core.serializers.json import DjangoJSONEncoder
class ProjectSummaryRecentUpdatesNode(graphene.ObjectType):
Input = ProjectSummaryInput
recent_updates = graphene.String()
def resolve_recent_updates(self, resolve, **kwargs):
instance = Project.objects.get(id=resolve.variable_values.get("project_id"))
things = instance.things.all()
# these are all from different models, and the list is a bit longer than this.
querysets = (
("scheduled", get_scheduled(things, resolve, **kwargs)),
("completed", get_completed(things, resolve, **kwargs)),
("invoices", get_invoices(things, resolve, **kwargs)),
("expenditures", get_expenditures(things, resolve, **kwargs)),
("comments", get_comments(things, resolve, **kwargs)),
("files", get_files(things, resolve, **kwargs)),
)
recent_updates = []
for update_type, qs in querysets:
for item in qs:
item.update(
{
"recent_update_type": update_type
}
)
recent_updates.append(item)
return json.dumps(recent_updates, cls=DjangoJSONEncoder)
And on the frontend, with an import { Query } from "react-apollo"; element we can JSON.parse the field ... which has "any" (json serialized) data inside of it:
<Query
query={GET_PROJECT_DETAIL_SUMMARY_RECENT_UPDATES}
fetchPolicy="network-only"
variables={{ project_id: this.props.projectId }}
>
{({ loading, error, data }: QueryResult) => {
if (
data &&
data.get_project_detail_summary_recent_updates &&
data.get_project_detail_summary_recent_updates.recent_updates
) {
console.log(JSON.parse(data.get_project_detail_summary_recent_updates.recent_updates))
}
}}
</Query>
Side note:
If there isn't a large list of data types create a Union like object which has all of the fields needed from different models, or an actual Union, which seems like the correct way, as then types are not lost:
class ProjectSummaryNode(graphene.ObjectType):
# from FileNode graphene.ObjectType class
file__id = graphene.Int()
file__brief_description = graphene.String()
file_count = graphene.Int()
last_file_created = graphene.String()
last_file = graphene.String()
# from UploaderNode graphene.ObjectType class
uploader__first_name = graphene.String()
uploader__last_name = graphene.String()
# from CommentNode graphene.ObjectType class
comment = graphene.String()
class Meta:
name = "ProjectSummaryNode"
# example of Union, the above fields would be commented out, as they are part of
# the different graphene.ObjectType classes, and this line below would be uncommented
# types = ( FileNode, UploaderNode, CommentNode, )
Still open to suggestions on a better way to do this.

Logstash escape JSON Keys

I have multiple systems that send data as JSON Request Body. This is my simple config file.
input {
http {
port => 5001
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
In most cases this works just fine. I can look at the json data with kibana.
In some cases the JSON will not be processed. It hase something to do with the JSON escaping. For example: If a key contains a '.', the JSON will not be processed.
I can not control the JSON. Is there a way to escape these characters in a JSON key?
Update: As mentioned in the comments I'll give an example of a JSON String (Content is altered. But I,ve tested the JSON String. It has the same behavior as the original.):
{
"http://example.com": {
"a": "",
"b": ""
}
}
My research brings me back to my post, finally.
Before Elasticsearch 2.0 dots in the key were allowed. Since version 2.0 this is not the case anymore.
One user in the logstash forum developed a ruby script that takes care of the dots in json keys:
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
All credits go to #hanzmeier1234 (Field name cannot contain β€˜.’)

How to add new key/value pair to existing JSON object in Ruby

How could I append a new key/value pair to an existing JSON object in Ruby?
My output is:
{
"2d967df3-ee07-4e40-8f65-7bbff59bbb7e": {
"name": "Book1",
"author": "Author1"
}
}
I want to achieve something like this when I add a new key/value pair:
{
"2d967df3-ee07-4e40-8f65-7bbff59bbb7e": {
"name": "Book1",
"author": "Author1"
},
"c55a3632-9bed-4a41-ae40-c1abfe0f332a": {
"name": "Book2",
"author": "Author2"
}
}
This is my method to write to a JSON file:
def create_book(name, author)
tempHash = {
SecureRandom.uuid => {
"name" => name,
"author" => author
}
}
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate(tempHash))
end
end
To clarify, I need to add a second entry to the original file. I tried using append (<<), and that's where my code fails:
file = File.read("./books/book.json")
data_hash = JSON.parse(file)
newJson = data_hash << tempHash
How could I append a new key/value pair to existing JSON object in Ruby?
If you want to add it to an existing file then you should read the JSON first, extract data from it, then add a new hash to an array.
Maybe something like this will solve your problem:
def create_book(name, author)
tempHash = {
SecureRandom.uuid => {
"name" => name,
"author" => author
}
}
data_from_json = JSON[File.read("./books/book.json")]
data_from_json = [data_from_json] if data_from_json.class != Array
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate(data_from_json << tempHash))
end
end
There are also some other ways like manipulating the JSON as a common string but for safety you should extract the data and then create a new JSON file.
If you need the new key/value pair to be in the same JSON element as the previous data, instead of shoveling (<<) the hashes together, merge them.
Additionally this can allow you to put the new key/value pair in the start of the element or in the end, by flipping which hash you merge first.
So, take Maxim's solution from Apr 14 '15, but modify to merge the two hashes together.
data_from_json = JSON[http://File.read("./books/book.json")]
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate([data_from_json.merge(tempHash)])
end

Resources