How do i setup POSTMAN with nested formdata so that i can execute a collection file? - postman-collection-runner

The examples given by postman team https://www.getpostman.com/docs/v6/postman/collection_runs/working_with_data_files
However, in my situation, i don't have a simple flat file as their example provides. I have a nested data structure. I wanted to know how I can prepare the form data in a nested manner so that the data file is correctly searched and replaced.
Ex:
Name:{{Name}}
Age:{{age}}
addressId:{{addressId}}
addressName:{{addressName}}
addressLine1:{{addressLine1}}
---- following is what I'm unsure about on how to indicate the remaining form data as it is an array of addresses. It will not vary in the number of addresses, but is a child of the above.
my tought was
Addresses:
[
addressId:{{addressId}}
addressName:{{addressName}}
addressLine1:{{addressLine1}}
]
This form data will be used to load multiple individuals with their addresses reading of a data file with 300 records. Sample datafile structure below:
[{
id:1
Name:user1,
Age:34,
Addresses:
[
{addressId:1001
addressName:home,
addressLine1:123 XYZ St}
] },
id:2
Name:user2,
Age:35,
Addresses:
[
{addressId:1002
addressName:home,
addressLine1:124 XYZ St}
]
}, {
id:3
Name:user3,
Age:34,
Addresses:
[
{addressId:1003
addressName:home,
addressLine1:125 XYZ St}
]
}]

All you really need to do is construct the request body in the format you require using the placeholder {{...}} syntax.
In the data file use those same names in either a CSV or a JSON file and assign the values that you require. When the collection is run, it will replace the variables in the request.
[
{
"addressOne": "blah road",
"addressTwo": "blah avenue",
"addressThree": "blah street"
}
]
If those keys were the placeholders in your request body, then those values will be used as the data set when the collection is run.

Related

Efficient data-structure to searching data only in documents a user can access

Problem description:
The goal is to efficiently query strings from a set of JSON documents while respecting document-level security, such that a user is only able to retrieve data from documents they have access to.
Suppose we have the following documents:
Document document_1, which has no restrictions:
{
"id": "document_1",
"set_of_strings_1": [
"the",
"quick",
"brown"
],
"set_of_strings_2": [
"fox",
"jumps",
"over",
],
"isPublic": true
}
Document document_2, which can only be accessed by 3 users:
{
"id": "document_2",
"set_of_strings_1": [
"the"
"lazy"
],
"set_of_strings_2": [
"dog",
],
"isPublic": false,
"allowed_users": [
"Alice",
"Bob",
"Charlie"
]
}
Now suppose user Bob (has access to both documents) makes the following query:
getStrings(
user_id: "Bob",
set_of_strings_id: "set_of_strings_1"
)
The correct response should be the union of set_of_strings_1 from both documents:
["the", "quick", "brown", "lazy"]
Now suppose user Dave (has access to document_1 only) makes the following query:
getStrings(
user_id: "Dave",
set_of_strings_id: "set_of_strings_1"
)
The correct response should be set_of_strings_1 from document_1:
["the", "quick", "brown"]
A further optimization is to handle prefix tokens. E.g. for the query
getStrings(
user_id: "Bob",
set_of_strings_id: "set_of_strings_1",
token: "t"
)
The correct response should be:
["the"]
Note: empty token should match all strings.
However, I am happy to perform a simple in-memory prefix-match after the strings have been retrieved. The bottleneck here is expected to be the number of documents, not the number of strings.
What I have tried:
Approach 1: Naive approach
The naive solution here would be to:
put all the documents in a SQL database
perform a full-table scan to get all the documents (we can have millions of documents)
iterate through all the documents to figure out user permissions
filtering out the set of documents the user can access
iterating through the filtered list to get all the strings
This is too slow.
Approach 2: Inverted indices
Another approach considered is to create an inverted index from users to documents, e.g.
users
documents_they_can_see
user_1
document_1, document_2, document_3
user_2
document_1
user_3
document_1, document_4
This will efficiently give us the document ids, which we can use against some other index to construct the string set.
If this next step is done naively, it still involves a linear scan through all the documents the user is able to access. To avoid this, we can create another inverted index mapping document_id#set_of_strings_id to the corresponding set of strings then we just take the union of all the sets to get the result and then we can run prefix match after. However, this involves doing the union of a large number of sets.
Approach 3: Caching
Use redis with the following data model:
key
value
user_id#set_of_strings_id
[String]
Then we perform prefix match in-memory on the set of strings we get from the cache.
We want this data to be fairly up-to-date so the source-of-truth datastore still needs to be performant.
I don't want to reinvent the wheel. Is there a data structure or some off-the-shelf system that does what I am trying to do?

How to get multiple entries of similar pattern from the same input line using grok pattern?

I'm aware of grok pattern and able to parse and store the data into elastic search using logstash configuration file with filter and grok patterns.
For example:
If the data input line is:
Start-field1|field2|field3
then
field1, field2, field3 are being parsed and getting stored into elastic search successfully without any problem.
But now I have a input line like below:
Start-field1|field2|field3#Start-field1|field2|field3#Start-field1|field2|field3
means there are multiple occurrences of required pattern in the same input line, with Start as starting of pattern and # among all the required patterns.
Is there any way to fetch all such fields and store into elastic search?
You can use (?:case1|case2) for as much occurrences you want. In each case you can put the same pattern like this: (?:|Start-your_pattern) (?:|Start-your_pattern) (?:|Start-your_pattern)
Also use the same field name multiple times. In this way you can store fields with all of their values. This is done by using %{DATA:field1}, %{DATA:field2} and %{DATA:field3} multiple times.
For this example:
Start - John 5 apples # Start - Joe 10 beers # Start - Max 2 eggs
Use this pattern:
Start - %{DATA:field1} %{DATA:field2} %{DATA:field3} (?:|# Start - %{DATA:field1} %{DATA:field2} %{DATA:field3}) (?:|# Start - %{DATA:field1} %{DATA:field2} %{DATA:field3})
You get this output:
{
"field1": [
[
"John",
"Joe",
"Max"
]
],
"field2": [
[
"5",
"10",
"2"
]
],
"field3": [
[
"apples",
"beers",
"eggs"
]
]
}
Try it on http://grokdebug.herokuapp.com/

custom key value pair in fhir json

i have mapped around 20 fields from the sample data. All of them come under Observation category. I have a field/Label which we created to denote each patient.
"PatientLabel" : 0
I understand FHIR is all about fixed set of items. But, is there a way to include this in FHIR Json. i would need this information while processing.
"extension": [
{
"name": "PatientLabel",
"value": 0
}
tried the above one.. FHIR validator is throwing error
Extensions don't have a 'name', they have a 'url'. Also, 'value' is a polymorphic type, so you'd need "valueInteger" or "valueDecimal". That said, "0" seems an unusual value for something titled "patient label". Normally, the Patient would be in the Observation.subject. If you just have a label and not a reference, you could use subject.display and not need an extension at all...

Group multi record CSV to JSON conversion

I have below sample CSV data coming in multi record format. I want to convert to JSON format like below. I am using Nifi 1.8.
CSV:
id,name,category,status,country
1,XXX,ABC,Active,USA
1,XXX,DEF,Active,HKG
1,XXX,XYZ,Active,USA
Expected JSON:
{
"id":"1",
"status":"Active",
"name":[
"ABC",
"DEF",
"XYZ"
],
"country":[
"USA",
"HKG"
]
}
I tried FetchFile -> ConvertRecord but it is converting every csv record to one JSON object.
Ideal way would be using QueryRecord processor to run Apache calcite SQL query to group by and collect as set to get your desired output.
But i don't know what exactly functions we can use in Apache calcite :(
(or)
You can store the data into HDFS then create a temporary/staging table on top of the hdfs directory.
Use SelectHiveQL processor run the below query:
select to_json(
named_struct(
'id',id,
'status',status,
'category',collect_set(category),
'country',collect_set(country)
)
) as jsn
from <db_name>.<tab_name>
group by id,status
Will result output flowfile as:
+-----------------------------------------------------------------------------------+
|jsn |
+-----------------------------------------------------------------------------------+
|{"id":"1","status":"Active","category":["DEF","ABC","XYZ"],"country":["HKG","USA"]}|
+-----------------------------------------------------------------------------------+
You can Remove header by using csv header to false in case of csv output.

Elasticsearch performance impact on choosing mapping structure for index

I am receiving data in a format like,
{
name:"index_name",
status: "good",
datapoints: [{
paramType: "ABC",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "123"
}]
},
{
paramType: "XYZ",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "124"
}]
}]
}
I would like to store the data into elasticsearch in such a way that I can query based on a timerange, status or paramType.
As mentioned here, I can define datapoints or batch as a nested data type which will allow to index object inside the array.
Another way, I can possibly think is by dividing the structure into separate documents. e.g.
{
name : "index_name",
status: "good",
paramType:"ABC",
time:"timestamp<epoch in sec>",
value: "123"
}
which one will be the most efficient way?
if I choose the 2nd way, I know there may be ~1000 elements in the batch array and 10-15 paramsType array, which means ~15k documents will be generated and 15k*5 fields (= 75K) key values pair will be repeated in the index?
Here this explains about the advantage and disadvantage of using nested but no performance related stats provided. in my case, there won't be any update in the inner object. So not sure which one will be better. Also, I have two nested objects so I would like to know how can I query if I use nested for getting data between a timerange?
Flat structure will perform better than nested. Nested queries are slower compared to term queries ; Also while indexing - internally a single nested document is represented as bunch of documents ; just that they are indexed in same block .
As long as your requirements are met - second option works better.

Resources