How to convert a CSV to JSON array using the Miller command line tool? - miller

Using the Miller command line tool I want to convert a CSV file with headers into a JSON array.
Currently I am using this command: mlr --icsv --ojson cat sample.csv > sample.json
It is outputting JSON, but not in array format.
This is the sample CSV input:
Keyword, Weight, Quantity
Apple, 10, 2345
Orange, 23, 467
Banana, 2345, 2345
And this is the output I am getting from Miller:
{ "Keyword": "Apple", "Weight": 10, "Quantity": 2345 }
{ "Keyword": "Orange", "Weight": 23, "Quantity": 467 }
{ "Keyword": "Banana", "Weight": 2345, "Quantity": 2345 }
As you can see this output is JSON Lines, not an array format.
I want the JSON to be an array, like this:
[
{ "Keyword": "Apple", "Weight": 10, "Quantity": 2345 },
{ "Keyword": "Orange", "Weight": 23, "Quantity": 467 },
{ "Keyword": "Banana", "Weight": 2345, "Quantity": 2345 }
]
What is the correct Miller command for that?

Figured it out.
You need to use --jlistwrap.
So the full command becomes: mlr --icsv --ojson --jlistwrap cat sample.csv > sample.json
Which outputs this:
[
{ "Keyword": "Apple", "Weight": 10, "Quantity": 2345 }
,{ "Keyword": "Orange", "Weight": 23, "Quantity": 467 }
,{ "Keyword": "Banana", "Weight": 2345, "Quantity": 2345 }
]
It's not formatted beautifully (commas on the wrong line, and not indented) but it's a valid JSON array.
After running through a tool to auto-format the JSON it would look like this:
[
{
"Keyword":"Apple",
"Weight":10,
"Quantity":2345
},
{
"Keyword":"Orange",
"Weight":23,
"Quantity":467
},
{
"Keyword":"Banana",
"Weight":2345,
"Quantity":2345
}
]

Related

How to add unique to each request in JMeter

I'm having a JSON array with n number of elements like productName, productId. I would like to generate unique id in productId for each element and for each request.
Currently, I'm reading productId from .csv file but for each request same productId is applied for all the elements. For example:
test.csv
productId
10
11
12
13
14
In JMeter it substitute like below for request 1:
[
{
"productName": "Apple",
"productId": "10"
},
{
"productName": "Apple",
"productId": "10"
},
{
"productName": "Apple",
"productId": "10"
},
{
"productName": "Apple",
"productId": "10"
}
]
request 2:
[
{
"productName": "Apple",
"productId": "11"
},
{
"productName": "Apple",
"productId": "11"
},
{
"productName": "Apple",
"productId": "11"
},
{
"productName": "Apple",
"productId": "11"
}
]
But the way I'm expecting is, first request should be
[
{
"productName": "Apple",
"productId": "10"
},
{
"productName": "Apple",
"productId": "11"
},
{
"productName": "Apple",
"productId": "12"
},
{
"productName": "Apple",
"productId": "13"
}
]
And second request should be like below and so on,
[
{
"productName": "Apple",
"productId": "14"
},
{
"productName": "Apple",
"productId": "15"
},
{
"productName": "Apple",
"productId": "16"
},
{
"productName": "Apple",
"productId": "17"
}
]
productId should generate with some random id for each request and apply random id for all the elements in the json. How can we achieve this in JMeter?
You can generate a random unique number with JMeter function __UUID
Replace the productId with the following
${__UUID}
Example
[
{
"productName": "Apple",
"productId": "${__UUID}"
},
{
"productName": "Apple",
"productId": "${__UUID}"
},
{
"productName": "Apple",
"productId": "${__UUID}"
},
{
"productName": "Apple",
"productId": "${__UUID}"
}
]
As per CSV Data Set Config documentation:
By default, the file is only opened once, and each thread will use a different line from the file. However the order in which lines are passed to threads depends on the order in which they execute, which may vary between iterations. Lines are read at the start of each test iteration. The file name and mode are resolved in the first iteration.
If you want to generate a random number you can just go for __Random() function which produces a random number within the given range:
[
{
"productName": "Apple",
"productId": "${__Random(1,2147483647,)}"
},
{
"productName": "Apple",
"productId": "${__Random(1,2147483647,)}"
},
{
"productName": "Apple",
"productId": "${__Random(1,2147483647,)}"
},
{
"productName": "Apple",
"productId": "${__Random(1,2147483647,)}"
}
]
More information on JMeter Functions concept: Apache JMeter Functions - An Introduction
Another solution could be using the __CSVRead function instead of CSV Data Set Config element.
Note :
You will have to remove the column names. i.e. First row
Ensure you have sufficient test data in the CSV file
[
{
"productName": "Apple",
"productId": "${__CSVRead(productIds.csv,0)}${__CSVRead(productIds.csv,next)}"
},
{
"productName": "Apple",
"productId": "${__CSVRead(productIds.csv,0)}${__CSVRead(productIds.csv,next)}"
},
{
"productName": "Apple",
"productId": "${__CSVRead(productIds.csv,0)}${__CSVRead(productIds.csv,next)}"
},
{
"productName": "Apple",
"productId": "${__CSVRead(productIds.csv,0)}${__CSVRead(productIds.csv,next)}"
}
]

Jsonata, merging array of objects

I have an array of objects that I would like to reformat using a jsonata expression
{
"items": [
{
"time": 1575417919282,
"message": {
"data": 21,
"type": "temperature"
}
},
{
"time": 1575417919282,
"message": {
"data": 45,
"type": "temperature"
}
}
]
}
Desired format
[
{
"data": 21,
"type": "temperature",
"time": 1575417919282
},
{
"data": 45,
"type": "temperature"
"time": 1575417919282
}
]
Is there an easy one liner for this? I started with merging time into the message object using $merge([$.items.message, {"time":$.items.time}]) but his gives me
{
"data": 45,
"type": "temperature",
"time": [
1575417919282,
1575417919282
]
}
I'm finding the documentation hard to follow. How do you start with just merging two objects iteratively?
This will do it:
items.{
"data": message.data,
"type": message.type,
"time": time
}
http://try.jsonata.org/SJZDsyHTr

How to find records matching the result of a previous search using ElasticSearch Painless scripting

I have the index I attached below.
Each doc in the index holds the name and height of Alice or Bob and the age at which the height was measured. Measurements taken at the age of 10 are flagged as "baseline_height_at_age_10": true
I need to do the following:
Find the height of Alice and Bob at age 10.
List item Return for Alice and Bob, the records where the height is lower than their height at age 10.
So my question is: Can Painless do such type of search?
I'd appriciate if you could point me at a good example for that.
Also: Is ElasticSearch Painless even a good approach for this problem? Can you sugges
The Index Mappings
PUT /shlomi_test/
{
"mappings": {
"_doc": {
"properties": {
"first_name": {
"type": "keyword",
"fields": {
"raw": {
"type": "text"
}
}
},
"surname": {
"type": "keyword",
"fields": {
"raw": {
"type": "text"
}
}
},
"baseline_height_at_age_10": {
"type": "boolean"
},
"age": {
"type": "integer"
},
"height": {
"type": "integer"
}
}
}
}
}
The Index Data
POST /test/_doc/alice_green_8_110
{
"first_name": "Alice",
"surname": "Green",
"age": 8,
"height": 110,
"baseline_height_at_age_10": false
}
POST /test/_doc/alice_green_10_120
{
"first_name": "Alice",
"surname": "Green",
"age": 10,
"height": 120,
"baseline_height_at_age_10": true
}
POST /test/_doc/alice_green_13_140
{
"first_name": "Alice",
"surname": "Green",
"age": 13,
"height": 140,
"baseline_height_at_age_10": false
}
POST /test/_doc/alice_green_23_170
{
"first_name": "Alice",
"surname": "Green",
"age": 23,
"height": 170,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_8_120
{
"first_name": "Alice",
"surname": "Green",
"age": 8,
"height": 120,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_10_130
{
"first_name": "Alice",
"surname": "Green",
"age": 10,
"height": 130,
"baseline_height_at_age_10": true
}
POST /test/_doc/bob_green_15_160
{
"first_name": "Alice",
"surname": "Green",
"age": 15,
"height": 160,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_21_180
{
"first_name": "Alice",
"surname": "Green",
"age": 21,
"height": 180,
"baseline_height_at_age_10": false
}
You should be able to do it just using aggregations. Assuming people only ever get taller, and the measurements are accurate, you could restrict the query to only those documents aged 10 or under, find the max height of those, then filter the results of those to exclude the baseline result
POST test/_search
{
"size": 0,
"query": {
"range": {
"age": {
"lte": 10
}
}
},
"aggs": {
"names": {
"terms": {
"field": "first_name",
"size": 10
},
"aggs": {
"max_height": {
"max": {
"field": "height"
}
},
"non-baseline": {
"filter": {
"match": {
"baseline_height_at_age_10": false
}
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
I've posted the same question, with emphasis on Painless scripting, ElasticSearch Support Forum How to find records matching the result of a previous search using ElasticSearch Painless scripting
and the answer was:
"I don't think the Painless approach will work here. You cannot use
the results of one query to execute a second query with Painless.
The two-step approach that you outline at the end of your post is the
way to go."
The bottom line is that you cannot use a result from one query as an input to another query. You can filter and aggregate and more, but not this.
So the approcah is pretty much as follows:
according to my understanding, suggests to do the 1st search, process
the data and do an additional search. This basically translates to:
Search the record where first_name=Alice and baseline_height_at_age_10=True.
Process externally, to extract the value of height for Alice at age 10.
Search for Alice's records where her height is lower than the value calculated externally.

Extract keywords from fields

I want to write a query to analyze one or more fields ?
i.e. current analyzers require text to function, instead of passing text I want to pass a field value.
If I have a document like this
{
"desc": "A document description",
"name": "This name is not original",
"amount": 3000
}
I would like to return something like the below
{
"desc": ["document", "description"],
"name": ["name", "original"],
"amount": 3000
}
You can use Term Vectors or Multi Term Vectors to achieve what you're looking for:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html
You'd have to specify the Ids of the fields you want as well as the fields and it will return an array of analyzed tokens for each document you have as well as certain other info which you can easily disable.
GET /exampleindex/_doc/_mtermvectors
{
"ids": [
"1","2"
],
"parameters": {
"fields": [
"*"
]
}
}
Will return something along the lines of:
"docs": [
{
"_index": "exampleindex",
"_type": "_doc",
"_id": "1",
"_version": 2,
"found": true,
"took": 0,
"term_vectors": {
"desc": {
"field_statistics": {
"sum_doc_freq": 5,
"doc_count": 2,
"sum_ttf": 5
},
"terms": {
"amazing": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 3,
"end_offset": 10
}
]
},
"an": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 2
}
]
}

ElasticSearch : how to rank on skill rating in full text search?

I have quite a simple case but can't find the good way to solve it:
I have People, their Skills are rated. They also have other information attached (eg: city). All of this is in my ElasticSearch index.
Example:
John
Paris
Python: 7/10
Boris
Paris
Python: 3/10
Mike
Frankfurt
Python: 7/10
I would like to perform a text search only to find people.
If I type "Python", the better rated someone is, the higher it should be
If I type "Python Paris", it should get all people in Paris sorted by Python rating
Here is an example of people document in ES index:
{
"_index": "senso",
"_type": "talent",
"_id": "12469",
"_version": 1,
"found": true,
"_source": {
"id": 12469,
"nickname": "Roger",
"first_name": "Moore",
"last_name": "Bond",
"companyName": null,
"email": "example#example.org",
"city": "Marseille",
"region": "Provence-Alpes-Côte d'Azur",
"internalGlobalRating": 5,
"declaredDailyPrice": 650,
"declaredAnnualSalaryTarget": null,
"boughtDailyPrice": null,
"soldDailyPrice": null,
"skillsRatings": [
{
"skillName": "Direction Artistique Web",
"skillId": 1298,
"rating": 9
},
{
"skillName": "UX Design",
"skillId": 1295,
"rating": 9
},
{
"skillName": "Identité Visuelle",
"skillId": 1319,
"rating": 8
},
{
"skillName": "Illustrator",
"skillId": 1425,
"rating": 9
},
{
"skillName": "Photoshop",
"skillId": 1427,
"rating": 9
},
{
"skillName": "InDesign",
"skillId": 1426,
"rating": 9
}
],
"expertises": [
{
"name": "Direction Artistique Web",
"id": 1298
},
{
"name": "UX Design",
"id": 1295
},
{
"name": "Identité Visuelle",
"id": 1319
}
],
"missionTypes": [
{
"name": "Freelance sur place",
"id": 2
},
{
"name": "Freelance en télétravail",
"id": 3
},
{
"name": "Forfait",
"id": 4
}
],
"tools": [
{
"name": "Illustrator",
"id": 1425
},
{
"name": "Photoshop",
"id": 1427
},
{
"name": "InDesign",
"id": 1426
}
],
"themes": [],
"medias": [],
"organizationType": {
"id": 2,
"name": "Studio"
},
"source": {
"id": 2
},
"spokenLanguages": [
{
"id": 2
},
{
"id": 3
}
],
"mainLanguage": {
"id": 1,
"name": "Français"
}
"created": "2011-10-05T20:17:52+02:00",
"updated": "2017-07-03T15:59:11+02:00",
"applicationDate": "2011-10-05T20:17:52+02:00",
"portfolio": {
"id": 95,
"visible": true,
"submissionTime": "2017-01-13T18:20:31+01:00",
"isDisplayed": 1,
"isPublic": 1
}
}
}
I wonder which approach I should choose : tweak at index time or custom queries, or both ?
Any clue on how to tackle this problem would be appreciated.
Thank you.

Resources