Apache drill selecting empty arrays from JSON - hadoop

I am using Apache drill to store large JSON files which I'm then querying using the Drill API as follows:
{
"queryType": "SQL",
"query": "select * from db.table.`/path/to/JSON.json` w "
}
This correctly returns the data. However, some of the JSON files have an empty array.
For example, the following is the JSON stored in the database
{
"key1": ["array", "of", "data"],
"key2": ["array", "of", "data"],
"key3": ["array", "of", "data"],
"key4": ["array", "of", "data"],
"key5": ["array", "of", "data"],
"key6": ["array", "of", "data"],
"key7": [],
}
When I retrieve this data, it returns as the following
{
"columns": [
"key1",
"key2",
"key3",
"key4",
"key5",
"key6",
],
"rows": [
{}
]
}
key7 is missing. How do I get the response to show this key even though it maybe empty for some of the stored JSON files.

Drill is schema less, So if there is no data in any row it will ignore that column, if you know that column required, you may need to use "case" or "if" statement to add default value or create a view.

Related

Is it possible to use cockroach gen_random_uuid() function inside JSON data while inserting into JSON datatype in cockroachDB

I am new to cockroach DB and was wondering if the below ask is possible
One of the columns in my table is of JSON type and the sample data in it is as follows
{
"first_name": "Lola",
"friends": 547,
"last_name": "Dog",
"location": "NYC",
"online": true,
"Education": [
{
"id": "4ebb11a5-8e9a-49dc-905d-fade67027990",
"UG": "UT Austin",
"Major": "Electrical",
"Minor": "Electronics"
},
{
"id": "6724adfa-610a-4efe-b53d-fd67bd3bd9ba",
"PG": "North Eastern",
"Major": "Computers",
"Minor": "Electrical"
}
]
}
Is there a way to replace the "id" field in JSON as below to get the id generated dynamically?
"id": gen_random_uuid(),
Yes, this should be possible. To generate JSON data that includes a randomly-generated UUID, you can use a query like:
root#:26257/defaultdb> select jsonb_build_object('id', gen_random_uuid());
jsonb_build_object
--------------------------------------------------
{"id": "d50ad318-62ba-45c0-99a4-cb7aa32ad1c3"}
If you want to update in place JSON data that already exists, you can use the jsonb_set function (see JSONB Functions).

Kibana/Elastic Query on multiple terms in same array element

In my Elasticsearch Index I have documents which contain an array of uniform elements, like this:
Document 1:
"listOfElements": {
"entries": [{
"key1": "value1",
"int1": 4,
"key2": "value2"
}, {
"key1": "value1",
"int1": 7,
"key2": "value2"
}
]
}
Document 2:
"listOfElements": {
"entries": [{
"key1": "value1",
"int1": 5,
"key2": "value2"
}, {
"key1": "value1",
"int1": 7,
"key2": "value2"
}
]
}
Now I want to create a query that returns all documents which have, e.g. key1:value1 AND int1:4 in the same entry element.
However, if I only query for "key1:value1 AND int1:4" I obviously get all documents that have key1:value1 and all that have int1:4 so I would get both documents from the above example.
Is there any way to query for multiple fields that have to be in the same array element?

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..
What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

How to select objects from JSON file and push into new file when they fail API validation

I am working with an API which accepts some JSON objects (sent as post request) and fails others based on certain criteria.
I am trying to compile a "log" of the objects which have failed and ones which have been validated successfully so I don't have to manually copy and paste them each time. (There are hundreds of objects).
Basically if the API returns "false", I want to push that object into a file, and if it returns true, all those objects go into another file.
I have tried to read a bunch of documentation / blogs on "select, detect, reject" etc enumerators but my problem is very different from the examples given.
I have written some pseudo code in my ruby file below and I think I'm going along the right lines, but need a bit of guidance to complete the task:
restaurants = JSON.parse File.read('pretty-minified.json')
restaurants.each do |restaurant|
create_response = HTTParty.post("https://api.hailoapp.com/business/create",
{
:body => restaurant.to_json,
:headers => { "Content-Type" => "text", "Accept" => "application/x-www-form-urlencoded", "Authorization" => "token #{api_token}" }
})
data = create_response.to_hash
alert = data["valid"]
if alert == false
# select restaurant json objects which return false and push into new file
# false_rest = restaurants.detect { |r| r == false }
File.open('false_objects.json', 'w') do |file|
file << JSON.pretty_generate(false_rest)
else
# select restaurant json objects which return true and push into another file
File.open('true_objects.json', 'w') do |file|
file << JSON.pretty_generate()
end
end
An example of the output (JSON) from the API is as follows:
{"id":"102427","valid":true}
{"valid":false}
The JSON file is basically an huge array of hashes (or objects), here is a short excerpt:
[
{
"id": "223078",
"name": "3 South Place",
"phone": "+442032151270",
"email": "3sp#southplacehotel.com",
"website": "",
"location": {
"latitude": 51.5190536,
"longitude": -0.0871038,
"address": {
"line1": "3 South Place",
"line2": "",
"line3": "",
"postcode": "EC2M 2AF",
"city": "London",
"country": "UK"
}
}
},
{
"id": "210071",
"name": "5th View Bar & Food",
"phone": "+442077347869",
"email": "waterstones.piccadilly#elior.com",
"website": "http://www.5thview.com",
"location": {
"latitude": 51.5089594,
"longitude": -0.1359897,
"address": {
"line1": "Waterstone's Piccadilly",
"line2": "203-205 Piccadilly",
"line3": "",
"postcode": "W1J 9HA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "239971",
"name": "65 & King",
"phone": "+442072292233",
"email": "hello#65king.com",
"website": "http://www.65king.com/",
"location": {
"latitude": 51.5152533,
"longitude": -0.1916538,
"address": {
"line1": "65 Westbourne Grove",
"line2": "",
"line3": "",
"postcode": "W2 4UJ",
"city": "London",
"country": "UK"
}
}
}
]
Assuming you want to filter by emails, ending with elior.com (this condition might be easily changed):
NB! The data above looks like a javascript var, it’s not a valid ruby object. I assume you just got it from somewhere as a string. That’s why json:
require 'json'
array = JSON.parse(restaurants) # data is a string: '[{....... as you received it
result = array.group_by do |e|
# more sophisticated condition goes here
e['email'] =~ /elior\.com$/ ? true : false
end
File.open('false_objects.json', 'w') do |file|
file << JSON.pretty_generate(result[false])
end
File.open('true_objects.json', 'w') do |file|
file << JSON.pretty_generate(result[true])
end
There is a hash in result, containing two elements:
#⇒ {
# true: [..valids here ..],
# false: [..invalids here..]
# }

read specific part of a text file Ruby

Hi I convert a PDF to a txt file in Ruby 1.9.3
Here is part of the txt file:
[["Rate", "Card", "February", "29,", "2012"]]
[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]
[["x", "A", "CAMEL", "56731973573", "$", "0.1400", "USD", "30/45/100%"]]
["y", "A", "CARDINAL", "56731972501", "$", "0.1400", "USD", "30/45/100%"]]
[["z", "A", "CARNELIAN", "56731971654", "$", "0.1400", "USD", "30/45/100%"]]
.....
....
[["Rate", "Card", "February", "29,", "2012"]]
[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]
I store every line in a different array, but the problem is that I don't want to read the two first lines which appears lots of times in my txt file, because those lines are the header in every page on the pdf. Any idea about how to do that? Thanks!
You can read file into array and reject lines you do not need:
rejected = [
'[["Rate", "Card", "February", "29,", "2012"]]',
'[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]',
]
lines = File.readlines('/path/to/file').reject { |line| rejected.include? line }

Resources