Trying to query a json hive table built on top of json data. Using json2Hive was able to generate DDL and was able to create table after removing unnecessary fields.
create external table user_tables.sample_json_table (
`apps` struct<
`app`: array<struct<
`id`: string,
`queue`: string,
`finalstatus`: string,
`trackingurl`: string,
`applicationtype`: string,
`applicationtags`: string,
`startedtime`: string,
`launchtime`: string,
`finishedtime`: string,
`memoryseconds`: string,
`vcoreseconds`: string,
`resourcesecondsmap`: struct<
`entry`: struct<
`key`: string,
`value`: string
>
>
>
>
>
)
row format serde 'org.apache.hadoop.hive.serde2.JsonSerDe'
location '/xyz/location/;
Now, stuck trying to figure out how to query each field from the below schema ?
checked several articles but all of them are case specific, and need a generic explanation or example how to query each field under array/struct :)
I only care about the multiple 'app' subsection entries and would like them to be imported onto another table with separate fields for each fields.
Sample json data:
{"apps":{"app":[{"id":"application_282828282828_12717","user":"xyz","name":"xyz-4b6bdae2-1a0c-4772-bd8e-0d7454268b82","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12717/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"ABC,xyz_20221107070124_2beb5d90-24c7-4b1b-b977-3c9af1397195,userid=dummy","priority":0,"startedtime":1667822485626,"launchtime":1667822485767,"finishedtime":1667822553365,"elapsedtime":67739,"amcontainerlogs":"http://dingdong:8042/node/containerlogs/container_e65_282828282828_12717_01_000001/xyz","amhosthttpaddress":"dingdong:8042","amrpcaddress":"dingdong:46457","masternodeid":"dingdong:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":1264304,"vcoreseconds":79,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"1264304"},"entry":{"key":"vcores","value":"79"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12724","user":"xyz","name":"xyz-94962a3e-d230-4fd0-b68b-01b59dd3299d","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12724/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"ZZZ_,xyz_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy","priority":0,"startedtime":1667822585231,"launchtime":1667822585437,"finishedtime":1667822631435,"elapsedtime":46204,"amcontainerlogs":"http://ding:8042/node/containerlogs/container_e65_282828282828_12724_01_000002/xyz","amhosthttpaddress":"ding:8042","amrpcaddress":"ding:46648","masternodeid":"ding:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":5603339,"vcoreseconds":430,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"5603339"},"entry":{"key":"vcores","value":"430"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"time_out","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12736","user":"xyz","name":"xyz-1a9c73ef-2992-40a5-aaad-9f0688bb04f4","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12736/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"BLAHBLAH,xyz_20221107070609_8d261352-3efa-46c5-a5a0-8a3cd745d180,userid=dummy","priority":0,"startedtime":1667822771170,"launchtime":1667822773663,"finishedtime":1667822820351,"elapsedtime":49181,"amcontainerlogs":"http://dong:8042/node/containerlogs/container_e65_282828282828_12736_01_000001/xyz","amhosthttpaddress":"dong:8042","amrpcaddress":"dong:34266","masternodeid":"dong:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":1300011,"vcoreseconds":89,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"1300011"},"entry":{"key":"vcores","value":"89"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12735","user":"xyz","name":"xyz-d5f56a0a-9c6b-4651-8f88-6eaff5953777","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12735/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"HAHAHA_,xyz_20221107070605_a082d9d8-912f-4278-a2ef-5dfe66089fd7,userid=dummy","priority":0,"startedtime":1667822766897,"launchtime":1667822766999,"finishedtime":1667822796759,"elapsedtime":29862,"amcontainerlogs":"http://dung:8042/node/containerlogs/container_e65_282828282828_12735_01_000001/xyz","amhosthttpaddress":"dung:8042","amrpcaddress":"dung:42765","masternodeid":"dung:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":669695,"vcoreseconds":44,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"669695"},"entry":{"key":"vcores","value":"44"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}}]}}
sample query output :
id | queue | finalStatus | trackingurl |....
-----------------------------------------------------------
application_282828282828_12717 | root.users.dummy | succeeded | ...
application_282828282828_12724 | root.users.dummy2 | failed | ....
For anyone looking to perform something similar ,I found this article very helpful with clear explanation: https://community.cloudera.com/t5/Support-Questions/Complex-Json-transformation-using-Hive-functions/m-p/236476
Below is the query to parse using LATERAL VIEW EXPLODE in case people on the same boat:
select ex1.* from user_tables.sample_json_table cym LATERAL VIEW OUTER inline(cym.apps.app) ex1;
| id | queue | finalstatus | trackingurl | applicationtype | applicationtags | startedtime | launchtime | finishedtime | memoryseconds | vcoreseconds | resourcesecondsmap |
| ------------------------------- | ----------------- | ----------- | ------------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | ---------------------------------------- |
| application_1667627410794_12717 | root.users.dummy2 | succeeded | http://dang:8088/proxy/application_1667627410794_12717/ | tez | \_xyz,test-app-24c7-4b1b-b977-3c9af1397195,userid=dummy1 | 1667822485626 | 1667822485767 | 1667822553365 | 1264304 | 79 | {"entry":{"key":"vcores","value":"79"}} |
| application_1667627410794_12724 | root.users.dummy3 | succeeded | http://dang:8088/proxy/application_1667627410794_12724/ | tez | \_generate_stuff,hive_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy3 | 1667822585231 | 1667822585437 | 1667822631435 | 5603339 | 430 | {"entry":{"key":"vcores","value":"430"}} |
| application_1667627410794_12736 | root.users.dummy1 | succeeded | http://dang:8088/proxy/application_1667627410794_12736/ | tez | \_sample_job,test-zzz-3efa-46c5-a5a0-8a3cd745d180,userid=dummy1 | 1667822771170 | 1667822773663 | 1667822820351 | 1300011 | 89 | {"entry":{"key":"vcores","value":"89"}} |
| application_1667627410794_12735 | root.users.dummy2 | succeeded | http://dang:8088/proxy/application_1667627410794_12735/ | tez | \_mixed_article,placebo_2-912f-4278-a2ef-5dfe66089fd7,userid=dummy2 | 1667822766897 | 1667822766999 | 1667822796759 | 669695 | 44 | {"entry":{"key":"vcores","value":"44"}} |
Add. Note: Although my requirement no longer needs it, but If anyone can suggest how to further parse the last field resourcesecondsmap to populate map key value would be great to know! basically use key value as field and value as actual value in field:
Desired Output:
| id | queue | finalstatus | trackingurl | applicationtype | applicationtags | startedtime | launchtime | finishedtime | memoryseconds | vcoreseconds | vcores-value |
| ------------------------------- | ----------------- | ----------- | ------------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | ------------ |
| application_1667627410794_12717 | root.users.dummy2 | succeeded | http://dang:8088/proxy/application_1667627410794_12717/ | tez | \_xyz,test-app-24c7-4b1b-b977-3c9af1397195,userid=dummy1 | 1667822485626 | 1667822485767 | 1667822553365 | 1264304 | 79 | 79 |
| application_1667627410794_12724 | root.users.dummy3 | succeeded | http://dang:8088/proxy/application_1667627410794_12724/ | tez | \_generate_stuff,hive_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy3 | 1667822585231 | 1667822585437 | 1667822631435 | 5603339 | 430 | 430 |
I run same query in 2 environnements with huge performance différence : 0.015 sec vs 25sec.
Exlain plan :
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | company1_ | const | PRIMARY | PRIMARY | 152 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | user2_ | ref | PRIMARY | PRIMARY | 152 | const | 1032 | 100.00 | Using where |
| 1 | SIMPLE | vacationpr5_ | eq_ref | PRIMARY | PRIMARY | 304 | user2_.ID_COMPANY_VACATION_PROFILE,.user2_.ID_VACATION_PROFILE | 1 | 100.00 | Using index |
| 1 | SIMPLE | vacationac0_ | ref | PRIMARY,I_VACATION_ACCUMULATION_EA | PRIMARY | 304 | const,.user2_.ID_USER | 4 | 100.00 | Using where |
| 1 | SIMPLE | vacationty3_ | eq_ref | PRIMARY | PRIMARY | 304 | const,.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | vacationst6_ | eq_ref | PRIMARY | PRIMARY | 608 | user2_.ID_COMPANY_VACATION_PROFILE,.user2_.ID_VACATION_PROFILE,const,.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | translatio9_ | eq_ref | PRIMARY | PRIMARY | 919 | vacationty3_.ID_COMPANY_TRANSLATION,.vacationty3_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio10_ | eq_ref | PRIMARY, | PRIMARY | 951 | vacationty3_.ID_COMPANY_TRANSLATION,.vacationty3_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
| 1 | SIMPLE | listvalue4_ | ALL | NULL | NULL | NULL | NULL | 5284 | 100.00 | Using where |
| 1 | SIMPLE | translatio7_ | eq_ref | PRIMARY | PRIMARY | 919 | listvalue4_.ID_COMPANY_TRANSLATION,.listvalue4_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio8_ | eq_ref | PRIMARY | PRIMARY | 951 | listvalue4_.ID_COMPANY_TRANSLATION,.listvalue4_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
next explain plan :
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
| 1 | SIMPLE | company1_ | const | PRIMARY | PRIMARY | 152 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | user2_ | ref | PRIMARY, | PRIMARY | 152 | const | 1050 | 100.00 | Using where |
| 1 | SIMPLE | vacationpr5_ | eq_ref | PRIMARY | PRIMARY | 304 | validation2.user2_.ID_COMPANY_VACATION_PROFILE,validation2.user2_.ID_VACATION_PROFILE | 1 | 100.00 | Using index |
| 1 | SIMPLE | vacationac0_ | ref | PRIMARY,I_VACATION_ACCUMULATION_EA | PRIMARY | 304 | const,validation2.user2_.ID_USER | 5 | 100.00 | Using where |
| 1 | SIMPLE | vacationty3_ | eq_ref | PRIMARY | PRIMARY | 304 | const,validation2.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | vacationst6_ | eq_ref | PRIMARY | PRIMARY | 608 | validation2.user2_.ID_COMPANY_VACATION_PROFILE,validation2.user2_.ID_VACATION_PROFILE,const,validation2.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | translatio9_ | eq_ref | PRIMARY | PRIMARY | 919 | validation2.vacationty3_.ID_COMPANY_TRANSLATION,validation2.vacationty3_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio10_ | eq_ref | PRIMARY, | PRIMARY | 951 | validation2.vacationty3_.ID_COMPANY_TRANSLATION,validation2.vacationty3_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
| 1 | SIMPLE | listvalue4_ | ALL | NULL | NULL | NULL | NULL | 5282 | 100.00 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | translatio7_ | eq_ref | PRIMARY | PRIMARY | 919 | validation2.listvalue4_.ID_COMPANY_TRANSLATION,validation2.listvalue4_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio8_ | eq_ref | PRIMARY, | PRIMARY | 951 | validation2.listvalue4_.ID_COMPANY_TRANSLATION,validation2.listvalue4_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
How I can force to use join buffer (flat, BNL join) the first environment is the production one and has more memory and CPU.
In first environment :
join_buffer_size............ 16777216
join_buffer_space_limit..... 2097152
In second environment :
join_buffer_size............ 262144
join_buffer_space_limit..... 2097152
Is there any link/ratio between join_buffer_size and join_buffer_space_limit?
We configure 16Mo on join_buffer_size because it is a mysqlTuner hint.
I set join_buffer_space_limit at 128Mo and it resolves performance issue.
So mysqlTuner doesn't give hint for this configuration key.
SET GLOBAL join_buffer_space_limit = 1024 * 1024 * 128;
It takes time (hour) to improve performances.
https://mariadb.com/kb/en/library/multi-range-read-optimization/