Explode function returning single row - hadoop

I used the field type as Array. "Select col as sample_table" returns the below output.
["[-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442]"]
When I used
select explode(col)
from sample_table.
I get the output as below which is a single row.
[-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442]
I want the output in 3 rows as below.
[-80.86598534884655,35.53423185253291]
[-80.86598789514547,35.53423048990488]
[-80.86598794307857,35.53423046392442]
As i see in the hive tutorial, explode function should return multiple rows but i don't see it happening

The input you have given appears as an array field having only one value. That entire value is taken as array of size one by explode function and thereby returns the result in a single row.

Related

How do I sort a data range but only return one column? (Sheets)

I have a data range in Google Sheets where I want to sort the data by column B, but only return column A. If it matters, column A is a string, column B is integers.
Using =SORT(A1:B10,2,FALSE) returns both columns A and B, sorted by column B...but I only want it to return column A.
I've also tried:
=QUERY((SORT(A1:B10,2,FALSE)),"select *") <- does exactly the same as sort, tried just for testing
=QUERY((SORT(A1:B10,2,FALSE)),"select col1") <- #value error
=QUERY((SORT(A1:B10,2,FALSE)),"select A") <- #value error (also tried "select A:A" and "select A1:A10")
=QUERY((SORT(A1:B10,2,FALSE)),"select Stat") <- #value error
I've also tried all of the above, but starting with =QUERY(A1:B10,SORT(...
Am I using QUERY wrong? Is SORT not what I want? I could just use SORT in a hidden part of the sheet, then reference the column I want but that feels cheaty, I want to know if there's a way to do what I want to do.
You can set in the first part the column you want to be returned, then the column you want to be sorted with, and then if it's ascending or not (you can then add other columns, obviously. They don't need to be included nor contiguous, but of the same size). Try this:
=SORT(A1:A10,B1:B10,FALSE)
use:
=INDEX(SORT(A1:A10,2,),,1)

How to use DWitemstatus in Power Builder

I'm learning about Power Builder, and i don't know how to use these, (DWitemstatus, getnextmodified, modifiedcount, getitemstatus, NotModified!, DataModified!, New!, NewModified!)
please help me.
Thanks for read !
These relate to the status of rows in a datawindow. Generally the rows are retrieved from a database but this doesn't always have to be the case - data can be imported from a text file, XML, JSON, etc. as well.
DWItemstatus - these values are constants and describe how the data would be changed in the database.
Values are:
NotModified! - data unchanged since retrieved
DataModified! - data in one or more columns has changed
New! - row is new but no values have been assigned
NewModifed! - row is new and at least one value has been assigned to a column.
So in terms of SQL, a row which is not modified would not generate any SQL to the DBMS. A DataModified row would typically generate an UPDATE statement. New and NewModifed would typically generate INSERT statements.
GetNextModifed is a method to search a set of rows in a datawindow to find the modified rows within that set. The method takes a buffer parameter and a row parameter. The datawindow buffers are Primary!, Filter!, and Delete!. In general you would only look at the Primary buffer.
ModifedCount is a method to determine the number of rows which have been modifed in a datawindow. Note that deleting a row is not considered a modification. To find the number of rows deleted use the DeletedCount method.
GetItemStatus is a method to get the status of column within a row in a data set in a datawindow. It takes the parameters row, column (name or number), and DWBuffer.
So now an example of using this:
// loop through rows checking for changes
IF dw_dash.Modifiedcount() > 0 THEN
ll = dw_dash.GetNextModified(0,Primary!)
ldw = dw_dash
DO WHILE ll > 0
// watch value changed
IF ldw.GetItemStatus(ll,'watch',Primary!) = DataModified! THEN
event we_post_item(ll, 'watch', ldw)
END IF
// followup value changed
IF ldw.GetItemStatus(ll,'followupdate',Primary!) = DataModified! THEN
event we_post_item(ll, 'followupdate', ldw)
END IF
ll = ldw.GetNextModified(ll,Primary!)
LOOP
ldw.resetupdate() //reset the modifed flags
END IF
In this example we first check to see if any row in the datawindow has been modified. Then we get the first modified row and check if either the 'watch' or 'followupdate' columns were changed. If they were we trigger an event to do something. We then loop to the next modified row and so on. Finally we reset the modified flags so the row would now show as not being mofified.

pymysql-How to return the results of a query in the form of a list of tuples

Lets say I have this code:
sql_query="select actor.actor_id from actor where actor='%s'"
cursor.execute=(sql_query,(actorID))
result=cursor.fetchall()
return(result)
What should I do to my code so the results are in the form of a list of tuples?Also I want the first tuple to be the name of the columns of my query.
For example: [(“Name”, “Id”,),(“Jim”,7,),(“Tom”,13,)]
Here is a sample example:
cur.execute('''SELECT * FROM patient_login''')
results = cur.fetchall()
nested_tuple_list = []
for result in results:
nested_tuple_list.append(result)
print(nested_tuple_list)
As you can see, we enter our select statement with cur.execute(). We fetch all of the results and store them in the variable results. We run a for loop, and each result will be a tuple of a result in our DB. We then append them to the end of the list. When we print the results, here is the output:
[(4, 'sikudabo', 'monkey1'), (83, 'sikudabo2', 'monkey2')]
We end up with a list of tuples.
Here is another answer that simple grabs the column names and stores them in a nested tuple:
cur.execute('''DESCRIBE patient_login''')
results = cur.fetchall()
nested_tuple_list = []
nested_tuple_list_2 = []
for result in results:
result = ((result[0]))
nested_tuple_list.append(result)
nested_tuple_list = tuple(nested_tuple_list)
nested_tuple_list_2.append(nested_tuple_list)
print(nested_tuple_list_2)
The describe command in SQL will Describe the table by telling you which columns exist within the table, and various characteristics of the table such as primary key, datatype ect. This command will return a tuple of the described data. Here we search each result and grab the first index in the results for each of the nested tuples. The first index corresponds with the column name in the for loop. We can append this to the first empty list. After we append all of the column names, we change the list to a tuple. We then append that tuple to the list and have all of the column names in the list within a tuple. Here is the output:
[('ID', 'username', 'password')]
If you want each element in the list to be a tuple in it of itself, here is the code:
nested_tuple_list = tuple(nested_tuple_list)
nested_tuple_list_2 = [(x,) for x in nested_tuple_list]
print(nested_tuple_list_2)
We can do a list comprehension with x representing each column, and here is our output for each of my columns in the DataFrame:
[('ID',), ('username',), ('password',)]

Hashing methodology for collection of strings and integer ranges

I have a data, for example per the following:
I need to match the content with the input provided for the Content & Range fields to return the matching rows. As you can see the Content field is a collection of strings & the Range field is a range between two numbers. I am looking at hashing the data, to be used for matching with the hashed input. Was thinking about Iterating through the collection of individual strings hashcode & storing it for the Content field. For the Range field I was looking at using interval trees. But then the challenge is when i hash the Content input & Range input how will i find if it that hashcode is present in the hashcode generated for the collection of strings in the Content fields & the same for the Range fields.
Please do let me know if there are any other alternate ways in which this can be achieved. Thanks.
There is a simple solution to your problem: Inverted Index.
For each item in content, create the inverted index that maps 'Content' to 'RowID', i.e. create another table of 2 columns viz. Content(string), RowIDs(comma separated strings).
For your first row, add the entries {Azd, 1}, {Zax, 1}, {Gfd, 1}..., {Mni, 1} in that table. For the second row, add entries for new Content strings. For the Content string already present in the first row ('Gfd', for example), just append the new row id to the entry you created for first row. So, Gfd's row will look like {Gfd, 1,2}.
When done processing, you will have the table that will have 'Content' strings mapped to all the rows in which this content string is present.
Do the same inverted indexing for mapping 'Range' to 'RowID' and create another table of Range(int), RowIDs(comma seperated strings).
Now, you will have a table whose rows will tell which range is present in which row ids.
Finally, for each query that you have to process, get the corresponding Content and Range row from the inverted index tables and do an intersection of those comma seperated list. You will get your answer.

Using XPath to find rows where a specific column has value

I'm having trouble using XPath to find a row in a table where a specific column contains a value. The table has 10 columns where 2 of them will show Yes|No but I'm only interested in finding the value in one of the columns (the 4th one). My initial attempt was this:
//table[#id='myTable']/tbody/tr/td[text() = 'Yes']
but it finds it rows from both columns. I thought I could try something like this but it's not a valid expression:
//table[#id='myTable']/tbody/tr/td[4]/text()='Yes'
Any suggestions? Thanks.
You can try this way :
//table[#id='myTable']/tbody/tr[td[4][. = 'Yes']]
The XPath return row (tr) having the forth td child value equals "Yes".

Resources