Combining tab delimited files based on a column value - bash

Suppose I have two tab-delimited files that share a column. Both files have a header line that gives a label to each column. What's an easy way to take the union of the two tables, i.e. take the columns from A and B, but do so according to the value of column K?
for example, table A might be:
employee_id name
123 john
124 mary
and table B might be:
employee_id age
124 18
123 22
then the union based on column 1 of table A ("employee_id") should yield the table:
employee_id name age
123 john 22
124 mary 18
i'd like to do this using Unix utilities, like "cut" etc. how can this be done?

you can use the join utility, but your files need to be sorted first.
join file1 file2
man join for more information

here's a start. I leave you to format the headers as needed
$ awk 'NR>1{a[$1]=a[$1]" "$2}END{for(i in a)print a[i],i}' tableA.txt tableB.txt
age employee_id
john 22 123
mary 18 124
another way
$ join <(sort tableA.txt) <(sort tableB.txt)
123 john 22
124 mary 18
employee_id name age
experiment with the join options when needed (see info page or man page)

Try:
paste file1 file2 > file3

Related

Generating new table in Power BI dynamically from another table

I am working with Power BI in which I have a list of names in a particular column which has a date attached to each. I need to modify this table to create extra rows that create a further two dates for each name, so each name has 3 rows attached to it, the original, the day before, and the day afterwards.
For example if I had
Person | Date
Luke | 2021-06-01
Adam | 2021-05-12
Ben | 2021-04-28
This would be modified to be
Person | Date
Luke | 2021-05-31
Luke | 2021-06-01
Luke | 2021-06-02
Adam | 2021-05-11
Adam | 2021-05-12
Adam | 2021-05-13
Ben | 2021-04-27
Ben | 2021-04-28
Ben | 2021-04-29
The dataset I have is many thousands of names. Does anyone know how to create the output in a new table?
One way to achieve that is to add couple of custom columns, named PrevDate and NextDate for example, as follows:
and
This will give you all 3 dates per person, but in 3 separate columns:
To combine them into a single column, select all date columns and click Transform -> Unpivot Columns:
If you want, you can delete Attribute column, if it is not needed.

Uniq a column and print out number of rows in that column

I have a file, with header
name, age, id, address
Smith, 18, 201392, 19 Rand Street, USA
Dan, 19, 029123, 23 Lambert Rd, Australia
Smith, 20, 192837, 61 Apple Rd, UK
Kyle, 25, 245123, 103 Orange Rd, UK
And I'd like to sort out duplicates on names, so the result will be:
Smith, 18, 201392, 19 Rand Street, USA
Dan, 19, 029123, 23 Lambert Rd, Australia
Kyle, 25, 245123, 103 Orange Rd, UK
# prints 3 for 3 unique rows at column name
I've tried sort -u -t, -k1,1 file, awk -F"," '!_[$1]++' file but it doesn't work because I have commas in my address.
Well, you changed the functionality since the OP, but this should get you unique names in your file (considering it's named data), unsorted:
#!/bin/bash
sed "1 d" data | awk -F"," '!_[$1]++ { print $1 }'
If you need to sort, append | sort to the command line above.
And append | wc -l to the command line to count lines.

Powerquery - appending the same table to itself using differing columns

So I have a list of properties and a list of the next four servicing dates
e.g:
Property| Last | Next1 | Next2 | Next3 | Next4 |
123 Road| 01-2019 |03-2019| 05-2019| 07-2019| 09-2019|
444 Str | 01-2019 |07-2019| 01-2020| 07-2020| 01-2021|
etc.
I want to see:
Property | Date
123 Road | 01-2019
444 Str | 01-2019
123 Road | 03-2019
123 Road | 05-2019
123 Road | 07-2019
444 Str | 07-2019
etc.
In SQL this would be a union join, in powerquery. I think it's an append, but I'm not sure how to go about it. i.e. how to select columns from a table, then append a table with a different selection. I can append the full table easily, but not certain columns.
Select the date columns and do Transform > Unpivot Columns.
Then you can rename the Value column to Date, remove the Attribute column if you want, and sort as desired.

Store result of query in an array in shell scripting

I want to store row results of a query in array in unix scripting.
I tried this :
array=`sqlplus -s $DB <<eof
select customer_id from customer;
eof`;
When i tried to print it , it shows me this result:
echo ${array[0]};
CUSTOMER_ID ----------- 1 2 51 52 101 102 103 104 105 106 108 11 rows selected.
But I want to store each row as an element by excluding column_name and that "11 rows selected" sentence.
Thanks in Advance.
To create an array you need this syntax in BASH:
array=($(command))
or:
declare -a array=($(command))
For your sqlplus command:
array=($(sqlplus -s "$DB"<<eof
SET PAGESIZE 0;
select customer_id from customer;
eof))
and then print it as:
printf "%\n" "${array[#]}"
Just note that it is subject to all the shell expansions on white-spaces.

shell script inserting "$" into a formatted column and adding new column

Hi guys pardon for my bad English. I manage to display out my data nicely and neatly using column program in the code. But how do i add a "$" in the price column. Secondly how do i add a new column total sum to it and display it with "$". (Price * Sold)
(echo "Title:Author:Price:Quantity:Sold" && cat BookDB.txt) | column -s: -t
Output:
Title Author Price Quantity Sold
The Godfather Mario Puzo 21.50 50 20
The Hobbit J.R.R Tolkien 40.50 50 10
Romeo and Juliet William Shakespeare 102.80 200 100
The Chronicles of Narnia C.S.Lewis 35.90 80 15
Lord of the Flies William Golding 29.80 125 25
Memories of a Geisha Arthur Golden 35.99 120 50
I guess you could do it with awk (line break added before && for readability
(echo "Title:Author:Price:Quantity:Sold:Calculated"
&& awk -F: '{printf ("%s:%s:$%d:%d:%d:%d\n",$1,$2,$3,$4,$5,$3*$5)}' BookDB.txt) | column -s: -t

Resources