Shell script for getting limited Values from Delimited Values - bash

I need a shell script for the below.
I have 14 variables as:
scenario 1
input:
a###b###c###d###e###f###g###e###f###g###h###i###j###k###l###m###n
scenario 2
input:
a###b###c###d###e###f###g###e###f###g###h###i###j###k###l###m###n###
I want output as
op1 = a
op2 = b
op3 = c
op4 = d
op5 = e
op6 = g
op7 = f
op8 = h
op9 = i
op10 = j
op11 = k
op12 = l
op13 = m
op14 = n
Here op1 to op14 are variable in which i have to store the values.

First replace ### with a single, unique character delimiter e.g. #, then use read to read it into an array. The elements of the array contain the characters. This is shown below.
$ input="a###b###c###d###e###f###g###e###f###g###h###i###j###k"
$ IFS='#' read -a arr <<< "${input//###/#}"
$ echo ${arr[0]}
a
$ echo ${arr[1]}
b
$ echo ${arr[13]}
k
# print out the whole array
$ for (( i=0; i<${#arr[#]}; i++ ))
> do
> echo "Element at index $i is ${arr[i]}"
> done
Element at index 0 is a
Element at index 1 is b
Element at index 2 is c
Element at index 3 is d
Element at index 4 is e
Element at index 5 is f
Element at index 6 is g
Element at index 7 is e
Element at index 8 is f
Element at index 9 is g
Element at index 10 is h
Element at index 11 is i
Element at index 12 is j
Element at index 13 is k

Related

How do i change an array of numbers into corresponding letters of the alphabet

I have an array called variable that contains the numbers 1-26, i am trying to use a for loop in bash to go through each number of the array and associating it with a letter from the alphabet as tr only lets me translate the first few letters of the alphabet. An example of my code is
Note: i am using bash
#!/bin/bash
for p1 in "${variable[#]}"; do
if (( $p1 == 1 )); then
newvar+='a'
elif (( $p1 == 2 )); then
newvar+='b'
...... and so on down to z
i am trying to create the string newvar which contains these translated letters. However when i try to run this it only shows me a which is the very first number translated. Why doesn't this work?
for p1 in "${variable[#]}"; do
chars+=( $((p1 + 96)) )
done
printf '%b' $(printf '\\%03o' ${chars[#]})
Maybe:
# alphabet=(a b c d e f g h i j k l m n o p q r s t u v w x y z)
alphabet=({a..z})
letters=(8 5 12 12 15 23 15 18 12 4)
phrase=''
for i in "${letters[#]}"; do
phrase+="${alphabet[i-1]}"
done
echo $phrase
helloworld

ls command, default (alphabetical) sorting order

here a piece of code :
$> ls
` = _ ; ? ( ] # \ % 1 4 7 a B d E g H J l M o P r S u V x Y
^ > - : ' ) { $ & + 2 5 8 A c D f G i k L n O q R t U w X z
< | , ! " [ } * # 0 3 6 9 b C e F h I K m N p Q s T v W y Z
I'm printing all ASCII character, each element is a folder, and I'm trying to understand the default sorting order of the ls command.
I understand that's there is a case insensitive comparison to sort alphabetic character, with digit coming first.
I've some trouble to understand how special character are sorted, and I'm not able to find something clear. I was thinking it could be related to the ASCII table, but when we see how things are ordered it really make no sens with it... Where is this order coming from ?
Thanks

gsub many columns simultaneously based on different gsub conditions?

I have a file with the following data-
Input-
A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B
If any of the other rows starting from row 2 have the same letter as row 1, they should be changed to 1. Basically, I'm trying to find out how similar any of the rows are to the first row.
Desired Output-
1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B
The first row has become all 1 since it is identical to itself (obviously). In the second row, the first and second columns are identical to the first row (A B) and hence they become 1 1. And so on for the other rows.
I have written the following code which does this transformation-
for seq in {1..1} ; #Iterate over the rows (in this case just row 1)
do
for position in {1..6} ; #Iterate over the columns
do
#Define the letter in the first row with which I'm comparing the rest of the rows
aa=$(awk -v pos=$position -v line=$seq 'NR == line {print $pos}' f)
#If it matches, gsub it to 1
awk -v var=$aa -v pos=$position '{gsub (var, "1", $pos)} 1' f > temp
#Save this intermediate file and now act on this
mv temp f
done
done
As you can imagine, this is really slow because that nested loop is expensive. My real data is a 60x10000 matrix and it takes about 2 hours for this program to run on that.
I was hoping you could help me get rid of the inner loop so that I can do all 6 gsubs in a single step. Maybe putting them in an array of their own? My awk skills aren't that great yet.
You can use this simpler awk command to do the job which will be faster to complete as we are avoiding nested loops in shell and also invoking awk repeatedly in nested loop:
awk '{for (i=1; i<=NF; i++) {if (NR==1) a[i]=$i; if (a[i]==$i) $i=1} } 1' file
1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B
EDIT:
As per the comments below here is what you can do to get the sum of each column in each row:
awk '{sum=0; for (i=1; i<=NF; i++) { if (NR==1) a[i]=$i; if (a[i]==$i) $i=1; sum+=$i}
print $0, sum}' file
1 1 1 1 1 1 6
1 1 B B B B 2
C A 1 1 1 1 4
1 1 D E F A 2
1 A A A A 1 2
1 1 1 B B B 3
Input
$ cat f
A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B
Desired o/p
$ awk 'FNR==1{split($0,a)}{for(i=1;i<=NF;i++)if (a[i]==$i) $i=1}1' f
1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B
Explanation
FNR==1{ .. }
When awk reads first record of current file, do things inside braces
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces
in array and the separator strings in the seps array.
split($0,a)
split current record or row ($0) into pieces by fieldsep (defualt space, as
we have not supplied 3rd argument) and store the pieces in array a
So array a contains data from first row
a[1] = A
a[2] = B
a[3] = C
a[4] = D
a[5] = E
a[6] = F
for(i=1;i<=NF;i++)
Loop through all the fields of for each record of file till end of file.
if (a[i]==$i) $i=1
if first row's column value of current index (i) is equal to
current column value of current row set current column value = 1 ( meaning modify current column value )
Now we modified column value next just print modified row
}1
1 always evaluates to true, it performs default operation {print $0}
For update request on comment
Same question here, I have a second part of the program that adds up
the numbers in the rows. I.e. You would get 6, 2, 4, 2, 2, 3 for this
output. Can your program be tweaked to get these values out at this
step itself?
$ awk 'FNR==1{split($0,a)}{s=0;for(i=1;i<=NF;i++)if(a[i]==$i)s+=$i=1;print $0,s}' f
1 1 1 1 1 1 6
1 1 B B B B 2
C A 1 1 1 1 4
1 1 D E F A 2
1 A A A A 1 2
1 1 1 B B B 3

Sorting in O(n) intersect

Let S1 and S2 be two sets of integers (they are not necessarily disjoint).
We know
that |S1| = |S2| = n (i.e. each set has n integers).
Each set is stored in an array of length n, where
its integers are sorted in ascending order.
Let k ≥ 1 be an integer.
Design an algorithm to find the
k smallest integers in S1 ∩ S2 in O(n) time.
This is what I have so far:
Create a new array called Intersection
For each e in S1 add e to hashset in O(n) time
For each e in S2 check if e exists in hashset in O(n) time
If e exists in hashset add e to Intersection
Once comparisons are done sort Intersection by count sort in O(n) time
return the first k integers
Thus O(n) + O(n) + O(n) = O(n)
Am I on the right track?
Yes, you're definitely on the right track but there's actually no need at all to generate a hash-table or extra set. As your two sets are already sorted, you can simply run an index/pointer through both of them, looking for the common elements.
For example, to find the first common element from the two sets, use the following pseudo-code:
start at first index of both sets
while more elements in both sets, and current values are different:
if set1 value is less than set2 value:
advance set1 index
else
advance set2 index
At the end of that, set1 index will refer to an intersect point provided that neither index has moved beyond the last element in their respective list. You can then just use that method in a loop to find the first x intersection values.
Here's a proof of concept in Python 3 that gives you the first three numbers that are in the two lists (multiples-of-two and multiples-of-three). The full intersection would be {0, 6, 12, 18, 24} but you will see that it will only extract the first three of those:
# Create the two lists to be used for intersection.
set1 = [i * 2 for i in range(15)] ; print(set1) # doubles
set2 = [i * 3 for i in range(15)] ; print(set2) # trebles
idx1 = 0 ; count1 = len(set1)
idx2 = 0 ; count2 = len(set2)
# Only want first three.
need = 3
while need > 0:
# Continue until we find next intersect or end of a list.
while idx1 < count1 and idx2 < count2 and set1[idx1] != set2[idx2]:
# Advance pointer of list with lowest value.
if set1[idx1] < set2[idx2]:
idx1 += 1
else:
idx2 += 1
# Break if reached end of a list with no intersect.
if idx1 >= count1 or idx2 >= count2:
break
# Otherwise print intersect and advance to next list candidate.
print(set1[idx1]) ; need -= 1
idx1 += 1 ; idx2 += 1
The output is, as expected:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42]
0
6
12
If you needed a list at the end rather than just printing out the intersect points, you would simply initialise an empty container before the loop and the append the value to it rather than printing it. This then becomes a little more like your proposed solution but with the advantage of not needing hash tables or sorting.
Create two arrays, call them arr1 and arr2, of size array_size and populate them with integer values in ascending order. Create two indexes, call them i and j, that will be used to iterate over arr1 and arr2 respectively and initialize them to 0. Compare the first two values of each array: if arr1[0] is less than arr2[0] then increment i, else if arr1[0] is greater than arr2[0] increment j, else the values intersect and we can return this value. Once we have returned k intersecting values we can stop iterating. In the worst case scenario this will be i + j, O(n) if no intersections occur between both sets of values and we will have to iterate to the end of each array.
Here is the solution in bash:
#!/bin/bash
#-------------------------------------------------------------------------------
# Design an algorithm to find the k smallest integers in S1 ∩ S2 in O(n) time.
#-------------------------------------------------------------------------------
typeset -a arr1 arr2 arr_answer
typeset -i array_size=20 k=5
function populate_arrs {
typeset -i counter=0
while [[ ${counter} -lt ${array_size} ]]; do
arr1[${counter}]=$((${counter} * 2))
arr2[${counter}]=$((${counter} * 3))
counter=${counter}+1
done
printf "%8s" "Set1: "; printf "%4d" ${arr1[*]}; printf "\n"
printf "%8s" "Set2: "; printf "%4d" ${arr2[*]}; printf "\n\n"
}
function k_smallest_integers_main {
populate_arrs
typeset -i counter=0 i=0 j=0
while [[ ${counter} -lt ${k} ]]; do
if [[ ${arr1[${i}]} -eq ${arr2[${j}]} ]]; then
arr_answer[${counter}]=${arr1[${i}]}
counter=${counter}+1; i=${i}+1; j=${j}+1
elif [[ ${arr1[${i}]} -lt ${arr2[${j}]} ]]; then
i=${i}+1
else
j=${j}+1
fi
done
printf "%8s" "Answer: "; printf "%4d" ${arr_answer[*]}; printf "\n"
}
k_smallest_integers_main
Output:
Set1: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
Set2: 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
Answer: 0 6 12 18 24
In Python:
i1= 0; i2= 0
while k > 0 and i1 < n and i2 < n:
if S1[i1] < S2[i2]:
i1+= 1
elif S1[i1] > S2[i2]:
i2+= 1
else:
Process(S1[i1], S2[i2])
i1+= 1; i2+= 1
k-= 1
Execution will perform less than k calls to Process if there aren't sufficiently many elements in the intersection.

Find the closest values: Multiple columns conditions

Following my first question here I want to extend the condition of find the closest value from two different files of the first and second column, and print specific columns.
File1
1 2 3 4 a1
1 4 5 6 b1
8 5 9 11 c1
File 2
1 1 3 a
1 2 5 b
1 2.1 4 c
1 4 6 d
2 4 5 e
9 4 1 f
9 5 2 g
9 6 2 h
11 10 14 i
11 15 5 j
So for example I need to find the closest value from $1 in file 2 for each $1 in file 1 but then search the closest also for $2.
Output:
1 2 a1*
1 2 b*
1 4 b1
1 4 d
8 5 c1
9 5 g
* First column file 1 and 2nd column file 2 because for the 1st column (of file 1) the closest value (from the 1st column of file 2) is 1, and the 2nd condition is that also must be the closest value for the second column which is this case is 2. And I print $1,$2,$5 from file 1 and $1,$2,$4 from file 2
For the other output is the same procedure.
The solution to find the closest it is in my other post and was given by #Tensibai.
But any solution will work.
Thanks!
Sounds a little convoluted but works:
function closest(array,searched) {
distance=999999; # this should be higher than the max index to avoid returning null
split(searched,skeys,OFS)
# Get the first part of key
for (x in array) { # loop over the array to get its keys
split(x,mkeys,OFS) # split the array key
(mkeys[1]+0 > skeys[1]+0) ? tmp = mkeys[1] - skeys[1] : tmp = skeys[1] - mkeys[1] # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
if (tmp < distance) { # if the distance if less than preceding, update
distance = tmp
found1 = mkeys[1] # and save the key actually found closest
}
}
# At this point we have the first part of key found, let's redo the work for the second part
distance=999999;
for (x in array) {
split(x,mkeys,OFS)
if (mkeys[1] == found1) { # Filter on the first part of key
(mkeys[2]+0 > skeys[2]+0) ? tmp = mkeys[2] - skeys[2] : tmp = skeys[2] - mkeys[2] # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
if (tmp < distance) { # if the distance if less than preceding, update
distance = tmp
found2 = mkeys[2] # and save the key actually found closest
}
}
}
# Now we got the second field, woot
return (found1 OFS found2) # return the combined key from out two search
}
{
if (NR>FNR) { # If we changed file (File Number Record is less than Number Record) change array
b[($1 OFS $2)] = $4 # make a array with "$1 $2" as key and $4 as value
} else {
key = ($1 OFS $2) # Make the key to avoid too much computation accessing it later
akeys[max++] = key # store the array keys to ensure order at end as for (x in array) does not guarantee the order
a[key] = $5 # make an array with the key stored previously and $5 as value
}
}
END { # Now we ended parsing the two files, print the result
for (i in akeys) { # loop over the array of keys which has a numeric index, keeping order
print akeys[i],a[akeys[i]] # print the value for the first array (key then value)
if (akeys[i] in b) { # if the same key exist in second file
print akeys[i],b[akeys[i]] # then print it
} else {
bindex = closest(b,akeys[i]) # call the function to find the closest key from second file
print bindex,b[bindex] # print what we found
}
}
}
Note I'm using OFS to combine the fields so if you change it for output it will behave properly.
WARNING: This should do with relative short files, but as now the array from second file is traversed twice it will be twice long for each searchEND OF WARNING
There's place for a better search algorithm if your files are sorted (but it was not the case on previous question and you wished to keep the order from the file). First improvement in this case, break the for loop when distance start to be greater than preceding one.
Output from your sample files:
$ mawk -f closest2.awk f1 f2
1 2 a1
1 2 b
1 4 b1
1 4 d
8 5 c1
9 5 g

Resources