How can I create array of lines in this case? - bash

Given a file so that in any line can be more than one word, and exists a single space between any word to other, for example:
a a a a
b b b b
c c
d d
a a a a
How can I create array so that in the cell number i will be the line number i , but WITHOUT DUPLICATES BETWEEN THE ELEMENTS IN THE ARRAY !
In according to the file above, we will need create this array:
Array[0]="a a a a" , Array[1]="b b b b" , Array[2]="c c" , Array[3]=d d.
(The name of the file pass to the script as argument).
I know how to create array that will contain all the lines. Something like that:
Array=()
while read line; do
Array=("${Array[#]}" "${line}")
done < $1
But how can I pass to the while read.. the sorting (and uniq) output of the file?

You should be able to use done < <(sort "$1" | uniq) in place of done < $1.
The <() syntax creates a file-like object from a subshell to execute a separate set of commands.

Related

How do I print out 2 separate arrays with new lines in bash script

So basically I want to be able to print out 2 separate arrays with newlines between each element.
Sample output I'm looking for:
a x
b y
(a,b being apart of one array x,y being a separate array)
Currently im using:
printf "%s\n" "${words[#]} ${newWords[#]}"
But the output comes out like:
a
b x
y
As bash is tagged, you could use paste from GNU coreutils with each array as an input:
$ words=(a b)
$ newWords=(x y)
$ paste <(printf '%s\n' "${words[#]}") <(printf '%s\n' "${newWords[#]}")
a x
b y
TAB is the default column separator but you can change it with option -d.
If you have array items that might contain newlines, you can switch to e.g. NUL-delimited strings by using the -z flag and producing each input using printf '%s\0'.
What does "${words[#]} ${newWords[#]}" produce? Let's put that expansion into another array and see what's inside it:
words=(a b)
newWords=(x y)
tmp=("${words[#]} ${newWords[#]}")
declare -p tmp
declare -a tmp=([0]="a" [1]="b x" [2]="y")
So, the last element of the first array and the first element of the second array are joined as a string; the other elements remain individual.
paste with 2 process substitutions is a good way to solve this. If you want to do it in plain bash, iterate over the indices of the arrays:
for idx in "${!words[#]}"; do
printf '%s\t%s\n' "${words[idx]}" "${newWords[idx]}"
done

Bash: nested loop one way comparison

I have one queston about nested loop with bash.
I have an input files with one file name per line (full path)
I read this file and then i make a nest loop:
for i in $filelines ; do
echo $i
for j in $filelines ; do
./program $i $j
done
done
The program I within the loop is pretty low.
Basically it compare the file A with the file B.
I want to skip A vs A comparison (i.e comparing one file with itslef) AND
I want to avoid permutation (i.e. for file A and B, only perform A against B and not B against A).
What is the simplest to perform this?
Version 2: this one takes care of permutations
#!/bin/bash
tmpunsorted="/tmp/compare_unsorted"
tmpsorted="/tmp/compare_sorted"
>$tmpunsorted
while read linei
do
while read linej
do
if [ $linei != $linej ]
then
echo $linei $linej | tr " " "\n" | sort | tr "\n" " " >>$tmpunsorted
echo >>$tmpunsorted
fi
done <filelines
done <filelines
sort $tmpunsorted | uniq > $tmpsorted
while read linecompare
do
echo "./program $linecompare"
done <$tmpsorted
# Cleanup
rm -f $tmpunsorted
rm -f $tmpsorted
What is done here:
I use the while loop to read each line, twice, i and j
if the value of the lines is the same, forget them, no use to consider them
if they are different, output them into a file ($tmpunsorted). And they are sorted in alphebetical order before going tothe $tmpunsorted file. This way the arguments are always in the same order. So a b and b a will be same in the unsorted file.
I then apply sort | uniq on $tmpunsorted, so the result is a list of individual argument pairs.
finally loop on the $tmpsorted file, and call the program on each individual pair.
Since I do not have your program, I did an echo, which you should remove to use the script.

read column from csv file in terminal ignoring the header

I'm writting a simple .ksh file to read a single column from a .csv file and then printing the output to the screen:
fname=($(cut -d, -f2 "myfile.csv"))
# loop through these names
for i in ${fname[#]};
do echo "$i"
done
This works fine but I don't want to return the header row, that is the first row of the file. How would I alter the cut command so that it ignore the first value or string. In this case the header is called 'NAME'. I want to print all of the other rows of this file.
That being said, is it easier to loop through from 2:fname as the code is currently written or is it best to alter the cut command?
You could do
fname=($(sed 1d myfile.csv | cut -d, -f2))
Alternately, the index of the first element of the array is 0: to start the loop at index 1:
for i in "${fname[#]:1}"; do
Demo:
$ a=(a b c d e f)
$ echo "${a[#]:1}"
b c d e f
Note, you should always put the array expansion in double quotes.

BASH: "while read line ???"

I understand the format below...
while read line
do
etc...
However, I saw this yesterday and haven't been able to figure out what var would be in the following:
while read pkg var
do
etc...
Thanks
while loop will read the var one by one , but assign the last parts to one var.
For example, I have a file like:
a b c d
when run the command
$ while read x y
do
echo $x
echo $y
done < file
Resule:
a
b c d
You will get "b c d" to $y.
Of course, if you only assign one var (line), then $line will get the whole line.
The read builtin will read multiple whitespace-separated (or, really, separated by whatever is in $IFS) values.
echo a b c | (read x y z; echo "$y")
#=> b
If there are more fields than variables passed to read, the last variable gets the rest of the line.

Compare Lines of file to every other line of same file

I am trying to write a program that will print out every line from a file with another line of that file added at the end, basically creating pairs from a portion of each line. If the line is the same, it will do nothing. Also, it must avoid repeating the same pairs. A B is the same as B A
In short
FileInput:
otherstuff A
otherstuff B
otherstuff C
otherstuff D
Output:
A B
A C
A D
B C
B D
C D
I was trying to do this with a BASH script, but was having trouble because I could not get my nested while loops to work. It would read the first line, compare it to each other line, and then stop (Basically only outputting the first 3 lines in the example output above, the outer while loop only ran once).
I also suspect I might be able to do this using MATLAB, so suggestions using that are also welcome.
Here is the bash script that I have thus far. As I said, it is no printing out correctly for me, as the outer loop only runs once.
#READS IN file from terminal
FILE1=$1
#START count at 0
count0=
exec 3<&0
exec 0< $FILE1
while read LINEa; do
while read LINEb; do
eventIDa=$(echo $LINEa | cut -c20-23)
eventIDb=$(echo $LINEb | cut -c20-23)
echo $eventIDa $eventIDb
done
done
Using bash:
#!/bin/bash
[ -f "$1" ] || { echo >&2 "File not found"; exit 1; }
mapfile -t lines < <(cut -c20-23 <"$1" | sort | uniq)
for i in ${!lines[#]}; do
elem1=${lines[$i]}
unset lines[$i]
for elem2 in "${lines[#]}"; do
echo "$elem1" "$elem2"
done
done
This will read a file given as a parameter on the command line, sort and filter out duplicates, and output all combinations. You can modify the parameter to cut to adjust to your particular input file.
Due to the particular way you seem to indent to use cut, your input example above won't work. Instead, use something with the correct line length, such as:
123456789012345678 A
123456789012345678 B
123456789012345678 C
123456789012345678 D
Assuming the otherstuff is not relevant (otherwise you can of course add it later) this should do the trick in Matlab:
combnk({'A' 'B' 'C' 'D'},2)

Resources