split element of array in multiple array bash - bash

I need to read a file into an array.
Then store in a new array only the first column of each line
example file:
aa,1,2,3
bb,4,5,2
cc,7,1,4
mapfile -t arrFile < file
so in arrFile, I got all the rows
${arrFile[0]} , returns 'aa,1,2,3'
echo ${arrFile[0]} | cut -d ";" -f1 returns 'aa'
How can I copy the firstcolumns from arrFile in another array, possibly without looping in a while

Why copy? Perhaps it is enough if you simply use ${arrFile[0]%%,*} ?
Or you can copy, using arr2=(${arrFile[#]%%,*})

Related

Bash array containing output from 'find' function is incorrectly structured

I am trying to create an array in bash that contains filenames for a subset of files stored in a single folder. I want the array to contain only filenames with the common string "zzz", and I want the array to contain one filename per element. I have been trying to use the find function to get filenames containing "zzz", and store the results in myarray.
Here is what I'm doing:
# Define folder containing files
file_dir=./my_files
# Define the common string
pattern="*zzz*"
# Store find output to myarray
readarray -d ' ' -t myarray < <(find ${file_dir} -name ${pattern})
# Print myarray
echo $myarray
Output:
./my_files/abc_zzz_1.nii.gz ./my_files/def_zzz_763.nii.gz ./my_files/ghi_zzz_628.nii.gz
myarray contains the correct filenames, however it does not appear to be structured in a way that allows indexing - I would like to be able to index the nth filename in myarray with ${myarray[n]}, however it seems that the full output from find is stored in a single element. echo ${myarray[0]} prints the same output as above, while echo ${myarray[1]} prints an empty line.
I figured that the whole output from find was being stored as a single string in ${myarray[0]}, so I tried to break the string up using:
read -r -a myarray2 <<< "${myarray[0]}"
...but this did not work as intended, because echo ${myarray2} only returns a single filename.
What am I doing wrong here?

Reading numeric values from grep output in bash

I have a file filled by rows of text, I'm interested about a group of these, every line starts with the same word, in each line there are two numbers i have to elaborate later, and they are always in the same position, for example:
Round trip time was 49.9721 milliseconds in repetition 5 tcp_ping received 128 bytes back
I was thinking about trying to use grep to grab the rows wanted into a new file, and then put the content of this new file into an array, to easily access it during the elaboration, but this isn't working, any tips?
#!/bin/bash
InputFile="../data/N.dat"
grep "Round" ../data/tcp_16.out > "$InputFile"
IFS=' ' read -a array <<< "$InputFile"
If they're all you care about, you can read only the numbers in.
I'd also strongly suggest extracting the values you're going to be analyzing into arrays, like so, rather than storing the full lines as strings:
ms_time_arr=( ) # array: map repetitions to ms_time
bytes_arr=( ) # array: map repetitions to bytes
while read -r ms_time repetition bytes_back _; do
# log to stderr to show that we read the data
echo "At $ms_time ms, repetition $repetition, got $bytes_back back" >&2
ms_time_arr[$repetition]=$ms_time
bytes_arr[$repetition]=$bytes_back
done < <(grep -e 'Round' <../data/N.dat | tr -d '[[:alpha:]]')
# more logging, to show that array contents survive the loop
declare -p ms_time_arr bytes_arr
This works by using tr to remove all alpha characters, leaving only numbers, punctuation and whitespace.

Want to sort a file based on another file in unix shell

I have 2 files refer.txt and parse.txt
refer.txt contains the following
julie,remo,rob,whitney,james
parse.txt contains
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0
Now my output.txt should list the files in parse.txt based on the order specified in refer.txt
ex of output.txt should be:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
i have tried the following code:
sort -nru refer.txt parse.txt
but no luck.
please assist me.TIA
You can do that using gnu-awk:
awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
{s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt
Output:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
Explanation:
-F/ # Use field separator as /
-v RS=',|\n' # Use record separator as comma or newline
NR == FNR { # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the
# records with keys julie, remo, rob etc.
}
{ # while processing the second file refer.txt
s = (s)?s "," a[$1]:a[$1] # aggregate all values by reading key from 2nd file
}
END {print s } # print all the values
In pure native bash (4.x):
# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt
# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[#]}"; do
key=${value%%/*}
if [[ ${kv[$key]} ]]; then
kv[$key]+=",$value" # already exists, comma-separate
else
kv[$key]="$value"
fi
done
# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[#]}"; do
out+=( "${kv[$value]}" )
done
# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt
If you're getting more output fields than you have input fields, you're probably trying to run this with bash 3.x. Since associative array support is mandatory for correct operation, this won't work.
tr , "\n" refer.txt | cat -n >person_id.txt # 'cut -n' not posix, use sed and paste
cat person_id.txt | while read person_id person_key
do
print "$person_id" > $person_key
done
tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt
cat person_data.txt | while read foreign_key person_data
do
person_id="$(<$foreign_key)"
print "$person_id" " " "$person_data" >>merge.txt
done
sort merge.txt >output.txt
A text book data processing approach, a person id table, a person data table, merged on a common key field, which is the first name of the person:
[person_key] [person_id]
- person id table, a unique sortable 'id' for each person (line number in this instance, since that is the desired sort order), and key for each person (their first name)
[person_key] [person_data]
- person data table, the data for each person indexed by 'person_key'
[person_id] [person_data]
- a merge of the 'person_id' table and 'person_data' table on 'person_key', which can then be sorted on person_id, giving the output as requested
The trick is to implement an associative array using files, the file name being the key (in this instance 'person_key'), the content being the value. [Essentially a random access file implemented using the filesystem.]
This actually adds a step to the otherwise simple but not very efficient task of grepping parse.txt with each value in refer.txt - which is more efficient I'm not sure.
NB: The above code is very unlikely to work out of the box.
NBB: On reflection, probably a better way of doing this would be to use the file system to create a random access file of parse.txt (essentially an index), and to then consider refer.txt as a batch file, submitting it as a job as such, printing out from the parse.txt random access file the data for each of the names read in from refer.txt in turn:
# 1) index data file on required field
cat person_data.txt | while read data
do
key="$(print "$data" | sed 's/(^[^\/]*)/\1/')" # alt. `cut -d'/' -f1` ??
print "$data" >>./person_data/"$key"
done
# 2) run batch job
cat refer_data.txt | while read key
do
print ./person_data/"$key"
done
However having said that, using egrep is probably just as rigorous a solution or at least for small datasets, I would most certainly use this approach given the specific question posed. (Or maybe not! The above could well prove faster as well as being more robust.)
Command
while read line; do
grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -
Key points
For both files, newlines are translated to commas using the tr command (without actually changing the files themselves). This is useful because while read and grep work under the assumption that your records are separated by newlines instead of commas.
while read will read in every name from refer.txt, (i.e julie, remo, etc.) and then use grep to retrieve lines from parse.txt containing that name.
The ^ in the regex ensures matching is only performed from the start of the string and not in the middle (thanks to #CharlesDuffy's comment below), and the -w option for grep allows whole-word matching only. For example, this ensures that "rob" only matches "rob/..." and not "robby/..." or "throb/...".
The paste command at the end will comma-separate the results. Removing this command will print each result on its own line.

argument length is too big. How to chunk it up?

I have this python code which will take a filename and set of offsets (comma separated) and will read the corresponding lines defined in the offsets.
do
python fileOffset.py /mnt/media1/file $offsets >> tmpfile
done
$offsets will provide the string which is comma separated which contain the filepointers ( 12,123,121134). This works fine until I get a very lengthy string of offsets which will throw a argument list too long error. As a solution I have written the following code which will split the offsets and call the fileOffset.py one for one offset.
IFS=', ' read -a array <<< $offsets
for element in "${array[#]}"
do
python fileOffset.py /mnt/media1/$file $element >> tmpfile
done
But this makes processing of the file very slow. How could I make it faster?
You can use xargs :
IFS=', ' xargs read -a array <<< $offset
However, I'm with #FrederikPihil's comment: Use python at all as you are already spawning a python process on each iteration.

Iterate over a file using two values on the same line

I need pass a series of couples values which are arguments for a c++ software. So I wrote this script:
while read randomNumbers; do
lambda = $randomNumbers | cut -f1 -d ' '
mi = $randomNumbers | cut -f2 -d ' '
./queueSim mm1-queue $lambda $mi
done < "randomNumbers"
where the first arg is the first value for each line in the file "randomNumbers" and the second one in the second value (of course). I got a segfault and a "command not found".
How can I assign to lambda and mi valus got from the line and pass this variable to c++ software?
There's no need for cut. Let read split the line for you:
while read lambda mi; do
./queueSim mm1-queue $lambda $mi
done < randomNumbers
Note that it is also commonly used in conjunction with IFS to split the input line on different fields. For example, to parse /etc/passwd ( a file with colon separated lines ), you will often see:
while IFS=: read username passwd uid gid info home shell; do ...
I would recommend assigning the values like this:
lambda=$(echo $randomNumbers | cut -f1 -d ' ')
mi=$(echo $randomNumbers | cut -f2 -d ' ')
the way you do it, you actually try to run a command that is named like whatever is the current content of $randomNumbers.
Edit:
Another thing: since your columns are delimited by a whitespace character, you could also just read the entire line into an array whose elements are separated by whitespaces as well. One way to achieve this is:
columns=( $(echo "$randomNumbers" | grep -o "[^ ]*") )
./queueSim mm1-queue ${columns[#]::2}
The first line matches all substrings that are not containing any spaces separately and puts them into the array columns. The second line does the same thing as the corresponding one in your implementation: inserting the first two columns as parameters. Since is done with slicing: you take the entire array ${columns[#]}, but select a certain subsequence of it by applying the boundary ::2 on the right, which returns in every element of columns beginning from the left (position 0), that is not on a position >=2.

Resources