convert factor to numeric in bash - bash

What's the most efficient way to convert a factor vector (not all levels are unique) into a numeric vector in bash? The values in the numeric vector do not matter as long as each represents a unique level of the factor.
To illustrate, this would be the R equivalent to what I want to do in bash:
numeric<-seq_along(levels(factor))[factor]
I.e.:
factor
AV1019A
ABG1787
AV1019A
B77hhA
B77hhA
numeric
1
2
1
3
3
Many thanks.

It is most probably not the most efficient, but maybe something to start.
#!/bin/bash
input_data=$( mktemp )
map_file=$( mktemp )
# your example written to a file
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# create a map <numeric, factor> and write to file
idx=0
for factor in $( cat $input_data | sort -u )
do
echo $idx $factor
let idx=$idx+1
done > $map_file
# go through your file again and replace values with keys
while read line
do
key=$( cat $map_file | grep -e ".* ${line}$" | awk '{print $1}' )
echo $key
done < $input_data
# cleanup
rm -f $input_data $map_file
I initially wanted to use associative arrays, but it's a bash 4+ feature only and not available here and there. If you have bash 4 then you have one file less, which is obviously more efficient.
#!/bin/bash
# your example written to a file
input_data=$( mktemp )
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# declare an array
declare -a factor_map=($( cat $input_data | sort -u | tr "\n" " " ))
# go through your file replace values with keys
while read line
do
echo ${factor_map[#]/$line//} | cut -d/ -f1 | wc -w | tr -d ' '
done < $input_data
# cleanup
rm -f $input_data

Related

Grep -rl from a .txt list

I'm trying to locate a list of strings from a .txt file, the search target is a directory of multiple .csv (locating which .csv contain the string)
I already find how to do it manually:
grep -rl doggo C:\dirofcsv\
The next step is to to it from a list of hundreds of terms.
I tried grep -rl -f list.txt C:\dirofcsv < print.txt but I only have the last term printed.. I want to have the results lines by lines.
I'm missing something but I don't know where.
I'm working on windows with a term emulator.
EDIT: I've found how to list the terms from a file.Now I need to see which terms have which result like " doggo => file2, file4" did I need to write a loop ?
Thanks community.
grep -rl -f list.txt C:\dirofcsv >> print.txt
You are looking to append lines to the print.txt file and so will need to use >> as opposed to > which will overwrite what is already in the file.
To get the output listed in the output required in your edited requirement, you can use a loop redirected back into awk:
awk '/^FILE -/ { fil=$3; # When the output start with "FILE -" set fil to the third space delimited field
next # Skip to the next line
}
{ arr[fil][$0]="" # Set up a 2 dimensional array with the search term (fil) as the first index and the name of the file the second
}
END { for (i in arr) { # Loop through the array
printf "%s => ",i; First print the search term in the format required
for (j in arr[i]) {
printf "%s,",j # Print the file name followed by a comma
}
printf "\n" # Print a new line
}
}' <<< "$(while read line # Read list.txt line by line
do
echo "FILE - $line"; Echo a marker for identification in awk
grep -l "$line" C:\dirofcsv ; # Grep for the line
done < list.txt)" >> print.txt
One liner:
awk '/^FILE -/ { fil=$3;next } { arr[fil][$0]="" } END { for (i in arr) { printf "%s => ",i;for (j in arr[i]) { printf "%s,",j } printf "\n" } }' <<< "$(while read line;do echo "FILE - $line";grep -l "$line" C:\dirofcsv done < list.txt)" >> print.txt
I think you meant to pass the command as:
grep -rl -f list.txt C:\dirofcsv >> print.txt
Give it a shot. It should take all patterns from list.txt line by line and search in the directory C:\dirofcsv for files with matching patterns and print their names to print.txt file.
Try this for printing without a loop (just like you asked in comments ;-)
One Line Answer
dir=C:\dirofcsv
listfile=list.txt
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
Another solution, for explanation say:
unset li
readarray li -u <"$listfile"
quoted_commands="$(jq -R 'reduce inputs as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' \
--arg dir $dir \
<<< $(echo; printf "%s" "${li[#]}"))"
quoted_commands=${quoted_commands%\"}
commands=${quoted_commands#\"}
eval $commands
Breaking down the command for better explaination in comments:
# read contents of listfile in li
unset li && readarray li -u <"$listfile"
# add the content to new list so that it prints the list elements in new-lines
# also add a newline at top as it will be discarded by jq (in this case only)
list="$(echo; printf "%s" "${li[#]}";)"
# pass jq command
quoted_commands="$(jq -R 'reduce inputs as $line
([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"])
| (join("; "))' \
--arg dir $dir <<< "$list")"
# the elements are read with reduce filter and converted to JSON Array of corresponding commands to execute
# the commands for all elements of list are joined with join filter
# trim quotes to execute commands properly
commands=$(sed -e 's/^"//' -e 's/"$//' <<< "$quoted_commands")
# run commands
eval "$commands"
You may want to print the above variables. Take care to use quotes in echo/printf while doing so, i.e., echo "$variable".
Replacement of sed command:
signgle_quoted=${quoted%\"}
commands=${signgle_quoted#\"}
echo "$commands"
I am now using the following implementations (though the dictionary implementation uses a for loop, the key : value implementation doesn't, and is a single line command):
# print an Associative bash array as a JSON dictionary
print_dict()
{
declare -n ref
ref=$1
for k in $(echo "${!ref[#]}")
do
printf '{"name":"%s", "value":"%s"}\n' "$k" "${ref[$k]}"
done | jq -s 'reduce .[] as $i ({}; .[$i.name] = $i.value)'
}
#-------------------------------------------------------------------------
# print the grep output in key : value format
function list_grep()
{
local listfile=$1
local dir=$2
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
}
#-------------------------------------------------------------------------
# print the grep output as JSON dictionary
function dict_grep()
{
local listfile=$1
local dir=$2
eval declare -A Arr=\($(eval echo $(jq -Rrs 'split("\n") | map(select(length > 0)) | reduce .[] as $k ([]; . + ["[\($k)]=\\\"$(grep -rl \($k) tmp)\\\""]) | (join(" "))' --arg dir $dir < tmp/list.txt))\)
print_dict Arr
}
#-------------------------------------------------------------------------
# call:
list_grep $listfile $dir
dict_grep $listfile $dir
-Himanshu

Bash: Split two strings directly into associative array

I have two strings of same number of substrings divided by a delimiter.
I need to create key-value pairs from substrings.
Short example:
Input:
firstString='00011010:00011101:00100001'
secondString='H:K:O'
delimiter=':'
Desired result:
${translateMap['00011010']} -> 'H'
${translateMap['00011101']} -> 'K'
${translateMap['00100001']} -> 'O'
So, I wrote:
IFS="$delimiter" read -ra fromArray <<< "$firstString"
IFS="$delimiter" read -ra toArray <<< "$secondString"
declare -A translateMap
curIndex=0
for from in "${fromArray[#]}"; do
translateMap["$from"]="${toArray[curIndex]}"
((curIndex++))
done
Is there any way to create the associative array directly from 2 strings without the unneeded arrays and loop? Something like:
IFS="$delimiter" read -rA translateMap["$(read -ra <<< "$firstString")"] <<< "$secondString"
Is it possible?
A (somewhat convoluted) variation on #accdias's answer of assigning the values via the declare -A command, but will need a bit of explanation for each step ...
First we need to break the 2 variables into separate lines for each item:
$ echo "${firstString}" | tr "${delimiter}" '\n'
00011010
00011101
00100001
$ echo "${secondString}" | tr "${delimiter}" '\n'
H
K
O
What's nice about this is that we can now process these 2 sets of key/value pairs as separate files.
NOTE: For the rest off this discussion I'm going to replace "${delimiter}" with ':' to make this a tad bit (but not much) less convoluted.
Next we make use of the paste command to merge our 2 'files' into a single file; we'll also designate ']' as the delimiter between key/value mappings:
$ paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n')
00011010]H
00011101]K
00100001]O
We'll now run these results through a couple sed patterns to build our array assignments:
$ paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g'
[00011010]=H
[00011101]=K
[00100001]=O
What we'd like to do now is use this output in the typeset -A command but unfortunately we need to build the entire command and then eval it:
$ evalstring="typeset -A kv=( "$(paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g')" )"
$ echo "$evalstring"
typeset -A kv=( [00011010]=H
[00011101]=K
[00100001]=O )
If we want to remove the carriage returns and put on a single line we append another tr at the output from the sed command:
$ evalstring="typeset -A kv=( "$(paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g' | tr '\n' ' ')" )"
$ cat "${evalstring}"
typeset -A kv=( [00011010]=H [00011101]=K [00100001]=O )
At this point we can eval our auto-generated typeset -A command:
$ eval "${evalstring}"
And now loop through our array displaying the key/value pairs:
$ for i in ${!kv[#]}; do echo "kv[${i}] = ${kv[${i}]}"; done
kv[00011010] = H
kv[00100001] = O
kv[00011101] = K
Hey, I did say this would be a bit convoluted! :-)
It is probably not what you expect, but this works:
key_string="A:B:C:D"
val_string="1:2:3:4"
declare -A map
while [ -n "$key_string" ] && [ -n "$val_string" ]; do
IFS=: read -r key key_string <<<"$key_string"
IFS=: read -r val val_string <<<"$val_string"
map[$key]="$val"
done
for key in "${!map[#]}"; do echo "$key => ${map[$key]}"; done
It uses recursion in the read function to reassign the string value.
The downside of this method is that it destroys the original strings. The while-loop checks constantly if both strings have a non-zero length.
Next to the above in pure bash, you could any command to generate the associative array. See How do I populate a bash associative array with command output?
This generally looks like:
declare -A map="( $( magic_command ) )"
where the magic_command generates an output like
[key1]=val1
[key2]=val2
[key3]=val3
In this case we use the command:
paste -d "" <(echo "[${key_string//:/]=$'\n'[}]=") \
<(echo "${val_string//:/$'\n'}")
where we use bash substitution to replace the delimiter with a newline. However, any other magic_command might do. For completion:
key_string="A:B:C:D"
val_string="1:2:3:4"
declare -A map="( $(paste -d "" <(echo "[${key_string//:/]=$'\n'[}]=") \
<(echo "${val_string//:/$'\n'}")) )"
for key in "${!map[#]}"; do echo "$key => ${map[$key]}"; done
Both examples generate the following output
D => 4
C => 3
B => 2
A => 1
Not exactly the answer for what you asked but at least it is shorter:
key='00011010:00011101:00100001'
value='H:K:O'
ifs=':'
IFS="$ifs" read -ra keys <<< "$key"
IFS="$ifs" read -ra values <<< "$value"
declare -A kv
for ((i=0; i<${#keys[*]}; i++)); do
kv[${keys[i]}]=${values[i]}
done
As a side note, you can initialize an associative array in one step with:
declare -A kv=([key1]=value1 [key2]=value2 [keyn]=valuen)
But I don't know how to use that in your case.
If values in your strings won't use spaces i would suggest this approach
firstString='00011010:00011101:00100001'
secondString='H:K:O'
delimiter=':'
declare -A translateMap
firstArray=( ${firstString//$delimiter/' '} )
secondArray=( ${secondString//$delimiter/' '} )
for i in ${!firstArray[#]}; {
translateMap[firstArray[$i]}]=${secondArray[$i]}
}

Reading a file in a shell script and selecting a section of the line

This is probably pretty basic, I want to read in a occurrence file.
Then the program should find all occurrences of "CallTilEdb" in the file Hendelse.logg:
CallTilEdb 8
CallCustomer 9
CallTilEdb 4
CustomerChk 10
CustomerChk 15
CallTilEdb 16
and sum up then right column. For this case it would be 8 + 4 + 16, so the output I would want would be 28.
I'm not sure how to do this, and this is as far as I have gotten with vistid.sh:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r line
do
if [ "$occurance" = $(cut -f1 line) ] #line 10
then
sumTime+=$(cut -f2 line)
fi
done < "$filename"
so the execution in terminal would be
vistid.sh CallTilEdb
but the error I get now is:
/home/user/bin/vistid.sh: line 10: [: unary operator expected
You have a nice approach, but maybe you could use awk to do the same thing... quite faster!
$ awk -v par="CallTilEdb" '$1==par {sum+=$2} END {print sum+0}' hendelse.logg
28
It may look a bit weird if you haven't used awk so far, but here is what it does:
-v par="CallTilEdb" provide an argument to awk, so that we can use par as a variable in the script. You could also do -v par="$1" if you want to use a variable provided to the script as parameter.
$1==par {sum+=$2} this means: if the first field is the same as the content of the variable par, then add the second column's value into the counter sum.
END {print sum+0} this means: once you are done from processing the file, print the content of sum. The +0 makes awk print 0 in case sum was not set... that is, if nothing was found.
In case you really want to make it with bash, you can use read with two parameters, so that you don't have to make use of cut to handle the values, together with some arithmetic operations to sum the values:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r name value # read both values with -r for safety
do
if [ "$occurance" == "$name" ]; then # string comparison
((sumTime+=$value)) # sum
fi
done < "$filename"
echo "sum: $sumTime"
So that it works like this:
$ ./vistid.sh CallTilEdb
sum: 28
$ ./vistid.sh CustomerChk
sum: 25
first of all you need to change the way you call cut:
$( echo $line | cut -f1 )
in line 10 you miss the evaluation:
if [ "$occurance" = $( echo $line | cut -f1 ) ]
you can then sum by doing:
sumTime=$[ $sumTime + $( echo $line | cut -f2 ) ]
But you can also use a different approach and put the line values in an array, the final script will look like:
#!/bin/bash
declare -t filename=prova
declare -t occurance="$1"
declare -i sumTime=0
while read -a line
do
if [ "$occurance" = ${line[0]} ]
then
sumTime=$[ $sumtime + ${line[1]} ]
fi
done < "$filename"
echo $sumTime
For the reference,
id="CallTilEdb"
file="Hendelse.logg"
sum=$(echo "0 $(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1 +/p" < "$file") p" | dc)
echo SUM: $sum
prints
SUM: 28
the sed extract numbers from a lines containing the given id, such CallTilEdb
and prints them in the format number +
the echo prepares a string such 0 8 + 16 + 4 + p what is calculation in RPN format
the dc do the calculation
another variant:
sum=$(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1/p" < "$file" | paste -sd+ - | bc)
#or
sum=$(grep -oP "^$id\D*\K\d+" < "$file" | paste -sd+ - | bc)
the sed (or the grep) extracts and prints only the numbers
the paste make a string like number + number + number (-d+ is a delimiter)
the bc do the calculation
or perl
sum=$(perl -slanE '$s+=$F[1] if /^$id/}{say $s' -- -id="$id" "$file")
sum=$(ID="CallTilEdb" perl -lanE '$s+=$F[1] if /^$ENV{ID}/}{say $s' "$file")
Awk translation to script:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
sumtime=$(awk -v entry=$occurance '
$1==entry{time+=$NF+0}
END{print time+0}' $filename)

Elegant way to check for equal values within an array or any given textfile

Hello i'm fairly new to scripting, and struggling with trying to test/check if 4 lines in a textfile are equal to eachother, and i cannot figure this one out since comparison examples are all with two variables. i've come up with this:
#!/bin/sh
#check if mxf videofiles are older than 10 minutes and parse them into tclist.txt
find . -amin +10 |sed "s/^..//" >tclist.txt
#grep timecode and cut : from the output of mxfprobe and place that into variable TC
for z in $(cat tclist.txt); do TC=$(mxfprobe -i "$z" 2>&1 |grep timecode|sed "s/[^0-9]*//"|sed "s/://"|sed "s/://"|sed "s/://")
echo $TC >>offsetcheck.txt
done;
The output of offsetcheck.txt then looks like this:
10194013
10194013
10194014
10194014
How can i test if those 4 values are equal to eachother? (in this example two files are drifted one frame)
I've tried to place those values into an array and check them for uniqueness...
exec 10<&0
exec < offsetcheck.txt
let count=0
while read LINE; do
ARRAY[$count]=$LINE
((count++))
done
echo ${ARRAY[#]}
exec 0<&10 10<&-
if ($ARRAY !== array_unique($ARRAY))
{
echo There were duplicate values
}
... struggling with trying to test/check if 4 lines in a textfile are
equal to eachother
You could use sort and wc to determine the number of unique values in the file. The following would tell whether the file contains unique values or not:
(( $(sort -u offsetcheck.txt | wc -l) == 1 )) && echo "File contains unique values" || echo "File does not contain unique values"
If you wanted to do the same for an array, you could say:
for i in "${ARRAY[#]}"; do echo "$i" ; done | sort -u | wc -l
to get the number of unique values in the array.
If the values in the array are guaranteed not to have any space, then saying:
echo "${ARRAY[#]}" | tr ' ' '\n' | sort -u | wc -l
would suffice. (But note the if above.)
Looks to me like the whole process can be reduced to
n=$(
find . -amin +10 |
sed "s/^..//" |
xargs -I FILE mxfprobe -i "FILE" 2>&1 |
grep -h timecode |
sed 's/[^0-9]//g' |
sort -u |
wc -l
)
Then check if n == 1

Simple method to shuffle the elements of an array in BASH shell?

I can do this in PHP but am trying to work within the BASH shell. I need to take an array and then randomly shuffle the contents and dump that to somefile.txt.
So given array Heresmyarray, of elements a;b;c;d;e;f; it would produce an output file, output.txt, which would contain elements f;c;b;a;e;d;
The elements need to retain the semicolon delimiter. I've seen a number of bash shell array operations but nothing that seems even close to this simple concept. Thanks for any help or suggestions!
The accepted answer doesn't match the headline question too well, though the details in the question are a bit ambiguous. The question asks about how to shuffle elements of an array in BASH, and kurumi's answer shows a way to manipulate the contents of a string.
kurumi nonetheless makes good use of the 'shuf' command, while siegeX shows how to work with an array.
Putting the two together yields an actual "simple method to shuffle the elements of an array in BASH shell":
$ myarray=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
$ myarray=( $(shuf -e "${myarray[#]}") )
$ printf "%s" "${myarray[#]}"
d;b;e;a;c;f;
From the BashFaq
This function shuffles the elements of an array in-place using the Knuth-Fisher-Yates shuffle algorithm.
#!/bin/bash
shuffle() {
local i tmp size max rand
# $RANDOM % (i+1) is biased because of the limited range of $RANDOM
# Compensate by using a range which is a multiple of the array size.
size=${#array[*]}
max=$(( 32768 / size * size ))
for ((i=size-1; i>0; i--)); do
while (( (rand=$RANDOM) >= max )); do :; done
rand=$(( rand % (i+1) ))
tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
done
}
# Define the array named 'array'
array=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
shuffle
printf "%s" "${array[#]}"
Output
$ ./shuff_ar > somefile.txt
$ cat somefile.txt
b;c;e;f;d;a;
If you just want to put them into a file (use redirection > )
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;a;e;f;b;c;
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n" > output.txt
If you want to put the items in array
$ array=( $(echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d " " ) )
$ echo ${array[0]}
e;
$ echo ${array[1]}
d;
$ echo ${array[2]}
a;
If your data has &#abcde;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;c;f;&#abcde;e;a;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
&#abcde;f;a;c;d;e;

Resources