How do I translate a list using a dictionary in bash? - bash

Say I have a dictionary TSV file dict.txt:
apple pomme
umbrella parapluie
glass verre
... ...
and another file list.txt containing a list of words (from the left column of dict.txt):
pie
apple
blue
...
I'd like to translate them into the corresponding words from the right column of dict.txt, i.e:
tarte
pomme
bleu
...
what is the easiest way to do so?

You can use awk:
awk 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt
EDIT: If there is a requirement to have multi words (separated by spaces) as word meaning in the dictionary using tab se field separator you can use:
awk -F '\t' 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt

If you don't have many words (so that everything fits in memory) you can use an associative array:
#!/bin/bash
declare -A english2french=()
# Build dictionary
linenb=0
while ((++linenb)) && IFS=$'\t' read -r en fr; do
if [[ -z $fr ]] || [[ -z $en ]]; then
echo "Error line $linenb: one of the two is empty fr=\`$fr' en=\`$en'"
continue
fi
english2french["$en"]=$fr
done < dict.txt
# Translate
linenb=0
while ((++linenb)) && read -r en; do
[[ -z $en ]] && continue
fr=${english2french["$en"]}
if [[ -n $fr ]]; then
echo "$fr"
else
echo >&2 "Error line $linenb: word \`$en' unknown"
fi
done < list.txt
It seems a bit long, but there are lots of error checks ;).

Related

Get first character of each string with BASH_REMATCH

I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.
A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"
Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"
First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF

Using a FOR loop to compare items in a list with items in an ARRAY

Without too much fluff, basically I'm creating an array of IP addresses from a user provided file. Then I have another file with three columns of data and multiple lines, the first column is IP addresses.
What I'm trying to do is loop through the file with 3 columns of data and compare the IP addresses with the values in the arrary, and if a value is present from file in the array, to then print some text as well as the 3rd column from that line of the file.
I have a feeling I'm taking a really wrong approach and making things a lot harder than what they need to be!
Semi-Pseudo code below
#!/bin/bash
scopeFile=$1
data=$2
scopeArray=()
while IFS= read -r line; do
scopeArray+=("$line")
done <$1
for line in $2; do
if [[ $line == scopeArray ]]; then
awk '{print $3 " is in scope!"}' $2;
else
echo "$line is NOT in scope!"
fi;
done
EDIT: Added example files for visulisation for context, data.txt
file is dynamically generated elsewhere but the format is always the same.
scope.txt=$1
192.168.0.14
192.168.0.15
192.168.0.16
data.txt=$2
192.168.0.14 : example.com
192.168.0.15 : foobar.com
192.168.0.19 : test.com
Here is one way of doing what you wanted.
#!/usr/bin/env bash
mapfile -t scopeArray < "$1"
while read -r col1 col2 col3; do
for item in "${!scopeArray[#]}"; do
if [[ $col1 == "${scopeArray[item]}" ]]; then
printf '%s is in scope!\n' "$col3"
unset 'scopeArray[item]' && break
else
printf '%s is not is scope!\n' "$col1" >&2
fi
done
done < "$2"
The shell is not the best if not the right tool for comparing files, but it will get you there slowly but surely.
mapfile is a bash4+ feature jyfi.

Running math, ignoring non-numeric values

I am trying to do some math on 2nd column of a txt file , but some lines are not numbers , i only want to operate on the lines which have numbers .and keep other line unchanged
txt file like below
aaaaa
1 2
3 4
How can I do this?
Doubling the second column in any line that doesn't contain any alphabetic content might look a bit like the following in native bash:
#!/bin/bash
# iterate over lines in input file
while IFS= read -r line; do
if [[ $line = *[[:alpha:]]* ]]; then
# line contains letters; emit unmodified
printf '%s\n' "$line"
else
# break into a variable for the first word, one for the second, one for the rest
read -r first second rest <<<"$line"
if [[ $second ]]; then
# we extracted a second word: emit it, doubled, between the first word and the rest
printf '%s\n' "$first $(( second * 2 )) $rest"
else
# no second word: just emit the whole line unmodified
printf '%s\n' "$line"
fi
fi
done
This reads from stdin and writes to stdout, so usage is something like:
./yourscript <infile >outfile
thanks all ,this is my second time to use this website ,i find it is so helpful that it can get the answer very quickly
I also find a answer below
#!/bin/bash
FILE=$1
while read f1 f2 ;do
if[[$f1 != *[!0-9]*]];then
f2=`echo "$f2 -1"|bc` ;
echo "$f1 $f2"
else
echo "$f1 $f2"
fi
done< %FILE

bash: read line and keep spaces

I am trying to read lines from a file containing multiple lines. I want to identify lines that contain only spaces.
By definition, an empty line is empty and does not contain anything (including spaces).
I want to detect lines that seems to be empty but they are not (lines that contain spaces only)
while read line; do
if [[ `echo "$line" | wc -w` == 0 && `echo "$line" | wc -c` > 1 ]];
then
echo "Fake empty line detected"
fi
done < "$1"
But because read ignores spaces in the start and in the end of a string my code isn't working.
an example of a file
hi
hi
(empty line, no spaces or any other char)
hi
(two spaces)
hey
Please help me to fix the code
Disable word splitting by clearing the value of IFS (the internal field separator):
while IFS= read -r line; do
....
done < "$1"
The -r isn't strictly necessary, but it is good practice.
Also, a simpler way to check the value of line (I assume you're looking for a line with nothing but whitespace):
if [[ $line =~ ^$ ]]; then
echo "Fake empty line detected"
fi
Following your code, it can be improved.
while read line; do
if [ -z "$line" ]
then
echo "Fake empty line detected"
fi
done < "$1"
The test -z checks if $line is empty.
Output:
Fake empty line detected
Fake empty line detected

Read a config file in BASH without using "source"

I'm attempting to read a config file that is formatted as follows:
USER = username
TARGET = arrows
I realize that if I got rid of the spaces, I could simply source the config file, but for security reasons I'm trying to avoid that. I know there is a way to read the config file line by line. I think the process is something like:
Read lines into an array
Filter out all of the lines that start with #
search for the variable names in the array
After that I'm lost. Any and all help would be greatly appreciated. I've tried something like this with no success:
backup2.config>cat ~/1
grep '^[^#].*' | while read one two;do
echo $two
done
I pulled that from a forum post I found, just not sure how to modify it to fit my needs since I'm so new to shell scripting.
http://www.linuxquestions.org/questions/programming-9/bash-shell-program-read-a-configuration-file-276852/
Would it be possible to automatically assign a variable by looping through both arrays?
for (( i = 0 ; i < ${#VALUE[#]} ; i++ ))
do
"${NAME[i]}"=VALUE[i]
done
echo $USER
Such that calling $USER would output "username"? The above code isn't working but I know the solution is something similar to that.
The following script iterates over each line in your input file (vars in my case) and does a pattern match against =. If the equal sign is found it will use Parameter Expansion to parse out the variable name from the value. It then stores each part in it's own array, name and value respectively.
#!/bin/bash
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=${line%% =*}
value[i]=${line#*= }
((i++))
fi
done < vars
echo "total array elements: ${#name[#]}"
echo "name[0]: ${name[0]}"
echo "value[0]: ${value[0]}"
echo "name[1]: ${name[1]}"
echo "value[1]: ${value[1]}"
echo "name array: ${name[#]}"
echo "value array: ${value[#]}"
Input
$ cat vars
sdf
USER = username
TARGET = arrows
asdf
as23
Output
$ ./varscript
total array elements: 2
name[0]: USER
value[0]: username
name[1]: TARGET
value[1]: arrows
name array: USER TARGET
value array: username arrows
First, USER is a shell environment variable, so it might be better if you used something else. Using lowercase or mixed case variable names is a way to avoid name collisions.
#!/bin/bash
configfile="/path/to/file"
shopt -s extglob
while IFS='= ' read lhs rhs
do
if [[ $lhs != *( )#* ]]
then
# you can test for variables to accept or other conditions here
declare $lhs=$rhs
fi
done < "$configfile"
This sets the vars in your file to the value associated with it.
echo "Username: $USER, Target: $TARGET"
would output
Username: username, Target: arrows
Another way to do this using keys and values is with an associative array:
Add this line before the while loop:
declare -A settings
Remove the declare line inside the while loop and replace it with:
settings[$lhs]=$rhs
Then:
# set keys
user=USER
target=TARGET
# access values
echo "Username: ${settings[$user]}, Target: ${settings[$target]}"
would output
Username: username, Target: arrows
I have a script which only takes a very limited number of settings, and processes them one at a time, so I've adapted SiegeX's answer to whitelist the settings I care about and act on them as it comes to them.
I've also removed the requirement for spaces around the = in favour of ignoring any that exist using the trim function from another answer.
function trim()
{
local var=$1;
var="${var#"${var%%[![:space:]]*}"}"; # remove leading whitespace characters
var="${var%"${var##*[![:space:]]}"}"; # remove trailing whitespace characters
echo -n "$var";
}
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
setting_name=$(trim "${line%%=*}");
setting_value=$(trim "${line#*=}");
case "$setting_name" in
max_foos)
prune_foos $setting_value;
;;
max_bars)
prune_bars $setting_value;
;;
*)
echo "Unrecognised setting: $setting_name";
;;
esac;
fi
done <"$config_file";
Thanks SiegeX. I think the later updates you mentioned does not reflect in this URL.
I had to edit the regex to remove the quotes to get it working. With quotes, array returned is empty.
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=${line%% =*}
value[i]=${line##*= }
((i++))
fi
done < vars
A still better version is .
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=`echo $line | cut -d'=' -f 1`
value[i]=`echo $line | cut -d'=' -f 2`
((i++))
fi
done < vars
The first version is seen to have issues if there is no space before and after "=" in the config file. Also if the value is missing, i see that the name and value are populated as same. The second version does not have any of these. In addition it trims out unwanted leading and trailing spaces.
This version reads values that can have = within it. Earlier version splits at first occurance of =.
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=`echo $line | cut -d'=' -f 1`
value[i]=`echo $line | cut -d'=' -f 2-`
((i++))
fi
done < vars

Resources