Bash script for change strings in variables - bash

I have two variables in bash:
in the first variable, the field separator is (,)
in the second variable, the field separator is also (,)
in the first variable named VAR1 I have:
Maria Debbie Annie,Chewbakka Zero,Yoda One,Vader 001
in the second variable named VAR2:
"number":"11112",Maria Debbie Annie
"number":"11113",Maria Debbie Annie Lisa
"number":"33464",Chewbakka Zero
"number":"22465",Chewbakka Zero Two
"number":"34534",Christine Ashley
"number":"45233",Yoda One
"number":"45233",Yoda One One
"number":"38472",Susanne Ann
"number":"99999",Vader 001
"number":"99991",Vader 001 001
"number":"99992",Vader 001 002
The desired output in variable VAR3:
"number":"11112","number":"33464","number":"45233","number":"99999"
So basically i need to change the names in the output from some name to "number":"somenumber" the same order as in the first variable.
What is also important that there are very similar strings so
Yoda One != Yoda One One also Chewbakka Zero is not equal Chewbakka Zero Two.
VAR2 contains much more lines than listed, I just wanted to show the script needs to find exact matches between VAR1 and VAR2.
Thank you for the help.

Check this out..
> echo "$VAR1"
Maria Debbie Annie,Chewbakka Zero,Yoda One,Vader 001
> echo "$VAR2"
"number":"11112",Maria Debbie Annie
"number":"11113",Maria Debbie Annie Lisa
"number":"33464",Chewbakka Zero
"number":"22465",Chewbakka Zero Two
"number":"34534",Christine Ashley
"number":"45233",Yoda One
"number":"45233",Yoda One One
"number":"38472",Susanne Ann
"number":"99999",Vader 001
"number":"99991",Vader 001 001
"number":"99992",Vader 001 002
> export VAR1A=$(echo $VAR1| sed 's/,/$\|/g' | sed 's/$/\$/g')
> echo "$VAR1A"
Maria Debbie Annie$|Chewbakka Zero$|Yoda One$|Vader 001$
> echo "$VAR2" | egrep "$VAR1A" | awk -F"," ' { printf("%s,",$1)} END { printf("\n") } ' | sed 's/.$//g'
"number":"11112","number":"33464","number":"45233","number":"99999"
>

Related

read file line by line and sum each line individually

Im trying to make a script that creates a file say file01.txt that writes a number on each line.
001
002
...
998
999
then I want to read the file line by line and sum each line and say whether the number is even or odd.
sum each line like 0+0+1 = 1 which is odd
9+9+8 = 26 so even
001 odd
002 even
..
998 even
999 odd
I tried
while IFS=read -r line; do sum+=line >> file02.txt; done <file01.txt
but that sums the whole file not each line.
You can do this fairly easily in bash itself making use of built-in parameter expansions to trim leading zeros from the beginning of each line in order to sum the digits for odd / even.
When reading from a file (either a named file or stdin by default), you can use the initialization with default to use the first argument (positional parameter) as the filename (if given) and if not, just read from stdin, e.g.
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
Which you will use infile with your while loop, e.g.
while read -r line; do ## loop reading each line
...
done < "$infile"
To trim the leading zeros, first obtain the substring of leading zeros trimming all digits from the right until only zeros remain, e.g.
leading="${line%%[1-9]*}" ## get leading 0's
Now using the same type parameter expansion with # instead of %% trim the leading zeros substring from the front of line saving the resulting number in value, e.g.
value="${line#$leading}" ## trim from front
Now zero your sum and loop over the digits in value to obtain the sum of digits:
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
All that remains is your even / odd test. Putting it altogether in a short example script that intentionally outputs the sum of digits in addition to your wanted "odd" / "even" output, you could do:
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
while read -r line; do ## read each line
[ "$line" -eq "$line" 2>/dev/null ] || continue ## validate integer
leading="${line%%[1-9]*}" ## get leading 0's
value="${line#$leading}" ## trim from front
sum=0 ## zero sum
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
printf "%s (sum=%d) - " "$line" "$sum" ## output line w/sum
## (temporary output)
if ((sum % 2 == 0)); then ## check odd / even
echo "even"
else
echo "odd"
fi
done < "$infile"
(note: you can actually loop over the digits in line and skip removing the leading zeros substring. The removal ensure that if the whole value is used it isn't interpreted as an octal value -- up to you)
Example Use/Output
Using a quick process substitution to provide input of 001 - 020 on stdin you could do:
$ ./sumdigitsoddeven.sh < <(printf "%03d\n" {1..20})
001 (sum=1) - odd
002 (sum=2) - even
003 (sum=3) - odd
004 (sum=4) - even
005 (sum=5) - odd
006 (sum=6) - even
007 (sum=7) - odd
008 (sum=8) - even
009 (sum=9) - odd
010 (sum=1) - odd
011 (sum=2) - even
012 (sum=3) - odd
013 (sum=4) - even
014 (sum=5) - odd
015 (sum=6) - even
016 (sum=7) - odd
017 (sum=8) - even
018 (sum=9) - odd
019 (sum=10) - even
020 (sum=2) - even
You can simply remove the output of "(sum=X)" when you have confirmed it operates as you expect and redirect the output to your new file. Let me know if I understood your question properly and if you have further questions.
Would you please try the bash version:
parity=("even" "odd")
while IFS= read -r line; do
mapfile -t ary < <(fold -w1 <<< "$line")
sum=0
for i in "${ary[#]}"; do
(( sum += i ))
done
echo "$line" "${parity[sum % 2]}"
done < file01.txt > file92.txt
fold -w1 <<< "$line" breaks the string $line into lines of character
(one digit per line).
mapfile assigns array to the elements fed by the fold command.
Please note the bash script is not efficient in time and not suitable
for the large inputs.
With GNU awk:
awk -vFS='' '{sum=0; for(i=1;i<=NF;i++) sum+=$i;
print $0, sum%2 ? "odd" : "even"}' file01.txt
The FS awk variable defines the field separator. If it is set to the empty string (this is what the -vFS='' option does) then each character is a separate field.
The rest is trivial: the block between curly braces is executed for each line of the input. It compute the sum of the fields with a for loop (NF is another awk variable, its value is the number of fields of the current record). And it then prints the original line ($0) followed by the string even if the sum is even, else odd.
pure awk:
BEGIN {
for (i=1; i<=999; i++) {
printf ("%03d\n", i) > ARGV[1]
}
close(ARGV[1])
ARGC = 2
FS = ""
result[0] = "even"
result[1] = "odd"
}
{
printf("%s: %s\n", $0, result[($1+$2+$3) % 2])
}
Processing a file line by line, and doing math, is a perfect task for awk.
pure bash:
set -e
printf '%03d\n' {1..999} > "${1:?no path provided}"
result=(even odd)
mapfile -t num_list < "$1"
for i in "${num_list[#]}"; do
echo $i: ${result[(${i:0:1} + ${i:1:1} + ${i:2:1}) % 2]}
done
A similar method can be applied in bash, but it's slower.
comparison:
bash is about 10x slower.
$ cd ./tmp.Kb5ug7tQTi
$ bash -c 'time awk -f ../solution.awk numlist-awk > result-awk'
real 0m0.108s
user 0m0.102s
sys 0m0.000s
$ bash -c 'time bash ../solution.bash numlist-bash > result-bash'
real 0m0.931s
user 0m0.929s
sys 0m0.000s
$ diff --report-identical result*
Files result-awk and result-bash are identical
$ diff --report-identical numlist*
Files numlist-awk and numlist-bash are identical
$ head -n 5 *
==> numlist-awk <==
001
002
003
004
005
==> numlist-bash <==
001
002
003
004
005
==> result-awk <==
001: odd
002: even
003: odd
004: even
005: odd
==> result-bash <==
001: odd
002: even
003: odd
004: even
005: odd
read is a bottleneck in a while IFS= read -r line loop. More info in this answer.
mapfile (combined with for loop) can be slightly faster, but still slow (it also copies all the data to an array first).
Both solutions create a number list in a new file (which was in the question), and print the odd/even results to stdout. The path for the file is given as a single argument.
In awk, you can set the field separator to empty (FS="") to process individual characters.
In bash it can be done with substring expansion (${var:index:length}).
Modulo 2 (number % 2) to get odd or even.

How to reduce run time of shell script? [duplicate]

This question already has answers here:
Take nth column in a text file
(6 answers)
Closed 2 years ago.
I have written a simple code that takes data from a text file( which has space-separated columns and 1.5 million rows) gives the output file with the specified column. But this code takes more than an hr to execute. Can anyone help me out to optimize runtime
a=0
cat 1c_input.txt/$1 | while read p
do
IFS=" "
for i in $p
do
a=`expr $a + 1`
if [ $a -eq $2 ]
then
echo "$i"
fi
done
a=0
done >> ./1.c.$2.column.freq
some lines of sample input:
1 ib Jim 34
1 cr JoHn 24
1 ut MaRY 46
2 ti Jim 41
2 ye john 6
2 wf JoHn 22
3 ye jOE 42
3 hx jiM 21
some lines of sample output if the second argument entered is 3:
Jim
JoHn
MaRY
Jim
john
JoHn
jOE
jiM
I guess you are trying to print just 1 column, then do something like
#! /bin/bash
awk -v c="$2" '{print $c}' 1c_input.txt/$1 >> ./1.c.$2.column.freq
If you just want something faster, use a utility like cut. So to
extract the third field from a single space delimited file bigfile
do:
cut -d ' ' -f 3 bigfile
To optimize the shell code in the question, using only builtin shell
commands, do something like:
while read a b c d; echo "$c"; done < bigfile
...if the field to be printed is a command line parameter, there are
several shell command methods, but they're all based on that line.

How to write a shell script to swap columns in txt file?

I was trying to solve one of my old assignment I am literally stuck in this one Can anyone help me?
There is a file called "datafile". This file has names of some friends and their
ages. But unfortunately, the names are not in the correct format. They should be
lastname, firstname
But, by mistake they are firstname,lastname
The task of the problem is writing a shell script called fix_datafile
to correct the problem, and sort the names alphabetically. The corrected filename
is called datafile.fix .
Please make sure the original structure of the file should be kept untouched.
The following is the sample of datafile.fix file:
#personal information
#******** Name ********* ***** age *****
Alexanderovich,Franklin 47
Amber,Christine 54
Applesum,Franky 33
Attaboal,Arman 18
Balad,George 38
Balad,Sam 19
Balsamic,Shery 22
Bojack,Steven 33
Chantell,Alex 60
Doyle,Jefry 45
Farland,Pamela 40
Handerman,jimmy 23
Kashman,Jenifer 25
Kasting,Ellen 33
Lorux,Allen 29
Mathis,Johny 26
Maxter,Jefry 31
Newton,Gerisha 40
Osama,Franklin 33
Osana,Gabriel 61
Oxnard,George 20
Palomar,Frank 24
Plomer,Susan 29
Poolank,John 31
Rochester,Benjami 40
Stanock,Verona 38
Tenesik,Gabriel 29
Whelsh,Elsa 21
If you can use awk (I suppose you can), than this there's a script which does what you need:
#!/bin/bash
RESULT_FILE_NAME="datafile.new"
cat datafile.fix | head -4 > datafile.new
cat datafile.fix | tail -n +5 | awk -F"[, ]" '{if(!$2){print()}else{print($2","$1, $3)}}' >> datafile.new
Passing -F"[, ]" allows awk to split columns both by , and space and all that remains is just print columns in a needed format. The downsides are that we should use if statement to preserve empty lines and file header also should be treated separately.
Another option is using sed:
cat datafile.fix | sed -E 's/([a-zA-Z]+),([a-zA-Z]+) ([0-9]+)/\2,\1 \3/g' > datafile.new
The downside is that it requires regex that is not as obvious as awk syntax.
awk -F[,\ ] '
!/^$/ && !/^#/ {
first=$1;
last=$2;
map[first][last]=$0
}
END {
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in map) {
for (j in map[i])
{
print map[i][j]
}
}
}' namesfile > datafile.fix
One liner:
awk -F[,\ ] '!/^$/ && !/^#/ { first=$1;last=$2;map[first][last]=$0 } END { PROCINFO["sorted_in"]="#ind_str_asc";for (i in map) { for (j in map[i]) { print map[i][j] } } }' namesfile > datafile.fix
A solution completely in gawk.
Set the field separator to both , and space. Then ignore any lines that are empty or start with #. Mark the first and last variables based on the delimited fields and then create a two dimensional array called map indexed by first and last name and the value equal to the line. At the end, set the sort to indices string ascending and loop through the array printing the names in order as requested.
Completely in bash:
re="^[[:space:]]*([^#]([[:space:]]|[[:alpha:]])+),(([[:space:]]|[[:alpha:]])*[[:alpha:]]) *([[:digit:]]+)"
while read line
do
if [[ ${line} =~ $re ]]
then
echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}
else
echo "${line}"
fi
done < names.txt
The core of this is to capture, using bash regex matching (=~ operator of the [[ command), parenthesis groupings, and the BASH_REMATCH array, the name before the comma (([^#]([[:space:]]|[[:alpha:]])+)), the name after the comma ((([[:space:]]|[[:alpha:]])*[[:alpha:]])), and the age ( *([[:digit:]]+)). The first-name regex is constructed so as to exclude comments, and the last-name regex is constructed as to handle multiple spaces before the age without including them in the name. Preconditions: Commented lines with or without leading spaces (^[[:space:]]*([^#]), or lines without a comma, are passed through unchanged. Either first names or last names may have internal spaces. Once the last name and first name are isolated, it is easy to print them in reverse order followed by the age (echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}). Note that the letter/space groupings are counted as matches which is why we skip 2 and 4.
I have tried using awk and sed.
Try if this works
less dataflie.fix | sed 's/ /,/g' | awk -F "," '{print $2,$1,$3}' | sed 's/ /,/' | sed 's/^,//' | sort -u > dataflie_new.fix

bash - print id number on multiple lines in new txt file

I am trying to concatenate multiple .txt files into one .txt file. Each individual file has one column with 178 rows and is tied to one particular person. I would like the concatenated file to have three columns: person's ID #; person's session #; value taken from the individual .txt file, e.g.:
desired output format (minus bullet points):
1 1 000
1 1 001
....
1 1 177
1 2 000
1 2 001
...
1 2 177
My current script prints the output with ID # and session # only printed on the first line of each person's 178 lines and the remaining 177 values printed to column 1 underneath the ID #, e.g.:
1 1 000
001
002
...
177
1 2 000
001
002
...
177
2 1 000
001
...
I would like help getting the ID # and session # next to each of the 178 rows taken from each person's individual .txt file, not just in the first row as it currently prints.
Code below:
for subject in 170; do
for session in 1 2; do
cd ${datadir}
ts_SalienceNetwork=$(cat sub${subject}.txt | awk '{print $1}')
echo -e "${subject}\t${session}\t${ts_SalienceNetwork}" >> concat_data.txt
done
done
$t_SalienceNetwork contains the entire file, and you're just putting the other variables before it, not before each line.
Use awk to print the first column of each row preceded by the variables on each line, no need for the variable.
cd "$datadir" # no need to do this each time through the loop
for subject in 170; do
for session in 1 2; do
awk -v subject="$subject" -v session="$session" '{printf("%s\t%s\t%s\n", subject, session, $1)}' "sub$subject.txt"
done
done >> concat_data.txt

Passing script to array

I have a bash script which runs as follows:
./script.sh var1 var2 var3.... varN varN+1
What i need to do is take first 2 variables, last 2 variables and insert them into file. The variables between 2 last and 2 first should be passed as a whole string to another file. How can this be done in bash?
Of course i can define a special variable with "read var" directive and then input this whole string from keyboard but my objective is to pass them from the script input
argc=$#
argv=("$#")
first_two="${argv[#]:0:2}"
last_two="${argv[#]:$argc-2:2}"
others="${argv[#]:2:$argc-4}"
#!/bin/bash
# first two
echo "${#:1:2}"
# last two
echo "${#:(-2):2}"
# middle
echo "${#:3:(($# - 4))}"
so sample
./script aaa bbb ccc ddd eee fff gggg hhhh
aaa bbb
gggg hhhh
ccc ddd eee fff

Resources