Bash Scripting - Variable Concatenation - bash

Completely new to Linux and Bash scripting and I've been experimenting with the following script :
declare -a names=("Liam" "Noah" "Oliver" "William" "Elijah")
declare -a surnames=("Smith" "Johnson" "Williams" "Brown" "Jones")
declare -a countries=()
readarray countries < $2
i=5
id=1
while [ $i -gt 0 ]
do
i=$(($i - 1))
rname=${names[$RANDOM % ${#names[#]}]}
rsurname=${surnames[$RANDOM % ${#surnames[#]}]}
rcountry=${countries[$RANDOM % ${#countries[#]}]}
rage=$(($RANDOM % 5))
record="$id $rname $rsurname $rcountry"
#record="$id $rname $rsurname $rcountry $rage"
echo $record
id=$(($id + 1))
done
The script above produces the following result :
1 Liam Williams Andorra
2 Oliver Jones Andorra
3 Noah Brown Algeria
4 Liam Williams Albania
5 Oliver Williams Albania
but the problem becomes apparent when the line record="$id $rname $rsurname $rcountry" gets commented and the line record="$id $rname $rsurname $rcountry $rage" is active where the exact output on the second execution is :
4William Johnson Albania
2Elijah Smith Albania
2Oliver Brown Argentina
0William Williams Argentina
3Oliver Brown Angola
The file I am reading the countries from looks like this :
Albania
Algeria
Andorra
Angola
Argentina
Could you provide an explanation to why this happens?

Your countries input file has DOS-style <cr><lf> (carriage-return line-feed) line endings.
When you read lines from the file, each element of the countries array ends up looking like somename<cr>, and when printed the <cr> moves the cursor back to the beginning of the line, so the contents of $rage end up overwriting the beginning of the line.
The fix is to convert your countries input to use Unix style (<lf> only) line endings. You can do this with dos2unix <inputfile> > <outputfile>, for example.

Related

How to reduce run time of shell script? [duplicate]

This question already has answers here:
Take nth column in a text file
(6 answers)
Closed 2 years ago.
I have written a simple code that takes data from a text file( which has space-separated columns and 1.5 million rows) gives the output file with the specified column. But this code takes more than an hr to execute. Can anyone help me out to optimize runtime
a=0
cat 1c_input.txt/$1 | while read p
do
IFS=" "
for i in $p
do
a=`expr $a + 1`
if [ $a -eq $2 ]
then
echo "$i"
fi
done
a=0
done >> ./1.c.$2.column.freq
some lines of sample input:
1 ib Jim 34
1 cr JoHn 24
1 ut MaRY 46
2 ti Jim 41
2 ye john 6
2 wf JoHn 22
3 ye jOE 42
3 hx jiM 21
some lines of sample output if the second argument entered is 3:
Jim
JoHn
MaRY
Jim
john
JoHn
jOE
jiM
I guess you are trying to print just 1 column, then do something like
#! /bin/bash
awk -v c="$2" '{print $c}' 1c_input.txt/$1 >> ./1.c.$2.column.freq
If you just want something faster, use a utility like cut. So to
extract the third field from a single space delimited file bigfile
do:
cut -d ' ' -f 3 bigfile
To optimize the shell code in the question, using only builtin shell
commands, do something like:
while read a b c d; echo "$c"; done < bigfile
...if the field to be printed is a command line parameter, there are
several shell command methods, but they're all based on that line.

How to use sed and cut to find and replace value at certain position of line in a file

I have a case where I have to replace the number 1 with number 3 at 10th location of various lines in a stored text file. I am unable to find a way to do that. Below is sample file and code.
Sample file:
$ cat testdata.txt
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 1 1 John Carter 19880712
#!/bin/sh
filename=testdata.txt
echo "reading number of line"
nol=$(cat $filename | wc -l)
flag[$nol]=''
echo "reading content of file"
for i in (1..$nol)
do
flag=($cut -c10-11 $filename)
if($flag==1)
sed 's/1/3/2'
fi
done
But this is not working.
Please help to resolve this.
Updated:
Sample Output:
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712
try this
sed "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt > new_testdata.txt
If sed supports the option -i you can also edit inplace.
sed -i "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt
output
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712
explanation
s # substitute
/^\( # from start of line, save into arg1
.\{8\} # the first 8 characters
\) 1 \( # search pattern ' 1 '
.* # save the rest into arg2
\)$/ # to the end of the line
\1 3 \2 # output: arg1 3 arg2
/g # global on whole line

Bash Cut diamond question mark symbol �

I am trying to display the 2nd and 7th character from each line of text.
while read line
do
x=`echo $line | cut -c2,7`
echo $x
done
Sample Input:
C.B - Cantonment Board/Cantonment
C.M.C – City Municipal Council
C.T – Census Town
E.O – Estate Office
Expected Output:
.C
.â
.“
.“
My output:
.C
.�
.�
.�
Anyone knows why this happens?
cut does not really support Unicode. You might want to use Perl instead (adapted from this Unix & Linux post):
perl -CIO -ne 'print substr($_, 1, 1) . substr($_, 6, 1) . "\n"'
For example:
$ perl -CIO -ne 'print substr($_, 1, 1) . substr($_, 6, 1) . "\n"' < foo
.C
.â
.“
.“
-CIO tells perl that both input and output are in Unicode. substr(var, m, n) extracts the substring of length n beginning at index m (starting from 0). So the second character is the substring of length 1 at index 1. $_ is the variable holding the current input line.
You can use bash's substring parameter expansion.
while read line; do
x=${line:1:1}${line:6:1} # 0-based counting
echo "$x"
done <<EOF
C.B - Cantonment Board/Cantonment
C.M.C – City Municipal Council
C.T – Census Town
E.O – Estate Office
EOF
The form ${var:offset:length} returns length characters starting at position offset in the value of var. Strings are 0-indexed, like arrays.
(I am not sure, though, if bash always handles utf-8 correctly, or if it depends on how it was compiled.)

Bash shell scripting - Error setting variables

I'm new at bash scripting. I tried the following:
filename01 = ''
if [ $# -eq 0 ]
then
filename01 = 'newList01.txt'
else
filename01 = $1
fi
I get the following error:
./smallScript02.sh: line 9: filename01: command not found
./smallScript02.sh: line 13: filename01: command not found
I imagine that I am not treating the variables correctly, but I don't know how. Also, I am trying to use grep to extract the second and third words from a text file. The file looks like:
1966 Bart Starr QB Green Bay Packers
1967 Johnny Unitas QB Baltimore Colts
1968 Earl Morrall QB Baltimore Colts
1969 Roman Gabriel QB Los Angeles Rams
1970 John Brodie QB San Francisco 49ers
1971 Alan Page DT Minnesota Vikings
1972 Larry Brown RB Washington Redskins
Any help would be appreciated
When you assign variables in bash, there should be no spaces on either side of the = sign.
# good
filename0="newList01.txt"
# bad
filename0 = "newlist01.txt"
For your second problem, use awk not grep. The following will extract the second and third items from each line of a file whose name is stored in $filename0:
< $filename0 awk '{print $2 $3}'
In bash (and other bourne-type shells), you can use a default value if a variable is empty or not set:
filename01=${1:-newList01.txt}
I'd recommend spending some time with the bash manual: http://www.gnu.org/software/bash/manual/bashref.html
Here's a way to extract the name:
while read first second third rest; do
echo $second $third
done < "$filename01"

Unix code wanted to copy template file and replace strings in template file in the copied files

I have 2 files:
File_1.txt:
John
Mary
Harry
Bill
File_2.txt:
My name is ID, and I am on line NR of file 1.
I want to create four files that look like this:
Output_file_1.txt:
My name is John, and I am on line 1 of file 1.
Output_file_2.txt:
My name is Mary, and I am on line 2 of file 1.
Output_file_3.txt:
My name is Harry, and I am on line 3 of file 1.
Output_file_4.txt:
My name is Bill, and I am on line 4 of file 1.
Normally I would use the following sed command to do this:
for q in John Mary Harry Bill
do
sed 's/ID/'${q}'/g' File_2.txt > Output_file.txt
done
But that would only replace the ID for the name, and not include the line nr of File_1.txt. Unfortunately, my bash skills don't go much further than that... Any tips or suggestions for a command that includes both file 1 and 2? I do need to include file 1, because actually the files are much larger than in this example, but I'm thinking I can figure the rest of the code out if I know how to do it with this hopefully simpler example... Many thanks in advance!
How about:
n=1
while read q
do
sed -e 's/ID/'${q}'/g' -e "s/NR/$n/" File_2.txt > Output_file_${n}.txt
((n++))
done < File_1.txt
See the Advanced Bash Scripting Guide on redirecting input to code blocks, and maybe the section on double parentheses for further reading.
How about awk, instead?
[ghoti#pc ~]$ cat file1
John
Mary
[ghoti#pc ~]$ cat file2
Harry
Bill
[ghoti#pc ~]$ cat merge.txt
My name is %s, and I am on the line %s of file '%s'.
[ghoti#pc ~]$ cat doit.awk
#!/usr/bin/awk -f
BEGIN {
while (getline line < "merge.txt") {
fmt = fmt line "\n";
}
}
{
file="Output_File_" NR ".txt";
printf(fmt, $1, FNR, FILENAME) > file;
}
[ghoti#pc ~]$ ./doit.awk file1 file2
[ghoti#pc ~]$ grep . Output_File*txt
Output_File_1.txt:My name is John, and I am on the line 1 of file 'file1'.
Output_File_2.txt:My name is Mary, and I am on the line 2 of file 'file1'.
Output_File_3.txt:My name is Harry, and I am on the line 1 of file 'file2'.
Output_File_4.txt:My name is Bill, and I am on the line 2 of file 'file2'.
[ghoti#pc ~]$
If you really want your filenames to be numbered, we can do that too.
What's going on here?
The awk script BEGINs by reading in your merge.txt file and appending it to the variable "fmt", line by line (separated by newlines). This makes fmt a printf-compatile format string.
Then, for every line in your input files (specified on the command line), an output file is selected (NR is the current record count spanning all files). The printf() function replaces each %s in the fmt variable with one of its options. Output is redirected to the appropriate file.
The grep just shows you all the files' contents with their filenames.
This might work for you:
sed '=' File_1.txt |
sed '1{x;s/^/'"$(<File_2.txt)"'/;x};N;s/\n/ /;G;s/^\(\S*\) \(\S*\)\n\(.*\)ID\(.*\)NR\(.*\)/echo "\3\2\4\1\5" >Output_file_\1.txt/' |
bash
TXR:
$ txr merge.txr
My name is John, and I am on the line 1 of file1.
My name is Mary, and I am on the line 2 of file1.
My name is Harry, and I am on the line 3 of file1.
My name is Bill, and I am on the line 4 of file1.
merge.txr:
#(bind count #(range 1))
#(load "file2.txt")
#(next "file1.txt")
#(collect)
#name
#(template name #(pop count) "file1")
#(end)
file2.txt:
#(define template (ID NR FILE))
#(output)
My name is #ID, and I am on the line #NR of #FILE.
#(end)
#(end)
Read the names into an array.
get the array length
iterate over the array
Test preparation:
echo "John
Mary
Harry
Bill
" > names
Names and numbers:
name=($(<names))
max=$(($(echo ${#name[*]})-1))
for i in $(seq 0 $max) ; do echo $i":"${name[i]}; done
with template:
for i in $(seq 0 $max) ; do echo "My name is ID, and I am on the line NR of file 1." | sed "s/ID/${name[i]}/g;s/NR/$((i+1))/g"; done
My name is John, and I am on the line 1 of file 1.
My name is Mary, and I am on the line 2 of file 1.
My name is Harry, and I am on the line 3 of file 1.
My name is Bill, and I am on the line 4 of file 1.
A little modification needed in your script.Thats it.
pearl.306> cat temp.sh
#!/bin/ksh
count=1
cat file1|while read line
do
sed -e "s/ID/${line}/g" -e "s/NR/${count}/g" File_2.txt > Output_file_${count}.txt
count=$(($count+1))
done
pearl.307>
pearl.303> temp.sh
pearl.304> ls -l Out*
-rw-rw-r-- 1 nobody nobody 59 Mar 29 18:54 Output_file_1.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_2.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_3.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_4.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_5.txt
pearl.305> cat Out*
My name is linenumber11, and I am on the line 1 of file 1.
My name is linenumber2, and I am on the line 2 of file 1.
My name is linenumber1, and I am on the line 3 of file 1.
My name is linenumber4, and I am on the line 4 of file 1.
My name is linenumber6, and I am on the line 5 of file 1.
pearl306>

Resources