Completely new to Linux and Bash scripting and I've been experimenting with the following script :
declare -a names=("Liam" "Noah" "Oliver" "William" "Elijah")
declare -a surnames=("Smith" "Johnson" "Williams" "Brown" "Jones")
declare -a countries=()
readarray countries < $2
i=5
id=1
while [ $i -gt 0 ]
do
i=$(($i - 1))
rname=${names[$RANDOM % ${#names[#]}]}
rsurname=${surnames[$RANDOM % ${#surnames[#]}]}
rcountry=${countries[$RANDOM % ${#countries[#]}]}
rage=$(($RANDOM % 5))
record="$id $rname $rsurname $rcountry"
#record="$id $rname $rsurname $rcountry $rage"
echo $record
id=$(($id + 1))
done
The script above produces the following result :
1 Liam Williams Andorra
2 Oliver Jones Andorra
3 Noah Brown Algeria
4 Liam Williams Albania
5 Oliver Williams Albania
but the problem becomes apparent when the line record="$id $rname $rsurname $rcountry" gets commented and the line record="$id $rname $rsurname $rcountry $rage" is active where the exact output on the second execution is :
4William Johnson Albania
2Elijah Smith Albania
2Oliver Brown Argentina
0William Williams Argentina
3Oliver Brown Angola
The file I am reading the countries from looks like this :
Albania
Algeria
Andorra
Angola
Argentina
Could you provide an explanation to why this happens?
Your countries input file has DOS-style <cr><lf> (carriage-return line-feed) line endings.
When you read lines from the file, each element of the countries array ends up looking like somename<cr>, and when printed the <cr> moves the cursor back to the beginning of the line, so the contents of $rage end up overwriting the beginning of the line.
The fix is to convert your countries input to use Unix style (<lf> only) line endings. You can do this with dos2unix <inputfile> > <outputfile>, for example.
This question already has answers here:
Take nth column in a text file
(6 answers)
Closed 2 years ago.
I have written a simple code that takes data from a text file( which has space-separated columns and 1.5 million rows) gives the output file with the specified column. But this code takes more than an hr to execute. Can anyone help me out to optimize runtime
a=0
cat 1c_input.txt/$1 | while read p
do
IFS=" "
for i in $p
do
a=`expr $a + 1`
if [ $a -eq $2 ]
then
echo "$i"
fi
done
a=0
done >> ./1.c.$2.column.freq
some lines of sample input:
1 ib Jim 34
1 cr JoHn 24
1 ut MaRY 46
2 ti Jim 41
2 ye john 6
2 wf JoHn 22
3 ye jOE 42
3 hx jiM 21
some lines of sample output if the second argument entered is 3:
Jim
JoHn
MaRY
Jim
john
JoHn
jOE
jiM
I guess you are trying to print just 1 column, then do something like
#! /bin/bash
awk -v c="$2" '{print $c}' 1c_input.txt/$1 >> ./1.c.$2.column.freq
If you just want something faster, use a utility like cut. So to
extract the third field from a single space delimited file bigfile
do:
cut -d ' ' -f 3 bigfile
To optimize the shell code in the question, using only builtin shell
commands, do something like:
while read a b c d; echo "$c"; done < bigfile
...if the field to be printed is a command line parameter, there are
several shell command methods, but they're all based on that line.
---- my text file from which i have to search for the keywords [name of the file --- test] <cat -Evt file>
centos is my bro$
red hat is my course$
ubuntu is my OS$
fqdn is stupid $
$
$
$
tom outsmart jerry$
red hat is my boy$
jerry is samall
------ keyword file is [word.txt] <cat -Evt file >
red hat$
we$
hello$
bye$
Compensation
----- my code
while read "p"; do
paste -d',' <(echo -n "$p" ) <(echo "searchall") <( grep -i "$p" test | wc -l) <(grep -i -A 1 -B 1 "$p" test )
done <word.txt
---- my expectation ,output should be
keyword,serchall,frequency,line above it
line it find keyword in
line below it
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
---- but coming OUTPUT from my code
red hat,searchall,2,centos is my bro
,,,red hat is my course
,,,ubuntu is my OS
,,,--
,,,tom outsmart jerry
,,,red hat is my boy
,,,jerry is samall
---- please give me suggestion and point me in the right direction to get the desired output.
---- i am trying to grep the keyword from the file and printing them
Here two records should create as keyword (red hat) is coming two time
----how can i loop through the coming frequency of the keyword.
This sounds very much like a homework assignment.
c.f. BashFAQ for better reads; keeping this simple to focus on what you asked for.
Rewritten for more precise formatting -
while read key # read each search key
do cnt=$(grep "$key" test|wc -l) # count the hits
pad="$key,searchall,$cnt," # build the "header" fields
while read line # read the input from grep
do if [[ "$line" =~ ^-- ]] # treat hits separately
then pad="$key,searchall,$cnt," # reset the "header"
echo # add the blank line
continue # skip to next line of data
fi
echo "$pad$line" # echo "header" and data
pad="${pad//?/ }" # convert header to spacving
done < <( grep -B1 -A1 "$key" test ) # pull hits for this key
echo # add blank lines between
done < word.txt # set stdin for the outer read
$: cat word.txt
course
red hat
$: ./tst
course,searchall,1,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
This will produce the expected output based on one interpretation of your requirements and should be easy to modify if I've made any wrong guesses about what you want to do:
$ cat tst.awk
BEGIN {
RS = ""
FS = "\n"
}
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
NR == FNR {
words[$0]
next
}
{
for (word in words) {
for (i=1; i<=NF; i++) {
if ($i ~ word) {
map[word,++cnt[word]] = (i>1 ? $(i-1) : "") FS $i FS $(i+1)
}
}
}
}
END {
for (word in words) {
for (i=1; i<=cnt[word]; i++) {
beg = sprintf("%s,searchall,%d,", word, cnt[word])
split(map[word,i],lines)
for (j=1; j in lines; j++) {
print beg lines[j]
beg = sprintf("%*s",length(beg),"")
}
print ""
}
}
}
.
$ awk -f tst.awk words file
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
I assumed your real input doesn't start with a bunch of blanks as in your posted example - if it does that's easy to accommodate.
I know you can use field delimiters to break up a field in AWK, however I have a question regarding a string without any delimiters. I need to process to following data, and I'm not sure how to start:
RyanWehe989987412rwehe#asu.edu2025550126CO2001BlakeStDenver80205
JosephLee605497184josephl#mailinator.com3035550103CO5986BudweiserWayAlamosa81101
AmyJohnson783333251amyj#mailinator.com6515550164MN14N5thStMinneapolis55403
DanielJEverhard314849866everhard#asu.edu5059358554NM8830JohnsonRdAlbuquerque87122
PhilipEPeterson325764011peterson#asu.edu4561238888WA542468thAveLacey98513
MattVNulk124085733nulk#asu.edu2093865442KSManhattanStRiley87512
BrandonTLyons123456123btlyons1#asu.edu5755595459AZ635WElmStMesa85212
RogerATurtle983421567rat#gmail.com8587754321IA3400SWIslanDrdDesmoines50021
MarcJWhiz745629754marcwhiz76#yahoo.com6195323200CA215NCollegeGroveWaySandiego91210
I want to format the raw data into this:
Ryan Wehe, 989-98-7412
2001 Blake St
Denver, CO 80205
wehe#asu.edu
(202) 555-0126
Joseph Lee, 605-49-7184
5986 Budweiser Way
Alamosa, CO 81101
josephl#mailinator.com
(303) 555-0103
AmyJohnson, 783-33-3251
14 N 5th St
Minneapolis, MN 55403
amyj#mailinator.com
(651) 555-0164
To the best of my knowledge, Awk provides no facilitity for using capture groups to define the field separator.
In consideration of this I think a quick hack might be your best option:
cat addresses.txt | perl -ne '/([A-Z][[:lower:]]*)([A-Z]*[[:lower:]]*)([0-9]{9})(.*?\.\w{2,3})([0-9]{10})(.*?)([0-9]{5})/ && print "$1 $2 $3 $4 $5 $6\n"'
Which returns this:
Ryan Wehe 989987412 rwehe#asu.edu 2025550126 CO2001BlakeStDenver 80205
Joseph Lee 605497184 josephl#mailinator.com 3035550103 CO5986BudweiserWayAlamosa 81101
Amy Johnson 783333251 amyj#mailinator.com 6515550164 MN14N5thStMinneapolis 55403
Daniel JEverhard 314849866 everhard#asu.edu 5059358554 NM8830JohnsonRdAlbuquerque 87122
Philip EPeterson 325764011 peterson#asu.edu 4561238888 WA 54246
Matt VNulk 124085733 nulk#asu.edu 2093865442 KSManhattanStRiley 87512
Brandon TLyons 123456123 btlyons1#asu.edu 5755595459 AZ635WElmStMesa 85212
Roger ATurtle 983421567 rat#gmail.com 8587754321 IA3400SWIslanDrdDesmoines 50021
Marc JWhiz 745629754 marcwhiz76#yahoo.com 6195323200 CA215NCollegeGroveWaySandiego 91210
Your answer uses both formats so I was unsure if you you need to break names apart (i.e Ryan Wehe instead of RyanWehe), adjusting it to this is fairly straitforward.
I need some help getting a script up and running. Basically I have some data that comes from a command output and want to select some of it and evaluate
Example data is
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
So far I have something along the lines of
# Define date to check
check=$(date -d "-90 days" "+%Y/%m/%d")
# Return user name
for user in $(command | awk '{print $1}')
do
# Return last logon date
$lastdate=(command | awk '{for(i=1;i<=NF;i++) if ($i==spotted) $(i+1)}')
# Evaluation date again current -90days
if $lastdate < $check; then
printf "$user not logged on for ages"
fi
done
I have a couple of problems, not least the fact that whilst I can get information from places I don't know how to go about getting it all together!! I'm also guessing my date evaluation will be more complicated but at this point that's another problem and just there to give a better idea of my intentions. If anyone can explain the logical steps needed to achieve my goal as well as propose a solution that would be great. Thanks
Every time you write a loop in shell just to manipulate text you have the wrong approach (see, for example, https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). The general purpose text manipulation tool that comes on every UNIX installation is awk. This uses GNU awk for time functions:
$ cat tst.awk
BEGIN { check = systime() - (90 * 24 * 60 * 60) }
{
user = $1
date = gensub(/([0-9]+)\/([0-9]+)\/([0-9]+)/,"\\3 \\2 \\1 0 0 0",1,$NF)
secs = mktime(date)
if (secs < check) {
printf "%s not logged in for ages\n", user
}
}
$ cat file
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
$ cat file | awk -f tst.awk
JSnow not logged in for ages
BBaggins not logged in for ages
Batman not logged in for ages
Replace cat file with command.