Searching text using a keyword, WITHOUT grep, sed, awk etc [closed] - bash

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
I am working on a programming assignment where I have to search for an entry in a text file, and print out text corresponding to the entry. As an example, let's say I have an entry as follows,
JOHN DOE
34 RIGHT WAY
HALIFAX
465-0394
, and the user enters HALIFAX as the keyword, I then would want to find the line that Halifax is located on, and then print out all associated text with this entry. The tricky part is doing this all without grep, sed, or awk, as the assignment is not accepted if these commands are used. I thought about using regular expressions, but these text manipulations can only be done on a single line, and I must do it for the entire file. As of now I am stumped and any help would be appreciated!
Alex

You should read in the whole file in your bash script line by line and then check if the line contains your search term
cat $FILENAME | while read LINE
do
if [[ $LINE =~ *HALIFAX* ]] then
echo "I found HALIFAX"
fi
done
From here on it should be easy enough for you to print out the rest.

I'm assuming this is bash scripting from the tag. My suggestion would be to read the text file line by line into an array, and then loop through the array, searching for the keyword in each string. This can be done using wild cards. Here's a link: String contains in Bash. Could you clarify "print out all associated text with this entry"?

You can do this by using while loop reading the file line by line.
#set the delimiter to newline
IFS_backup=$IFS
IFS=$'\n'
#variable to calculate line number
lineNum=0
while read a1
do
let "lineNum++" #increment variable for each line
#variable a1 store the value of each line
#do comparision with $a1 by user input string
#if string match then value of lineNum equal to the line number of file
# containing user input string
done<filename

Related

Substringing with bash [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to substring many strings with bash.However, despite the prefix is sorrectly deleted, the suffix is not.
One of the strings:
lcl|MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]_[exception=RNA_editing]_[protein_id=QHD46953.1]_[location=complement(71768..73444)]_[gbkey=CDS]
The desired output:
MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]
The code
for row in $colonna2; do tmp=${row#*lcl|}
colonna2_newname=${tmp%exception=*} echo $colonna2_newname; done
The output
MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]_[exception=RNA_editing]_[protein_id=QHD46953.1]_[location=complement(71768..73444)]_[gbkey=CDS]
Any guess why the suffix is not deleted? Has my syntax some error?
Thanks in advance
You have the variable substitution mostly right; it seems the main problem with the code is that there is no line break or semicolon after you define the colonna2_newname variable.
You will also want to change the colonna2_newname variable's definition from ${tmp%exception=*} to ${tmp%_[exception=*}.
for row in $colonna2
do
tmp="${row#*lcl|}"
colonna2_newname="${tmp%exception=*}"
echo "$colonna2_newname"
done
# output:
# MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]
Now about the for loop: If any of the lines in your $colonna2 variable have whitespace in them, for will split the line into separate strings after each space. for loops are better suited for use with arrays and globbed filenames/pathnames. while read loops are better to use with lines of text:
while IFS=$'\n' read -r row
do
tmp="${row#*lcl|}"
colonna2_newname="${tmp%exception=*}"
echo "$colonna2_newname"
done <<< $colonna2

Bash grep complicated search [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
How can I write a searching command using grep that will look for a line with a strict requirements. For example it should start with a name, which consist only letters and "-", then follows an ":", then a year or "xxxx", then again an ":", and then a line of letters, digits and "-" of some length. Or may be there is a link where I can read this... I'm trying to find some solution in the Internet for a long time, but can't...
What you need here is to pass the grep command a regular expression that describes your pattern of interest, on the basis of which grep will match only valid lines.
Taking into account your indications, the following regular expression could do the job:
^([A-z]|-)+:([0-9]|xxxx)+:([A-z]|[0-9]|-)+$
The expression begins and ends with the ^ and $ anchors, that indicate the beginning and the end of a line. Then, you basically have three token blocks, separated by :, the first matching letters and dashes, the second years or xxxx, and the third letters, digits and dashes. + is a quantifier, indicating that the preceding token can appear one or more times.
You can use it with grep like so:
grep -P "^([A-z]|-)+:([0-9]|xxxx)+:([A-z]|[0-9]|-)+$"
The -P option is to indicate to interpret it as a Perl regex and correctly handle hyphens matching.

Adding \n at every # characters in string Ruby [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I would like to start a new line after every 66 characters for any file that is input into a Ruby script.
some_string.insert( 66, "\n" )
puts some_string
shows that a new line starts after the 66th character but I need it to happen after each 66th character. In other words, each line should be 66 characters long (except possibly the last).
I'm sure it involves a regex but I've tried various with insert, scan, gsub and cannot get it to work.
I'm new to Ruby and programming and this is the first thing I've tried outside of a tutorial. Thanks for the information, all.
You could do something like this:
<your_string>.scan(/.{1,66}/).join("\n")
It will basically split <your_string> at every 66th character and then re-join it by adding the \n between each part.
Or this variation to not split words in half:
<your_string>.scan(/.{1,66} /).join("\n")
some_string.gsub(/.{66}/, "\n")
If you're interested in exploring an answer that doesn't use RegEx, try something like:
a = "Your string goes here"
d = 66
Array(0..a.length/d).collect {|j| a[j*d..(j+1)*d-1]}.join("\n")
The RegEx is likely faster, but this uses the Array Constructor, .collect and .join so it might be an interesting learning exercise. The first part generates an array of numbers based on the number of chunks (a.length/d). The collect gathers the substrings in to an array. The body of the collect generates substrings by ranges on the original string, and the join puts it back together with '\n' separators.
Use the following to split the string into an array of strings of length 66 and join those strings with a newline character.
some_string.scan(/.{1,66}/).join("\n")

Using read command [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am extremely confused about using the read command. Can someone please explain this to me? For example if I have:
An executable file called script, containing
read first second
echo $first
echo $second
and you call it with:
echo This is a line of input | ./script
What happens and why? I can't get it to work and something is supposed to displayed
Running help read gives this info:
The line is split into fields as with word splitting, and the first
word is assigned to the first NAME, the second word to the second
NAME, and so on, with any leftover words assigned to the last NAME.
So what's happening here is that the first word is assigned to the variable $first, and the rest of the input line assigned to the last variable $second.
If you want to keep the first word in $first and the second word in $second, try adding a $third variable.

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1 when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?
Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
call program with an input filename in param and optionally a second filename(dictionary)
verify the input file exists and isnt empty
read the file's content and echo it on screen
transform to lowercase
scan through the text and count the amount of each letter to do a frequency analysis
ask the user what langage is the text supposed to be (english default)
use the response to specify which letter frequencies to use as a baseline
swap letters corresponding to the frequency analysis in uppercase..
print the changed document on screen
ask the user to swap letters in the crypted text
if user had given a dictionary file as the second argument
then scan the cipher for words and find the bigger words
find words with a similar pattern (some letters repeating letters) in the dictionary file
list on screen the results if any
offer to swap the letters corresponding in the cipher
print modified cipher on screen
ask again to swap letters or find more similar words
More or less it the way I see the script structured.
Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'
As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).
grep -o . inputfile | sort | uniq -c | sort -rn
Example:
$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
5 B
3 b
3 A
1 a
1 3
1 2
1 1

Resources