Regular expression to find pattern with insertion and deletion cross different lines? - bash

I have text file with following lines as example:
AZZKLMNANAKK
AZZLNAKK
AZLPMNNAK
I would like to write regular expression (AZLN)which allows me to search for specific pattern that is shared between different lines with specifying certain number of insertion and deletion.
Would someone can help me with that ?

If your requirement is to do a simple search then use grep as follows.
grep "AZLN" Input_file

Related

Bash: Sort file numerically, but only where the first field matches a pattern

Due to poor past naming practices, I'm left with a list of names that is proving to be a challenge to work with. The bottom line is that I want the most current name (by date) to be placed in a variable. All the names are listed (unsorted) in a file called bar.txt.
In this case I can't rename, and there's no way to get the actual dates of the images; these names are all I have to go on. The names can follow one of several patterns;
foo
YYYYMMDD-foo
YYYYMMDD##-foo
foo can be anything from a single character to a long string of letters/numbers/symbols. I am interested only in the names matching the second use case, YYMMDD-foo, as those are from after we started tagging consistently.
I would like to end up with a variable containing the most recent date that follows the pattern YYMMDD-foo.
I know sort -k1 -n < bar.txt, but then I'm not sure how to isolate the second pattern's results to extract what I need.
How do I sort the file to ignore anything but the second pattern, and return the most current date?
Sample
Given that bar.txt looks like this;
test
2017120901-develop-BUILD-31
20170326-TEST-1.2.0
20170406-BUILD-40-1.2.0-test
2010818_001
I would want to extract 20170406-BUILD-40-1.2.0-test
Since your requirement involves 1) to get only files of a certain format 2) apply sorting and get only the latest file. Am using a Awk & GNU sort together to achieve it
awk -F'-' 'length($1) == 8' file | sort -nrk1 | head -1
20170406-BUILD-40-1.2.0-test
The solution works by only getting those lines in the file whose first column has 8 characters exactly corresponding to YYYYMMDD alignment. Once those filtered, sort applied on first field and the first line is obtained using head.

ParseGlob: What is the pattern to parse all templates recursively within a directory?

Template.ParseGlob("*.html") //fetches all html files from current directory.
Template.ParseGlob("**/*.html") //Seems to only fetch at one level depth
Im not looking for a "Walk" solution. Just want to know if this is possible. I don't quite understand what "pattern" this expects. if i can get an explanation about the pattern used by ParseGlob that would be great too.
The code text/template/helper.go mentions
// The pattern is processed by filepath.Glob and must match at least one file.
filepath.Glob() says that "the syntax of patterns is the same as in Match"
Match returns true if name matches the shell file name pattern.
The implementation of Match() doesn't seem to treat '**' differently, and only consider '*' as matching any sequence of non-Separator characters.
That would mean '**' is equivalent to '*', which in turn would explain why the match works at one level depth only.
So, since the ParseGlob can't load templates recursively we have to use path/filepath.Walk function. But this way gives more opportunities.
https://gist.github.com/logrusorgru/abd846adb521a6fb39c7405f32fec0cf

How can I write a regex to repeatedly capture group within a larger match?

I'm getting a regex headache, so hopefully someone can help me here. I'm doing some file syntax conversion and I've got this situation in the files:
OpenMarker
keyword some expression
keyword some expression
keyword some expression
keyword some expression
keyword some expression
CloseMarker
I want to match all instances of "keyword" inside the markers. The marker areas are repeated and the keyword can appear in other places, but I don't want to match outside of the markers. What I don't seem to be able to work out is how to get a regex to pull out all the matches. I can get one to do the first or the last, but not to get all of them. I believe it should be possible and it's something to do with repeated capture groups -- can someone show me the light?
I'm using grepWin, which seems to support all the bells and whistles.
You could use:
(?<=OpenMarker((?!CloseMarker).)*)keyword(?=.*CloseMarker)
this will match the keyword inside OpenMarker and CloseMarker (using the option "dot matches newline").
sed -n -e '/OpenMarker[[:space:]]*CloseMarker/p' /path/to/file | grep keyword should work. Not sure if grep alone could do this.
There are only a few regex engines that support separate captures of a repeated group (.NET for example). So your best bet is to do this in two steps:
First match the section you're interested in: OpenMarker(.*?)CloseMarker (using the option "dot matches newline").
Then apply another regex to the match repeatedly: keyword (.*) (this time without the option "dot matches newline").

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1 when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?
Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
call program with an input filename in param and optionally a second filename(dictionary)
verify the input file exists and isnt empty
read the file's content and echo it on screen
transform to lowercase
scan through the text and count the amount of each letter to do a frequency analysis
ask the user what langage is the text supposed to be (english default)
use the response to specify which letter frequencies to use as a baseline
swap letters corresponding to the frequency analysis in uppercase..
print the changed document on screen
ask the user to swap letters in the crypted text
if user had given a dictionary file as the second argument
then scan the cipher for words and find the bigger words
find words with a similar pattern (some letters repeating letters) in the dictionary file
list on screen the results if any
offer to swap the letters corresponding in the cipher
print modified cipher on screen
ask again to swap letters or find more similar words
More or less it the way I see the script structured.
Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'
As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).
grep -o . inputfile | sort | uniq -c | sort -rn
Example:
$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
5 B
3 b
3 A
1 a
1 3
1 2
1 1

Getting Applescript to do some work

I have two files.
One has a list of words. The other has a large list of words that includes all words in the original file. How can I write a script to search all words in list 1 and get the line on list 2 with that exact word (list 2 has the words plus additional information) and write them to a third file?
I've been searching online but couldn't find a solution.
I would look at sed, awk or grep one of those tools should be the right direction...

Resources