Swap characters in specific positions in strings of varying lengths - bash

I've been trying to learn sed and the examples I've found here are for swapping dates from 05082012 to 20120805 and I'm having trouble adapting them to my current need.
I need to convert an IP address 10.4.13.22 to a reverse lookup of 22.13.4.10 for a nsupdate script. My biggest problem is the fact that sometimes each octet can change lengths e.g. 10.4.13.2 and 10.19.8.126
Thanks for any help!
echo 10.0.2.99 | sed 's/\(....\)\(....\)/\2\1/'
this is currently what I've tried, just based off another question here, but since the examples don't provide much explanation as to what .... means, Im having trouble understanding what it does.
This is the output of that command .2.910.09 and I am expecting 99.2.0.10
Directly, I want to rearrange each "section" that is separated by a "."

A "bruteforce" method to "reverse" an IPv4 address would be:
sed 's/\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)/\4.\3.\2.\1/g'
or, for GNU sed,
sed -r 's/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/\4.\3.\2.\1/g'

Related

egrep between 2 ranges in same column csv

not sure how to iterate between 2 sets of data on the same column, so lets say i have a CSV file with all titanic passangers and i want to extract the people between 20 and 29 years old and from 40 to 49 years old, and people who spoke english AND other lenguage lets say french, since both data are in the same column is quite challenging.
egrep does not seem to have a AND only and or so im struggling to find how to do it
so what i was trying was something like (from a coma separated csv)
3rd columns is Age and 8th is lenguage
(despite that i know that it might be easier solutions with some sed/awk etc i need it for training porposes in egrep)
egrep "^.*,.*,[2-0][0-9],.*,.*,[eng.*]" titanic-passengers.csv
thanks in advance.
You should use [^,]* to match a single column. .* will match across multiple columns.
To match 20-29 use 2[0-9]; to match 40-49 use 4[0-9]. You can then combine them with [24][0-9].
You don't need to put [] around the language, that's for matching a single character that's any of the characters in the brackets.
grep -E '^[^,]*,[^,]*,[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,eng' titanic-passengers.csv
maybe this one?
grep -E '^[^,]*,[^,*],[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,[^,]*( english|english )[^,]*' titanic-passengers.csv
#Barmar explained well the other patterns so I'll explain the "language" part.
To be sure to match at least one more language than english, you need to force a space before or after the word english. The OR operator is expressed by (pattern1|pattern2)

How to sort datafiles with some numbers of Fortran-style D+01?

I have several Fortran datafiles that contain numbers in a format like this:
-0.53D+02
I want to combine these with simple floating point data like
-0.53
and then sort them, like with Unix sort.
Unfortunately sort can't recognize this format, so I am looking for a simple converter, but couldn't find anything online. I thought about a Fortran script and converting it from double precision to float, but I am not quite sure about the number of digits and this is always a bit tedious with Fortran.
So does anyone know a script that can do this, a sorting program that can read that format or maybe even just a short sed command that might help? I am not that good with sed, so it would cost me quite a while to figure out how...
I think you can use this:
sed 's/D/E/g' YourNumbersFile | sort -g
The sed command changes all Ds to Es - read it like this... "substitute all Ds with Es, globally".
Thesort command needs the -g option to sort general numerical numbers.
If your sort doesn't accept the -g switch, I guess another option might be to use this awk to reformat your numbers:
awk '{sub(/D/,"e");printf "%8.3f\n", $1}' YourNumbersFile | sort -n

Case Sensitive Sort Unix Bash

Here is a screenshot of an issue I'm having with sort:
http://i.imgur.com/cIvAF.png
The objective I want out of this, is to put all equal strings on consecutive lines. It works for 99% of the list I'm sorting, but there's a few hitches such as those in the screen shot.
So all the yahoo.coms should be next to each other, and then all the Yahoo.coms then the YAHOO.coms yahoo.cmos yhoo.c etc. (The typos even getting their own group of lines)
Not entirely sure how to handle this with sort, but I'm certainly trying.
I print all the domains unsorted to a file and then sort it with just vanilla sort filename
Would love some advice/input.
You probably need to override the locale; most Linux systems default to a UTF8 locale which specifies both case independent sorting and ignoring punctuation.
LANG=C sort filename
normalize your input a bit
tr [A-Z] [a-z]
Try reading "Unix for poets"

Take token from this bash string/array...not sure which it is

Hi I am writing a bash script and I have a string
foo=1.0.3
What I want to do is examine the '3'. The first thing I did was get rid of the periods by doing this. bar=echo $foo|tr '.' ' ' with backticks around echo until the last single quote (not sure how to accomplish writing that.
When I do an echo $bar it prints 1 0 3. Now how do I create a variable that holds only the 3? thank you very much
As you are no doubt learning about bash, there are many many ways to achieve your goals. I think #Mat's answer using bar=${foo##*.} is the best so far, although he doesn't explain how or why it works. I strongly recommend you check out the bash tutorial on tldp, it is my goto source when I have questions like this. For string manipulation, there is a section there that discusses many of the different ways to go about this sort of thing.
For example, if you know that foo is always going to be 5 characters long, you can simply take the fifth character from it:
bar=${foo:4}
That is, make bar the fifth position of foo (remember, we start counting from zero, not from one).
If you know it is always going to be the last position of foo, then you can just count backwards:
bar=${foo: -1}
Notice there is a space between the -1 and the colon, you need that (or parenthesis) to escape the negative sign.
To explain #Mat's answer, I had to look at the link I provided above. Apparently the double pound signs (hash mark, octothorpe, whatever you want to call them) in the expression:
${string##substring}
Mean to delete longest match of $substring from front of $string. So you are looking for the longest match of *. which equates to everything before a dot. Pretty cool, huh?
This should work:
bar=$(echo $foo|cut -d. -f3)
If you know you only want the part after the last dot (not the third item in a .-separated list) you can also do this:
bar=${foo##*.}
Advantage: no extra process or subshell started.
One way: Build an array and take position 2:
array=(`echo $foo | tr . ' '`)
echo ${array[2]}
This should also work too:
echo $foo | awk -F. '{print $3}'

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1 when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?
Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
call program with an input filename in param and optionally a second filename(dictionary)
verify the input file exists and isnt empty
read the file's content and echo it on screen
transform to lowercase
scan through the text and count the amount of each letter to do a frequency analysis
ask the user what langage is the text supposed to be (english default)
use the response to specify which letter frequencies to use as a baseline
swap letters corresponding to the frequency analysis in uppercase..
print the changed document on screen
ask the user to swap letters in the crypted text
if user had given a dictionary file as the second argument
then scan the cipher for words and find the bigger words
find words with a similar pattern (some letters repeating letters) in the dictionary file
list on screen the results if any
offer to swap the letters corresponding in the cipher
print modified cipher on screen
ask again to swap letters or find more similar words
More or less it the way I see the script structured.
Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'
As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).
grep -o . inputfile | sort | uniq -c | sort -rn
Example:
$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
5 B
3 b
3 A
1 a
1 3
1 2
1 1

Resources