egrep between 2 ranges in same column csv - bash

not sure how to iterate between 2 sets of data on the same column, so lets say i have a CSV file with all titanic passangers and i want to extract the people between 20 and 29 years old and from 40 to 49 years old, and people who spoke english AND other lenguage lets say french, since both data are in the same column is quite challenging.
egrep does not seem to have a AND only and or so im struggling to find how to do it
so what i was trying was something like (from a coma separated csv)
3rd columns is Age and 8th is lenguage
(despite that i know that it might be easier solutions with some sed/awk etc i need it for training porposes in egrep)
egrep "^.*,.*,[2-0][0-9],.*,.*,[eng.*]" titanic-passengers.csv
thanks in advance.

You should use [^,]* to match a single column. .* will match across multiple columns.
To match 20-29 use 2[0-9]; to match 40-49 use 4[0-9]. You can then combine them with [24][0-9].
You don't need to put [] around the language, that's for matching a single character that's any of the characters in the brackets.
grep -E '^[^,]*,[^,]*,[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,eng' titanic-passengers.csv

maybe this one?
grep -E '^[^,]*,[^,*],[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,[^,]*( english|english )[^,]*' titanic-passengers.csv
#Barmar explained well the other patterns so I'll explain the "language" part.
To be sure to match at least one more language than english, you need to force a space before or after the word english. The OR operator is expressed by (pattern1|pattern2)

Related

Regex: Grouping with OR

I'm new here, so please don't scold me for misspellings etc.
What I need to do is to rename a bunch of files with a date in different formats at the beginning of their names, like:
05.07.2020-abc.pdf
2020.07.05-pqr.pdf
Instead of writing a different expression for each formatting, eg.
^(\d{2})\.(\d{2}).(\d{4})(.+) => $3-$2-$1$4
Example
02.11.2022-abc.pdf => 2022-11-02-abc.pdf
I'd like to do it in one fell swoop using the OR operator "|" but I have no idea how to formulate the groupings etc. Can one have nested groupings in regex?
Any ideas? Thank in advance!
#The fourth bird:
No (.+) needed. You're right, I condensed my actual expression and could have taken it out.
The different date 'formats' I mean are dd.mm.yyyy and yyyy.mm.dd respectively, and I need to convert both to yyyy-mn-dd
So,if the format is dd.mm.yyyy I have to flip the string, so to say, else I just need to replace the dots by hyphens.
The OS is Android, and for this operation I use Solid Explorer multi search & replace using regex.
I hope I made myself clear this time around ;-)

BASH - How to delete all numerals from a text file, unless they are part of a specific string?

I have a text file, and I want to delete all the numerals included in them. However, there are two key strings "9/11" and "September 11", in which I want to keep the numerals. How can I delete all the numerals except when they are a part of these key strings?
I use sed 's/[0-9]*//g' to get rid of the numerals. So for now, the sample text before processing would be something like this:
12 Aug. 2002, News Section. 9/11 was a terrible tragedy for the nation, in which 2,500 ...
And I want the file after processing to look like this:
Aug. , News Section. 9/11 was a terrible tragedy for the nation, in which ...
I tried searching for the answer, but to no avail. Thanks in advance for any suggestions.
This will do the job. It's like a kind of capturing the part we want to stay and matching the part you want to remove. So by replacing all the matched characters with the chars present inside group index 1 will make the captured chars to stay and the other matched chars to leave.
sed 's~\(\b9/11\b\|\bSeptember 11\b\)\|[[:digit:]]~\1~g' file
DEMO

return line of strings between two strings in a ruby variable

I would like to extract a line of strings but am having difficulties using the correct RegEx. Any help would be appreciated.
String to extract: KSEA 122053Z 21008KT 10SM FEW020 SCT250 17/08 A3044 RMK AO2 SLP313 T01720083 50005
For Some reason StackOverflow wont let me cut and paste the XML data here since it includes "<>" characters. Basically I am trying to extract data between "raw_text" ... "/raw_text" from a xml that will always be formatted like the following: http://www.aviationweather.gov/adds/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&hoursBeforeNow=3&mostRecent=true&stationString=PHNL%20KSEA
However, the Station name, in this case "KSEA" will not always be the same. It will change based on user input into a search variable.
Thanks In advance
if I can assume that every strings that you want starts with KSEA, then the answer would be:
.*(KSEA.*?)KSEA.*
using ? would let .* match as less as possible.

Swap characters in specific positions in strings of varying lengths

I've been trying to learn sed and the examples I've found here are for swapping dates from 05082012 to 20120805 and I'm having trouble adapting them to my current need.
I need to convert an IP address 10.4.13.22 to a reverse lookup of 22.13.4.10 for a nsupdate script. My biggest problem is the fact that sometimes each octet can change lengths e.g. 10.4.13.2 and 10.19.8.126
Thanks for any help!
echo 10.0.2.99 | sed 's/\(....\)\(....\)/\2\1/'
this is currently what I've tried, just based off another question here, but since the examples don't provide much explanation as to what .... means, Im having trouble understanding what it does.
This is the output of that command .2.910.09 and I am expecting 99.2.0.10
Directly, I want to rearrange each "section" that is separated by a "."
A "bruteforce" method to "reverse" an IPv4 address would be:
sed 's/\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)/\4.\3.\2.\1/g'
or, for GNU sed,
sed -r 's/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/\4.\3.\2.\1/g'

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1 when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?
Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
call program with an input filename in param and optionally a second filename(dictionary)
verify the input file exists and isnt empty
read the file's content and echo it on screen
transform to lowercase
scan through the text and count the amount of each letter to do a frequency analysis
ask the user what langage is the text supposed to be (english default)
use the response to specify which letter frequencies to use as a baseline
swap letters corresponding to the frequency analysis in uppercase..
print the changed document on screen
ask the user to swap letters in the crypted text
if user had given a dictionary file as the second argument
then scan the cipher for words and find the bigger words
find words with a similar pattern (some letters repeating letters) in the dictionary file
list on screen the results if any
offer to swap the letters corresponding in the cipher
print modified cipher on screen
ask again to swap letters or find more similar words
More or less it the way I see the script structured.
Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'
As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).
grep -o . inputfile | sort | uniq -c | sort -rn
Example:
$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
5 B
3 b
3 A
1 a
1 3
1 2
1 1

Resources