BASH - How to delete all numerals from a text file, unless they are part of a specific string? - bash

I have a text file, and I want to delete all the numerals included in them. However, there are two key strings "9/11" and "September 11", in which I want to keep the numerals. How can I delete all the numerals except when they are a part of these key strings?
I use sed 's/[0-9]*//g' to get rid of the numerals. So for now, the sample text before processing would be something like this:
12 Aug. 2002, News Section. 9/11 was a terrible tragedy for the nation, in which 2,500 ...
And I want the file after processing to look like this:
Aug. , News Section. 9/11 was a terrible tragedy for the nation, in which ...
I tried searching for the answer, but to no avail. Thanks in advance for any suggestions.

This will do the job. It's like a kind of capturing the part we want to stay and matching the part you want to remove. So by replacing all the matched characters with the chars present inside group index 1 will make the captured chars to stay and the other matched chars to leave.
sed 's~\(\b9/11\b\|\bSeptember 11\b\)\|[[:digit:]]~\1~g' file
DEMO

Related

Regex: Grouping with OR

I'm new here, so please don't scold me for misspellings etc.
What I need to do is to rename a bunch of files with a date in different formats at the beginning of their names, like:
05.07.2020-abc.pdf
2020.07.05-pqr.pdf
Instead of writing a different expression for each formatting, eg.
^(\d{2})\.(\d{2}).(\d{4})(.+) => $3-$2-$1$4
Example
02.11.2022-abc.pdf => 2022-11-02-abc.pdf
I'd like to do it in one fell swoop using the OR operator "|" but I have no idea how to formulate the groupings etc. Can one have nested groupings in regex?
Any ideas? Thank in advance!
#The fourth bird:
No (.+) needed. You're right, I condensed my actual expression and could have taken it out.
The different date 'formats' I mean are dd.mm.yyyy and yyyy.mm.dd respectively, and I need to convert both to yyyy-mn-dd
So,if the format is dd.mm.yyyy I have to flip the string, so to say, else I just need to replace the dots by hyphens.
The OS is Android, and for this operation I use Solid Explorer multi search & replace using regex.
I hope I made myself clear this time around ;-)

egrep between 2 ranges in same column csv

not sure how to iterate between 2 sets of data on the same column, so lets say i have a CSV file with all titanic passangers and i want to extract the people between 20 and 29 years old and from 40 to 49 years old, and people who spoke english AND other lenguage lets say french, since both data are in the same column is quite challenging.
egrep does not seem to have a AND only and or so im struggling to find how to do it
so what i was trying was something like (from a coma separated csv)
3rd columns is Age and 8th is lenguage
(despite that i know that it might be easier solutions with some sed/awk etc i need it for training porposes in egrep)
egrep "^.*,.*,[2-0][0-9],.*,.*,[eng.*]" titanic-passengers.csv
thanks in advance.
You should use [^,]* to match a single column. .* will match across multiple columns.
To match 20-29 use 2[0-9]; to match 40-49 use 4[0-9]. You can then combine them with [24][0-9].
You don't need to put [] around the language, that's for matching a single character that's any of the characters in the brackets.
grep -E '^[^,]*,[^,]*,[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,eng' titanic-passengers.csv
maybe this one?
grep -E '^[^,]*,[^,*],[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,[^,]*( english|english )[^,]*' titanic-passengers.csv
#Barmar explained well the other patterns so I'll explain the "language" part.
To be sure to match at least one more language than english, you need to force a space before or after the word english. The OR operator is expressed by (pattern1|pattern2)

How to separate numbers from text using SPLIT/LEFT/RIGHT function in Google Sheets

I have a google sheets documents with data in this format:
Some data 10:5 Somemore Data
I am trying to separate the text from the numbers in separate columns based on the colon sign so that the output looks like this:
Some data | 10 | 5 | Somemore Data
I tried the SPLIT and RIGHT/LEFT functions but I can't get it to work.
This is what I have so far
=LEFT(C2,FIND(":",C2)-3)
This separates the text on the LEFT but using it on the right side doesn't work. My formula also doesn't separate the numbers. Looking for a formula that can achieve the above desired result.
My spreadsheet - https://docs.google.com/spreadsheets/d/1EmL4kzCGxRbwvNJntwMokqgt8yjjAqnZuUidTbZe6Z8/edit?usp=sharing
Thanks.
There is already a solution in your shared sheet with SPLIT and REGEXREPLACE.
Here is one a bit simpler with REGEXEXTRACT:
=ARRAYFORMULA(IF(A2:A="", "", REGEXEXTRACT(A2:A,"^(.+?)[ ]+(\d+)[ ]*:[ ]*(\d+)[ ]+(.+)$")))
Every group will be a cell in a row to the right.
Regex description and demo: link.
Edit: stripped spaces. You have a nasty chars in your strings - nonbreaking space bar which is indistinguishable from the regular space. Could not understand why a simpler regex (^(.+?)\s+(\d+)\s*:\s*(\d+)\s+(.+)$) did not work. All because of this nbsp (char 160). Thus [ ] (nbsp and a regular space) instead of just \s.

Block Indent Regex

I'm having problems about a regexp.
I'm trying to implement a regex to select just the tab indent blocks, but i cant find a way of make it work:
Example:
INDENT(1)
INDENT(2)
CONTENT(a)
CONTENT(b)
INDENT(3)
CONTENT(c)
So I need blocks like:
INDENT(2)
CONTENT(a)
CONTENT(b)
AND
INDENT(3)
CONTENT(c)
How I can do this?
really tks, its almost that, here is my original need:
table
tr
td
"joao"
"joao"
td
"marcos"
I need separated "td" blocks, could i adapt your example to that?
It depends on exactly what you are trying to do, but maybe something like this:
^(\t+)(\S.*)\n(?:\1\t.*\n)*
Working example: http://www.rubular.com/r/qj3WSWK9JR
The pattern searches for:
^(\t+)(\S.*)\n - a line that begins with a tab (I've also captured the first line in a group, just to see the effect), followed by
(?:\1\t.*\n)* - lines with more tabs.
Similarly, you can use ^( +)(\S.*)\n(?:\1 .*\n)* for spaces (example). Mixing spaces and tabs may be a little problematic though.
For the updated question, consider using ^(\t{2,})(\S.*)\n(?:\1\t.*\n)*, for at least 2 tabs at the beginning of the line.
You could use the following regex to get the groups...
[^\s]*.*\r\n(?:\s+.*\r*\n*)*
this requires that your lines not begin with white space for the beginning of the blocks.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I donĀ“t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources