Find multiple strings between values and replace with newline in bash

Find multiple strings between values and replace with newline in bash - bash

I need to write a bash script to list values from an sql database.
I've got so far but now I need to get the rest of the way.
The string so far is
10.255.200.0/24";i:1;s:15:"10.255.207.0/24";i:2;s:14:"192.168.0.0/21
I now need to delete everything between the speech marks and send it to a new line.
desired output:
10.255.200.0/24
10.255.207.0/24
192.168.0.0/21
any help would be greatly appreciated.

$ tr '"' '\n' <<< $string | awk 'NR%2'
10.255.200.0/24
10.255.207.0/24
192.168.0.0/21

You could use :
echo 'INPUT STRING HERE' | sed $'s/"[^"]*"/\\\n/g'
Explanation :
sed 's/<PATTERN1>/<PATTERN2/g' : we substitute every occurrence of PATTERN1 by PATTERN2
[^"]*: any character that is not a ", any number of time
\\\n: syntax for newline in sed (reference here)

Considering that your Input_file is same as shown sample then could you please try following.
awk '
{
while(match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\/[0-9]+/)){
print substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}' Input_file

This might work for you (GNU sed):
sed 's/"[^"]*"/\n/g' file
Or using along side Bash:
sed $'/"[^"]*"/\\n/g' file
Or using most other sed's:
sed ':a;/"[^"]*"\(.*\)\(.\|$\)/{G;s//\2\1/;ba}' file
This uses the feature that an unadulterated hold space contains a newline.

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?

You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters

1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

I need delete two " " with sed command

I need to delete "" in file
"CITFFUSKD-E0"
I have tried sed 's/\"//.
Result is:
CITFFUSKD-E0"
How I can delete both ?
Also I need to delete everything behind first word but input can be this one:
"CITFFUSKD-E0"
"CITFFUSKD_E0"
"CITFFUSKD E0"
Result I want it:
CITFFUSKD

You may use
sed 's/"//g' file | sed 's/[^[:alnum:]].*//' > newfile
Or, contract the two sed commands into one sed call as #Wiimm suggests:
sed 's/"//g;s/[^[:alnum:]].*//' file > newfile
If you want to replace inline, see sed edit file in place.
Explanation:
sed 's/"//g' file - removes all " chars from the file
sed 's/[^[:alnum:]].*//' > newfile - also removes all chars from a line starting from the first non-alphanumeric char and saves the result into a newfile.

Could you please try following.
awk 'match($0,/[a-zA-Z]+[^a-zA-Z]*/){val=substr($0,RSTART,RLENGTH);gsub(/[^a-zA-Z]+/,"",val);print val}' Input_file

delete everything behind first word
sed 's/^"\([[:alpha:]]*\)[^[:alpha:]]*.*/\1/'
Match the first ". Then match a sequence of alphabetic characters. Match until you find non-alphabetic character ^[:alpha:]. Then match the rest. Substitute it all for \1 - it is a backreference for the part inside \( ... \), ie. the first word.
I need delete two “ ” with sed command
Remove all possible ":
sed 's/"//g'
Extract the string between ":
sed 's/"\([^"]*\)"/\1/'
Remove everything except alphanumeric characters (numbers + a-z + a-Z, ie. [0-9a-zA-z]):
sed 's/[^[:alnum:]]//g'

This should do all in one go, remove the ", print the first part:
awk -F\" '{split($2,a,"-| |_");print a[1]}' file
CITFFUSKD
CITFFUSKD
CITFFUSKD

When you have 1 line, you can use
grep -Eo "(\w)*" file | head -1
For normal files (starting with a double quote on each line)
, try this
tr -c [^[:alnum:]] '"' < file | cut -d'"' -f2

Many legitimate ways to solve this.
I favor using what you know about your data to simplify solutions -- this is usually an option. If everything in your file follows the same pattern, you can simply extract the first set of capitalized letters encountered:
sed 's/"\([A-Z]\+\).*$/\1/' file

awk '{gsub(/^.|....$/,"")}NR==1' file
CITFFUSKD

Split 2nd occurrence of pattern of camel style text in sed

I am trying to create a key, value strings table for a mac app using sed and awk. So far I have got it to the point of having lines like:
"exif:DateTimeOriginal" = "DateTimeOriginal:\t";
I want to do a final step to get:
"exif:DateTimeOriginal" = "Date Time Original:\t";
In other words split up the second occurrence of the camel text.
I have seen sed like this:
sed 's/\([A-Z]\)/ \1/g'
Which would do it globally and then just do the 2nd occurrence with:
sed 's/\([A-Z]\)/ \1/2g'
Or is it 3rd occurrence. However, unfortunately on macos you can't combine a number with the g command.
So is there another way to do this?
BTW, I could make it so that you start with:
"exif:DateTimeOriginal" = DateTimeOriginal:\t";
That is, leave out the leading quote of the camel text, so that if a leading space is added by splitting the camel text, it would be added after the = which wouldn't matter. Then add the leading quote after the camel text is split.

Here is how you could do it with sed:
sed -E -e ':a' -e 's/^([^=]+)= (.*)([a-z])([A-Z])/\1= \2\3 \4/' -e 'ta'
The idea is to apply repeated substitutions (:a and ta) where you match the part you don't want to change ([^=]+) and then insert a space between a lowercase letter followed by an upper case letter ([a-z][A-Z]) in the remainder.

with GNU awk (not the default for your OS).
$ awk -F'"' -v OFS='"' '{$4=gensub(/([^A-Z])([A-Z])/,"\\1 \\2","g",$4)}1' file
"exif:DateTimeOriginal" = "Date Time Original:\t";
you may need [:lower:] or [:upper:] char classes based on your locale.

With any POSIX awk:
$ awk 'BEGIN{FS=OFS="\""} {gsub(/[[:upper:]]/," &",$4); sub(/^ /,"",$4)} 1' file
"exif:DateTimeOriginal" = "Date Time Original:\t";

This might work for you (GNU sed):
sed 'h;s/\B[[:upper:]]/ &/g;H;x;s/=.*=/=/' file
Make a copy of the current line.
Insert a space before all capitals within a word.
Append the result to the original line.
Remove the tail of the original line and the head of the result.

Using Perl
$ echo '"exif:DateTimeOriginal" = DateTimeOriginal:\t"' | perl -F'"' -lane ' $F[2]=~s/(?=[A-Z])/ /g;$F[2]=~s/\s+=\s+/=\"/g; print "\"$F[1]\"$F[2]\"" '
"exif:DateTimeOriginal"="Date Time Original: "
$

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"

If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}

I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.

Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.

When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2

grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

replace a string before the semi colon

I have several files, which begins like this :
unit,s_adj,partner,stk_flow,indic,geo\time;aaaa;2222;
time,s_adj,partner,stk_flow,lolo,geo\time;bbb;2222;
I want to replace the first occurence before the semi-colon with that new occurence YEAR
The desired output would be:
YEAR;aaaa;2222;
YEAR;bbb;2222;
I tried with the following command line but it does not seem to do what I want
awk -F ";" 'NR==1 {$1=""; print "year"}' input_file
Your suggestions are welcomed.
Best.

try this:
sed 's/[^;]*/YEAR/' file
if you only want the substitution happen on the 1st line:
sed '1s/[^;]*/YEAR/' file

You can also do:
awk '{$1="YEAR"}1' OFS=\; FS=\; input-file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio