inserting text around a list of things - ruby

This is something simple that, for some reason is eluding me. I should think I should be able to do this from the bash prompt with a very simple script or one-liner.
I have a file, consisting of a list of numbers:
12345
23456
34567
45678
Very simple. I want to change it to:
arglebargle-12345-fulferol-12345-applesauce
arglebargle-23456-fulferol-23456-applesauce
arglebargle-34567-fulferol-34567-applesauce
arglebargle-45678-fulferol-45678-applesauce
So... insert a string on both sides of a number (or a string, it just happens that these strings are numbers, it is not essential that they be numbers)... then append the original string, and put a third string after that.
I think I would prefer to do this in sed or awk as a one-liner. Or as a ruby script. It should be so easy, and it is evading my mind for some reason!

Using sed
echo "12345
> 23456
> 34567
> 45678" | sed -e 's/\(.*\)/arglebargle-\1-fulferol-\1-applesauce/g'
arglebargle-12345-fulferol-12345-applesauce
arglebargle-23456-fulferol-23456-applesauce
arglebargle-34567-fulferol-34567-applesauce
arglebargle-45678-fulferol-45678-applesauce
If you want to substitute in place in a file(where your file is a.txt for example) you can do
sed -i 's/\(.*\)/arglebargle-\1-fulferol-\1-applesauce/g' a.txt

Something like this?
nums = %w(12345 23456 34567 45678)
nums.each { |num| puts "arglebargle-#{num}-fulferol-#{num}-applesauce" }
Output:
arglebargle-12345-fulferol-12345-applesauce
arglebargle-23456-fulferol-23456-applesauce
arglebargle-34567-fulferol-34567-applesauce
arglebargle-45678-fulferol-45678-applesauce

This should accomplish the job:
awk '{ print "arglebargle-" $0 "-fulferol-" $0 "-applesauce" }' numFile
See this related question.

Related

bash: substract constant number after prefix

I have a large text file with many entries like this:
/locus_tag="PREFIX_05485"
including the empty spaces in the beginning. Unfortunately, the first identifier does not start with 00001.
The only part in this line that is changing is the number.
I would like to change the PREFIX (this I can do easily with sed), but I also want to decrease the number so it looks like this:
/locus_tag="myNewPrefix_00001"
(the next entry should be ..."myNewPrefix_00002" and so on). Alternatively, the entry could also be without leading zeros.
As far as I know, sed cannot calculate (like substracting a constant number). Any ideas how I can solve that?
Thank you very much. If the question is unclear, please let me know and I will try to improve it.
EDIT: Sometimes the same number occurs twice (this should also be the case in the modified file, for instance
/locus_tag="PREFIX_12345"
/locus_tag="PREFIX_12345"
/locus_tag="PREFIX_12346"
/locus_tag="PREFIX_12347"
should be in the end
/locus_tag="myNewPrefix_00001"
/locus_tag="myNewPrefix_00001"
/locus_tag="myNewPrefix_00002"
/locus_tag="myNewPrefix_00003"
You may use awk:
awk -v pf='myNewPrefix' 'BEGIN{FS=OFS="="}
$1 ~ /\/locus_tag$/ && split($2, a, /_/) == 2 {
$2 = sprintf("\"%s_%05d\"", pf, (a[2] in seen ? i : ++i)); seen[a[2]]
} 1' file
/locus_tag="myNewPrefix_00001"
/locus_tag="myNewPrefix_00001"
/locus_tag="myNewPrefix_00002"
/locus_tag="myNewPrefix_00003"
Check this Perl one liner
/tmp> cat littlebird.txt
abcdef
/locus_tag="PREFIX_12345"
hello hai
/locus_tag="PREFIX_12345"
/locus_tag="PREFIX_12346"
/locus_tag="PREFIX_12347"
123 456
end
/tmp> perl -pe 'BEGIN{$r=qr/PREFIX_(.+)["]/} if(/$r/) {$kv{$1}++;$kv{$1}==1 and $kv2{$1}=sprintf("%04d",++$i) for(keys %kv) } s/$r/PREFIX_$kv2{$1}/g ' littlebird.txt
abcdef
/locus_tag="PREFIX_0001
hello hai
/locus_tag="PREFIX_0001
/locus_tag="PREFIX_0002
/locus_tag="PREFIX_0003
123 456
end
/tmp>

Sed insert file contents rather than file name

I have two files and would like to insert the contents of one file into the other, replacing a specified line.
File 1:
abc
def
ghi
jkl
File 2:
123
The following code is what I have.
file1=numbers.txt
file2=letters.txt
linenumber=3s
echo $file1
echo $file2
sed "$linenumber/.*/r $file1/" $file2
Which results in the output:
abc
def
r numbers.txt
jkl
The output I am hoping for is:
abc
def
123
jkl
I thought it could be an issue with bash variables but I still get the same output when I manually enter the information.
How am I misunderstanding sed and/or the read command?
Your script replaces the line with the string "r $file1". The part in sed in s command is not re-interpreted as a command, but taken literally.
You can:
linenumber=3
sed "$linenumber"' {
r '"$file1"'
d
}' "$file2"
Read line number 3, print file1 and then delete the line.
See here for a good explanation and reference.
Surely we can make that a oneliner:
sed -e "$linenumber"' { r '"$file2"$'\n''d; }' "$file1"
Life example at tutorialpoints.
I would use the c command as follows:
linenumber=3
sed "${linenumber}c $(< $file1)" "$file2"
This replaces the current line with the text that comes after c.
Your command didn't work because it expands to this:
sed "3s/.*/r numbers.txt/" letters.txt
and you can't use r like that. r has to be the command that is being run.

Using sed to find-and-replace in a text file using strings from another text file

I have two files as follows. The first is sample.txt:
new haven co-op toronto on $1245
joe schmo co-op powell river bc $4444
The second is locations.txt:
toronto
powell river
on
bc
We'd like to use sed to produce a marked up sample-new.txt that added ; before and after each of these. So that the final string would appear like:
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
Is this possible using bash? The actual files are much longer (thousands of lines in each case) but as a one-time job we're not too concerned about processing time.
--- edited to add ---
My original approach was something like this:
cat locations.txt | xargs -i sed 's/{}/;/' sample.txt
But it only ran the script once per pattern, as opposed to the methods you've proposed here.
Using awk:
awk 'NR==FNR{a[NR]=$0; next;} {for(i in a)gsub("\\<"a[i]"\\>",";"a[i]";"); print} ' locations.txt sample.txt
Using awk+sed
sed -f <(awk '{print "s|\\<"$0"\\>|;"$0";|g"}' locations.txt) sample.txt
Same using pure sed:
sed -f <(sed 's/.*/s|\\<&\\>|\;&\;|g/' locations.txt) sample.txt
(After you show your coding attempts, I will add the explanation of why this works.)
Just to complete your set of options, you can do this in pure bash, slowly:
#!/usr/bin/env bash
readarray -t places < t2
while read line; do
for place in "${places[#]}"; do
line="${line/ $place / ;$place; }"
done
echo "$line"
done < t1
Note that this likely won't work as expected if you include places that are inside other places, for example "niagara on the lake" which is in "on":
foo bar co-op ;niagara ;on; the lake; on $1
Instead, you might want to do more targeted pattern matching, which will be much easier in awk:
#!/usr/bin/awk -f
# Collect the location list into the index of an array
NR==FNR {
places[$0]
next
}
# Now step through the input file
{
# Handle two-letter provinces
if ($(NF-1) in places) {
$(NF-1)=";" $(NF-1) ";"
}
# Step through the remaining places doing substitutions as we find matches
for (place in places) {
if (length(place)>2 && index($0,place)) {
sub(place,";"place";")
}
}
}
# Print every line
1
This works for me using the data in your question:
$ cat places
toronto
powell river
niagara on the lake
on
bc
$ ./tst places input
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
foo nar co-op ;niagara on the lake; ;on; $1
You may have a problem if your places file contains an actual non-province comprising two letters. I'm not sure if such things exist in Canada, but if they do, you'll either need to tweak such lines manually, or make the script more complex by handling provinces separately from cities.

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

convert multiply lines between pattern to a comma separated string

I need help in processing data from STDIN (data is taken from another file with 'tail -f' plus grepped to filter out garbage). There are several lines between patterns:
<DN> 589</DN>
<DD>03.12.2014</DD>
<ST> </ST>
<STC>0</STC>
<STT>0</STT>
<PU>5</PU>
<OT>01</OT>
<DSN></DSN>
<NRA>40807,40820,426,30231,40818,30230</NRA>
<GR>300 000-00
&#10</GR>
then next block with DN/GR starts
I need to convert lines between and to a single line, comma-separated:
<DN> 589</DN>,<DD>03.12.2014</DD>,<ST> </ST>,<STC>0</STC>,<STT>0</STT>,<PU>5</PU>,<OT>01</OT>,<DSN></DSN>,<NRA>40807,40820,426,30231,40818,30230</NRA>,<GR>300 000-00
&#10</GR>
I need a one-liner with awk or sed or perl to do it and put result to STDOUT.
I've tried to do it, but failed due to lack of experience. Also tried to google and didn't find a working solution.
whatever..| awk '{sub(/^\s*/,"");printf "%s%s",$0,(/\/GR>\s*$/?"\n":",")}'
this line does:
remove the leading spaces from each line
join all line with sep , till the block end /GR>
if you have x data blocks, it gives you x long lines.
sed -nr '/<DN>/,/<GR>/{ H; /<GR>/{ g; s%\n%,%g; s%^,%%; p; s%.*%%; h }; }' <<'EOSEQ'
<DN> 589</DN>
<DD>03.12.2014</DD>
<STC>0</STC>
<GR>300 000-00
&#10</GR>
<DN>900</DN>
<DD>20.11.2014</DD>
<OT>01</OT>
<NRA>40807,40820,426,30231,40818,30230</NRA>
<GR>300 000-00
&#10</GR>
EOSEQ
SED one-liner, as you wish :)
Using awk you could do the following:
awk '{printf ("%s,", $NF)}' test.txt ##Will have comma at the end which may/may not be ok for you.
You can use the following one in sed.
sed -r ':loop ;N;s/(.*)\n(.*)/\1,\2/ ; t loop ' file name.

Resources