How to insert space between characters after some specific symbols? - bash

I have a text file in the following format:
\Hollands\\\\\\hOlAnz/hOlAns\\\\\\\\
\Hollandse\\\\\\hOlAns#\\\\\\\\
\Hollywood\\\\\\hOliwud/hOliwut/hOliwYd\\\\\\\\
...
and I would like to make it look like this ⬇️:
\Hollands\\\\\\h O l A n z / h O l A n s\\\\\\\\
\Hollandse\\\\\\h O l A n s #\\\\\\\\
\Hollywood\\\\\\h O l i w u d / h O l i w u t / h O l i w Y d\\\\\\\\
What should I do?
Many thanks in advance.
I tried using sed:
sed 's/\{\\\\\\\{1\}\)/\1 /g'
as I was expecting to insert\1 / (one blank space) after {1\}(each one character), after 6 repetitive \s.
but got the error saying
RE error: invalid repetition count(s)

sed is the right tool for doing a simple s/old/new/ operation on individual strings, for anything more than that (e.g. isolating part of a string and then doing further operations on parts of that string as you need), just use awk.
Using GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(.*\\{6}.)(.*)(\\{8})/,a) { $0=a[1] gensub(/./," &","g",a[2]) a[3] } 1' file
\Hollands\\\\\\h O l A n z / h O l A n s\\\\\\\\
\Hollandse\\\\\\h O l A n s #\\\\\\\\
\Hollywood\\\\\\h O l i w u d / h O l i w u t / h O l i w Y d\\\\\\\\

This might work for you (GNU sed):
sed -E 's/(\\{6}[^\\])(.*\\{8})/\1 \n\2/;:a;s/\n([^\\])/\1 \n/;ta;s/ \n//' file
Turn on extended regexp -E.
For all lines,using pattern matching insert a space and unique delimiter (\n) the character after six \'s.
Iterate replacing the newline and the following character by the following character a space and the newline delimiter until the next occurrence of \.
Remove the last space and newline and print the result.

Related

Shell script - iterate over space separated words/characters (in zsh)

I am having some trouble figuring out how to iterate over space separated words/characters in a shell script. For instance I would like to iterate over a variable containing the characters in the alphabet separated by a space.
NOTE: The result should be the same even if the alphabet variable contained space separated strings instead of characters, i.e "aa bb cc ..." instead of "a b c .."
I have tried a lot of the alternatives provided from:
How to split a line into words separated by one or more spaces in bash?
Example:
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local index="0"
for character in $alphabet; do
index=$((++index))
echo "$index. $character"
# Possibility to do some more stuff
done
Expected/Desired output:
1. a
2. b
3. c
and so on..
Result:
1. a b c d e f g h i j k l m n o p q r s t u v w x y z
Additional tests(without success):
####################################################################
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local index="0"
for character in ${alphabet[#]}; do
index=$((++index))
echo "$index. $character"
# Possibility to do some more stuff
done
####################################################################
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local alphabetArray=( ${alphabet} )
local index="0"
for character in "${alphabetArray[#]}"; do
index=$((++index))
echo "$index. $character"
# Possibility to do some more stuff
done
####################################################################
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local alphabetArray=( ${alphabet} )
local index="0"
for character in ${alphabetArray}; do
index=$((++index))
echo "$index. $character"
# Possibility to do some more stuff
done
Could someone provide a solution on how to solve this(I would prefer a solution that iterates the alphabet variable without explicitly using an index variable, i.e $alphabet[index] )?
Thanks for your help. I discovered the error thanks to your feedback.
I thought that it was irrelevant when I posted this question but I was experimenting with functions in my .zshrc file. Hence I was using (just my assumption) the zsh interpreter and not the sh or bash interpreter.
By realizing that this could be a potential problem, I googled and found the following How to iterate through string one word at a time in zsh
So I tested the following and it works as expected:
setopt shwordsplit
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local index="0"
for character in $alphabet; do
index=$(($index+1))
echo "$index. $character"
# Possibility to do some more stuff
done
unsetopt shwordsplit
NOTE:
index=$((++$index))
and/or
index=$(($index++))
Doesn't seem to work as I expected in zsh.
... The little gritty details, I should have used:
((++index))
or
((index++))
instead of
index=$((++$index))
Try this
IFS=$' \t\n'
local alphabet="a b c d e f g h i j k l m n o p q r s t u v w x y z"
local index="0"
for character in $alphabet; do
index=$((++index))
echo "$index. $character"
# Possibility to do some more stuff
done
Hope it helps

Split up line with arbitrary many groups

I have many files with many entries (one entry per line) which I have to filter through a sequence of greps and seds. The lines are of the form
a
x, y
u --> v, w
s --> p, q, r
One the steps is splitting up the lines containing --> such that the left-hand side and each of the comma-separated entries on the right side (of which there can be arbitrary many) end up on different lines. I.e., the above lines should become:
a
x, y
u
v
w
s
p
q
r
Separating the left side from the right side is quickly done:
echo "u --> v, w" | sed 's/\(.\+\)\s*\-\->\s*\(.\+\)/\1\n\2/'
Gives me
u
v, w
But this seems to be a dead end in that I cannot then pipe this on to splitting on the comma, since that would also split the x, y.
So, I am wondering if there is a way to completely split up such lines in a sed command, or do I have to turn to, e.g., awk (or just go to Python)? It would be preferable to keep this a bash pipe sequence.
awk '/-->/ {gsub(/-->|,/,RS)}1' inputfile|column -t
a
x, y
u
v
w
s
p
q
r
OR as Anubhav suggested to avoid pipe:
awk '/-->/ {gsub(/[ \t]*(-->|,)[ \t]*/ , ORS)} 1' inputfile
Using awk you can do this:
awk -F'[ \t]*-->[ \t]*' -v OFS='\n' '{gsub(/,[ \t]*/, OFS, $2)} 1' file
a
x, y
u
v
w
s
p
q
r
You can do this by creating a command group when you match -->. In this group, you replace --> with newline, print up to the newline, discard the portion you printed, then replace commas in the remainder:
#!/bin/sed -f
/\s*-->\s*/{
s//\n/
P
s/.*\n//
s/,\s*/\n/g
}
Results:
a
x, y
u
v
w
s
p
q
r
Alternatively, in GNU sed, you could use the T command to skip processing of the right-hand side unless you match and replace the -->:
#!/bin/sed -f
s/\s*-->\s*/\n/
Tend
P
s/.*\n//
s/,\s*/\n/g
:end
This produces the same output, as required.
I've assumed throughout that you don't want to split any commas on the left-hand side, so that
foo, bar --> baz
becomes
foo, bar
baz
If that's not the case (perhaps if you know there will be no comma to the left of -->), then you don't need P or s/.*\n//, and the script is as simple as
/\s*-->\s*/!n
s//\n/
s/,\s*/\n/g

I would like to sort rows of a data file by NF increasing

I would like to sort rows of a data file by NF increasing.
input
z a b c d k l p m
m x y h j i
y w
g t y u
output
y w
g t y u
m x y h j i
z a b c d k l p m
I had tried sort command, but it no works.
How to?
Thanks for help.
Typically you solve these types of problems by modifying the input stream to add some data, operating on that data, and then removing it. In this case, we want to add the field count to the input stream, sort (numerically) on the field count, and then remove it (using a space as the field delimiter):
awk '{ print NF, $0 }' | sort -n | cut -d' ' -f2-
You can either pipe your data to awk or pass the filename as another argument to awk.

How to delete the last space at the end of each line in a text file? (shell scripts)

I have a file like this:
z E l f
A l t E^ t
d Y s
m u s t
z E l f s
x # w e s t
s t e t s
h E p
w i
t E n
o G #
o G # n
m I s x i n
s t O n t
and I need to remove a space at the end of each line.
How can I do it? Thank you in advance.
I assume you want to delete trailing spaces and tabs from the end of each line.
awk '{ sub(/[ \t]+$/, ""); print }' file

how do I select words with a particular structure from a file in a shell script?

My file has this structure:
HELLO h e l lo
HELLO(2) h ee l lo
HELLOMUM h e l ll m um
HELLO-FATHER h e llo fa th er
HOME h o m e
HONEY h o ne y
HONEY(2) ho nei y
HONEY(3) h o ney
HONEYMOON ho ney m o o n
HONEY-MOON h o ne y moo n
I would like, while looking for the word HELLO, for example, to extract only the lines relatives to the words HELLO and HELLO(2). Same thing if I am looking for the word HONEY, I would like to have only the lines relative to HONEY, HONEY(2) and HONEY(3).
I am using
grep -w "HELLO" file.txt
but this gives me back for HELLO, for example, also the lines relative to HELLOMUM and HELLO-FATHER.
Thank you in advance.
I guess this should work for you.
grep '^HELLO[^a-zA-Z]' your_file
tested below:
> grep '^HELLO[^a-zA-Z\-]' temp
HELLO h e l lo
HELLO(2) h ee l lo
> grep '^HONEY[^a-zA-Z\-]' temp
HONEY h o ne y
HONEY(2) ho nei y
HONEY(3) h o ney
>
take your pasted example as file, the following grep line works:
grep 'HELLO[( ]' file
grep 'HONEY[( ]' file

Resources