modify distribution of data inside a file - bash

I need help with bash in order to modify a file.txt. I have names, each name in a line
for example
Peter
John
Markus
and I need them in the same row and with " before and at the end of each element of the vector.
"Peter" "John" "Markus"
Well, I can insert " when I have all elements in a row but I don't know how to modify the shape...all lines in a row.
array=( Peter John Markus )
number=${#array[#]}
for ((i=0;i<number;i++)); do
array[i]="\"${array[i]}"\"
echo "${array[i]}"
done

With awk
$ awk '{printf "\""$0"\" "} END{print""}' file
"Peter" "John" "Markus"
How it works:
printf "\""$0"\" "
With every new line of input, $0, this prints out a quote, the line itself, a quote and a space.
END{print""}
(optional) After we have read the last line of the file, this prints out a newline.
With sed and tr
$ sed 's/.*/"&"/' file | tr '\n' ' '
"Peter" "John" "Markus"
How it works:
s/.*/"&"/
This puts a quote before and after every line
tr '\n' ' '
This replaces newline characters with spaces so that all names appear on the same line.
With sed alone
$ sed ':a;$!{N;ba};s/^/"/; s/$/"/; s/\n/" "/g' file
"Peter" "John" "Markus"
How it works:
:a;$!{N;ba}
This reads the whole file in to the pattern space.
s/^/"/
This adds a quote at the beginning of the file
s/$/"/
This adds a quote to the end of the file.
s/\n/" "/g
This replaces every newline with the three characters: quote-space-quote.
With bash
To make the bash script in the question print on one line, one can use echo -n in place of echo. In other words, replace:
echo "${array[i]}"
With:
echo -n "${array[i]} "
Quoting all words on one line
From the comments, suppose that our file has all the names on one line and we want to quote each individually. Use:
$ cat file2
Peter John Markus
$ sed -r 's/[[:alnum:]]+/"&"/g' file2
"Peter" "John" "Markus"
The above is for GNU sed. On OSX or other BSD system, try:
sed -E 's/[[:alnum:]]+/"&"/g' file2

Perl to the rescue:
perl -pe 'chomp; $_ = qq("$_" );chop if eof' < input
Explanation:
-p reads the input line by line and prints what's in $_
chomp removes a newline
$_ = qq("$_" ) puts a " before and "<Space> after the string.
chop if eof removes the trailing space.

Related

Bash; Replacing new line with ", " and ending with ".", can someone explain awk and sed, please?

so let's say i have
aaa
bbb
ccc
ddd
and i need to replace all new lines by comma and space, and end with a dot like so
aaa, bbb, ccc, ddd.
I found this, but i can't understand it at all and i need to know what every single character does ;
... | awk '{ printf $0", " }' | sed 's/.\{2\}$/./'
Can someone make those two commands human-readable ?
tysm!
About the command:
... | awk '{ printf $0", " }' | sed 's/.\{2\}$/./'
Awk prints the line $0 followed by , without newlines. When this is done, you have , trailing at the end.
Then the pipe to sed replaces the last , with a single dot as this part .\{2\}$ matches 2 times any character at the end of the string.
With sed using a single command, you can read all lines using N to pull the next line in the pattern space, and use a label to keep on replacing a newline as long as it is not the last line last line.
After that you can append a dot to the end.
sed ':a;N;$!ba;s/\n/, /g;s/$/./' file
Output
aaa, bbb, ccc, ddd.
ok,
first of all; thank u.
I do now understand that printf $0", " means 'print every line, and ", " at the end of each'
as for the sed command, a colleague explained it to me a minute ago;
in 's/.\{2\}$/./',
s/ replace
. any character
{2} x2, so two characters
$ at end of the line
/ by ( 's/ / /' = replace/ this / that /)
. the character '.'
/ end
without forgetting to escape { and }, so u end up with
's/ . \{2\} $ / . /'
but wait, it gets even better;
my colleague also told me that \{2\} wasn't necessary in this case ;
.{2} (without the escapes) could simply be replaced by
.. 'any character' twice.
so 's/..$/./' wich is way more readable i think
'replace/ wich ever two last characters / this character/'
hope this helps if any other 42 student gets here
tism again
awk '{ printf $0", " }'
This is awk command with single action, encased in {...}, this action is applied to every line of input.
Printf is print with format, here no formatting takes places but another feature of printf is leveraged - printf does not attach output row separator (default: newline) as opposed to print.
$0 denotes whole current line (sans trailing newline).
", " is string literal for comma followed by space.
$0", " instructs awk to concatenate current line with comma and space.
Whole command thus might be read as: for every line output current line followed by comma and space
sed 's/.\{2\}$/./'
s is one of commands, namely substitute, basic form is
s/regexp/replacement/
therefore
.\{2\}$ is regular expression, . denotes any characters, \{2\} repeated 2 times, $ denotes end of line, thus this one matches 2 last characters of each line, as text was already converted to single line, it will match 2 last characters of whole text.
. is replacement, literal dot character
Whole command thus might be read as: for every line replace 2 last characters using dot
Assuming the four lines are in a file...
#!/bin/sh
cat << EOF >> ed1
%s/$/,/g
$
s/,/./
wq
EOF
ed -s file < ed1
cat file | tr '\n' ' ' > f2
mv f2 file
rm -v ./ed1
echo 'aaa
bbb
ccc
ddd' |
mawk NF+=RS FS='\412' RS= OFS='\40\454' ORS='\456\12'
aaa, bbb, ccc, ddd.

getting first part of a string that has two parts

I have a string that has two parts (path and owner) both separated by a space.
This is the input file input.txt
/dir1/dir2/file1 #owner1
/dir1/dir2/foo\ bar #owner2
I want to extract all the paths to a separate output file - output.txt
I cannot use space as delimiter since paths can also have filenames with space and delimiter in them
/dir1/dir2/file1
/dir1/dir2/foo\ bar
Here could be a different way of doing it with rev + GNU grep:
rev file | grep -oP '.*# \K.*' | rev
OR
rev file | grep -oP '.*#\s+\K.*' | rev
With original simple solution go with:
awk -F' #' '{print $1}' Input_file
Assuming spaces that shouldn't be parsed as delimiters are escaped by a backslash as in your sample, you could use the following regex :
^(\\ |[^ ])*
For instance with grep :
grep -oE '^(\\ |[^ ])*'
The regex matches from the start of the line any number of either a backslash followed by a space or any other character than a space and will stop at the first occurence of a space that isn't preceded by a backslash.
You can try it here.
I would trim the ending part with sed.
sed 's/ [^ ]*$//' /path/to/file
This will match from the end of the line:
(blank) matches the space character
[^ ]* matches the longest string that contains no spaces, i.e. #owner1
$ matches the end of the line
And they will be replaced by nothing, which will act as if you deleted the matched string.
A one-line would do it:
while read p _; do printf '%q\n' "$p"; done <input.txt >output.txt
You can put them in an array and process
mapfile test < input.txt; test=("${test[#]% *}")
echo "${test[#]}"
echo "${test[0]}"
echo "${test[1]}"
You can try with simple awk
awk ' { $NF=""; print } '
Try it here https://ideone.com/W8J1ZO

Converting a list into a single line string in bash [duplicate]

This question already has answers here:
How to join multiple lines of filenames into one with custom delimiter
(22 answers)
Closed 6 years ago.
I have a file that contains a list:
line1
line2
line3
.
.
.
I want to join everything in a string in bash like so:
"line1", "line2", "line3", ...... (no comma at the end)
How do I do that?
Edit: This is not a duplicate of How to join multiple lines of file names into one with custom delimiter? because I am trying to get every item in double quotes while joining.
For an input file like
line1
line2
line3
I would use sed and tr as follows:
$ sed 's/.*/"&"/;$!s/$/, /' infile | tr -d '\n'
"line1", "line2", "line3"
The first sed command quotes every line; the second one adds , at the end of every line except the last (the $! address: "not the last line").
tr then removes all linebreaks.
If you want to do it completely in Bash:
#!/usr/bin/env bash
mapfile -t arr < infile # Read file into array
arr=("${arr[#]/#/\"}") # Prepend " to each element
arr=("${arr[#]/%/\"}") # Append " to each element
IFS=, # Set separator to comma
str="${arr[*]}" # Comma separated string
printf '%s\n' "${str//\",\"/\", \"}" # Insert space after commas
Using awk:
$ awk '{ printf "%s\"%s\"", (NR==1?"":", "), $0 } END{ print "" }' file
"line1", "line2", "line3"
Lets say that you file name is test then this will do the trick
while IFS= read -r i; do echo " \"$i\""; done < test | paste -sd "," |sed 's/ //1'
Building a general solution to most problems of this kind.
Using a general start, middle and end string
bash A solution with only bash:
#!/bin/bash
infile="infile"
start='"'
middle='", "'
end='"'
res="$start" # start the result with str "$start".
while IFS=$'\n' read -r line # for each line in the file.
do res="${res}${line}${middle}" # add the line and the middle str.
done <"$infile" # for file "$infile"
res="${res%"$middle"}${end}" # remove "$middle" on the last line.
printf '%s\n' "${res}" # print result.
awk
And a solution for larger files with awk:
#!/bin/bash
infile="infile"
start='"'
middle='", "'
end='"'
awk -vs="$start" -vm="$middle" -ve="$end" '
BEGIN{printf("%s",s)}
{
if(ll){printf("%s%s",ll,m)}
ll=$0
}
END{printf("%s%s%s",ll,e,"\n")}
' "$infile"

Replacing/removing excess white space between columns in a file

I am trying to parse a file with similar contents:
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
I want the out file to be tab delimited:
I am a string\t12831928
I am another string\t41327318
A set of strings\t39842938
Another string\t3242342
I have tried the following:
sed 's/\s+/\t/g' filename > outfile
I have also tried cut, and awk.
Just use awk:
$ awk -F' +' -v OFS='\t' '{sub(/ +$/,""); $1=$1}1' file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Breakdown:
-F' +' # tell awk that input fields (FS) are separated by 2 or more blanks
-v OFS='\t' # tell awk that output fields are separated by tabs
'{sub(/ +$/,""); # remove all trailing blank spaces from the current record (line)
$1=$1} # recompile the current record (line) replacing FSs by OFSs
1' # idiomatic: any true condition invokes the default action of "print"
I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
The difficulty comes in the varying number of words per-line. While you can handle this with awk, a simple script reading each word in a line into an array and then tab-delimiting the last word in each line will work as well:
#!/bin/bash
fn="${1:-/dev/stdin}"
while read -r line || test -n "$line"; do
arr=( $(echo "$line") )
nword=${#arr[#]}
for ((i = 0; i < nword - 1; i++)); do
test "$i" -eq '0' && word="${arr[i]}" || word=" ${arr[i]}"
printf "%s" "$word"
done
printf "\t%s\n" "${arr[i]}"
done < "$fn"
Example Use/Output
(using your input file)
$ bash rfmttab.sh < dat/tabfile.txt
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Each number is tab-delimited from the rest of the string. Look it over and let me know if you have any questions.
sed -E 's/[ ][ ]+/\\t/g' filename > outfile
NOTE: the [ ] is openBracket Space closeBracket
-E for extended regular expression support.
The double brackets [ ][ ]+ is to only substitute tabs for more than 1 consecutive space.
Tested on MacOS and Ubuntu versions of sed.
Your input has spaces at the end of each line, which makes things a little more difficult than without. This sed command would replace the spaces before that last column with a tab:
$ sed 's/[[:blank:]]*\([^[:blank:]]*[[:blank:]]*\)$/\t\1/' infile | cat -A
I am a string^I12831928 $
I am another string^I41327318 $
A set of strings^I39842938 $
Another string^I3242342 $
This matches – anchored at the end of the line – blanks, non-blanks and again blanks, zero or more of each. The last column and the optional blanks after it are captured.
The blanks before the last column are then replaced by a single tab, and the rest stays the same – see output piped to cat -A to show explicit line endings and ^I for tab characters.
If there are no blanks at the end of each line, this simplifies to
sed 's/[[:blank:]]*\([^[:blank:]]*\)$/\t\1/' infile
Notice that some seds, notably BSD sed as found in MacOS, can't use \t for tab in a substitution. In that case, you have to use either '$'\t'' or '"$(printf '\t')"' instead.
another approach, with gnu sed and rev
$ rev file | sed -r 's/ +/\t/1' | rev
You have trailing spaces on each line. So you can do two sed expressions in one go like so:
$ sed -E -e 's/ +$//' -e $'s/ +/\t/' /tmp/file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Note the $'s/ +/\t/': This tells bash to replace \t with an actual tab character prior to invoking sed.
To show that these deletions and \t insertions are in the right place you can do:
$ sed -E -e 's/ +$/X/' -e $'s/ +/Y/' /tmp/file
I am a stringY12831928X
I am another stringY41327318X
A set of stringsY39842938X
Another stringY3242342X
Simple and without invisible semantic characters in the code:
perl -lpe 's/\s+$//; s/\s\s+/\t/' filename
Explanation:
Options:
-l: remove LF during processing (in this case)
-p: loop over records (like awk) and print
-e: code follows
Code:
remove trailing whitespace
change two or more whitespace to tab
Tested on OP data. The trailing spaces are removed for consistency.

Find and Replace with awk

I have this value, cutted from .txt:
,Request Id,dummy1,dummy2,dummyN
I am trying to find and replace the space with "_", like this:
#iterator to read lines of txt
#if conditions
trim_line=$(echo "$user" | awk '{gsub(" ", "_", $0); print}')
echo $trim_line
but the echo is showing:
Id,dummy1,dummy2,dummyN
Expected output:
,Request_Id,dummy1,dummy2,dummyN
Where is my bug?
EDIT:
The echo of user is not the expected, it is:
Id,dummy1,dummy2,dummyN
And should be:
,Request Id,dummy1,dummy2,dummyN
To do this operation I am using:
for user in $(cut -d: -f1 $FILENAME)
do (....) find/replace
You can try bash search and replace substring :
echo $user
,Request Id,dummy1,dummy2,dummyN
echo ${user// /_} ## For all the spaces
,Request_Id,dummy1,dummy2,dummyN
echo ${user/ /_} ## For first match
This will replace all the blank spaces with _. Note that here two / are used after user. This is to do the search and replace operation on whole text. If you put only one / then search and replace would be done over first match.
Your problem is your use of a for loop to read the contents of your file. The shell splits the output of your command substitution $(cut -d: -f1 $FILENAME) on white space and you have one in the middle of your line, so it breaks.
Use a while read loop to read the file line by line:
while IFS=: read -r col junk; do
col=${col// /_}
# use $col here
done < "$FILENAME"
As others have mentioned, there's no need to use an external tool to make the substitution.
...That said, if you don't plan on doing something different (e.g. executing other commands) with each line, then the best option is to use awk:
awk -F: '{ gsub(/ /, "_", $1); print $1 }' "$FILENAME"
The output of this command is the first column of your input file, with the substitution made.
If your data is already in an environment variable, the fastest way is to directly use built-in bash replacement feature:
echo "${user// /_/}"
With awk, set the separator as , or the space character will be interpreted as the separator.
echo ",Request Id,dummy1,dummy2,dummyN" | awk -F, '{gsub(" ", "_", $0); print}'
,Request_Id,dummy1,dummy2,dummyN
note: if it's just to replace a character in a raw string (no tokens, no fields), bash, sed and tr are best suited.

Resources