How to get the desire ouput using bash script? - bash

I am trying to get this ouput, i don't know how to get it i search through the internet but i didn't know what will be the exact keyword for searching, so i post it here my question
i have a csv file data.csv which it contents are shown below
I have tried so far is shown my MWE
cat data.csv|sed 's/\n.*//g'
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,
line 5 text
10,1,6,"<J>
line 6 text"
10,1,7,"line 7 text"
10,1,8,"
line 8 text"
10,1,9,"line 9 text"
I want the ouput as shown below
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

With GNU awk for mult-char RS, RT, and gensub() you can just describe each record as a series of 4 comma-separated fields ending in newline and then remove the newlines and spaces around them:
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,line 5 text
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
and to ensure quotes around the last field:
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
Note that this will work no matter how many lines your 4th field is split over:
$ cat file
10,1,1,"line 1 text"
10,1,2,
foo
line
2
text
bar
10,1,3,"line 3 text"
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"fooline2textbar"
10,1,3,"line 3 text"

In addition to Cyrus's answer, to ensure 'line 5 text' is surrounded with double-quotes you can add additional expressions to replace the ', ' with ',"' and lines that do not end in '"' with a '"', e.g.
sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
The first expression is exactly the same. This would provide your requested output of:
$ sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

With GNU sed:
sed '/".*"$/!{N;s/\n *//}' file
If a line does not match regex ".*"$ append next line (N) to sed's pattern space and replace newline followed by none, one or more white spaces with nothing (s/\n *//).
Output:
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5, line 5 text
10,1,6,"line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
I did not add the missing quotation marks in line 5.
See: man sed and The Stack Overflow Regular Expressions FAQ

Related

Cannot print in awk command in bash script

I am trying to read values from a file and print specific items into a variable which I will use later.
cat /dir1/file1 | while read blmbline2
do
BLMBFILE2=`print $blmbline2 | awk '{$1=""; print $0}'`
echo $BLMBFILE2
done
When I run that same code at the command line, it runs as expected, but, when I run it in a bash script called testme.sh, I get this error:
./testme.sh: line 3: print: command not found
If I run print by itself at the command prompt, I don't get an error (just a blank line).
If I run "bash" and then print at the command prompt, I get command not found.
I can't figure out what I'm doing wrong. Can someone suggest?
updated: I see some other posts that say to use echo or printf? Is there a difference I need to be concerned with in using one of those in bash?
Since awk can read files, you may be able to do away with the cat | while read and just use awk. Using a sample file containing:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
Declare your bash array variable and populate with the output from awk:
arr=() ; arr=($(awk '{$1=""; print $0}' /dir1/file1))
Use the following to display array size and contents:
printf "array length: %d\narray contents: %s\n" "${#arr[#]}" "${arr[*]}"
Output:
array length: 30
array contents: 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6
Change print to echo in your shell script. With printf you can format the data and with echo it will print the entire line of the file. Also, create an array so you can store multiple items:
BLMBFILE2=()
while IFS= read -r -d $'\0'
do
BLMBFILE2+=(`echo $REPLY | awk '{$1=""; print $0}'`)
echo $BLMBFILE2
done < <(cat /dir1/file1)
echo "Items found:"
for value in "${BLMBFILE2[#]}"
do
echo $value
done

How to use sed and cut to find and replace value at certain position of line in a file

I have a case where I have to replace the number 1 with number 3 at 10th location of various lines in a stored text file. I am unable to find a way to do that. Below is sample file and code.
Sample file:
$ cat testdata.txt
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 1 1 John Carter 19880712
#!/bin/sh
filename=testdata.txt
echo "reading number of line"
nol=$(cat $filename | wc -l)
flag[$nol]=''
echo "reading content of file"
for i in (1..$nol)
do
flag=($cut -c10-11 $filename)
if($flag==1)
sed 's/1/3/2'
fi
done
But this is not working.
Please help to resolve this.
Updated:
Sample Output:
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712
try this
sed "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt > new_testdata.txt
If sed supports the option -i you can also edit inplace.
sed -i "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt
output
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712
explanation
s # substitute
/^\( # from start of line, save into arg1
.\{8\} # the first 8 characters
\) 1 \( # search pattern ' 1 '
.* # save the rest into arg2
\)$/ # to the end of the line
\1 3 \2 # output: arg1 3 arg2
/g # global on whole line

how to search and replace content of a file ,starting from a specific line number in bash

I have the below file .
$ cat testfile
line 1
line 2
line 3
line 4
line 5
line 6
$
I need to search and replace all the strings 'line' with 'LINE' from the line number 2 till the end . I tried like below
$sed '2 s/line/LINE/g' testfile
line 1
LINE 2
line 3
line 4
line 5
line 6
$
But my required output is :
line 1
LINE 2
LINE 3
LINE 4
LINE 5
LINE 6
$
How can I achieve it with sed command alone .
Try this:
# sed '2,$ s/line/LINE/g' /tmp/testfile
line 1
LINE 2
LINE 3
LINE 4
LINE 5
LINE 6
2,$ denotes from second line to end.
You are looking for:
sed '2,$s/line/LINE/g' file
I suggest you reading the man/info page of sed, the "address" part.
see: https://www.gnu.org/software/sed/manual/html_node/Addresses.html
sed '2,$ s/line/LINE/' testfile
If you want to change the file in-place (using -i parameter with sed) I recommend ed instead of sed.
echo -e '2,$ s/line/LINE/\nw' | ed -sv testfile

How to delete leading newline in a string in bash?

I'm having the following issue. I have an array of numbers:
text="\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0"
And I'd like to delete the leading newline.
I've tried
sed 's/.//' <<< "$text"
cut -c 1- <<< "$text"
and some iterations. But the issue is that both of those delete the first character AFTER EVERY newline. Resulting in this:
text="\n\t2\t3\t4\t5\n\t7\t8\t9\t0"
This is not what I want and there doesn't seem to be an answer to this case.
Is there a way to tell either of those commands to treat newlines like characters and the entire string as one entity?
awk to the rescue!
awk 'NR>1'
of course you can do the same with tail -n +2 or sed 1d as well.
You can probably use the substitution modifier (see parameter expansion and ANSI C quoting in the Bash manual):
$ text=$'\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0'
$ echo "$text"
1 2 3 4 5
6 7 8 9 0
$ echo "${text/$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
It replaces the first newline with nothing, as requested. However, note that it is not anchored to the first character:
$ alt="${text/$'\n'/}"
$ echo "${alt/$'\n'/}"
1 2 3 4 56 7 8 9 0
$
Using a caret ^ before the newline doesn't help — it just means there's no match.
As pointed out by rici in the comments, if you read the manual page I referenced, you can find how to anchor the pattern at the start with a # prefix:
$ echo "${text/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$ echo "${alt/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
The notation bears no obvious resemblance to other regex systems; you just have to know it.

Randomly sample lines retaining commented header lines

I'm attempting to randomly sample lines from a (large) file, while always retaining a set of "header lines". Header lines are always at the top of the file and unlike any other lines, begin with a #.
The actual file format I'm dealing with is a VCF, but I've kept the question general
Requirements:
Output all header lines (identified by a # at line start)
The command / script should (have the option to) read from STDIN
The command / script should output to STDOUT
For example, consider the following sample file (file.in):
#blah de blah
1
2
3
4
5
6
7
8
9
10
An example output (file.out) would be:
#blah de blah
10
2
5
3
4
I have a working solution (in this case selecting 5 non-header lines at random) using bash. It is capable of reading from STDIN (I can cat the contents of file.in into the rest of the command) however it writes to a named file rather than STDOUT:
cat file.in | tee >(awk '$1 =~ /^#/' > file.out) | awk '$1 !~ /^#/' | shuf -n 5 >> file.out
By using process substitution (thanks Tom Fenech), both commands are seen as files.
Then using cat we can concatenate these "files" together and output to STDOUT.
cat <(awk '/^#/' file) <(awk '!/^#/' file | shuf -n 10)
Input
#blah de blah
1
2
3
4
5
6
7
8
9
10
Output
#blah de blah
1
9
8
4
7
2
3
10
6
5

Resources