bash read split file after string - bash

I am tring to create one shell script to split text files after one specific STRING.
Line of text
Line of text
STRING
Line of text
Line of text
I pretend to have 2 files, one from begining to STRING and other with STRING to end contents.
Thanks for any help

With sed:
sed -n '1,/STRING/p' inputfile > file1
sed -n '/STRING/,$p' inputfile > file2
With awk:
awk '/STRING/{flag=1;print>"file1"}
flag {print>"file2";next}
{print>"file1"}
' inputfile
If you need the line to contain the exact word STRING and nothing more, then just substitute STRING for ^STRING$ in the scripts above.
If you don't want STRING to be present in first file,
awk '/STRING/{flag=1}
flag {print>"file2";next}
{print>"file1"}
' inputfile

Related

How to add new line in file in bash?

my input file contains
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
now i want to add new line <arg>arg5</arg>.
I used below command
awk '{gsub("<arg>arg4</arg>", "<arg>arg4</arg>\n<arg>arg5</arg>", $0); print}' inputfile > tempfile
But its not working at all. Its also not giving any errors.
Please help me out here.
You can use a simple string comparison to avoid escaping of special characters like $, ( and ) in regular expressions:
awk '1
$0 == "<arg>arg4</arg>"{
print "<arg>arg5</arg>"
}
' inputfile > tempfile
The first 1 prints the current line and if the current line is <arg>arg4</arg>, print
<arg>arg5</arg>.
If the search string is only part of the line (padded by whitespace for example), you could use index to get the position of the search string
and insert the new string after it:
# define two shell variables
search='<arg>arg4</arg>'
insert='<arg>arg5</arg>'
awk -v search="$search" -v insert="$insert" '
{
idx=index($0, search)
if (idx){
print substr($0, 1, idx+length(search)-1) ORS insert substr($0, idx+length(search))
next
}
}1' inputfile > tempfile
The long print statement prints the following parts
the string before the search string + the search string itself
a newline
the insert string
the string after the search string (possibly empty)
One way using sed:
File1:
$ cat file1
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
File2:
$ cat file2
<arg>arg5</arg>
sed command:
$ sed -i '$r file2' file1
Check file1:
$ cat file1
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
<arg>arg5</arg>
Using sed, we can simply read the contents of another file into current file.
$r file2- read(r) when the last line($) is read. -i to edit the file in-place.

awk: copy from A to B and output..?

my file is bookmarks, backup-6.session
inside file is long long letters, i need copy all url (many) see here example inside
......"charset":"UTF-8","ID":3602197775,"docshellID":0,"originalURI":"https://www.youtube.com/watch?v=axxxxxxxxsxsx","docIdentifier":470,"structuredCloneState":"AAAAA.....
result to output text.txt
https://www.youtube.com/watch?v=axxxxxxxxsxsx
https://www.youtube.com/watch?v=bxxxxxxxxsxsx
https://www.youtube.com/watch?v=cxxxxxxxxsxsx
https://www.youtube.com/watch?v=dxxxxxxxxsxsx
....
....
there are start before than A "originalURI":" to end "
comand to be: AWK, SED.. (i dont know what is best command for me)
thank you
With GNU awk for multi-char RS and RT:
$ awk -v RS='"originalURI":"[^"]+' 'sub(/.*"/,"",RT){print RT}' file
https://www.youtube.com/watch?v=axxxxxxxxsxsx
You could also use grep, for example:
grep -oh "https://www\.youtube\.com/watch?v=[A-Za-z0-9]*" backup-6.session > text.txt
That is if the axxxxxxxxsxsx part contains only letters from A-Z, a-z or digits 0-9, and is not followed by any of those.
Notice the flags for grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
The awk solution would be as follows:
awk -F, '{ for (i=1;i<=NF;i++) { if ( $i ~ "originalURI") { spit($i,add,":");print gensub("\"","","g",add[2])":"gensub("\"","","g",add[3])} } }' filename
We loop through each field separated by "," and then pattern match against "originalURI" Then we split this string using ":" and the function split and remove the quotation marks with the function gensub.
The sed solution would be as follows:
sed -rn 's/^.*originalURI":"(.*)","docIdentifier.*$/\1/p' filename
Run sed with extended regular expression (-r) and suppress the output (-n) Substitute the string with the regular expression enclosed in brackets (/1) printing the result.

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

sed/awk - Put all text on the same line as a preceding number

How can I get all text that proceeds 'number:number' onto the same line as the preceding 'number:number'?
10:15
text line one
text line two
text no pattern
11:12
random text
text is random
totally random
could be four lines
could be five
Should then become
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
This works for your example-
tr '\n' ' ' < file.txt | sed 's/[0-9]*:[0-9]*/\n&/g'
Explanation-
tr will initially put everything on the same line.
Then that sed one liner will insert new lines before each num:num pattern.
Given that input file all you need is to tell awk to read a blank-line-separated paragraph at a time using RS=<null> and recompile each record using the default OFS value of a blank char
$ awk -v RS= '{$1=$1}1' file
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
Both sed and awk solutions join lines till a new record is detected or input is done in which case the joined lines are printed and cleared - use either solution
the sed oneliner
sed -nr '/^[0-9]{2}:[0-9]{2}$/!{H;$!b}; x; s/\n/ /gp'
the awk script
awk '
!/^[0-9]{2}:[0-9]{2}$/ {
lines=lines" "$0
next
}
{if(lines) print lines; lines=$0}
END {print lines}
'
Here is an GNU AWK script:
script.awk
BEGIN { RS = "\n[0-9]+:[0-9]+|\n$" }
{ gsub(/\n/,"",$0)
printf( "%s%s ", $0,RT) }
Use it like this awk -f script.awk file.txt
It uses the GNU AWK specific extensions RT and regex RS:
the record separator is set to "colon separated number pairs".
to get the final newline at the end of the file the "|\n$" is added to match the last newline in the file.
In order to start separation at the second pair: the "\n" is added in front. Thus the first colon separated number pair "10:15" is included in the first $0 and not in RT.
The trick here is that you want to split the file on paragraphs instead of lines. In awk, if you set RS="" it enables paragraph mode. Each iteration of the awk loop will have a paragraph in $0. You can then substitute the newlines and turn them into spaces.
awk <data.txt 'BEGIN { RS = "" ; FS = "\n" } { gsub(/\n/, " ", $0) ; print }'
Output:
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
The benefit of this is that awk handles all the special cases for you: files that end in a blank line, end without a blank line, end without a newline, etc.

output csv with lines that contains only one column

with input csv file
sid,storeNo,latitude,longitude
2,1,-28.03720000,153.42921670
9
I wish to output only the lines with one column, in this example it's line 3.
how can this be done in bash shell script?
Using awk
The following awk would be usfull
$ awk -F, 'NF==1' inputFile
9
What it does?
-F, sets the field separator as ,
NF==1 matches lines with NF, number of fields as 1. No action is provided hence default action, printing the entire record is taken. it is similar to NF==1{print $0}
inputFile input csv file to the awk script
Using grep
The same function can also be done using grep
$ grep -v ',' inputFile
9
-v option prints lines that do not match the pattern
, along with -v greps matches lines that do not contain , field separator
Using sed
$ sed -n '/^[^,]*$/p' inputFile
9
what it does?
-n suppresses normal printing of pattern space
'/^[^,]*$/ selects lines that match the pattern, lines without any ,
^ anchors the regex at the start of the string
[^,]* matches anything other than ,
$ anchors string at the end of string
p action p makes sed to print the current pattern space, that is pattern space matching the input
try this bash script
#!/bin/bash
while read -r line
do
IFS=","
set -- $line
case ${#} in
1) echo $line;;
*) continue;;
esac
done < file

Resources