Trouble using csplit to split a text file based on delimiter

Trouble using csplit to split a text file based on delimiter - terminal

I have a text file containing multiple chunks of texts. In between each chunk, I've inserted the string '######## START OF NEW ARTICLE ########'. I'm trying to split the file at each of those instances using csplit in order to get separate files, each containing a unique chunk of text.
This is what I'm running through the terminal:
csplit -f check -n 3 desktop/TEST/test.txt ‘######## START OF NEW ARTICLE ########’ {*}
I get the error: csplit: ‘########: unrecognised pattern
What am I doing wrong?

You need to enclose the regex pattern in // e.g. ‘/######## START OF NEW ARTICLE ########/’.

Related

Adding quotes to variating characters in bash

I am trying to use the sed function in order to add double quotes for anything in between a matched pattern and a comma to break of the pattern. At the moment I am extracting the following data from cloudflare and I am trying to modify it to line protocol;
count=24043,clientIP=x.x.x.x,clientRequestPath=/abc/abs/abc.php
count=3935,clientIP=y.y.y.y,clientRequestPath=/abc/abc/abc/abc.html
count=3698,clientIP=z.z.z.z,clientRequestPath=/abc/abc/abc/abc.html
I have already converted to this format from JSON output with a bunch of sed functions to modify it, however, I am unable to get to the bottom of it to put the data for clientIP and clientRequestPath in inverted commas.
My expected output has to be;
count=24043,clientIP="x.x.x.x",clientRequestPath="/abc/abs/abc.php"
count=3935,clientIP="y.y.y.y",clientRequestPath="/abc/abc/abc/abc.html"
count=3698,clientIP="z.z.z.z",clientRequestPath="/abc/abc/abc/abc.html"
This data will be imported into InfluxDB, count will be a float whilst clientIP and clientRequestPath will be strings, hence why I need them to be in inverted commas as at the moment I am getting errors since they arent as they should be.
Is anyone available to provided to adequate 'sed' function to do is?

This might work for you (GNU sed):
sed -E 's/=([^0-9][^,]*)/="\1"/g' file
Enclose any string following a = does not begin with a integer upto a , in double quotes, globally.

here is a solution using a SED script to allow for multiple operations on a source file.
assuming your source data is in a file "from.dat"
create a sed script to run multiple commands
cat script.sed
s/clientIP=/clientIP=\"/
s/,clientRequestPath/\",clientRequestPath/
execute multiple-command sed script on data file redirecting the output file "to.dat"
sed -f script.sed from.dat > to.dat
cat to.dat (only showing one line)
count=24043,clientIP="x.x.x.x",clientRequestPath=/abc/abs/abc.php

How can I split content of a file to two different files using shell script

I want to split a file into two parts
I want to split a single file into two different files using shell script.

You can use linux split command, either by lines split -l<num_of_line> <file_name> or by size split -b<size><K|M|G> <file_name>.
For example: split -l100 a.txt will split each 100 lines into separate files.
Here is a link you can see more examples and all details.

Bash Script To Convert Text File of One Line to Multiple Lines

I am currently trying to create a bash script that will change a text document of one line into multiple lines.
Example:
TextFile: Header~someHeaderInfo|Object~someObjectInfo|SubObject~someSubObjectInfo|Object~someObjectInfo|SubObject~someSubObjectInfo|...|Tail~someInfo
Again, this above is only a single line.
This should be called through a bash script and be converted into:
Header~someHeaderInfo
Object~someObjectInfo|SubObject~someSubObjectInfo
Object~someObjectInfo|SubObject~someSubObjectInfo
...
Tail~someInfo
In the real use case, each Object has upwards of 20 subObjects, each of which may have more subObjects themselves.
How can I go about this separation?

if textfile contains:
Header~someHeaderInfo|Object~someObjectInfo|SubObject~someSubObjectInfo|Object~someObjectInfo|SubObject~someSubObjectInfo|...|Tail~someInfo
The following bash command:
sed "s/|/\n/g" textfile
Will produce the following output:
Header~someHeaderInfo
Object~someObjectInfo
SubObject~someSubObjectInfo
Object~someObjectInfo
SubObject~someSubObjectInfo
...
Tail~someInfo
But the OP wants the SubObject on the same line (see comments), so I suggest:
sed "s/|\([^S]\)/\n\1/g" textfile
That will produce the following output:
Header~someHeaderInfo
Object~someObjectInfo|SubObject~someSubObjectInfo
Object~someObjectInfo|SubObject~someSubObjectInfo
...
Tail~someInfo

How do I grep for all lines without a "#" character in the line

I have a text file open in BBEdit/InDesign with email addresses on some lines (about a third of the lines) and name and date stuff on the other lines. I just want to keep the lines that have emails and remove all the others.
A simple pattern I can see to eliminate all the lines apart from those with email addresses on them is to have a negative match for the # character.
I can't use grep -v pattern because the Find and Replace implementation of grep dialogue box just has the fields for Find pattern and Replace pattern. grep -something options don't exist in this context.
Note, I am note trying to construct a valid email address test at all, just using the presence of one (or more) # character to allow a line to stay, all other lines must be deleted from the list.
The closest I got was a pattern which hits only the email address lines (opposite outcome of my goal):
^((\w+|[ \.])\w+)[?#].*$
I tried various combination of ^.*[^#].*$ and more sophisticated /w and [/w|\.] in parentheses and escaping the # with [^\#] and negative look forwards like (!?).
I want to find these non-email address lines and delete them using any of these apps on OS X BBEdit/InDesign. I will use the command line if I have to. There must be a way using in-app Find and Replace with grep though I'd expect.

As stated in the comments grep -v # filename lists all lines without an # symbol. You could also use grep # filename > new_filename
The file new_filename will consist only of lines with #. You can use this new file or delete all lines in the old file and paste contents of new file into it.

Splitting non-equally file in bash

I have a file in csv format. I know positions where I want to chip off a chunk from the file and write it as a new csv file.
split command splits a file into equal-sized chunks. I wonder if there exists an effective (the file is huge) way to split file into chunks of different sizes?

I assume you want to split the file at a newline character. If this is the case you can use the head and tail commands to grab a number of lines from the beginning and from the end of your file, respectively.
If you want to copy a new of lines from within the file you can use sed, e.g.
sed -e 1,Nd -e Mq file
where N should be replaced with the line number of the line preceding the first line to display and M should be the line number of the last line to display.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Trouble using csplit to split a text file based on delimiter - terminal

You need to enclose the regex pattern in // e.g. ‘/######## START OF NEW ARTICLE ########/’.

Related

Adding quotes to variating characters in bash

How can I split content of a file to two different files using shell script

Bash Script To Convert Text File of One Line to Multiple Lines

How do I grep for all lines without a "#" character in the line

Splitting non-equally file in bash

Categories

Resources