Edit data removing line breaks and putting everything in a row - bash

Hi I'm new in shell scripting and I have been unable to do this:
My data looks like this (much bigger actually):
Note: After every 50 characters there is a line break, but sometimes less when the data finishes and there's a new sample name
I would like that after every 50 characters, the line break would be removed, so my data would look like this:
I tried using tr but I got an error:
tr '\n' '' < my_file
tr: empty string2
tr with "-d" deletes specified character
$ cat input.txt
$ cat input.txt | tr -d "\n"

You can use this awk:
awk '/^ *>/{if (s) print s; print; s="";next} {s=s $0;next} END {print s}' file

Using awk
awk '/>/{print (NR==1)?$0:RS $0;next}{printf $0}' file
if you don't care of the result which has additional new line on first line, here is shorter one
awk '{printf (/>/?RS $0 RS:$0)}' file

This might work for you (GNU sed):
sed '/^\s*>/!{H;$!d};x;s/\n\s*//2gp;x;h;d' file
Build up the record in the hold space and when encountering the start of the next record or the end-of-file remove the newlines and print out.

you can use this sed,
sed '/^>Sample/!{ :loop; N; /\n>Sample/{n}; s/\n//; b loop; }' file.txt

Try this
cat SampleName_ZN189A | tr -d '\r'
# tr -d deletes the given/specified character from the input
Using simple awk, Same will be achievable.
awk 'BEGIN{ORS=""} {print}' SampleName_ZN189A #Output doesn't contains an carriage return
at the end, If u want an line break at the end this works.
awk 'BEGIN{ORS=""} {print}END{print "\r"}' SampleName_ZN189A
# select the correct line break charachter (i.e) \r (or) \n (\r\n) depends upon the file format.


Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Expected Output of file.csv
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

If a line has a length less than a number, append to its previous line

I have a file that looks like this:
Most of the lines have a fixed length of 8. But there are some lines in between that have a length less than 8. I need a simple line of code that appends each of those short lines to its previous line.
I have tried the following code but it takes lots of memory when working with large files.
cat FILENAME | awk 'BEGIN{OFS=FS="\t"}{print length($1), $1}' | tr
'\n' '\t' | sed 's/8/\n/g' | awk 'BEGIN{OFS="";FS="\t"}{print $2, $4}'
The output I expect:
If perl is your option, please try:
perl -0777 -pe 's/(\n)(.{1,7})$/\2/mg' filename
-0777 option tells perl to slurp all lines.
The pattern (\n)(.{1,7}) matches to a line with length less than 8, assigning \1 to a newline and \2 to the string.
The replacement \2 does not contain the preceding newline and is appended to the previous line.
sed <FILENAME 'N;/\n.\{8\}/!s/\n//;P;D'
N; - append next line to pattern space
/\n.\{8\}/ - does second line contain 8 characters?
!s/\n//; - no: join the two lines
P - print first line of pattern space
D - delete first line of pattern space, start next cycle
Default print without \n and append it to the last line when the current line has length 8.
The first and last line are special.
awk 'NR==1 {printf $0;next}
length($0)==8 {printf "\n"}
END { printf "\n" }' FILENAME
When you have GNU sed 4.2 (support -z option), you can try
EDIT (see comments): the inferiour
sed -rz 's/\n(.{0,7})\n/\1\n/g' FILENAME
If you like old traditional tools, you can use ed, the standard text editor:
printf '%s\n' 'g/^.\{,7\}$/-,.j' wq | ed -s filename

Bash + sed/awk/cut to delete nth character

I trying to delete 6,7 and 8th character for each line.
Below is the file containing text format.
Actual output..
#cat test
Expecting below, after formatting.
#cat test
Even I tried with below , no luck
#awk -F ":" '{print $1":"$2","$3}' test
#sed 's/^\(.\{7\}\).\(.*\)/\1\2/' test { Here I can remove only one character }
Even with cut also failed
#cut -d ":" -f1,2,3 test
Need to delete character in each line like 6th , 7th , 8th
Suggestion please
With GNU cut you can use the --complement switch to remove characters 6 to 8:
cut --complement -c6-8 file
Otherwise, you can just select the rest of the characters yourself:
cut -c1-5,9- file
i.e. characters 1 to 5, then 9 to the end of each line.
With awk you could use substrings:
awk '{ print substr($0, 1, 5) substr($0, 9) }' file
Or you could write a regular expression, but the result will be more complex.
For example, to remove the last three characters from the first comma-separated field:
awk -F, -v OFS=, '{ sub(/...$/, "", $1) } 1' file
Or, using sed with a capture group:
sed -E 's/(.{5}).{3}/\1/' file
Capture the first 5 characters and use them in the replacement, dropping the next 3.
it's a structured text, why count the chars if you can describe them?
$ awk '{sub(":..,",",")}1' file
remove the seconds.
The solutions below are generic and assume no knowledge of any format. They just delete character 6,7 and 8 of any line.
sed 's/.//8;s/.//7;s/.//6' <file> # from high to low
sed 's/.//6;s/.//6;s/.//6' <file> # from low to high (subtract 1)
sed 's/\(.....\).../\1/' <file>
sed 's/\(.{5}\).../\1/' <file>
s/BRE/replacement/n :: substitute nth occurrence of BRE with replacement
awk 'BEGIN{OFS=FS=""}{$6=$7=$8="";print $0}' <file>
awk -F "" '{OFS=$6=$7=$8="";print}' <file>
awk -F "" '{OFS=$6=$7=$8=""}1' <file>
This is 3 times the same, removing the field separator FS let awk assume a field to be a character. We empty field 6,7 and 8, and reprint the line with an output field separator OFS which is empty.
cut -c -5,9- <file>
cut --complement -c 6-8 <file>
Just for fun, perl, where you can assign to a substring
perl -pe 'substr($_,5,3)=""' file
With awk :
echo "18:40:12,,UP" | awk '{ $0 = ( substr($0,1,5) substr($0,9) ) ; print $0}'
If you are running on bash, you can use the string manipulation functionality of it instead of having to call awk, sed, cut or whatever binary:
while read STRING
echo ${STRING:0:5}${STRING:9}
done < myfile.txt
${STRING:0:5} represents the first five characters of your string, ${STRING:9} represents the 9th character and all remaining characters until the end of the line. This way you cut out characters 6,7 and 8 ...

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
$ cat file
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file

UNIX command line file edit functionality

Back again with another question.
The file I have right now is of the following format:
I would like to do two things with this file.
First, remove the - characters in the third line. Therefore, 1-23-4 ==> 1234.
Second, I would like to make it all print in one line.
The final result should look like:
Is this possible using line commands in a script? Kindly advise.
Thank you in advance for your time and help.
With tr:
$ tr -d '\n-' < a | sed 's/$/\n/'
To remove the hyphens:
$ tr -d '-' < file
And the same applies for the new lines with \n.
As we are removing all new lines, it will miss the finishing one. To recover it, we use sed.
$prompt tr -d '\n-' < a
$ tr -d '\n-' < a | sed 's/$/\n/'
Thanks fedorqui.I tried this. I replaced file with my file name but it
is not creating a new file with the final format for me. Sorry i think
I should have mentioned this in the question. My bad.
No problem. You just need to redirect it:
$ tr -d '\n-' < a | sed 's/$/\n/' > new_file
$ cat new_file
(gnu) awk:
awk -v RS="\0" 'gsub(/[\n-]/,"")' file
kent$ echo "1234,
1-23-4"|awk -v RS="\0" 'gsub(/[\n-]/,"")'
tr was a good one.
Anotherway in perl:
perl -pne 's/[-\n]//g' your_file
-n-> this will act as a while loop for each line in the file.
-e-> the thing after this is nothing but the expression which will act on each an every line.
-p-> print each line after the expression is executed on each line.
s/ search for either a newline or "-"/ replace with empty character/g-for all occurences in the line.
