I am very new to shell scripting and having issue replacing character of tab delimited csv.
I want to convert the csv to text file and change the delimiter from tab to ~ , i tried below code but the delimiter turns out different some like japanese char+" instead of ~.
sed 's/\t/\"\~\"/g' test.csv > test.txt
Appreciate your help.. thanks in advance
If it's char for char, use tr:
cat test.csv | tr '\t' '~' > test.txt
Use iconv to change encoding to UTF-8:
iconv -f utf-16 -t utf-8 < test.csv | sed 's/\t/"~"/g' > test.txt
" and ~ can go unescaped.
I want to cut everything with a delimiter ":" The input file is in the following format:
data1:data2
data11:data22
...
I have a linux command
cat merged.txt | cut -f1 -d ":" > output.txt
On mac terminal it gives an error:
cut: stdin: Illegal byte sequence
what is the correct way to do it on a mac terminal?
Your input file (merged.txt) probably contains bytes/byte sequences that are not valid in your current locale. For example, your locale might specify UTF-8 character encoding, but the file be in some other encoding and cannot be parsed as valid UTF-8. If this is the problem, you can work around it by telling tr to assume the "C" locale, which basically tells it to process the input as a stream of bytes without paying attention to encoding.
BTW, cat file | is what's commonly referred to as a Useless Use of Cat (UUOC) -- you can just use a standard input redirect < file instead, which cleaner and more efficient. Thus, my version of your command would be:
LC_ALL=C cut -f1 -d ":" < merged.txt > output.txt
Note that since the LC_ALL=C assignment is a prefix to the tr command, it only applies to that one command and won't mess up other operations that should assume UTF-8 (or whatever your normal locale is).
Your cut command works for me on my Mac, you can try awk for the same result
awk -F: '{print $1}' merged.txt
data1
data11
I am trying to get some columns from file1 to file2 using cut command with delimiter Control A.
This is what I tried:
cut -d^A -f2-8 a.dat > b.dat
If my records are like this:
A^AB^AC^AD^AE^AF^AG^AH^A$
my command gives:
AB^AC^AD^AE^AF^AG^AH
Is my command wrong or am I putting the delimiter in a wrong way?
So it leaves Control-A's A in the starting point.
^A is character number 1 in the ASCII table a.k.a Start of Heading character. If you're using bash, you can have this:
cut -f 2-8 -d $'\x01'
Or use printf (can be builtin or an external binary):
CTRL_A=$(printf '\x01')
cut -f 2-8 -d "$CTRL_A"
You can also verify your output with hexdump:
hexdump -C b.dat
I can't really understand your question, but would suggest you use tr to change your Control-As into something else more workable and maybe then change them back when you are finished:
tr '^A' ',' < yourfile | do some cutting using commas | tr ',' '^A' > newfile
Here are my attempts to replace a b character with a newline using sed while running bash
$> echo 'abc' | sed 's/b/\n/'
anc
no, that's not it
$> echo 'abc' | sed 's/b/\\n/'
a\nc
no, that's not it either. The output I want is
a
c
HELP!
Looks like you are on BSD or Solaris. Try this:
[jaypal:~/Temp] echo 'abc' | sed 's/b/\
> /'
a
c
Add a black slash and hit enter and complete your sed statement.
$ echo 'abc' | sed 's/b/\'$'\n''/'
a
c
In Bash, $'\n' expands to a single quoted newline character (see "QUOTING" section of man bash). The three strings are concatenated before being passed into sed as an argument. Sed requires that the newline character be escaped, hence the first backslash in the code I pasted.
You didn't say you want to globally replace all b. If yes, you want tr instead:
$ echo abcbd | tr b $'\n'
a
c
d
Works for me on Solaris 5.8 and bash 2.03
In a multiline file I had to pipe through tr on both sides of sed, like so:
echo "$FILE_CONTENTS" | \
tr '\n' ¥ | tr ' ' ∑ | mySedFunction $1 | tr ¥ '\n' | tr ∑ ' '
See unix likes to strip out newlines and extra leading spaces and all sorts of things, because I guess that seemed like the thing to do at the time when it was made back in the 1900s. Anyway, this method I show above solves the problem 100%. Wish I would have seen someone post this somewhere because it would have saved me about three hours of my life.
echo 'abc' | sed 's/b/\'\n'/'
you are missing '' around \n
I have input.txt
1
2
3
4
5
I need to get such output.txt
1,2,3,4,5
How to do it?
Try this:
tr '\n' ',' < input.txt > output.txt
With sed, you could use:
sed -e 'H;${x;s/\n/,/g;s/^,//;p;};d'
The H appends the pattern space to the hold space (saving the current line in the hold space). The ${...} surrounds actions that apply to the last line only. Those actions are: x swap hold and pattern space; s/\n/,/g substitute embedded newlines with commas; s/^,// delete the leading comma (there's a newline at the start of the hold space); and p print. The d deletes the pattern space - no printing.
You could also use, therefore:
sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}'
The -n suppresses default printing so the final d is no longer needed.
This solution assumes that the CRLF line endings are the local native line ending (so you are working on DOS) and that sed will therefore generate the local native line ending in the print operation. If you have DOS-format input but want Unix-format (LF only) output, then you have to work a bit harder - but you also need to stipulate this explicitly in the question.
It worked OK for me on MacOS X 10.6.5 with the numbers 1..5, and 1..50, and 1..5000 (23,893 characters in the single line of output); I'm not sure that I'd want to push it any harder than that.
In response to #Jonathan's comment to #eumiro's answer:
tr -s '\r\n' ',' < input.txt | sed -e 's/,$/\n/' > output.txt
tr and sed used be very good but when it comes to file parsing and regex you can't beat perl
(Not sure why people think that sed and tr are closer to shell than perl... )
perl -pe 's/\n/$1,/' your_file
if you want pure shell to do it then look at string matching
${string/#substring/replacement}
Use paste command. Here is using pipes:
echo "1\n2\n3\n4\n5" | paste -s -d, /dev/stdin
Here is using a file:
echo "1\n2\n3\n4\n5" > /tmp/input.txt
paste -s -d, /tmp/input.txt
Per man pages the s concatenates all lines and d allows to define the delimiter character.
Awk versions:
awk '{printf("%s,",$0)}' input.txt
awk 'BEGIN{ORS=","} {print $0}' input.txt
Output - 1,2,3,4,5,
Since you asked for 1,2,3,4,5, as compared to 1,2,3,4,5, (note the comma after 5, most of the solutions above also include the trailing comma), here are two more versions with Awk (with wc and sed) to get rid of the last comma:
i='input.txt'; awk -v c=$(wc -l $i | cut -d' ' -f1) '{printf("%s",$0);if(NR<c){printf(",")}}' $i
awk '{printf("%s,",$0)}' input.txt | sed 's/,\s*$//'
printf "1\n2\n3" | tr '\n' ','
if you want to output that to a file just do
printf "1\n2\n3" | tr '\n' ',' > myFile
if you have the content in a file do
cat myInput.txt | tr '\n' ',' > myOutput.txt
python version:
python -c 'import sys; print(",".join(sys.stdin.read().splitlines()))'
Doesn't have the trailing comma problem (because join works that way), and splitlines splits data on native line endings (and removes them).
cat input.txt | sed -e 's|$|,|' | xargs -i echo "{}"