Cut command with delimiter Control-A - shell

I am trying to get some columns from file1 to file2 using cut command with delimiter Control A.
This is what I tried:
cut -d^A -f2-8 a.dat > b.dat
If my records are like this:
A^AB^AC^AD^AE^AF^AG^AH^A$
my command gives:
AB^AC^AD^AE^AF^AG^AH
Is my command wrong or am I putting the delimiter in a wrong way?
So it leaves Control-A's A in the starting point.

^A is character number 1 in the ASCII table a.k.a Start of Heading character. If you're using bash, you can have this:
cut -f 2-8 -d $'\x01'
Or use printf (can be builtin or an external binary):
CTRL_A=$(printf '\x01')
cut -f 2-8 -d "$CTRL_A"
You can also verify your output with hexdump:
hexdump -C b.dat

I can't really understand your question, but would suggest you use tr to change your Control-As into something else more workable and maybe then change them back when you are finished:
tr '^A' ',' < yourfile | do some cutting using commas | tr ',' '^A' > newfile

Related

reverse a file in Unix shell

I have a file parse.txt
parse.txt contains the following
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0
and I want the output.txt file as (to reverse the order from last to first)using parse.txt
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
I have tried the following code:
tail -r parse.txt
You can use the surprisingly helpful tac from GNU Coreutils.
tac -s "," parse.txt > newparse.txt
tac by default will "cat" the file to standard out, reversing the lines. By specifying the separator using the -s flag, you can simply reverse your fields as desired.
(You may need to do a post-processing step to get the commas to work out correctly, which can be another step in your pipeline.)
I like the tac solution; it's tight and elegant, but as Micah pointed out, tac is part of GNU Coreutils, which means that it's not available by default in FreeBSD, OSX, Solaris, etc.
This can be done in pure bash, no external tools required.
#!/usr/bin/env bash
unset comma
read foo < parse.txt
bar=(${foo//,/ })
for (( count="${#bar[#]}"; --count >= 0; )); do
printf "%s%s" "$comma" "${bar[$count]}"
comma=","
done
This obviously only handles one line, per your sample input. You can wrap it in something if you need to handle multiple lines of input.
The logic here is that we can convert the input into an array by replacing commas with spaces. Of course, if our input data included spaces, this would have to be adjusted. Once we have the array, we simply step backwards through it, printing each record.
Note that this does not include a terminating newline. If you want one, you can add it with:
printf '\n'
as a final line.
perl -F, -lane 'print join ",", reverse #F' parse.txt > output.txt
You can use this awk command:
awk -v RS=, '{a[++i]=$1} END{for (k=i; k>=1; k--) printf a[k] (k>1?RS:ORS)}' parse.txt
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
The question is tagged unix and you have mentioned tail -r which suggests you might not be using Linux (with full GNU toolchain), but instead some "real" Unix (BSD variant), e.g. osx.
As such, the tac command is not available, but as mentioned in the question, tail -r is. So you can use the following:
$ tr ',' '\n' < parse.txt | tail -r | tr '\n' ',' | sed 's/,$//'
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
$
Notes:
This only works for files that have one line, as we are relying on converting commas to newlines and back. If there is more than one line, then the newlines in between will get converted to commas by the second tr.
The final sed is to remove a trailing comma, that was converted from a trailing newline inserted by tail
Emulating tac with sed:
tr , '\n' <parse.txt | sed '1!G; h; $!d' | paste -sd ,
Alternatively, if you don't have paste:
tr , '\n' <parse.txt | sed '1!G; h; $!d' | tr '\n' , | sed 's/,$//'
Output:
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
You can use any language to do that
xargs ruby -e "puts ARGV[0].split(',').reverse.join(',')" < parse.txt
Reverse can be done by tac (from cat). As commented this will reverse the lines not what the OP asked for.
tac filename
You can still you tac if you provide line by line and reverse by not linefeed delimiter but the field separator, here ,.
echo "a,b,c" | tr '\n' ',' | tac -s "," | sed 's/,$/\n/'

Cut from column to end of line

I'm having a bit of an issue cutting the output up from egrep. I have output like:
From: First Last
From: First Last
From: First Last
I want to cut out the "From: " (essentially leaving the "First Last").
I tried
cut -d ":" -f 7
but the output is just a bunch of blank lines.
I would appreciate any help.
Here's the full code that I am trying to use if it helps:
egrep '^From:' $file | cut -d ":" -f 7
NOTE: I've already tested the egrep portion of the code and it works as expected.
The cut command lines in your question specify colon-separated fields and that you want the output to consist only of field 7; since there is no 7th field in your input, the result you're getting isn't what you intend.
Since the "From:" prefix appears to be identical across all lines, you can simply cut from the 7th character onward:
egrep '^From:' $file | cut -c7-
and get the result you intend.
you were really close.
I think you only need to replace ":" with " " as separator and add "-" after the "7": like this:
cut -d " " -f 2-
I tested and works pretty well.
The -f argument is for what fields. Since there is only one : in the line, there's only two fields. So changing -f 7 to -f 2- will give you want you want. Albeit with a leading space.
You can combine the egrep and cut parts into one command with sed:
sed -n 's/^From: //gp' $file
sed -n turns off printing by default, and then I am using p in the sed command explicitly to print the lines I want.
You can use sed:
sed 's/^From: *//'
OR awk:
awk -F ': *' '$1=="From"{print $2}'
OR grep -oP
grep -oP '^From: *\K.*'
Here is a Bash one-liner:
grep ^From file.txt | while read -a cols; do echo ${cols[#]:1}; done
See: Handling positional parameters at wiki.bash-hackers.org
cut itself is a very handy tool in bash
cut -d (delimiter character) -f (fields that you want as output)
a single field is given directly as -f 3 ,
range of fields can be selected as -f 5-9
so in your this particular case code would be
egrep '^From:' $file | cut -d\ -f 2-3
the delimiter is space here and can be escaped using a \
-f 1 corresponds to " From " and 2-3 corresponds to " First Last "

Cut command does not appear to be working

I'm piping a command to cut and nothing appears to be happening.
The output of the command looks like this:
Name File Info OS
11 FileName1 OS1
12 FileName2 OS2
13 FileName3 OS3
I'm trying to extract column 1,2 from all rows (starting with row 2) using the following:
my_command | cut -f1,2 and the output is exactly the same as the original.
Cut doen't behave well with multiple spaces as a delimiter. Use awk instead
mycommand | awk 'NR>1{print $1,$2}'
use tr -s to convert repeating spaces into single space. Now cut can be used where single space is delimiter seperating columns.
mycommand | tr -s ' ' | cut -d' ' -f1,2
If multiple spaces are used for a delimiter and the column positions are fixed, you would use column numbers with cut:
mycommand | cut -c1-27
Or you could lose the front spaces with:
mycommand | cut -c5-27
This will work even if your fields have embedded spaces. The awk method will fail if you have embedded spaces in your fields.

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

shell replace cr\lf by comma

I have input.txt
1
2
3
4
5
I need to get such output.txt
1,2,3,4,5
How to do it?
Try this:
tr '\n' ',' < input.txt > output.txt
With sed, you could use:
sed -e 'H;${x;s/\n/,/g;s/^,//;p;};d'
The H appends the pattern space to the hold space (saving the current line in the hold space). The ${...} surrounds actions that apply to the last line only. Those actions are: x swap hold and pattern space; s/\n/,/g substitute embedded newlines with commas; s/^,// delete the leading comma (there's a newline at the start of the hold space); and p print. The d deletes the pattern space - no printing.
You could also use, therefore:
sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}'
The -n suppresses default printing so the final d is no longer needed.
This solution assumes that the CRLF line endings are the local native line ending (so you are working on DOS) and that sed will therefore generate the local native line ending in the print operation. If you have DOS-format input but want Unix-format (LF only) output, then you have to work a bit harder - but you also need to stipulate this explicitly in the question.
It worked OK for me on MacOS X 10.6.5 with the numbers 1..5, and 1..50, and 1..5000 (23,893 characters in the single line of output); I'm not sure that I'd want to push it any harder than that.
In response to #Jonathan's comment to #eumiro's answer:
tr -s '\r\n' ',' < input.txt | sed -e 's/,$/\n/' > output.txt
tr and sed used be very good but when it comes to file parsing and regex you can't beat perl
(Not sure why people think that sed and tr are closer to shell than perl... )
perl -pe 's/\n/$1,/' your_file
if you want pure shell to do it then look at string matching
${string/#substring/replacement}
Use paste command. Here is using pipes:
echo "1\n2\n3\n4\n5" | paste -s -d, /dev/stdin
Here is using a file:
echo "1\n2\n3\n4\n5" > /tmp/input.txt
paste -s -d, /tmp/input.txt
Per man pages the s concatenates all lines and d allows to define the delimiter character.
Awk versions:
awk '{printf("%s,",$0)}' input.txt
awk 'BEGIN{ORS=","} {print $0}' input.txt
Output - 1,2,3,4,5,
Since you asked for 1,2,3,4,5, as compared to 1,2,3,4,5, (note the comma after 5, most of the solutions above also include the trailing comma), here are two more versions with Awk (with wc and sed) to get rid of the last comma:
i='input.txt'; awk -v c=$(wc -l $i | cut -d' ' -f1) '{printf("%s",$0);if(NR<c){printf(",")}}' $i
awk '{printf("%s,",$0)}' input.txt | sed 's/,\s*$//'
printf "1\n2\n3" | tr '\n' ','
if you want to output that to a file just do
printf "1\n2\n3" | tr '\n' ',' > myFile
if you have the content in a file do
cat myInput.txt | tr '\n' ',' > myOutput.txt
python version:
python -c 'import sys; print(",".join(sys.stdin.read().splitlines()))'
Doesn't have the trailing comma problem (because join works that way), and splitlines splits data on native line endings (and removes them).
cat input.txt | sed -e 's|$|,|' | xargs -i echo "{}"

Resources