Duplicate first column of multiple text files in bash - bash

I have multiple text files each containing two columns and I would like to duplicate the first column in each file in bash to have three columns in the end.
File:
sP100227 1
sP100267 1
sP100291 1
sP100493 1
Output file:
sP100227 sP100227 1
sP100267 sP100267 1
sP100291 sP100291 1
sP100493 sP100493 1
I tried:
txt=path/to/*.txt
echo "$(paste <(cut -f1-2 $txt) > "$txt"

Could you please try following. Written and tested with shown samples in GNU awk. This will add fields to only those lines which have 2 fields in it.
awk 'NF==2{$1=$1 OFS $1} 1' Input_file
In case you don't care of number of fields and simply want to have value of 1st field 2 times then try following.
awk '{$1=$1 OFS $1} 1' Input_file
OR if you only have 2 fields in your Input_file then we need not to rewrite the complete line we could simply print them as follows.
awk '{print $1,$1,$2}' Input_file
To save output into same Input_file itself append > temp && mv temp Input_file for above solutions(after testing).

Use a temp file, with cut -f1 and paste, like so:
paste <(cut -f1 in_file) in_file > tmp_file
mv tmp_file in_file
Alternatively, use a Perl one-liner, like so:
perl -i.bak -lane 'print join "\t", $F[0], $_;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

The default delimiter in cut and paste is TAB, but your file looks to be space-separated.
You can't use the same file as input and output redirection, because when the shell opens the file for output it truncates it, so there's nothing for the program to read. Write to a new file and then rename it.
Your paste command is only being given one input file. And there's no need to use echo.
paste -d' ' <(cut -d' ' -f1 "$txt") "$txt" > "$txt.new" && mv "$txt.new" "$txt"
You can do this more easily using awk.
awk '{print $1, $0}' "$txt" > "$txt.new" && mv "$txt.new" "$txt"
GNU awk has an in-place extension, so you can use that if you like. See Save modifications in place with awk

Try sed -Ei 's/\s*(\S+)\s+/\1 \1 /1' $txt if your fields are separated by strings of one or more whitespace characters. This used the Stream Editor (sed) replaces (s///1) the first string of non-space characters (\S+) followed by a string of whitespace characters (\s+) with the same thing repeated with intervening spaces(\1 \1 ). It keeps the rest of the line. The -E to sed means use extended pattern matching (+, ( vs. \(). The -i means do it in-place, replacing the file with the output.
You could use awk and do awk '{ printf "%s %s\n",$1,$0 }'. This takes the first whitespace-delimited field ($1) and follows it with a space and the whole line ($0) followed by a newline. This is a little clearer than sed but it doesn't have the advantage of being in-place.
If you can guarantee they are delimited by only one space, with no leading spaces, you can use paste -d' ' <(cut -d' ' -f1 ${txt}) ${txt} > ${txt}.new; mv ${txt}.new ${txt}. The -d' ' sets the delimiter to space for both cut and paste. You know this but for others -f1 means extract the first -d-delimited field. The mv command replaces the input with the output.

Related

How to add a space after a comma if it does not exist within the 6th column in a csv file?

Ubuntu 16.04
Bash 4.3.3
I also need a way to add a space after the comma if one does not exist in the 6th column. I had to comment the above line because it placed a space after all commas in the csv file.
Wrong: "This is 6th column,Hey guys,Red White & Blue,I know it,Right On"
Perfect: "This is 6th column, Hey guys, Red White & Blue, I know it, Right On"
I could almost see awk printing out the 6th column then having sed do the rest:
awk '{ print $6 }' "$feed " | sed -i 's/|/,/g; s/,/, /g; s/,\s\+/, /g'
This is what I have so far:
for feed in *; do
sed -r -i 's/([^,]{0,10})[^,]*/\1/5' "$feed"
sed -i '
s/<b>//g; s/*//g;
s/\([0-9]\)""/\1inch/g;
# s/|/,/g; s/,/, /g; s/,\s\+/, /g;
s/"one","drive"/"onetext","drive"/;
s/"comments"/"description"/;
s/"features"/"optiontext"/;
' "$feed"
done
s/|/,/g; s/,/, /g; s/,\s\+/, /g; works but is global and not within a column.
It sounds like all you need is this (using GNU awk for FPAT):
awk 'BEGIN{FPAT="[^,]*|\"[^\"]+\""; OFS=","} {gsub(/, ?/,", ",$6)} 1'
e.g.:
$ cat file
1,2,3,4,5,"This is 6th column,Hey guys,Red White & Blue,I know it,Right On",7,8
$ awk 'BEGIN{FPAT="[^,]*|\"[^\"]+\""; OFS=","} {gsub(/, ?/,", ",$6)} 1' file
1,2,3,4,5,"This is 6th column, Hey guys, Red White & Blue, I know it, Right On",7,8
It actually looks like your whole shell script including multiple calls to GNU sed could be done far more efficiently in just one call to GNU awk with no need for a surrounding shell loop, e.g. (untested):
awk -i inplace '
BEGIN{FPAT="[^,]*|\"[^\"]+\""; OFS=","}
{
$0 = gensub(/([^,]{0,10})[^,]*/,"\\1",5)
$0 = gensub(/([0-9])""/,"\\1inch","g")
sub(/"one","drive"/,"\"onetext\",\"drive\"")
sub(/"comments"/,"\"description\"")
sub(/"features"/,"\"optiontext\"")
gsub(/, ?/,", ",$6)
}
' *
This might work for you (GNU sed):
sed -r 's/[^,"]*("[^"]*")*/\n&\n/6;h;s/, ?/, /g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/' file
Surround the 6th field by newlines. Make a copy of the line. Replace all commas followed by a possible space with a comma followed by a space. Append the original line and using pattern matching replace the amended field discarding the rest of the ameliorated line.

Shell script copying all columns of text file instead of specified ones

I trying to copy 3 columns from one text file and paste them into a new text file. However, whenever I execute this script, all of the columns in the original text file get copied. Here is the code I used:
cut -f 1,2,6 PROFILES.1.0.profile > compiledfile.txt
paste compiledfile.txt > myNewFile
Any suggestions as to what I'm doing wrong? Also, is there a simpler way to do this? Thanks!
Let's suppose that the input is comma-separated:
$ cat File
1,2,3,4,5,6,7
a,b,c,d,e,f,g
We can extract columns 1, 2, and 6 using cut:
$ cut -d, -f 1,2,6 File
1,2,6
a,b,f
Note the use of option -d, to specify that the column separator is a comma.
By default, cut uses a tab as the column separator. If the separator in your file is anything else, you must use the -d option.
Using awk
awk -vFS=your_delimiter_here -vOFS=your_delimiter_here 'print $1,$2,$6' PROFILES.1.0.profile > compiledfile.txt
should do it.
For comma separated fields the solution would be
awk -vFS=, -vOFS=, '{print $1,$2,$6}' PROFILES.1.0.profile > compiledfile.txt
FS is an awk builtin variable which stands for field-separator.
Similarly OFS stands for output-field-separator.
And the handy -v option with awk helps you assign a value to variable.
You could use awk to this.
awk -F "delimiter" '
{
print $1,$2 ,$3 #Where $1,$2 and so are column numbers
}' filename > newfile

Delete first characters off of a line in a file with awk or grep

I'm attempting to remove a certain pattern from a line, but not the entire line itself. An example would be:
Original:
user=dannyBoy
Desired:
dannyBoy
I have a file that is full of lines like that, so I was wondering how I would be able to cut a specific part of the text off, whether that be just removing the first five characters from the list or searching for the pattern "user=" and removing it.
There are many ways to do this:
cut -d'=' -f2- file
sed 's/^[^=]*//' file
awk -F= '{print $2}' file #if just one = is present
cut sets a delimiter (-d'=) and then prints all the fields starting from the 2nd one (-f2-).
sed looks for all the content from the beginning up to the first = and removes it.
awk sets = as field separator and prints the second field.
Using ex:
echo user=dannyBoy | ex -s +"norm df=" +%p -cq! /dev/stdin
where ex is equivalent to vi -e/vim -e which basically executes vi command: df= (delete until finds =), then print the buffer (%p).
If you've multiple lines like that, then it would be simpler by using substitution:
ex -s +"%s/^.*=//g" +%p -cq! foo.txt
To edit file in place, change -cq! to -cwq.
The command below deletes the first 5 characters:
$ echo "user=dannyboy" | cut -c 6-
You can use it on a file with cut -c 6- inputfilename as well.

printing first word in every line of a txt file unix bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks
To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename
you can try
awk '{print $1}' your_file
read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.
You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.
-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Resources