Terminal Editing File to Remove Certain Characters - terminal

I have a couple of files which have a format of something like this:
TCTCTGGAAAGGGACGCCTGGGAGG 10
AAAAATACATTCTAACCTCGGCGT 1
TAATTTCATCAATATATCAATG 1
(etc...)
I want to remove everything after the space so that I only get this in the end:
TCTCTGGAAAGGGACGCCTGGGAGG
AAAAATACATTCTAACCTCGGCGT
TAATTTCATCAATATATCAATG
(etc...)
How would I do this?

You can do this with awk:
cat oldfile | awk '{print $1}' > newfile

cut -d' ' -f1 file.txt
or:
sed 's/ .*//' file.txt
or
sed -e 's/[^ACTG]//g' file.txt
or
awk '{print $1}' file.txt

Not as concise and pretty as timos's answer :), but another quick example of the same functionality but written in Ruby.
#!/usr/bin/env ruby
data = File.read("data.txt")
f = File.open("outData.txt", "w")
finalData = data.scan(/^\w+/)
finalData.each {|i| f.write(i + "\n")}
f.close

Related

Combine multiple text files (row wise) into columns

I have multiple text files that I want to merge columnwise.
For example:
File 1
0.698501 -0.0747351 0.122993 -2.13516
File 2
-5.27203 -3.5916 -0.871368 1.53945
I want the output file to be like:
0.698501, -5.27203
-0.0747351, -3.5916
0.122993, -0.871368
-2.13516, 1.53945
Is there a one line bash common that can accomplish this?
I'll appreciate any help.
---Lyndz
With awk:
awk '{if(NR==1) {split($0,a1," ")} else {split($0,a2," ")}} END{for(i in a2) print a1[i] ", " a2[i]}' file1 file2
Output:
0.698501, -5.27203
-0.0747351, -3.5916
0.122993, -0.871368
-2.13516, 1.53945
paste <(cat file1 | sed -E 's/ +/&,\n/g') <(cat file2 | sed -E 's/ +/&\n/g') | column -s $',' -t | sed -E 's/\s+/, /g' | sed -E 's/, $//g'
It got a bit complicated, but I guess it can be done in a bit simpler way also.
P.S: Please lookup for the man pages of each command to see what they do.

Bash: concenate lines in csv file (1+2, 3+4 etc)

I have a bash file with increasing integers in the first column and some text behind.
1,text1a,text1b
2,text2a,text2b
3,text3a,text3b
4,text4a,text4b
...
I would like to add line 1+2, 3+4 etc. and add the outcome to a new csv file.
The desired output would be
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
...
A second option without the numbers would be great as well. The actual input would be
1,text,text,,,text#text.com,2,text.text,text
2,text,text,,,text#text.com,3,text.text,text
3,text,text,,,text#text.com,2,text.text,text
4,text,text,,,text#text.com,3,text.text,text
Desired outcome
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
$ pr -2ats, file
gives you
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
UPDATE
for the second part
$ cut -d, -f2- file | pr -2ats,
will give you
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
awk solution:
awk '{ printf "%s%s",$0,(!(NR%2)? ORS:",") }' input.csv > output.csv
The output.csv content:
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
----------
Additional approach (to skip numbers):
awk -F',' '{ printf "%s%s",$2 FS $3,(!(NR%2)? ORS:FS) }' input.csv > output.csv
The output.csv content:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
3rd approach (for your extended input):
awk -F',' '{ sub(/^[0-9]+,/,"",$0); printf "%s%s",$0,(!(NR%2)? ORS:FS) }' input.csv > output.csv
With bash, cut, sed and paste:
paste -d, <(cut -d, -f 2- file | sed '2~2d') <(cut -d, -f 2- file | sed '1~2d')
Output:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
I hoped to get started with something simple as
printf '%s,%s\n' $(<inputfile)
This turns out wrong when you have spaces inside your text fields.
The improvement is rather a mess:
source <(echo "printf '%s,%s\n' $(sed 's/.*/"&"/' inputfile|tr '\n' ' ')")
Skipping the first filed can be done in the same sed command:
source <(echo "printf '%s,%s\n' $(sed -r 's/([^,]*),(.*)/"\2"/' inputfile|tr '\n' ' ')")
EDIT:
This solution will fail when it has special characters, so you should use a simple solution as
cut -f2- file | paste -d, - -

How I can delete the first column in even rows? [duplicate]

I have a csv file with data presented as follows
87540221|1356438283301|1356438284971|1356438292151697
87540258|1356438283301|1356438284971|1356438292151697
87549647|1356438283301|1356438284971|1356438292151697
I'm trying to save the first column to a new file (without field separator , and then delete the first column from the main csv file along with the first field separator.
Any ideas?
This is what I have tried so far
awk 'BEGIN{FS=OFS="|"}{$1="";sub("|,"")}1'
but it doesn't work
This is simple with cut:
$ cut -d'|' -f1 infile
87540221
87540258
87549647
$ cut -d'|' -f2- infile
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
Just redirect into the file you want:
$ cut -d'|' -f1 infile > outfile1
$ cut -d'|' -f2- infile > outfile2 && mv outfile2 file
Assuming your original CSV file is named "orig.csv":
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' orig.csv > tmp && mv tmp orig.csv
GNU awk
awk '{$1="";$0=$0;$1=$1}1' FPAT='[^|]+' OFS='|'
Output
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
Pipe is special regex symbol and sub function expectes you to pass a regex. Correct awk command should be this:
awk 'BEGIN {FS=OFS="|"} {$1=""; sub(/\|/, "")}'1 file
OUTPUT:
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
With sed :
sed 's/[^|]*|//' file.txt

Delete text in file after a match

I have a file with the following:
/home/adversion/web/wp-content/plugins/akismet/index1.php: PHP.Mailer-7 FOUND
/home/beckydodman/web/oldshop/images/google68274020601e.php: Trojan.PHP-1 FOUND
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc: W97M.Thus.A FOUND
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc: W97M.Thus.A FOUND
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php: Trojan.PHP-58 FOUND
I need to clean this file up by removing everything after the colon (:).
so that it looks like this:
/home/adversion/web/wp-content/plugins/akismet/index1.php
/home/beckydodman/web/oldshop/images/google68274020601e.php
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php
Use awk:
$ awk -F: '{print $1}' input
/home/adversion/web/wp-content/plugins/akismet/index1.php
/home/beckydodman/web/oldshop/images/google68274020601e.php
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php
or cut
$ cut -d: -f1 input
or sed
$ sed 's/:.*$//' input
or perl in awk-mode
$ perl -F: -lane 'print $F[0]' input
finally, pure bash
#!/bin/bash
while read line
do
echo ${line%%:*}
done < input
This should be enough
awk -F: '{print $1}' file-name
Here a none sed/awk solution
cut -d : -f 1 [filename]
pipe that through sed:
$ echo "/home/adversion/web/wp-content/plugins/akismet/index1.php: PHP.Mailer-7 FOUND" | sed 's/: .*$//'
/home/adversion/web/wp-content/plugins/akismet/index1.php
Will work as long as ': ' doesn't appear twice. Note that the awk / cut examples above are more likely to fail as they match ':' not ': '

How can I switch around the content of a line of text

I have a large file (around 39,000 lines of text) that consists of the following:
1:iowemiowe093j4384d
2:98j238d92dd2d
3:98h2d078h78dbe0c
(continues in the same manner)
and I need to reverse the order of the two sections of the lines, so the output would be:
iowemiowe093j4384d:1
98j238d92dd2d:2
98h2d078h78dbe0c:3
Instead, I've tried using cut to do this but have not been able to get it to behave properly (this is in a bash environment), what would be the best way to do this?
awk -F: '{print $2":"$1}' input-file
Or
awk -F: '{print $2,$1}' OFS=: input-file
If you may have more than 2 fields:
awk -F: '{print $NF; for(i=NF-1; i; i-- ) print ":"$i }' input-file
Or
perl -F: -anE '$\=:; say reverse #F' input-file
or
perl -F: -anE 'say join( ':', reverse #F)' input-file
( Both perl solutions are untested, and I believe flawed, each requiring a chop $F[-1] or similar to remove the newline in the input.)
One way using GNU sed:
sed -ri 's/([^:]+):(.*)/\2:\1/' file.txt
Results:
iowemiowe093j4384d:1
98j238d92dd2d:2
98h2d078h78dbe0c:3
Pure Bash and almost as fast as the awk solution from William Pursell, just not as elegant:
paste -d: <(cut -d: -f2 input-file) <(cut -d: -f1 input-file)

Resources