Exclude a column when pasting two data files - bash

I have one file "dat1.txt" which is like:
0 5.71159e-01
1 1.92632e-01
2 -4.73603e-01
and another file "dat2.txt" which is:
0 5.19105e-01
1 2.29702e-01
2 -3.05675e-01
to write combine these two files into one I use
paste dat1.txt dat2.txt > data.txt
But I do not want the 1st column of the 2nd file in the output file. How do I modify the unix command?

If your files are in sorted order along column 1, you could try:
join dat[12].txt

You could try this in awk itself,
$ awk 'FNR==NR {a[FNR]=$0;next} {print a[FNR],$2}' data1.txt data2.txt
0 5.71159e-01 5.19105e-01
1 1.92632e-01 2.29702e-01
2 -4.73603e-01 -3.05675e-01

Use cut to remove the first column and then pipe to paste.
cut -d' ' -f 1 --complement dat2.txt | paste dat1.txt - > data.txt
Note that the - in the past ecommand means to read from stdin in place of the second file.
If cut is broken on OSX, awk might work.
awk '{for (i=2; i<=NF; i++) print $i}' dat2.txt | paste dat1.txt - > data.txt

paste dat1.txt <(cut -d" " -f2- dat2.txt)
Using cut to remove column 1, and using process substitution to use its output in paste
Output:
0 5.71159e-01 5.19105e-01
1 1.92632e-01 2.29702e-01
2 -4.73603e-01 -3.05675e-01

Related

Print common values in columns using bash

I have file with two columns
apple apple
ball cat
cat hat
dog delta
I need to extract values that are common in two columns (occur in both columns) like
apple apple
cat cat
There is no ordering in items in each column.
Could you please try following and let me know if this helps you.
awk '
{
col1[$1]++;
col2[$2]++;
}
END{
for(i in col1){
if(col2[i]){
while(++count<=(col1[i]+col2[i])){
printf("%s%s",i,count==(col1[i]+col2[i])?ORS:OFS)}
count=""}
}
}' Input_file
NOTE: It will print the values if found in both the columns exactly number of times they are occurring in both the columns too.
$ awk '{a[$1];b[$2]} END{for(k in a) if(k in b) print k}' file
apple
cat
to print the values twice change to print k,k
with sort/join
$ join <(cut -d' ' -f1 file | sort) <(cut -d' ' -f2 file | sort)
apple
cat
perhaps,
$ function f() { cut -d' ' -f"$1" file | sort; }; join <(f 1) <(f 2)
Assuming I can use unix commands:
cut -d' ' -f2 fil | egrep `cut -d' ' -f1 < fil | paste -sd'|'` -
Basically what this does is this:
The second cut command collects all the words in the first column. The paste command joins them with a pipe (i.e. dog|cat|apple).
The first cut command takes the second column of words in the list and pipes them into a regexp-enabled egrep command.
Here is the closest I could get. Maybe you could loop through whole file and print when it reaches another occurrence.
Code
cat file.txt | gawk '$1==$2 {print $1,"=",$2}'
or
gawk '$1==$2 {print $1,"=",$2}' file.txt

Comparing 2 files with a for loop in bash

I am trying to compare the values in 2 files. For each row in Summits3.txt I want to define the value in Column 1 as "Chr" and then find the rows in generef.txt which have my value for "Chr" in column 2.
Then I would like to output some info about that row from generef.txt to out.txt and then repeat until the end.
I am using the following script:
#!/bin/bash
IFS=$'\n'
for i in $(cat Summits3.txt)
do
Chr=$(echo "$i" | awk '{print $1}')
awk -v var="$Chr" '{
if ($2==""'${Chr}'"")
print $2, $3
}' generef.txt > out.txt
done
it "works" but its only comparing values from the last line of Summits3.txt. It seems like it not looping through the awk bit.
Anyway please help if you can!
I think you might be looking for something like this:
awk 'FNR == NR {a[$1]; next} $2 in a {print $2, $3}' Summits3.txt generef.txt > out.txt
Basically you read column one from the first file into an array (array index is your chr and the value is empty character) then for the second file print only rows where the second column is in the index set of the array. FNR row number in file that is currently being processed, NR row number of all processed rows so far. This is a general look-up command I use for pulling out genes or variants from one file that are present in the other.
In your code above it should be appending to out.txt: >> out.txt. But you have to make sure to re-set out.txt at each run.
Besides using external scripts inside a loop (that is expensive), the first thing we see is that you redirect your output to a file from insside the loop. The output files is recreated each time, so please change inte append (>>) or better move the redirection outdide the loop.
When you want to use a loop, try this
while read -r Chr other; do
cut -d" " -f2,3 generef.txt | grep -E "^${Chr} "
done < Summits3.txt > out.txt
When you want to avoid the loop (needed for large inputfiles), an awk or some combined command can be used.
The first solution can fail:
grep -f <(cut -d" " -f1 Summits3.txt) <(cut -d" " -f2,3 generef.txt)
You only want matches of the complete field Chr, so starting at the first position until a space ( I assume that is the field-sep).
grep -f <(cut -d" " -f1 Summits3.txt| sed 's/.*/^& /') <(cut -d" " -f2,3 generef.txt)

Subtract length element two columns

I've a file from which I get two columns: cut -d $'\t' -f 4,5 file.txt
Now I would like to get the difference in length of each element between column 1 and 2.
Input from cut command
A T
AA T
AC TC
A CT
What I would expect
0
1
0
-1
Using awk.
awk ' {print length($1) - length($2)} ' cutoutput.txt
Or awk on the original file you can simply do:
awk ' {print length($4) - length($5)} ' file.txt
You probably can do this only with awk without using cut. Since you don't have the original input file, I would use the following with a | to your cut command:
cut -d $'\t' -f 4,5 file.txt | \
awk '{for (i=1;i<NF;i++) s=length($i)-length($NF); printf s"\n"}'

split numbers in and store them in different files using unix shell script

I have a file called "list.txt" which contains the following rows of numbers.
31056780
31909020
31092320
61093190
61094592
45090280
45902902
I need to now take all the rows starting with "31" and store them in another file call file31.txt take all the rows starting with "61" and store them in file61.txt, take all rows starting with "45" store it in file45.txt
file31.txt will contain.
31056780
31909020
31092320
file61.txt will contain.
61093190
61094592
file45.txt will contain.
45090280
45902902
I tried this command for all 3 but it does not do what i want it to do.
awk -F\" '/31*/ {print $0}' list.txt > file31
awk -F\" '/61*/ {print $0}' list.txt > file61
awk -F\" '/45*/ {print $0}' list.txt > file45
You can use output redirection inside a single awk script. It can construct the filename by concatenating the first two characters of the line.
awk '{ fn = "list" substr($0, 1, 2) ".txt"; print > fn }' list.txt
You could use grep or sed to filter the lines with a matching pattern, for example:
sed '/^31/!d' list.txt > list31.txt
Or in a for loop for every number you want:
for n in "31" "45" "61"; do sed '/^'"$n"'/!d' list.txt > list$n.txt; done
Hope it helps.
You can use:
awk '/^31/{print > "file31"} /^45/{print > "file45"} /^61/{print > "file61"}' file
for i in `cat list.txt | cut -c1-2 | uniq`; do cat list.txt | grep -P ^${i} > file${i}.txt; done
This command works fine and is generic enough to work for all cases.
Now let's understand how it works.
cat list.txt | cut -c1-2 | uniq
31
45
61
Next we loop over these unique identifiers to create the new files using
cat list.txt | grep -P ^${i}
grep -P finds strings with partial match - here ^ - means that we are looking at this partial string only at the beginning of the line.

Merge two files in linux with different column

I have two files in linux, the first file has 4 columns and the second has 2 columns. I want to merge these files into a new file that has the first 3 columns from file 1 and the first column from file 2. I tried awk, but my data from file 2 was placed under file 1.
paste file1 file2 | awk '{print $1,$2,$3,$5}'
Not sure which columns you want from each file, but something like this should work:
paste <file1> <file2> | awk '{print $1,$2,$3,$5}'
The first three columns would be picked from file1, and the fourth skipped, then pick the first column from the second file.
If the files have the same number of rows, you can do something like:
awk '{ getline v < "file2"; split( v, a ); print a[2], $1, $3 }' file1
to print colums 1 and 3 from file 1 and column 2 from file2.
you can try this one without paste command:
awk '{print $1}{print $2}{print $3}' file1 >> mergedfile
awk '{print $2}' file2 >> mergedfile

Resources