How I can add the same text to several files via script? - bash

I want a simple way to add the same text (e.g. "bye" or more lines) to a group of files using a small script. I tried something with ed and vi inside of a script, but it did'nt work.
Edit: I edit this coment to be more specific:
I have the files c0001.gin c0002.gin ... up to let's say 500. I need to add to the end of each file the next text:
species
Ca core 2.00000000
Co core 2.00000000
C core 1.34353898
O core 1.01848700
O shel -2.13300000
buck intra
O core O core 4030.3000 0.245497 0.00000000 0.00 2.50 1 0 0
buck
Ca core O shel 2154.0600 0.289118 0.00000000 0.00 10.00 1 0 0
Co core O shel 1095.6000 0.286300 0.00000000 0.00 10.00 1 0 0
Ca core C core 120000000.000 0.120000 0.00000000 0.00 10.00 1 0 0
Co core C core 95757575.760 0.120000 0.00000000 0.00 10.00 1 0 0
buck inter
O shel O shel 64242.454 0.198913 21.843570 0.00 15.00 1 0 0
morse intra bond
C core O core 5.0000000 2.5228 1.19820 0.0000 1 0
three
C core O core O core 1.7995 120.00
outofplane bond intra
C cor O cor O cor O cor 8.6892 360.0
spring
O 52.740087
I want just a script to do that.
Furthermore, the files are in folder called "CALCS" and I wanted to move each file to another folder inside CALCS called "001" for c0001.gin, "002" for file c0002.gin and so on.
Thanks in advance

#!/bin/sh
text="${1:?Usage: $0 <text> <file>...}"
shift
files="${#}"
for file in $files
do
echo "$text" >> "$file"
done

Related

Find a column from a file in another file based on value and order

I have two files and would like to find out which parts of file 1 occur in the same order/sequence of file 2 based on one of multiple columns (col4). The files are sorted based on an identifier in col1 (from 1 to n) but the identifier is not the between the files. The column in file 1 always occurs as one block in file 2.
file1:
x 1
x 2
x 3
file2:
y 5
y 1
y 2
y 3
y 6
output:
y 1
y 2
y 3
Another thing to take into consideration is, that the entries in the column to be filtered on are not unique.
I already tried
awk 'FNR==NR{ a[$2]=$2;next } ($2 in a)' file1 file2 > output
but it only works if you have unique identifiers.
To clarify it with real life data: I would like to extract the rows where I have the same order based on column 4.
File1:
ATOM 13 O ALA A 2 37.353 35.331 -19.903 1.00 71.02 O
ATOM 18 O TRP A 3 38.607 32.133 -18.273 1.00 69.13 O
File2:
ATOM 1 N MET A 1 42.218 38.990 -18.511 1.00 64.21 N
ATOM 10 CA ALA A 2 38.451 37.475 -20.033 1.00 71.02 C
ATOM 13 O ALA A 2 37.353 35.331 -19.903 1.00 71.02 O
ATOM 18 O TRP A 3 38.607 32.133 -18.273 1.00 69.13 O
ATOM 29 CA ILE A 4 38.644 33.633 -15.907 1.00 72.47 C

Add heteroatom to pdb file

I am using Biopython to perform various operations on a pdb file. Subsequently I would like to add some new atoms to the Biopython structure object generated by Biopython. Is there a good/recommended way to do this in Python. It seems Biopython only provides options to write out existing elements of a pdb file and not to create new ones.
You could have a look at the Python package Biotite (https://www.biotite-python.org/), a package I am developing.
In the following example code, a PDB structure is downloaded, read and then an atom is added:
import biotite.database.rcsb as rcsb
import biotite.structure as struc
import biotite.structure.io as strucio
# Download lysozyme structure for example
file_name = rcsb.fetch("1aki", "pdb", target_path=".")
# Read the file into Biotite's structure object (atom array)
atom_array = strucio.load_structure(file_name)
# Add an HETATM
atom = struc.Atom(
coord = [1.0, 2.0, 3.0],
chain_id = "A",
# The residue ID is the last ID in the file +1
res_id = atom_array.res_id[-1] + 1,
res_name = "ABC",
hetero = True,
atom_name = "CA",
element = "C"
)
atom_array += struc.array([atom])
# Save edited structure
strucio.save_structure("1aki_edited.pdb", atom_array)
The last lines of 1aki_edited.pdb:
...
HETATM 1075 O HOH A 203 12.580 21.214 5.006 1.00 0.000 O
HETATM 1076 O HOH A 204 19.687 23.750 -4.851 1.00 0.000 O
HETATM 1077 O HOH A 205 27.098 35.956 -12.358 1.00 0.000 O
HETATM 1078 O HOH A 206 37.255 9.634 10.002 1.00 0.000 O
HETATM 1079 O HOH A 207 43.755 23.843 8.038 1.00 0.000 O
HETATM 1080 CA ABC A 208 1.000 2.000 3.000 1.00 0.000 C
I have used RDKit to add and edit atoms in PDB-files succesfully. Below I've shown a small example of how to add a carbon atom to a PDB-file and creating a new .pdb-file
from rdkit import Chem
from rdkit.Chem import rdGeometry
prot = Chem.MolFromPDBFile("./3etr.pdb") #Read in the .pdb-file
protconf = prot.GetConformer() #create a conformer of the molecule
#create an editable mol-object
mw = Chem.RWMol(mol)
#create an editable conformer. This dictates the atoms coordinates and other attributes
mw_conf = mw.GetConformer()
#add a carbon atom to the editable mol. Returns the index of the new atom, which is the same as prot.GetNumAtoms() + 1
c_idx = mw.AddAtom(Chem.Atom(6))
#cartesian coordinates of the new atom. I think the Point3D object is not strictly necessary. but can be easier to handle in RDKit
coord = rdGeometry.Point3D(1.0, 2.0, 3.0)
#set the new coordinates
mw_conf.SetAtomPosition(c_idx, coord)
#save the edited PDB-file
pdb_out = Chem.MolToPDBFile(mw_conf, "_out.pdb")

Stata: Transposing panel rows to column

I am trying to rearrange the following panel data set into a form where I can merge with another. I would like to transform this:
Gender Year IndA IndB IndC
1 2008 0.22 0.34 0.45
2 2008 0.78 0.66 0.55
1 2009 0.25 0.36 0.49
2 2009 0.75 0.64 0.51
1 2010 0.28 0.38 0.48
2 2010 0.72 0.62 0.52
Into:
(ID) Year Industry 1 2
1 2008 A 0.22 0.78
2 2009 A 0.25 0.75
3 2010 A 0.28 0.72
4 2008 B 0.34 0.66
5 2009 B 0.36 0.64
6 2010 B 0.38 0.62
7 2008 C 0.45 0.55
8 2009 C 0.49 0.51
9 2010 C 0.38 0.62
I am new to Stata and am having difficulties reshaping both the columns and the genders.
See help reshape. One way to do this is consecutive reshapes. You can execute the first line, look at the data in the data browser, then execute the second line to see how this works. You will also need to choose a name other than 1 and 2 for the final variables.
reshape long Ind, i(Year Gender) j(Industry) string
reshape wide Ind, i(Year Industry) j(Gender)
You can also replace the first reshape with a stack (less legible, but can sometimes be faster than a reshape):
stack Gender Year IndA Gender Year IndB Gender Year IndC, into(Gender Year Y) clear
rename _stack Industry
lab define Industry 1 "A" 2 "B" 3 "C"
lab val Industry Industry
reshape wide Y, i(Industry Year) j(Gender)
sort Industry Year
gen id = _n
order id Year Industry
list, sepby(Industry) noobs
As a third variation on the same theme, note that proportions for the two Genders sum to 1, so we only need one.
clear
input Gender Year IndA IndB IndC
1 2008 0.22 0.34 0.45
2 2008 0.78 0.66 0.55
1 2009 0.25 0.36 0.49
2 2009 0.75 0.64 0.51
1 2010 0.28 0.38 0.48
2 2010 0.72 0.62 0.52
end
drop if Gender == 1
drop Gender
reshape long Ind , i(Year) j(Type) string
list , sepby(Year)
+-------------------+
| Year Type Ind |
|-------------------|
1. | 2008 A .78 |
2. | 2008 B .66 |
3. | 2008 C .55 |
|-------------------|
4. | 2009 A .75 |
5. | 2009 B .64 |
6. | 2009 C .51 |
|-------------------|
7. | 2010 A .72 |
8. | 2010 B .62 |
9. | 2010 C .52 |
+-------------------+

How to extract lines that are within radius of cartesian coordinates

I have a data file that has the format of the following:
ATOM 4 N ASP A 1 105.665 49.507 41.867 1.00 71.64 N
ATOM 5 CA ASP A 1 105.992 48.589 42.982 1.00 70.20 C
ATOM 6 C ASP A 1 107.024 49.191 43.936 1.00 69.70 C
In row 1 the numbers (105.665, 49.507, and 41.867) are the columns of the coordinates (x,y,z). How do I extract the entire line with coordinates that are within a specified radius and output them in another file? The equation to correlate the coordinates to the radius is:
radius= SQRT(x^2 + y^2 +z^2)
I think you mean this:
awk -v R=124.44 '($7^2)+($8^2)+($9^2) < R^2' YourFile
Change the R=124.44 to match your radius.
Sample Output
ATOM 4 N ASP A 1 105.665 49.507 41.867 1.00 71.64 N
ATOM 5 CA ASP A 1 105.992 48.589 42.982 1.00 70.20 C

How to transform a correlation matrix into a single row?

I have a 200x200 correlation matrix text file that I would like to turn into a single row.
e.g.
a b c d e
a 1.00 0.33 0.34 0.26 0.20
b 0.33 1.00 0.40 0.48 0.41
c 0.34 0.40 1.00 0.59 0.35
d 0.26 0.48 0.59 1.00 0.43
e 0.20 0.41 0.35 0.43 1.00
I want to turn it into:
a_b a_c a_d a_e b_c b_d b_e c_d c_e d_e
0.33 0.34 0.26 0.20 0.40 0.48 0.41 0.59 0.35 0.43
I need a code that can:
1. Join the variable names to make a single row of headers (e.g. turn "a" and "b" into "a_b") and
2. Turn only one half of the correlation matrix (bottom or top triangle) into a single row
A bit of extra information: I have around 500 participants in a study and each of them has a correlation matrix file. I want to consolidate these separate data files into one file where each row is one participant's correlation matrix.
Does anyone know how to do this?
Thanks!!

Resources