I'm actually facing a wall with my custom installation script.
At a point of the script, I need to enable the 64 bits repository for 64 bits machines and (for instance) I need to get from that format :
#multilib-testing[...]
#include[...]
#multilib[...]
#include[...]
To that format
#multilib-testing[...]
#include[...]
multilib[...]
include[...]
But as you can see, there are include everywhere and I can't use sed because it will recursively delete all the "include" of that specific file and it's not what I want...
I can't seem to find a solution with sed. I tried something I saw on another thread with
cat /etc/pacman.conf | grep -A 1 "multilib"
But I didn't get it well and I'm out of options...
Ideally, I would like to get a sed solution (but feel free to tell me what others options I could get as long as you explain !).
The pattern (and the beginning) shoud be something like that :
sed -i '/multilib/ s/#//' /etc/pacman.conf
And should be effective for the pattern and the line after (which is the include).
Also, I will be pleased if you could actually teach me why you do that or that as I'm learning and I can't remember something if I can't figure why I did like that. (also excuse my mid-game english).
We can use this to match a range by patterns. We can then match the # at the beginning of each line and remove it.
sed -i "/\[multilib\]/,/Include/"'s/^#//' /etc/pacman.conf
Related
I am trying to use a list that looks like this:
List file:
1mAF
2mAF
4mAF
7mAF
9mAF
10mAF
11mAF
13mAF
18mAF
27mAF
33mAF
36mAF
37mAF
38mAF
39mAF
40mAF
41mAF
45mAF
46mAF
47mAF
49mAF
57mAF
58mAF
60mAF
61mAF
62mAF
63mAF
64mAF
67mAF
82mAF
86mAF
87mAF
95mAF
96mAF
to grab out lines that contain a word-level match in a tab delimited file that looks like this:
File_of_interest:
11mAF-NODE_111-g7687-JEFF-tig00000037_arrow-g7396-AFLA_058530 11mAF cluster63
17mAF-NODE_343-g9350 17mAF cluster07
18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530 18mAF cluster20
22mAF-NODE_36-g3735 22mAF cluster28
36mAF-NODE_107-g7427 36mAF cluster77
45mAF-NODE_151-g9067 45mAF cluster14
47mAF-NODE_30-g3242-JEFF-tig00000037_arrow-g7396-AFLA_058530 47mAF cluster21
67mAF-NODE_54-g4372 67mAF cluster06
69mAF-NODE_27-g2754 69mAF cluster39
71mAF-NODE_44-g4178 71mAF cluster25
73mAF-NODE_47-g4895 73mAF cluster57
78mAF-NODE_4-g688 78mAF cluster53
but when I do grep -w -f list file_of_interest these are the only ones I get:
18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530 18mAF cluster20
36mAF-NODE_107-g7427 36mAF cluster77
45mAF-NODE_151-g9067 45mAF cluster14
and this misses a bunch of the values that are in the original list for example note that "67mAF" is in the list and in the file but it isn't returned.
I have tried removing everything after "mAF" in the list and trying again -- no change. I have rewritten the list in a completely new file to no avail. Oddly, I get more of them if I "sort" the list into a new file and then do the grep, but I still don't get all of them. I have also removed all invisible characters using sed (sed $'s/[^[:print:]\t]//g'). no change.
I am on OSX and both files were created on OSX, but normally grep -f -w works in the fashion i'm describing above.
I am completely flummoxed. Is I thought grep -w -f would look for all word-level matches of items in the file in the target file... am I wrong?
Thanks!
My guess is at least one of these files originates from a Windows machine and has CRLF line endings. file(1) might be used to tell you. If that is the case do:
fromdos FILE
or, alternatively:
dos2unix FILE
I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.
I want to find and replace the VALUE into a xml file :
<test name="NAME" value="VALUE"/>
I have to filter by name (because there are lot of lines like that).
Is it possible ?
Thanks for you help.
Since you tagged the question "bash", I assume that you're not trying to use an XML library (although I think an XML expert might be able to give you something like an XSLT processor command that solves this question very robustly), but that you're simply interested in doing search & replace from the commandline.
I am using perl for this:
perl -pi -e 's#VALUE#replacement#g' *.xml
See perlrun man page: Very shortly put, the -p switches perl into text processing mode, -i stands for "in-place", and -e let's you specify an expression to apply to all lines of input.
Also note (if you are not too familiar with that already) that you may use other characters than # (common ones are %, a comma, etc.) that don't clash with your search & replacement strings.
There is one small caveat: perl will read & write all files given on the commandline, even those that did not change. Thus, the files' modification times will be updated even if they did not change. (I usually work around that with some more shell magic, e.g. using grep -l or grin -l to select files for perl to work on.)
EDIT: If I understand your comments correctly, you also need help with the regular expression to apply. Let me briefly suggest something like this then:
perl -pi -e 's,(name="NAME" value=)"[^"]*",\1"NEWVALUE",g' *.xml
Related: bash XHTML parsing using xpath
You can use SED:
SED 's/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/' test.xml
where test.xml is the xml document containing the given node. This is very fragile, and you can work to make it more flexible if you need to do this substitution multiple times. For instance, the current statement is case sensitive, so it won't substitute the value on a node with the name="name", but you can add a case insensitivity flag to the end of the statement, like so:
('s/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/I').
Another option would be to use XSLT, but it would require you to download an external library. It's pretty versatile, and could be a viable option for more complex modifications to an XML document.
Apologies for a seemingly inane question. But I have spent the whole day trying to figure it out and it drives me up the walls. I'm trying to write a seemingly simple bash script that would take a list of files in the directory from ls, replace part of the file names using sed, get unique names from the list and pass them onto some command. Like so:
inputs=`ls *.ext`
echo $inputs
test1_R1.ext test1_R2.ext test2_R1.ext test2_R2.ext
Now I would like to put it through sed to replace 1.ext and 2.ext with * to get test1_R* etc. Then I'd like to remove resulting duplicates by running sort -u to arrive to the following $outputs variable:
echo $outputs
test1_R* test2_R*
And pass this onto a command, like so
cat $outputs
I can do something like this in a command line:
ls *.ext | sed s/..ext/\*/g | sort -u
But if I try to assign the above to a variable in the script it just returns the output from the ls. I have tried several ways to do it: including the whole pipe in the script. Running each command separately and assigning it to a variable, then passing that variable to the next command and writing the outputs to files then passing the file to the next command. But so far none of this managed to achieve what I aimed to. I think my problem lies in (except general cluelessness aroung bash scripting) inability to run seq on a variable within script. There seems to be a lot of advice around in how to pass variables to pattern or replacement string in sed, but they all seem to take files as input. But I understand that it might not be the proper way of doing it anyway. Therefore I would really appreciate if someone could suggest an elegant way to achieve, what I'm trying to.
Many thanks!
Update 2/06/2014
Hi Barmar, thanks for your answer. Can't say it solved the problem, but it helped pin-pointing it. Seems like the problem is in me using the asterisk. I have to say, I'm very puzzled. The actual file names I've got are:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
If I'm using the code you suggested, which seems to me the right way do to it:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/\*/g' | sort -u)
Sed doesn't seem to do anything and I'm getting the output of ls:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
Now if I replace that backslash with anything else, the sed works, but it also returns whatever character I'm putting in front (or after) the asteriks:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/"*/g' | sort -u)
test1_R"* test2_R"*
That's odd enough, but surely I can just put an "R" in front of the asteriks and then replace R in the search pattern string, right? Wrong! If I do that whichever way: 's/R..fastq.gz/R*/g' 's/...fastq.gz/R*/g' 's/[A-Z]..fastq.gz/R*/g' I'm back to the original names! And even if I end up with something like test1_RR* test2_RR* and try to run it through sed again and replace "_R" for "_" or "RR" for "R", I'm having no luck and I'm back to the original names. And yet I can replace the rest of the file name no problem, just not to get me test1_R* I need.
I have a feeling I should be escaping that * in some very clever way, but nothing I've tried seems to work. Thanks again for your help!
This is how you capture the result of the whole pipeline in a variable:
var=$(ls *.ext | sed s/..ext/\*/g | sort -u)
this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk