Understanding the sed command - bash

I'm trying to change every first line in all files contained in a parent directory so that they inherit the pathname of the directory that they're in.
For example I have a file with the format:
2000-01-18
Tuesday
Livingston
42178
This particular file is in a directory named 18, inside another directory named 01, which is in another directory named 2000, which is in a directory called filesToSort.
I managed to use this code as a console command to change the first line of the file:
perl -pi -w -e 's/2000-01-18/Test/g;' ff_1177818640
This changed the file to
Test
Tuesday
Livingston
42178
Is it possible for me to change the "date" in this command to select all dates, I tried to use it like this:
perl -pi -w -e 's/*/Test/g;' ff_1177818640
But it didn't like that at all.
My current though process is that if I can make this command select all dates in the initial input, then some how find a way to implement the pathname into the second part where I currently have "Test" using something like this:
path=/filesToSort/2000/01/18/ff_1177818640
file=$(basename "$path")
I should in theory be able to run this entire code through my parent directory and all sub directories, therefore changing every date value in the files, which apear on line 1 of every single file, in these directories to mirror the file path that they're in, effectively turning a file that looks like this:
2000-xx-18
Tuesday
Livingston
42178
Contained in directory /filesToSort/2000/01/18
into this:
2000/01/18
Tuesday
Livingston
42178
I'm not sure if I'm just using the sed command wrong here and that there is another command that I should be using instead but I've been trying to get this to work for 4 hours now and I can't seem to nail it.
Thanks in advance for the help!

Looks to me that what you want to do is basically translate "-" for "/". You could, find the file, take a backup of it (always a good idea) and then use :
sed 's|-|/|g' <path/backup_file >path/modified_file
That said, if it is the same assignement, you could use that command to copy to its new directory and modified the file at the same time.

You haven't posted a sed command, so it's hard to know what will work. Let's take this in small steps. Try this:
sed -i '1s/^/X/' ff_1177818640
and see if that modifies the file (adding 'X' to the beginning of the first line). If your version of sed doesn't like that, try this:
sed -i "" '1s/^/X/' ff_1177818640
Once you have the syntax working, we must tackle the problem of converting a path into a date. Try this:
echo some/path/filesToSort/2000/01/18/ff_1177818640 | sed 's|.*/filesToSort/||; s|/[^/]*$||'
If that produces "2000/01/18", post a comment, and we can put it all together.
EDIT: putting it all together. Abracadabra!
find . -type f -exec sed -i "1{s|.*|{}|;s|.*/filesToSort/||;s|/[^/]*$||;}" {} \;

Related

Bash: Identifying file based on part of filename

I have a folder containing paired files with names that look like this:
PB3999_Tail_XYZ_1234.bam
PB3999_PB_YWZ_5524.bam
I want to pass the files into a for loop as such:
for input in `ls PB*_Tail_.bam`; do tumor=${input%_Tail_*.bam}; $gatk Mutect2 -I $input -I$tumor${*}; done
The issue is, I can't seem to get the syntax right for the tumor input. I want it to recognise the paired file by the first part of the name PB3999_PB while ignoring the second half of the file name _YWZ_5524 that does not match.
Thank you for any help!
Just replaced ${*} with * and added _PB_ suffix to the prefix, to the script in the question. And, renamed variables.
for tailfname in PB*_Tail_*.bam; do
pairprefix="${tailfname%_Tail_*.bam}"
echo command with ${tailfname} ${pairprefix}_PB_*.bam
done
Hope this helps. The name tumor sounds scary. Hope the right files are paired.
I'm trying to fully understand what you want to do here.
If you want to extract just the first two parts, this should do:
echo "PB3999_Tail_XYZ_1234.bam" | cut -d '_' -f 1-2
That returns just the "PB3999_Tail" part.

Linux sed command that generates a new file on every regex match

I have the following Linux command which I am using to extract data from one very large log file.
sed -n "/<trade>/,/<\/trade>/p" Large.log > output.xml
However, the output is generated in a single file output.xml. My intention is to create a new file every time the "/<trade>/,/<\/trade>/p" is matched. Every new file will be named after the <id> tag which is inside the <trade> </trade> tags.
Something likes this...
sed -n "/<trade>/,/<\/trade>/p" Large.log > "/<id>/,/<\/id>/p".xml
However, that, of course, does not work and I am not sure how to apply a regex as a naming rule.
P.S At this point, I am also not sure if I should use sed or maybe I should try achieving this with awk

Find & Replace Multiple Sequence Headers in Multiple FASTA Files

Here's my problem (using a Mac OS X):
I have about 35 FASTA files with 30 sequences in each one. Each FASTA file represents a gene, and they all contain the same individuals with the same sequence headers in each file. The headers are formatted as "####_G_species," with the numbers being non-sequential. I need to go through every file and change 4 specific headers, while also keeping the output as 35 discrete files with the same names as their corresponding input files, preferably depositing the outputs into a separate subdirectory.
For example: Every file contains a "6934_Sergia_sp," and I need to change
every instance of that name in all of the 35 files to "6934_R_robusta." I need to do the same with "8324_Sergestes_sp," changing every instance in every file to "8324_P_vigilax." Rinse and repeat 2 more times with different headers. After changing the headers, I need to have 35 discrete output files with the same names as their corresponding input files.
What I've found so far that seems to show the most promise is from the following link:
https://askubuntu.com/questions/84007/find-and-replace-text-within-multiple-files
using the following script:
find /home/user/directory -name \*.c -exec sed -i "s/cybernetnews/cybernet/g" {} \;
Changing the information to fit my needs, I get a script like this:
find Path/to/my/directory -name \*.fas -exec sed -i 's/6934_Sergia_sp/6934_R_robusta/g' {} \;
Running the script like that, I get and "undefined label" error. After researching,
https://www.mkyong.com/mac/sed-command-hits-undefined-label-error-on-mac-os-x/
I found that I should add '.fas' after -i giving:
find Path/to/my/directory -name \*.fas -exec sed -i '.fas' 's/6934_Sergia_sp/6934_R_robusta/g' {} \;
because on Macs you need to specify an extension for the output files. Running the script like that, I get very nearly what I'm looking for with each input file being duplicated, the correct header in each being correctly substituted for the new name, and the outputs being placed in the same directory. However, this only substitutes one header at a time, and the output files have a .fas.fas extension.
Moving forward, I would have to rename the output files to remove the second " .fas " in the extension, and rewrite and rerun the script 3 more times to get everything changed how I want it, which wouldn't be the end of the world, but definitely wouldn't be ideal.
Is it possible to set up a script so that I can run all 4 substitutions at the same time, while also exporting the outputs to a new subdirectory?
Your approach is good, but I would prefer a more verbose approach where I don't have to fight so much with the quotes. Something like:
for fasta in $(find Path/to/my/directory -name "*.fas")
do
new_fasta=$(basename $fasta .fas).new.fas
sed 's/6934_Sergia_sp/6934_R_robusta/g; s/Another_substitution/Another_result/' $fasta > $new_fasta
done
Here, you fed the list of FastA file to loop over, you compute a new fasta name (and location, if needed), and finally run sed over the input and leave the output in a new file. Observe that you can give more than one substitution in sed, separated by semicolons.
BTW, as #Ed Morton said, for the next question please, include a concise description of the problem and sample input and expected output.

OSX / MacOs batch rename hexadecimal filenames to decimal filenames

I want to rename filenames with a hexadecimal part in the name to decimal. For example: MOV12B.MOD, MOV12C.MOD etc. To MOV299.mod, MOV300.MOD.
Can this be done in terminal?
It is possible to rename the extension using:
find . -name "*.MOD" -exec rename 's/\.MOD$/.MPG/' '{}' \;
But how can I rename the files to decimal?
Sure, you can do it with rename, also known as Perl rename and prename which is most simply installed on macOS with homebrew using:
brew install rename
Then the command is:
rename --dry-run 's/[0-9A-F]+/hex($&)/e' *MOD
Sample Output
'MOV10.MOD' would be renamed to 'MOV16.MOD'
'MOV12B.MOD' would be renamed to 'MOV299.MOD'
'MOV12C.MOD' would be renamed to 'MOV300.MOD'
'MOVBEEF.MOD' would be renamed to 'MOV48879.MOD'
If you like what it does, remove the --dry-run part and do it for real.
I would recommend you make a backup before trying this anyway, because if your films are actually named "Film 23.MOD" rather than "MOV12B.MOD" you will get:
'Film 23.MOD' would be renamed to '15ilm 23.MOD'
If you want to put the date in too, you can do:
rename --dry-run 's/[0-9A-F]+/hex($&)/e; s|.MOD| 17/01/2018.MOD|' *MOD
Sample Output
'MOV12A.MOD' would be renamed to 'MOV298 17/01/2018.MOD'
Why couldn't you find it in the man-page? Well, there is a line in there that casually says you can pass a line of Perl code to modify the name. That means that the entire Perl language is available to you - so you could write several pages of code that access a database, run something on a remote machine, or fetch a URL in order to rename your file.
The only tricky thing in my code is the e lurking at the end:
s/search/replace/e
The e means that the second half of the search/replace is actually executed so it is not a straight textual replacement, it is a new program that gets the search string from the left-hand side in $& and can do maths or lookups on it.
I have done some other answers that involve similar techniques...
here,
here,
here.
If you want to put the modification time of the file into its name as well, you need to do a little more work. First, stat() the file before changing its name ;-) Remember you receive the original filename in $_. Then do the the hex to decimal thing, then add in the mtime. Remember Perl uses a dot to concatenate strings together.
So, the command is going to look like this:
rename --dry-run 'my $mtime=(stat($_))[9]; s/[0-9A-F]+/hex($&) . " " . $mtime/e;' *MOD
Sample Output
'MOV12A.MOD' would be renamed to 'MOV298 1516229449.MOD'
If all the substitution and evaluation gets too much, you can always do all your calculations and assign the result to Perl's $_ variable through which you receive the into filename and in which you pass the desired name back to rename. So, for an example:
rename --dry-run 'my $prefix="PREFIX "; my $middle=$_; my $suffix=" SUFFIX"; $_=$prefix . $middle . $suffix;' *MOD
'MOV12A.MOD' would be renamed to 'PREFIX MOV12A.MOD SUFFIX'
Only a real programmer would store his movies with hex names - kudos to you!

Delete matching line(s) from a list of files

I've just been writing some code, that renders a locale property redundant. Because of that, I'd like it to be able to remove that property from every locale file of the project, but I simply couldn't find a way of doing that, and I ended up doing it by hand in vim.
Now, I'm no UNIX black-belt, but I know that there must be a pretty simple solution to such a trivial problem, probably hidden in the depths of sed or awk. So I managed to match the property (the property being no_outline):
sed -e '/no_outline=/d' l10n/*/viewer.properties
But this only prints out the contents of each file, without the no_outline line. Isn't it possible to write the "result" of the sed command to the same file as it was executed on?
you could:
sed -i '/no_outline=/d' l10n/*/viewer.properties
from man page:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)

Resources