remove first text with shell script - bash

Please someone help me with this bash script,
lets say I have lots of files with url like below:
https://example.com/x/c-ark4TxjU8/mybook.zip
https://example.com/x/y9kZvVp1k_Q/myfilename.zip
My question is, how to remove all other text and leave only the file name?
I've tried to use the command described in this url How to delete first two lines and last four lines from a text file with bash?
But since the text is random which means it doesn't have exact numbers the code is not working.

You can use the sed utility to parse out just the filenames
sed 's_.*\/__'

You can use awk:
The easiest way that I find:
awk -F/ '{print $NF}' file.txt
or
awk -F/ '{print $6}' file.txt
You can also use sed:
sed 's;.*/;;' file.txt
You can use cut:
cut -d'/' -f6 file.txt

Related

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt
I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Reading numbers from a text line in bash shell

I'm trying to write a bash shell script, that opens a certain file CATALOG.dat, containing the following lines, made of both characters and numbers:
event_0133_pk.gz
event_0291_pk.gz
event_0298_pk.gz
event_0356_pk.gz
event_0501_pk.gz
What I wanna do is print the numbers (only the numbers) inside a new file NUMBERS.dat, using something like > ./NUMBERS.dat, to get:
0133
0291
0298
0356
0501
My problem is: how do I extract the numbers from the text lines? Is there something to make the script read just the number as a variable, like event_0%d_pk.gz in C/C++?
A grep solution:
grep -oP '[0-9]+' CATALOG.dat >NUMBERS.dat
A sed solution:
sed 's/[^0-9]//g' CATALOG.dat >NUMBERS.dat
And an awk solution:
awk -F"[^0-9]+" '{print $2}' CATALOG.dat >NUMBERS.dat
There are many ways that you can achieve your result. One way would be to use awk:
awk -F_ '{print $2}' CATALOG.dat > NUMBERS.dat
This sets the field separator to an underscore, then prints the second field which contains the numbers.
Awk
awk 'gsub(/[^[:digit:]]/,"")' infile
Bash
while read line; do echo ${line//[!0-9]}; done < infile
tr
tr -cd '[[:digit:]\n]' <infile
You can use grep command to extract the number part.
grep -oP '(?<=_)\d+(?=_)' CATALOG.dat
gives output as
0133
0291
0298
0356
0501
Or
much simply
grep -oP '\d+' CATALOG.dat
You don't need perl mode in grep for this. BREs can do this.
grep -o '[[:digit:]]\+' CATALOG.dat > NUMBERS.dat

insert a blank line between every two lines in a file using shell, sed or awk

I have a file with many lines. I want to insert a blank line between each two lines
for example
original file
xfdljflsad
fjdiaopqqq
dioapfdja;
I want to make it as:
xfdljflsad
fjdiaopqqq
dioapfdja;
how to achieve this?
I want to use shell script, awk or sed for this?
thanks!
With sed, use
sed G input-file
If pilcrow is correct and you do not want an additional newline at the end of the file,
then do:
sed '$!G' input-file
Another alternative is to use pr:
pr -dt input-file
awk '{print nl $0; nl="\n"}' file
My approach if I want to quickly regex a file.
vim file.txt
%s/\n/\n\n/g
Idiomatic awk:
awk 1 ORS='\n\n' file
Similar thing with perl:
perl -nE 'say' file
Append | head -n -1 if final newline is unwanted.

Basic stream/sed? bash script, perform substring on each line

I know this is basic, but I couldn't find the simplest way to iterate through a file with hundreds of lines and extract a substring.
If I have a file:
ABCY uuuu
UNUY uuuu
...
I want to end up with:
uuuu
uuuu
....
Ideally do a substring
{5} detect at character 5 and output that
You need no sed:
cut -c5-9 yourfile
It would be easier to use cut or awk. Assuming that your fields are separated by a space and you want the second field, you can use:
cut -d' ' -f2 file.txt
awk '{print $2}' file.txt
You can also use cut and awk to extract substrings:
cut -c6- file.txt
awk '{print substr($0,6);}' file.txt
However, if you really want to iterate through the file and extract substrings, you can use a while loop:
while IFS= read -r line
do
echo ${line:5}
done < file.txt
if you really love sed, you could try:
sed -r 's/^.{5}//' file

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?
Or just use one command:
cut -d '|' -f FIELDNUMBER
The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...
The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file
Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'
And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Resources