How can I find text after some string over bash - bash

I have this bash script and works
DIRECTORY='1.20_TRUNK/mips-tuxbox-oe1.6'
# Download html page and save to tmp folder to ump.tmp file
wget -O 'ump.tmp' 'http://download.oscam.cc/index.php?&direction=0&order=mod&directory=$DIRECTORY&'
ft='index.php?action=downloadfile&filename=oscam-svn'
st='-webif-Distribution.tar.gz&directory=$DIRECTORY&'
File ump.tmp containts e.g. three links
I need find solution for find only number 10082 in first "a" links of the page. But this number is amended. When you run the script e.g per month, it may be different
I do not have the "cat" command. I have receiver and not linux. Receiver have enigma system and "cat" isnĀ“t implemented
I tested through comparison "sed", but it does not work.
sed -n "/filename=oscam-svn/,/-mips-tuxbox-webif/p" ump.tmp

Using a proper XHTML parser :
$ xmllint --html --xpath '//a/#href[contains(., "downloadfile")]' ump.tmp 2>/dev/null |
grep -oP "oscam-svn\K\d+"
But there's not this string in the given HTML file

"Find" is kind of vague, but you can use grep to get the link with the number 10082 in it from the temp file.
$ grep "10082" ump.tmp

Related

how to copy all the URLs of a certain column of a web page?

I want to import several number of files into my server using wget , the 492 files are here:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736
so I want to copy the URL of all files in "File Name" column to save them into a file and import them with wget.
So how can I copy all those URLs from that column ?
Thanks for reading :)
Since you've tagged bash, this should work.
wget -O- is used to output the data to the standard output, where it's greppable. (curl would do that by default.)
grep -oE is used to capture the URLs (which happily are in a regular enough format that a simple regexp works).
Then, wget -i is used to read URLs from the file generated. You might wish to add -nc or other suitable partial-fetch flags; those files are pretty hefty.
wget -O- https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 | grep -oE 'http://ftp.sra.ebi.ac.uk/[^"]+' > urls.txt
wget -i urls.txt
First, I recommend using a more specific and robust implementation...
but, in the case you are against a wall and in a hurry -
$: curl -s https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 |
sed -En '/href="http:\/\/.*clean.fastq.gz"/{s/^.*href="([^"]+)".*/\1/;p;}' |
while read url; do wget "$url"; done
This is a quick and dirty rough first pass, but it will give you something to work with.
If you aren't in a screaming hurry, try writing something more robust and step-wise in perl or python.

Use sed find ID in txt file and use ID to rename file

Using wget, a webpage is downloaded as a .txt file. This file saved is named using part of the url of the webpage, eg. wget http://www.example.com/page/12345/ -O 12345.txt, for convenience.
I am running commands from a shell script .sh file, as it can execute multiple commands, one line at time, eg.
After a file is downloaded, I use sed to parse for text / characters I want to keep. Part of the text I want includes blah blah Product ID a5678.
What I want is to use sed to find a5678 and use this to rename the file 12345.txt to a5678.txt.
# script.sh
wget http://www.example.com/page/12345/ -O 12345.txt
sed -i '' 's/pattern/replace/g' 12345.txt
sed command to find a5678 # in line blah blah Product ID a5678
some more sed commands
mv 12345.txt a5678.txt (or use a variable $var.txt)?
How do I do this?
I may also want to use this same ID a5678 and create a folder with the same name a5678. Hence the .txt file is inside the folder like so /a5678/a5678.txt.
mkdir a5678 (or mkdir $var)? && cd a5678
I've searched for answers for half a day, but can't find any. The closest I found is
Find instance of word in files and change it to the filename but it is the exact opposite of what I want. I've also thought about using variables eg. https://askubuntu.com/questions/76808/how-do-i-use-variables-in-a-sed-command but I don't know how to save the found characters as a variable.
Very much look forward to some help! Thank you! I am on a Mac running Sierra.
Trying to minimize, so fit this into your logic.
in=12345.txt
out=$( grep ' Product ID ' $in | sed 's/.* Product ID \([^ ]*\) .*/\1/' )
mkdir -p $out
mv $in $out/$out.txt
Thank you all! With your inspiration, I solved my problem by (without using grep):
in=12345
out=$(sed -n '/pattern/ s/.*ID *//p' $in.txt)
mv $in.txt $out.txt
cd ..
mv $in $out

How to extract a string at end of line after a specific word

I have different location, but they all have a pattern:
some_text/some_text/some_text/log/some_text.text
All locations don't start with the same thing, and they don't have the same number of subdirectories, but I am interested in what comes after log/ only. I would like to extract the .text
edited question:
I have a lot of location:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
Just to show you that I don't know what they are, the user enters these location, so I have no idea what the user enters. The only I know is that it always contains log/ followed by the name of the file.
I would like to extract the type of the file, whatever string comes after the dot
THe only i know is that it always contains log/ followed by the name
of the file.
I would like to extract the type of the file, whatever string comes
after the dot
based on this requirement, this line works:
grep -o '[^.]*$' file
for your example, it outputs:
text
You can use bash built-in string operations. The example below will extract everything after the last dot from the input string.
$ var="some_text/some_text/some_text/log/some_text.text"
$ echo "${var##*.}"
text
Alternatively, use sed:
$ sed 's/.*\.//' <<< "$var"
text
Not the cleanest way, but this will work
sed -e "s/.*log\///" | sed -e "s/\..*//"
This is the sed patterns for it anyway, not sure if you have that string in a variable, or if you're reading from a file etc.
You could also grab that text and store in a sed register for later substitution etc. All depends on exactly what you are trying to do.
Using awk
awk -F'.' '{print $NF}' file
Using sed
sed 's/.*\.//' file
Running from the root of this structure:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
This seems to work, you can skip the echo command if you really just want the file types with no record of where they came from.
$ for DIR in *; do
> echo -n "$DIR "
> find $DIR -path "*/log/*" -exec basename {} \; | sed 's/.*\.//'
> done
f l
s p
t h

Trying to extract a number from a plist with grep

I'll try and make this question short. Basically, I am working on a shell script, and I have a .plist file containing an integer value that I am trying to "extract" and put into a variable in my shell script.
I'm able to refine the contents of the .plist file to a few lines, but I am still getting a bunch of characters I don't need.
I am delcaring / running the following command in my shell script, and it is giving me the following results.
file_refine=`grep -C 2 CFBundleVersion $file | grep '[0-9]\{3\}'`
Output
<string>645</string>
I just need the numeral digits not the string tags, but I can't seem to figure that out.
Try this
file_refine=$(grep -C 2 CFBundleVersion $file | grep -o '[0-9]\{3\}')
the -o option from grep man page:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.

bash grep 'random matching' string

Is there a way to grab a 'random matching' string via bash from a text file?
I am currently grabbing a download link via bash, curl & grep from a online text file.
Example:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")"
from online text file which contains
http://alphaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
where $VARIABLE is something the user selected.
Works great, but i wanted to add some mirrors to the text file.
So when the variable 'banana' is selected, text file which i grep contains:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
the code should pick a random 'banana' string and store it as the 'DOWNLOADSTRING' variable.
the current code above can only work with 1 string in the text file, since it grabs everything 'banana'.
What this is for; i wanted to add some mirror downloadlinks for the files in the online text file, and the current code doesn't allow that.
Can i let grep grab one random 'banana' string? (and not all of them)
See this question to see how to get a random line after grep. rl seems like a good candidate
What's an easy way to read random line from a file in Unix command line?
then do a grep ... | rl | head -n 1
Try this:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")" |
sort -R | head -1
The output will be random-sorted and then the first line will be selected.
If mirrors.txt has the following data, which you provided in your question:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
Then you can use the following command to get a random "matched string" from the file:
grep -E "${VARIABLE}" mirrors.txt | shuf -n1
Then you can store it as the variable DOWNLOADSTRING by setting it's value with a function call like so:
rand_mirror_call() { grep -E "${1}" mirrors.txt | shuf -n1; }
DOWNLOADSTRING="$(rand_mirror_call ${VARIABLE})"
This will give you a dedicated random line from the text file based on the user's ${VARIABLE} input. It is a lot less typing this way.

Resources