bash grep 'random matching' string - bash

Is there a way to grab a 'random matching' string via bash from a text file?
I am currently grabbing a download link via bash, curl & grep from a online text file.
Example:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")"
from online text file which contains
http://alphaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
where $VARIABLE is something the user selected.
Works great, but i wanted to add some mirrors to the text file.
So when the variable 'banana' is selected, text file which i grep contains:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
the code should pick a random 'banana' string and store it as the 'DOWNLOADSTRING' variable.
the current code above can only work with 1 string in the text file, since it grabs everything 'banana'.
What this is for; i wanted to add some mirror downloadlinks for the files in the online text file, and the current code doesn't allow that.
Can i let grep grab one random 'banana' string? (and not all of them)

See this question to see how to get a random line after grep. rl seems like a good candidate
What's an easy way to read random line from a file in Unix command line?
then do a grep ... | rl | head -n 1

Try this:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")" |
sort -R | head -1
The output will be random-sorted and then the first line will be selected.

If mirrors.txt has the following data, which you provided in your question:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
Then you can use the following command to get a random "matched string" from the file:
grep -E "${VARIABLE}" mirrors.txt | shuf -n1
Then you can store it as the variable DOWNLOADSTRING by setting it's value with a function call like so:
rand_mirror_call() { grep -E "${1}" mirrors.txt | shuf -n1; }
DOWNLOADSTRING="$(rand_mirror_call ${VARIABLE})"
This will give you a dedicated random line from the text file based on the user's ${VARIABLE} input. It is a lot less typing this way.

Related

How to use grep/awk/sed to print until a certain character?

I am a complete beginner on shell scripting and I am trying to iterate through a set of JSON files and trying to extract a certain field out of it. Each JSON file has a "country:"xxx" field. In each JSON file, there are 10k of the same field with the same country name so I need only the first occurrence and I can do that using "-m 1".
I tried to use grep for this but could not figure out how to extract the whole field including the country name from each file at first occurrence.
for FILE in *.json;
do
grep -o -a -m 1 -h -r '"country":"' $FILE;
done
I tried to use another pipe and use the below pattern but it did not work
| egrep -o '^[^"]+'
Actual Output:
"country":"
"country":"
"country":"
Desired Output:
"country:"romania"
"country:"united kingdom"
"country:"tajikistan"
but I need the whole thing. Any help would be great. Thanks
There is one general answer on the question "I only want the first occurence", and that answer is:
... | head -n 1
This mean, whatever your do: take the head (the first lines), the -n switch gives you the possibility to say how many you want (one in this case).
The same can be done for the last occurence(s), but then you use tail instead of head (you can also use the -n switch).
After trying many things. I found the pattern I was looking for.
grep -Po '"country":.*?[^\\]",' $FILE | head -n 1;

How to create argument variable in bash script

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory.
I was trying to assign n and the name of sub-directory into arguments like $1, $2.
Current directory: Greetings
Sub-directory: language_files, others
Sub-directory: English, German, French
Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….
I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.
For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:
langs=$1
n=$2
for langs in language_files/;
Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"
The result I wanted was:
$ ./script1.sh English 2
The file has 1100 bytes!
The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.
no need for looping
find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"
for byte counting you should use -c, since -m is for char counting (it may be the same for you).
You don't use the loop variable in the script anyway.
Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:
count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')
That should give you the $count you need. Then you can echo it however you like.
EXPLANATION
If you wish to learn how it works:
The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)
If you wish to be cleverer, then you can optimize as #karafka has done.

Defining a variable using head and cut

might be an easy question, I'm new in bash and haven't been able to find the solution to my question.
I'm writing the following script:
for file in `ls *.map`; do
ID=${file%.map}
convertf -p ${ID}_par #this is a program that I use, no problem
NAME=head -n 1 ${ID}.ind | cut -f1 -d":" #Now: This step is the problem: don't seem to be able to make a proper NAME function. I just want to take the first column of the first line of the file ${ID}.ind
It gives me the return
line 5: bad substitution
any help?
Thanks!
There are a couple of issues in your code:
for file in `ls *.map` does not do what you want. It will fail e.g. if any of the filenames contains a space or *, but there's more. See http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29 for details.
You should just use for file in *.map instead.
ALL_UPPERCASE names are generally used for system variables and built-in shell variables. Use lowercase for your own names.
That said,
for file in *.map; do
id="${file%.map}"
convertf -p "${id}_par"
name="$(head -n 1 "${id}.ind" | cut -f1 -d":")"
...
looks like it would work. We just use $( cmd ) to capture the output of a command in a string.

How to extract a string at end of line after a specific word

I have different location, but they all have a pattern:
some_text/some_text/some_text/log/some_text.text
All locations don't start with the same thing, and they don't have the same number of subdirectories, but I am interested in what comes after log/ only. I would like to extract the .text
edited question:
I have a lot of location:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
Just to show you that I don't know what they are, the user enters these location, so I have no idea what the user enters. The only I know is that it always contains log/ followed by the name of the file.
I would like to extract the type of the file, whatever string comes after the dot
THe only i know is that it always contains log/ followed by the name
of the file.
I would like to extract the type of the file, whatever string comes
after the dot
based on this requirement, this line works:
grep -o '[^.]*$' file
for your example, it outputs:
text
You can use bash built-in string operations. The example below will extract everything after the last dot from the input string.
$ var="some_text/some_text/some_text/log/some_text.text"
$ echo "${var##*.}"
text
Alternatively, use sed:
$ sed 's/.*\.//' <<< "$var"
text
Not the cleanest way, but this will work
sed -e "s/.*log\///" | sed -e "s/\..*//"
This is the sed patterns for it anyway, not sure if you have that string in a variable, or if you're reading from a file etc.
You could also grab that text and store in a sed register for later substitution etc. All depends on exactly what you are trying to do.
Using awk
awk -F'.' '{print $NF}' file
Using sed
sed 's/.*\.//' file
Running from the root of this structure:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
This seems to work, you can skip the echo command if you really just want the file types with no record of where they came from.
$ for DIR in *; do
> echo -n "$DIR "
> find $DIR -path "*/log/*" -exec basename {} \; | sed 's/.*\.//'
> done
f l
s p
t h

Shell scripting - how to properly parse output, learning examples.

So I want to automate a manual task using shell scripting, but I'm a little lost as to how to parse the output of a few commands. I would be able to this in other languages without a problem, so I'll just explain what I'm going for in psuedo code and provide an example of the cmd output I'm trying to parse.
Example of output:
Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'
What I need to parse out is '2167467'. So what I want to do is split on spaces and take element 1 to use in another command. The output of my next command looks like this:
Change 2167463 by user1234#filename on 2012/02/13 18:10:15
description of submission
Affected files ...
... //filepath/dir1/dir2/dir3/filename#2298 edit
I need to parse out '//filepath/dir1/dir2/dir3/filename#2298' and use that in another command. Again, what I would do is remove the blank lines from the output, grab the 4th line, and split on space. From there I would grab the 1st element from the split and use it in my next command.
How can I do this in shell scripting? Examples or a point to some tutorials would be great.
Its not clear if you want to use the result from the first command for processing the 2nd command. If that is true, then
targString=$( cmd1 | awk '{print $2}')
command2 | sed -n "/${targString}/{n;n;n;s#.*[/][/]#//#;p;}"
Your example data has 2 different Chg values in it, (2167467, 2167463), so if you just want to process this output in 2 different ways, its even simpler
cmd1 | awk '{print $2}'
cmd2 | sed -n '/Change/{n;n;n;s#.*[/][/]#//#;p;}'
I hope this helps.
I'm not 100% clear on your question, but I would use awk.
http://www.cyberciti.biz/faq/bash-scripting-using-awk/
Your first variable would look something like this
temp="Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'"
To get the number you want do this:
temp=`echo $temp | cut -f2 -d" "`
Let the output of your second command be saved to a file something like this
command $temp > file.txt
To get what you want from the file you can run this:
temp=`tail -1 file.txt | cut -f2 -d" "`
rm file.txt
The last block of code gets the last nonwhite line of the file and delimits on the second set of white spaces

Resources