how to search goosh.org from the command line? - bash

I'd like to write a simple bash script to search goosh.org from the command line and open the results in a browser. (Or maybe you wouldn't even call this a bash script at all since all it does is make a browser call.)
I created a file called goosh that contains one line:
open "http://goosh.org/#$1"
This works, without the need to enclose the search in quotation marks, when the search term is just one word:
$ goosh monarch
But it fails when I use multiple search terms:
$ goosh monarch butterfly
This doesn't work:
open "http://goosh.org/#$1 $2 $3 $4 $5 $6 $7 $8 $9"
How would I do this?

Spaces, hex value 0x20, are converted to %20 in a browser. You probably want to use sed on the input values:
echo 1 2 3 4 5 6 7 8 | sed -e 's/ /%20/'
Will give you an output of
1%202%203%204%205%206%207%208
I'd suggest that you parse your script's "$#" inputs (everything provided to the script) in the same manner and see if that doesn't give you the results you're looking for.
EDIT:
Here's a simple function you can try:
function goosh() {
ARGS=$(echo "$#" | sed -e 's/ /%20/g');
echo "http://www.goosh.org/#$ARGS";
}
This won't actually call the site; it will just dump the output to your console (so you can make sure the variables are correct). Simply replace echo with open and it should be fine.
Example:
$ goosh When will Apple buy stock in Microsoft
http://www.goosh.org/#When%20will%20Apple%20buy%20stock%20in%20Microsoft

Related

How do I trim whitespace, but not newlines, from subshell output in bash?

There are many tens, maybe a hundred or more previous questions that seem "identical" to this already here, but after extensive search, I found NOTHING that even came close to working - though I did learn quite a lot - and so I decided to just RTFM and figure this out on my own.
The Problem
I wanted to search the output of a ps auxwww command to find processes of interest, and the issue was that I can't just simply use cut to find the exact data from them that I wanted. ps, it turns out, tries to columnate the output, adding either extra spaces or tabs that get in the way of using cut to get the correct data.
So, since I'm not a master at bash, I did a search... The answers I found were all focused on either variables - a "backup strategy" from my point of view that itself didn't solve the whole problem - or they only trimmed leading or trailing space or all "whitespace" including newlines. NOPE, Won't Work For Cut! And, neither will removing trailing newlines and so forth.
So, restated, the question is, how do we efficiently end up with the white space defined as simply a single space between other characters without eliminating newlines?
Below, I will give my answer, but I welcome others to give theirs - who knows, maybe someone has a better answer?!
Answer:
At least MY answer - please leave your own, too! - was to do this:
ps auxwww | grep <program> | tr -s [:blank:] | cut -d ' ' -f <field_of_interest>
This worked great!
Obviously, there are many ways to adapt this to other needs.
As an alternative to all of the pipes and grep with cut, you could simply use awk. The benefit of using awkwith the default field-separator (FS) being set to break on whitespace is that it considers any number of whitespace between fields as a single separator.
So using awk will do away with needing to use tr -s to "squeeze" whitespace to define fields. Further, awk gives far greater control over field matching using regular expressions rather than having to rely on grep of a full line and cut to locate a pre-determined field numbers. (though to some extent you will still have to tell awk what field out of the ps command you are interested in)
Using bash, you can also eliminate the pipe | by using process substitution to send the output of ps auxwww to awk on stdin using redirection, e.g. awk ... < <(ps auxwww) for a single tidy command line.
To get your "program" and "file_of_interest" into awk you have two options. You can initialize awk variables using the -v var=value option (there can be multiple -v otions given), or you can use the BEGIN rule to initialize the variables. The only difference being with -v you can provide a shell variable for value and there is no whitespace allowed surrounding the = sign, while within BEGIN any whitespace is ignored.
So in your case a couple of examples to get the virtual memory size for firefox processes, you could use:
awk -v prog="firefox" -v fnum="5" '
$11 ~ prog {print $fnum}
' < <(ps auxwww)
(above if you had myprog=firefox as a shell variable, you could use -v prog="$myprog" to initialize the prog variable for awk)
or using the BEGIN rule, you could do:
awk 'BEGIN {prog = "firefox"; fnum = "5"}
$11 ~ prog {print $fnum }
' < <(ps auxwww)
In each command above, it locates the COMMAND field from ps (field 11) and checks whether it contains firefox and if so it outputs field no. 5 the virtual memory size used by each process.
Both work fine as one-liners as well, e.g.
awk -v prog="firefox" -v fnum="5" '$11 ~ prog {print $fnum}' < <(ps auxwww)
Don't get me wrong, the pipeline is perfectly fine, it will just be slow. For short commands with limited output there won't be much difference, but when the output is large, awk will provide orders of magnitude improvement over having to tr and grep and cut reading over the same records three times.
The reason being, the pipes and the process on each side requires separate processes be spawned by the shell. So minimizes their use, improves the efficiency of what your script is doing. Now if the data is small as are the processes, there isn't much of a difference. However if you are reading a 3G file 3 times over -- that's is the difference in orders of magnitude. Hours verses minutes or seconds.
I had to use single quotes on CentosOS Linux to get tr working like described above:
ps -o ppid= $$ | tr -d '[:space:]'
You can reduce the number of pipes using this Perl one-liner, which uses Perl regexes instead of a separate grep process. This combines grep, tr and cut in a single command, with an easy way to manipulate the output (#F is the array of fields, 0-indexed):
Examples:
# Start an example process to provide the input for `ps` in the next commands:
/Applications/Emacs.app/Contents/MacOS/Emacs-x86_64-10_14 --geometry 109x65 /tmp/foo &
# Print single space-delimited output of `ps` for all emacs processes:
ps auxwww | perl -lane 'print "#F" if $F[10] =~ /emacs/i'
# Prints:
# bar 72144 0.0 0.5 4610272 82320 s006 SN 11:15AM 0:01.31 /Applications/Emacs.app/Contents/MacOS/Emacs-x86_64-10_14 --geometry 109x65 /tmp/foo
# Print emacs PID and file name opened with emacs:
ps auxwww | perl -lane 'print join "\t", #F[1, -1] if $F[10] =~ /emacs/i'
# Prints:
# 72144 /tmp/foo
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)

Delete everything before a pattern

I'm trying to clean a text file.
I want to delete everything start before the first 12 numbers.
1:0:135103079189:0:0:2:0::135103079189:000011:00
A:908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Output desired:
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Here's my command but seems not working.
sed '/:\([0-9]\{12\}\)/d' t.txt
the d command in sed will delete entire line on matching the given regex, you need to use s command to search and replace only part of line... however, for given problem, sed is not suitable as it doesn't support non-greedy regex
you can use perl instead
$ perl -pe's/^.*?(?=\d{12}:)//' ip.txt
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
.*? match zero or more characters as minimally as possible
(?=\d{12}:) only if it is followed by 12-digits ending with :
use perl -i -pe for in-place editing
some possible corner cases
$ # this is matching part of field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe's/^.*?(?=\d{12}:)//'
135103079189:23:603307102606:1
$ # this is not matching 12-digit field at end of line
$ echo 'foo:123:135103079189' | perl -pe's/^.*?(?=\d{12}:)//'
foo:123:135103079189
$ # so, add start/end of line matching cases and restrict 12-digits to whole field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe 's/^(?:.*?:)?(?=\d{12}(:|$))//'
603307102606:1
$ echo 'foo:123:135103079189' | perl -pe's/^(?:.*?:)?(?=\d{12}(:|$))//'
135103079189
Could you please try following.
awk --re-interval 'match($0,/[0-9]{12}/){print substr($0,RSTART)}' Input_file
Since I have OLD version of awk so I am using --re-interval you could remove it in case you have new version of it.
This might work for you (GNU sed):
sed -n 's/[0-9]\{12\}/\n&/;s/.*\n//p' file
We only want to print specific lines so use the -n option to turn off automatic printing. If a line contains a 12 digit number, insert a newline before it. Remove any characters before and including a newline and print the result.
If you want to print lines that do not contain a 12 digit number as is, use:
sed 's/[0-9]\{12\}/\n&/;s/.*\n//' file
The crux of the problem is to identify the start of a multi-character string, insert a unique marker and delete all characters before and including the unique marker. As sed uses the newline to delimit lines, only the user can introduce newlines into the pattern space and as a result, newlines will always be unique.
Taking the nice answer from #Sundeep, in case you would like to use grep or pcregrep (macOS/BSD) you could give a try to:
$ grep -oP '^(?:.*?:)?(?=\d{12})\K.*' file
or
$ pcregrep -o '^(?:.*?:)?(?=\d{12})\K.*' file
The \K will ignore everything after the pattern
Alternative thoughts - I almost think your data is too dirty for a quick sed fix but if generally it's all similar to your sample set of data then certainly pick one of the answers with sed etc. However if you wanted to be more particular about it you could build up a set of commands to ensure the values. I like doing this for debugging and when speed isn't urgent.
Take this tiny sample of code, you could do this other ways but I'm getting the value for each part of the string and I know the order because it contiguous. You could then set up controls on which parts to keep and such as it builds out say a new string per line. Overwrought for sure, but sometimes that is a better long term approach.
#!/bin/bash
while IFS= read -r line ;do
IFS=':' read -r -a array <<< "$line"
for ((i=0; i<${#array[#]}; i++)) ;do
echo "part : ${array[$i]}"
done
done < "test_data.txt"
You could then build the data back up how you wanted and more easily understand what's happening every step of the way ..
part : 1
part : 0
part : 135103079189
part : 0
part : 0
part : 2
part : 0
part :
part : 135103079189
part : 000011
part : 00
part : A
part : 908529896240
part : 0

Get first N chars and sort them

I have a requirement where i need to fetch first four characters from each line of file and sort them.
I tried below way. but its not sorting each line
cut -c1-4 simple_file.txt | sort -n
O/p using above:
appl
bana
uoia
Expected output:
alpp
aabn
aiou
sort isn't the right tool for the job in this case, as it used to sort lines of input, not the characters within each line.
I know you didn't tag the question with perl but here's one way you could do it:
perl -F'' -lane 'print(join "", sort #F[0..3])' file
This uses the -a switch to auto-split each line of input on the delimiter specified by -F (in this case, an empty string, so each character is its own element in the array #F). It then sorts the first 4 characters of the array using the standard string comparison order. The result is joined together on an empty string.
Try defining two helper functions:
explodeword () {
test -z "$1" && return
echo ${1:0:1}
explodeword ${1:1}
}
sortword () {
echo $(explodeword $1 | sort) | tr -d ' '
}
Then
cut -c1-4 simple_file.txt | while read -r word; do sortword $word; done
will do what you want.
The sort command is used to sort files line by line, it's not designed to sort the contents of a line. It's not impossible to make sort do what you want, but it would be a bit messy and probably inefficient.
I'd probably do this in Python, but since you might not have Python, here's a short awk command that does what you want.
awk '{split(substr($0,1,4),a,"");n=asort(a);s="";for(i=1;i<=n;i++)s=s a[i];print s}'
Just put the name of the file (or files) that you want to process at the end of the command line.
Here's some data I used to test the command:
data
this
is a
simple
test file
a
of
apple
banana
cat
uoiea
bye
And here's the output
hist
ais
imps
estt
a
fo
alpp
aabn
act
eiou
bey
Here's an ugly Python one-liner; it would look a bit nicer as a proper script rather than as a Bash command line:
python -c "import sys;print('\n'.join([''.join(sorted(s[:4])) for s in open(sys.argv[1]).read().splitlines()]))"
In contrast to the awk version, this command can only process a single file, and it reads the whole file into RAM to process it, rather than processing it line by line.

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources