How to get a bash shell without absolutely anything running silently before or after every command? - bash

My motivation is to test the trap '...' DEBUG and set -x commands, but there is just too much noise due to these silently running commands. So how can I run bash without any?
Passing --noprofile does not seem to achieve it.
mark#L-R910LPKW:~$ bash --noprofile
mark#L-R910LPKW:~$ set -x
++ __posh_git_ps1 '\u#\h:\w' '\$ '
++ local ps1pc_prefix=
++ local ps1pc_suffix=
++ case "$#" in
++ ps1pc_prefix='\u#\h:\w'
++ ps1pc_suffix='\$ '
+++ __posh_git_echo
++++ git config --bool bash.enableGitStatus
+++ '[' '' = false ']'
+++ local 'Red=\033[0;31m'
+++ local 'Green=\033[0;32m'
+++ local 'BrightRed=\033[0;91m'
+++ local 'BrightGreen=\033[0;92m'
+++ local 'BrightYellow=\033[0;93m'
+++ local 'BrightCyan=\033[0;96m'
++++ __posh_color '\e[m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\e[m\]'
+++ local 'DefaultForegroundColor=\[\e[m\]'
+++ local DefaultBackgroundColor=
+++ local 'BeforeText=['
++++ __posh_color '\033[0;93m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;93m\]'
+++ local 'BeforeForegroundColor=\[\033[0;93m\]'
+++ local BeforeBackgroundColor=
+++ local 'DelimText= |'
++++ __posh_color '\033[0;93m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;93m\]'
+++ local 'DelimForegroundColor=\[\033[0;93m\]'
+++ local DelimBackgroundColor=
+++ local 'AfterText=]'
++++ __posh_color '\033[0;93m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;93m\]'
+++ local 'AfterForegroundColor=\[\033[0;93m\]'
+++ local AfterBackgroundColor=
++++ __posh_color '\033[0;96m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;96m\]'
+++ local 'BranchForegroundColor=\[\033[0;96m\]'
+++ local BranchBackgroundColor=
++++ __posh_color '\033[0;92m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;92m\]'
+++ local 'BranchAheadForegroundColor=\[\033[0;92m\]'
+++ local BranchAheadBackgroundColor=
++++ __posh_color '\033[0;91m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;91m\]'
+++ local 'BranchBehindForegroundColor=\[\033[0;91m\]'
+++ local BranchBehindBackgroundColor=
++++ __posh_color '\033[0;93m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;93m\]'
+++ local 'BranchBehindAndAheadForegroundColor=\[\033[0;93m\]'
+++ local BranchBehindAndAheadBackgroundColor=
+++ local BeforeIndexText=
++++ __posh_color '\033[0;32m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;32m\]'
+++ local 'BeforeIndexForegroundColor=\[\033[0;32m\]'
+++ local BeforeIndexBackgroundColor=
++++ __posh_color '\033[0;32m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;32m\]'
+++ local 'IndexForegroundColor=\[\033[0;32m\]'
+++ local IndexBackgroundColor=
++++ __posh_color '\033[0;31m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;31m\]'
+++ local 'WorkingForegroundColor=\[\033[0;31m\]'
+++ local WorkingBackgroundColor=
++++ __posh_color '\033[0;91m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;91m\]'
+++ local 'StashForegroundColor=\[\033[0;91m\]'
+++ local StashBackgroundColor=
+++ local 'BeforeStash=('
+++ local 'AfterStash=)'
+++ local LocalDefaultStatusSymbol=
+++ local 'LocalWorkingStatusSymbol= !'
++++ __posh_color '\033[0;31m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;31m\]'
+++ local 'LocalWorkingStatusColor=\[\033[0;31m\]'
+++ local 'LocalStagedStatusSymbol= ~'
++++ __posh_color '\033[0;96m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\033[0;96m\]'
+++ local 'LocalStagedStatusColor=\[\033[0;96m\]'
++++ __posh_color '\e[0m'
++++ '[' -n '' ']'
++++ '[' -n '5.1.16(1)-release' ']'
++++ echo '\[\e[0m\]'
+++ local 'RebaseForegroundColor=\[\e[0m\]'
+++ local RebaseBackgroundColor=
++++ git config --get bash.branchBehindAndAheadDisplay
+++ local BranchBehindAndAheadDisplay=
+++ '[' -z '' ']'
+++ BranchBehindAndAheadDisplay=full
++++ git config --bool bash.enableFileStatus
+++ local EnableFileStatus=
+++ case "$EnableFileStatus" in
+++ EnableFileStatus=true
++++ git config --bool bash.showStatusWhenZero
+++ local ShowStatusWhenZero=
+++ case "$ShowStatusWhenZero" in
+++ ShowStatusWhenZero=false
++++ git config --bool bash.enableStashStatus
+++ local EnableStashStatus=
+++ case "$EnableStashStatus" in
+++ EnableStashStatus=true
++++ git config --bool bash.enableStatusSymbol
+++ local EnableStatusSymbol=
+++ case "$EnableStatusSymbol" in
+++ EnableStatusSymbol=true
+++ local BranchIdenticalStatusSymbol=
+++ local BranchAheadStatusSymbol=
+++ local BranchBehindStatusSymbol=
+++ local BranchBehindAndAheadStatusSymbol=
+++ local BranchWarningStatusSymbol=
+++ true
+++ BranchIdenticalStatusSymbol=' ≡'
+++ BranchAheadStatusSymbol=' ↑'
+++ BranchBehindStatusSymbol=' ↓'
+++ BranchBehindAndAheadStatusSymbol=↕
+++ BranchWarningStatusSymbol=' ?'
+++ __POSH_BRANCH_AHEAD_BY=0
+++ __POSH_BRANCH_BEHIND_BY=0
+++ local is_detached=false
++++ __posh_gitdir
++++ '[' -z '' ']'
++++ '[' -n '' ']'
++++ '[' -n '' ']'
++++ '[' -d .git ']'
++++ git rev-parse --git-dir
+++ local g=
+++ '[' -z '' ']'
+++ return
++ local gitstring=
++ PS1='\u#\h:\w\$ '
mark#L-R910LPKW:~$
I would like to witness empty output when running set -x by itself.
How can I achieve it?

env -i HOME="$HOME" USER="$USER" bash --noprofile --norc will start you with an empty environment (i.e. without inheriting anything from the parent environment) but explicitly setting $USER and $HOME. --norc and --noprofile tell bash not to load the startup files.
Alternately, as a quick hack, env -i bash will run bash with an empty environment — not even $HOME or $USER — so your .bash_profile, .bashrc etc will not be found, and nothing (except possibly global scripts) will affect your new pristine environment. Obviously, this will not go well if you depend on knowing who you are and where you should be.

Related

unable to stop looping in shellscript using FOR [duplicate]

I have a script with multiple loop commands using the same list. It looks like this:
# List of applications
read -r -d '' Applications << EndOfList
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
for file in $Applications
do
if [ -e "$file" ]; then
echo ""$file" found"
fi;
done
exit 1
This seems to work fine except for the fourth application in the list, because there's a space in the application name. If I run the script in debug mode, this is the output:
+ read -r -d '' Applications
+ for file in '$Applications'
+ '[' -e /Applications/App.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/App2.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/App3.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/Another ']'
+ for file in '$Applications'
+ '[' -e App.app ']'
+ exit 1
I've tried escaping with a backslash, quoting it and multiple other ways but I could not get it to work.
You should set IFS as \n while reading and use BASH array rather a simple variable to hold all the entries delimited by newlines:
#!/bin/bash
IFS=$'\n' read -r -d '' -a Applications <<'EndOfList'
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
for file in "${Applications[#]}"
do
if [[ -e "$file" ]]; then
echo "$file found"
fi;
done
PS: If you have BASH 4+ version then use mapfile:
mapfile -t Applications <<'EndOfList'
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
why you should use a list instead to get apps file name directly from directory?
If you should add new app in the future, you must update the script.
maybe this could be an idea to get file from dir:
I've created a directory Applications, and touched the 4th files in your script:
#!/bin/bash
# List of applications
for file in Applications/*.app
do
echo "file[$file]"
if [ -e "$file" ]; then
echo ""$file" found"
fi;
done
exit 1
output
[shell] ➤ ./tttttt
file[Applications/Another App.app]
Applications/Another App.app found
file[Applications/App.app]
Applications/App.app found
file[Applications/App2.app]
Applications/App2.app found
file[Applications/App3.app]
Applications/App3.app found

Bash loop command through list containing spaces

I have a script with multiple loop commands using the same list. It looks like this:
# List of applications
read -r -d '' Applications << EndOfList
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
for file in $Applications
do
if [ -e "$file" ]; then
echo ""$file" found"
fi;
done
exit 1
This seems to work fine except for the fourth application in the list, because there's a space in the application name. If I run the script in debug mode, this is the output:
+ read -r -d '' Applications
+ for file in '$Applications'
+ '[' -e /Applications/App.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/App2.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/App3.app ']'
+ for file in '$Applications'
+ '[' -e /Applications/Another ']'
+ for file in '$Applications'
+ '[' -e App.app ']'
+ exit 1
I've tried escaping with a backslash, quoting it and multiple other ways but I could not get it to work.
You should set IFS as \n while reading and use BASH array rather a simple variable to hold all the entries delimited by newlines:
#!/bin/bash
IFS=$'\n' read -r -d '' -a Applications <<'EndOfList'
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
for file in "${Applications[#]}"
do
if [[ -e "$file" ]]; then
echo "$file found"
fi;
done
PS: If you have BASH 4+ version then use mapfile:
mapfile -t Applications <<'EndOfList'
/Applications/App.app
/Applications/App2.app
/Applications/App3.app
/Applications/Another App.app
EndOfList
why you should use a list instead to get apps file name directly from directory?
If you should add new app in the future, you must update the script.
maybe this could be an idea to get file from dir:
I've created a directory Applications, and touched the 4th files in your script:
#!/bin/bash
# List of applications
for file in Applications/*.app
do
echo "file[$file]"
if [ -e "$file" ]; then
echo ""$file" found"
fi;
done
exit 1
output
[shell] ➤ ./tttttt
file[Applications/Another App.app]
Applications/Another App.app found
file[Applications/App.app]
Applications/App.app found
file[Applications/App2.app]
Applications/App2.app found
file[Applications/App3.app]
Applications/App3.app found

Save Zsh history to ~/.persistent_history

Recently I want to try Z shell in Mac. But I'd like to continue also saving the command history to ~/.persistent_history, which was what I did in Bash (ref).
However, the script in the ref link doesn't work under Zsh:
log_bash_persistent_history()
{
[[
$(history 1) =~ ^\ *[0-9]+\ +([^\ ]+\ [^\ ]+)\ +(.*)$
]]
local date_part="${BASH_REMATCH[1]}"
local command_part="${BASH_REMATCH[2]}"
if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
then
echo $date_part "|" "$command_part" >> ~/.persistent_history
export PERSISTENT_HISTORY_LAST="$command_part"
fi
}
run_on_prompt_command()
{
log_bash_persistent_history
}
PROMPT_COMMAND="run_on_prompt_command"
Is there anyone who can help me get it working? Many thanks!
After so much Googling, I finally found out the way to do this.
First, in ~/.zshrc, add the following options for history manipulation:
setopt append_history # append rather then overwrite
setopt extended_history # save timestamp
setopt inc_append_history # add history immediately after typing a command
In short, these three options will record every input_time+command to ~/.zsh_history immediately.
Then, put this function into ~/.zshrc:
precmd() { # This is a function that will be executed before every prompt
local date_part="$(tail -1 ~/.zsh_history | cut -c 3-12)"
local fmt_date="$(date -d #${date_part} +'%Y-%m-%d %H:%M:%S')"
# For older version of command "date", comment the last line and uncomment the next line
#local fmt_date="$(date -j -f '%s' ${date_part} +'%Y-%m-%d %H:%M:%S')"
local command_part="$(tail -1 ~/.zsh_history | cut -c 16-)"
if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
then
echo "${fmt_date} | ${command_part}" >> ~/.persistent_history
export PERSISTENT_HISTORY_LAST="$command_part"
fi
}
Since I use both bash and zsh, so I want a file that can save all their history commands. In this case, I can easily search all of them using "grep".
Can't comment yet (and this went beyond a simple correction), so I'll add this as an answer.
This correction to the accepted answer doesn't quite work when, for example, the last command took quite a bit of time to execute - you'll get stray numbers and ; in your command, like this:
2017-07-22 19:02:42 | 3;micro ~/.zshrc && . ~/.zshrc
This can be fixed by replacing the sed -re '1s/.{15}//' in command_part with a slightly longer gawk, which also avoids us a pipeline:
local command_part="$(gawk "
NR == $line_num_last {
pivot = match(\$0, \";\");
print substr(\$0, pivot+1);
}
NR > $line_num_last {
print;
}" ~/.zsh_history)"
It also has problems when dealing with multiline commands where one of the lines begin with :. This can be (mostly) fixed by replacing grep -ane '^:' ~/.zsh_history in line_num_last with grep -anE '^: [0-9]{10}:[0-9]*?;' ~/.zsh_history - I say mostly because a command could conceivably contain a string matching that expression. Say,
% naughty "multiline
> command
> : 0123456789:123;but a command I'm not
> "
Which will result in a clobbered record in ~/.persistent_history.
To fix this we need, in turn, to check whether the previous redord ends with \ (there might be other conditions but I'm not familiar yet with this history format), and if so try the previous match.
_get_line_num_last () {
local attempts=0
local line=0
while true; do
# Greps the last two lines that can be considered history records
local lines="$(grep -anE '^: [0-9]{10}:[0-9]*?;' ~/.zsh_history | \
tail -n $((2 + attempts)) | head -2)"
local previous_line="$(echo "$lines" | head -1)"
# Gets the line number of the line being tested
local line_attempt=$(echo "$lines" | tail -1 | cut -d':' -f1 | tr -d '\n')
# If the previous (possible) history records ends with `\`, then the
# _current_ one is part of a multiline command; try again.
# Probably. Unless it was in turn in the middle of a multi-line
# command. And that's why the last line should be saved.
if [[ $line_attempt -ne $HISTORY_LAST_LINE ]] && \
[[ $previous_line == *"\\" ]] && [[ $attempts -eq 0 ]];
then
((attempts+=1))
else
line=$line_attempt
break
fi
done
echo "$line"
}
precmd() {
local line_num_last="$(_get_line_num_last)"
local date_part="$(gawk "NR == $line_num_last {print;}" ~/.zsh_history | cut -c 3-12)"
local fmt_date="$(date -d #${date_part} +'%Y-%m-%d %H:%M:%S')"
# I use awk itself to split the _first_ line only at the first `;`
local command_part="$(gawk "
NR == $line_num_last {
pivot = match(\$0, \";\");
print substr(\$0, pivot+1);
}
NR > $line_num_last {
print;
}" ~/.zsh_history)"
if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
then
echo "${fmt_date} | ${command_part}" >> ~/.persistent_history
export PERSISTENT_HISTORY_LAST="$command_part"
export HISTORY_LAST_LINE=$((1 + $(wc -l < ~/.zsh_history)))
fi
}
The original answer is mostly good, but to handle multi-line commands that also contain the character ':' for example this works:
local line_num_last=$(grep -ane '^:' ~/.zsh_history | tail -1 | cut -d':' -f1 | tr -d '\n')
local date_part="$(gawk "NR == $line_num_last {print;}" ~/.zsh_history | cut -c 3-12)"
local fmt_date="$(date -d #${date_part} +'%Y-%m-%d %H:%M:%S')"
local command_part="$(gawk "NR >= $line_num_last {print;}" ~/.zsh_history | sed -re '1s/.{15}//')"
If you want to be able to source something that will add persistent history for both bash and zsh, try this:
# You should source this file from both .zshrc and .bashrc
if [ -n "${ZSH_VERSION}" ]; then
setopt append_history # append rather then overwrite
setopt extended_history # save timestamp
setopt inc_append_history # add history immediately after typing a command
_get_line_num_last () {
local attempts=0
local line=0
while true; do
# Greps the last two lines that can be considered history records
local lines="$(grep -anE '^: [0-9]{10}:[0-9]*?;' ~/.zsh_history | \
tail -n $((2 + attempts)) | head -2)"
local previous_line="$(echo "$lines" | head -1)"
# Gets the line number of the line being tested
local line_attempt=$(echo "$lines" | tail -1 | cut -d':' -f1 | tr -d '\n')
# If the previous (possible) history records ends with `\`, then the
# _current_ one is part of a multiline command; try again.
# Probably. Unless it was in turn in the middle of a multi-line
# command. And that's why the last line should be saved.
if [[ $line_attempt -ne $HISTORY_LAST_LINE ]] && \
[[ $previous_line == *"\\" ]] && [[ $attempts -eq 0 ]];
then
((attempts+=1))
else
line=$line_attempt
break
fi
done
echo "$line"
}
precmd() {
local line_num_last="$(_get_line_num_last)"
local date_part="$(awk "NR == $line_num_last {print;}" ~/.zsh_history | cut -c 3-12)"
# Try to get date with non-mac date function.
local fmt_date="$(date -d #${date_part} +'%Y-%m-%d %H:%M:%S')" >& /dev/null
# Try again with mac date function if that failed.
if [ -z "$fmt_date" ]; then
local fmt_date="$(date -r 1623959079 +'%Y-%m-%d %H:%M:%S')" >& /dev/null
fi
# I use awk itself to split the _first_ line only at the first `;`
local command_part="$(awk "
NR == $line_num_last {
pivot = match(\$0, \";\");
print substr(\$0, pivot+1);
}
NR > $line_num_last {
print;
}" ~/.zsh_history)"
if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
then
echo "${fmt_date} | ${command_part}" >> ~/.persistent_history
export PERSISTENT_HISTORY_LAST="$command_part"
export HISTORY_LAST_LINE=$((1 + $(wc -l < ~/.zsh_history)))
fi
}
elif [ -n "${BASH_VERSION}" ]; then
log_bash_persistent_history()
{
[[
$(history 1) =~ ^\ *[0-9]+\ +([^\ ]+\ [^\ ]+)\ +(.*)$
]]
local date_part="${BASH_REMATCH[1]}"
local command_part="${BASH_REMATCH[2]}"
if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
then
echo $date_part "|" "$command_part" >> ~/.persistent_history
export PERSISTENT_HISTORY_LAST="$command_part"
fi
}
export PROMPT_COMMAND="log_bash_persistent_history"
fi
export HISTSIZE=1000000
export HISTFILESIZE=-1
export HISTCONTROL=ignoredups:erasedups
export HISTTIMEFORMAT="%F %T "
alias persistent_history='cat ~/.persistent_history'
alias ph='cat ~/.persistent_history'
alias phgrep='ph | grep'
alias phg='ph | grep'

How to strip out all of the links of an HTML file in Bash or grep or batch and store them in a text file

I have a file that is HTML, and it has about 150 anchor tags. I need only the links from these tags, AKA, . I want to get only the http://www.google.com part.
When I run a grep,
cat website.htm | grep -E '<a href=".*">' > links.txt
this returns the entire line to me that it found on not the link I want, so I tried using a cut command:
cat drawspace.txt | grep -E '<a href=".*">' | cut -d’”’ --output-delimiter=$'\n' > links.txt
Except that it is wrong, and it doesn't work give me some error about wrong parameters... So I assume that the file was supposed to be passed along too. Maybe like cut -d’”’ --output-delimiter=$'\n' grepedText.txt > links.txt.
But I wanted to do this in one command if possible... So I tried doing an AWK command.
cat drawspace.txt | grep '<a href=".*">' | awk '{print $2}’
But this wouldn't run either. It was asking me for more input, because I wasn't finished....
I tried writing a batch file, and it told me FINDSTR is not an internal or external command... So I assume my environment variables were messed up and rather than fix that I tried installing grep on Windows, but that gave me the same error....
The question is, what is the right way to strip out the HTTP links from HTML? With that I will make it work for my situation.
P.S. I've read so many links/Stack Overflow posts that showing my references would take too long.... If example HTML is needed to show the complexity of the process then I will add it.
I also have a Mac and PC which I switched back and forth between them to use their shell/batch/grep command/terminal commands, so either or will help me.
I also want to point out I'm in the correct directory
HTML:
<tr valign="top">
<td class="beginner">
B03
</td>
<td>
Simple Symmetry </td>
</tr>
<tr valign="top">
<td class="beginner">
B04
</td>
<td>
Faces and a Vase </td>
</tr>
<tr valign="top">
<td class="beginner">
B05
</td>
<td>
Blind Contour Drawing </td>
</tr>
<tr valign="top">
<td class="beginner">
B06
</td>
<td>
Seeing Values </td>
</tr>
Expected output:
http://www.drawspace.com/lessons/b03/simple-symmetry
http://www.drawspace.com/lessons/b04/faces-and-a-vase
http://www.drawspace.com/lessons/b05/blind-contour-drawing
etc.
$ sed -n 's/.*href="\([^"]*\).*/\1/p' file
http://www.drawspace.com/lessons/b03/simple-symmetry
http://www.drawspace.com/lessons/b04/faces-and-a-vase
http://www.drawspace.com/lessons/b05/blind-contour-drawing
http://www.drawspace.com/lessons/b06/seeing-values
You can use grep for this:
grep -Po '(?<=href=")[^"]*' file
It prints everything after href=" until a new double quote appears.
With your given input it returns:
http://www.drawspace.com/lessons/b03/simple-symmetry
http://www.drawspace.com/lessons/b04/faces-and-a-vase
http://www.drawspace.com/lessons/b05/blind-contour-drawing
http://www.drawspace.com/lessons/b06/seeing-values
Note that it is not necessary to write cat drawspace.txt | grep '<a href=".*">', you can get rid of the useless use of cat with grep '<a href=".*">' drawspace.txt.
Another example
$ cat a
hello asdas
hello asdas
other things
$ grep -Po '(?<=href=")[^"]*' a
httafasdf
hello
My guess is your PC or Mac will not have the lynx command installed by default (it's available for free on the web), but lynx will let you do things like this:
$lynx -dump -image_links -listonly /usr/share/xdiagnose/workloads/youtube-reload.html
Output:
References
file://localhost/usr/share/xdiagnose/workloads/youtube-reload.html
http://www.youtube.com/v/zeNXuC3N5TQ&hl=en&fs=1&autoplay=1
It is then a simple matter to grep for the http: lines. And there even may be lynx options to print just the http: lines (lynx has many, many options).
Use grep to extract all the lines with links in them and then use sed to pull out the URLs:
grep -o '<a href=".*">' *.html | sed 's/\(<a href="\|\">\)//g' > link.txt;
As per comment of triplee, using regex to parse HTML or XML files is essentially not done. Tools such as sed and awk are extremely powerful for handling text files, but when it boils down to parsing complex-structured data — such as XML, HTML, JSON, ... — they are nothing more than a sledgehammer. Yes, you can get the job done, but sometimes at a tremendous cost. For handling such delicate files, you need a bit more finesse by using a more targetted set of tools.
In case of parsing XML or HTML, one can easily use xmlstarlet.
In case of an XHTML file, you can use :
xmlstarlet sel --html -N "x=http://www.w3.org/1999/xhtml" \
-t -m '//x:a/#href' -v . -n
where -N gives the XHTML namespace if any, this is recognized by
<html xmlns="http://www.w3.org/1999/xhtml">
However, As HTML pages are often not well-formed XML, it might be handy to clean it up a bit using tidy. In the example case above this gives then :
$ tidy -q -numeric -asxhtml --show-warnings no <file.html> \
| xmlstarlet sel --html -N "x=http://www.w3.org/1999/xhtml" \
-t -m '//x:a/#href' -v . -n
http://www.drawspace.com/lessons/b03/simple-symmetry
http://www.drawspace.com/lessons/b04/faces-and-a-vase
http://www.drawspace.com/lessons/b05/blind-contour-drawing
http://www.drawspace.com/lessons/b06/seeing-values
assuming a well-formed HTML document with only 1 href link per line, here's one awk approach without needing backreferences to regex:capturing groups
{m,g}awk 'NF*=2<NF' OFS= FS='^.*<[Aa] [^>]*[Hh][Rr][Ee][Ff]=\"|\".*$'
http://www.drawspace.com/lessons/b03/simple-symmetry
http://www.drawspace.com/lessons/b04/faces-and-a-vase
http://www.drawspace.com/lessons/b05/blind-contour-drawing
http://www.drawspace.com/lessons/b06/seeing-values
Here is a (more general) dash script, which can compare the URLs (delimited by ://) in two files (call this script with the --help flag to find out how to use it):
#!/bin/dash
PrintURLs () {
extract_urls_command="$insert_NL_after_URLs_command|$strip_NON_URL_text_command"
if [ "$domains_flag" = "1" ]; then
extract_urls_command="$extract_urls_command|$get_domains_command"
fi
{
eval path_to_search=\"\$$1\"
current_file_group="$2"
if [ ! "$skip_non_text_files_flag" = "1" ]; then
printf "\033]0;%s\007" "Loading non text files from group [$current_file_group]...">"$print_to_screen"
eval find \"\$path_to_search\" ! -type d ! -path '.' -a \\\( -name '*.docx' \\\) "$find_params" -exec unzip -q -c '{}' 'word/_rels/document.xml.rels' \\\;
eval find \"\$path_to_search\" ! -type d ! -path '.' -a \\\( -name '*.xlsx' \\\) "$find_params" -exec unzip -q -c '{}' 'xl/worksheets/_rels/*' \\\;
eval find \"\$path_to_search\" ! -type d ! -path '.' -a \\\( -name '*.pptx' -o -name '*.ppsx' \\\) "$find_params" -exec unzip -q -c '{}' 'ppt/slides/slide1.xml' \\\;
eval find \"\$path_to_search\" ! -type d ! -path '.' -a \\\( -name '*.odt' -o -name '*.ods' -o -name '*.odp' \\\) "$find_params" -exec unzip -q -c '{}' 'content.xml' \\\;
eval find \"\$path_to_search\" ! -type d ! -path '.' -a \\\( -name '*.pdf' \\\) "$find_params" -exec pdftotext '{}' '-' \\\;
fi
eval find \"\$path_to_search\" ! -type d ! -path '.' "$find_params"|{
count=0
while IFS= read file; do
if [ ! "$(file -bL --mime-encoding "$file")" = "binary" ]; then
count=$((count+1))
printf "\033]0;%s\007" "Loading text files from group [$current_file_group] - file $count...">"$print_to_screen"
cat "$file"
fi
done
}
printf "\033]0;%s\007" "Extracting URLs from group [$current_file_group]...">"$print_to_screen"
} 2>/dev/null|eval "$extract_urls_command"
}
StoreURLsWithLineNumbers () {
count_all="0"
mask="00000000000000000000"
#For <file group 1>: initialise next variables:
file_group="1"
count=0
dff_command_text=""
if [ ! "$dff_command_flag" = "0" ]; then
dff_command_text="Step $dff_command_flag - "
fi
for line in $(PrintURLs file_params_1 1; printf '%s\n' "### Sepparator ###"; for i in $(seq 2 $file_params_0); do PrintURLs file_params_$i 2; done;); do
if [ "$line" = "### Sepparator ###" ]; then
eval lines$file_group\1\_0=$count
eval lines$file_group\2\_0=$count
#For <file group 2>: initialise next variables:
file_group="2";
count="0"
continue;
fi
printf "\033]0;%s\007" "Storing URLs into memory [$dff_command_text""group $file_group]: $((count + 1))...">"$print_to_screen"
count_all_prev=$count_all
count_all=$((count_all+1))
count=$((count+1))
if [ "${#count_all_prev}" -lt "${#count_all}" ]; then
mask="${mask%?}"
fi
number="$mask$count_all"
eval lines$file_group\1\_$count=\"\$number\"
eval lines$file_group\2\_$count=\"\$line\" #URL
done;
eval lines$file_group\1\_0=$count
eval lines$file_group\2\_0=$count
}
trap1 () {
CleanUp
#if not running in a subshell: print "Aborted"
if [ "$dff_command_flag" = "0" ]; then
printf "\nAborted.\n">"$print_to_screen"
fi
#kill all children processes ("-$$": "-" = all processes in the process group with ID "$$" (current shell ID)), suppressing "Terminated" message (sending signal SIGPIPE ("PIPE") instead of SIGTERM ("INT") suppresses the "Terminated" message):
kill -s PIPE -- -$$
exit
}
CleanUp () {
#Restore "INTERRUPT" (CTRL-C) and "TERMINAL STOP" (CTRL-Z) signals:
trap - INT
trap - TSTP
#Clear the title:
printf "\033]0;%s\007" "">"$print_to_screen"
#Restore initial IFS:
#IFS="$initial_IFS"
unset IFS
#Restore initial directory:
cd "$initial_dir"
DestroyArray flag_params
DestroyArray file_params
DestroyArray find_params
DestroyArray lines11
DestroyArray lines12
DestroyArray lines21
DestroyArray lines22
##Kill current shell with PID $$:
#kill -INT $$
}
DestroyArray () {
eval eval array_length=\'\$$1\_0\'
if [ -z "$array_length" ]; then array_length=0; fi
for i in $(seq 1 $array_length); do
eval unset $1\_$i
done
eval unset $1\_0
}
PrintErrorExtra () {
{
printf '%s\n' "Command path:"
printf '%s\n' "$current_shell '$current_script_path'"
printf "\n"
#Flag parameters are printed non-quoted:
printf '%s\n' "Flags:"
for i in $(seq 1 $flag_params_0); do
eval current_param="\"\$flag_params_$i\""
printf '%s\n' "$current_param"
done
if [ "$flag_params_0" = "0" ]; then printf '%s\n' "<none>"; fi
printf "\n"
#Path parameters are printed quoted with '':
printf '%s\n' "Paths:"
for i in $(seq 1 $file_params_0); do
eval current_param="\"\$file_params_$i\""
printf '%s\n' "'$current_param'"
done
if [ "$file_params_0" = "0" ]; then printf '%s\n' "<none>"; fi
printf "\n"
#Find parameters are printed quoted with '':
printf '%s\n' "'find' parameters:"
for i in $(seq 1 $find_params_0); do
eval current_param="\"\$find_params_$i\""
printf '%s\n' "'$current_param'"
done
if [ "$find_params_0" = "0" ]; then printf '%s\n' "<none>"; fi
printf "\n"
}>"$print_error_messages"
}
DisplayHelp () {
printf '%s\n' ""
printf '%s\n' " uniql.sh - A script to compare URLs ( containing '://' ) in a file compared to a group of files"
printf '%s\n' " "
printf '%s\n' " Usage:"
printf '%s\n' " "
printf '%s\n' " dash '/path/to/this/script.sh' <flags> '/path/to/file1' ... '/path/to/fileN' [ --find-parameters <find_parameters> ]"
printf '%s\n' " where:"
printf '%s\n' " - The group 1: '/path/to/file1' and the group 2: '/path/to/file2' ... '/path/to/fileN' - are considered the two groups of files to be compared"
printf '%s\n' " "
printf '%s\n' " - <flags> can be:"
printf '%s\n' " --help"
printf '%s\n' " - displays this help information"
printf '%s\n' " --different or -d"
printf '%s\n' " - find URLs that differ"
printf '%s\n' " --common or -c"
printf '%s\n' " - find URLs that are common"
printf '%s\n' " --domains"
printf '%s\n' " - compare and print only the domains (plus subdomains) of the URLs for: the group 1 and the group 2 - for the '-c' or the '-d' flag"
printf '%s\n' " --domains-full"
printf '%s\n' " - compare only the domains (plus subdomains) of the URLs but print the full URLs for: the group 1 and the group 2 - for the '-c' or the '-d' flag"
printf '%s\n' " --preserve-order or -p"
printf '%s\n' " - preserve the order and the occurences in which the links appear in group 1 and in group 2"
printf '%s\n' " - Warning: when using this flag - process substitution is used by this script - which does not work with the \"dash\" shell (throws an error). For this flag, you can use other \"dash\" syntax compatible shells, like: bash, zsh, ksh"
printf '%s\n' " --skip-non-text"
printf '%s\n' " - skip non-text files from search (does not look into: .docx, .xlsx, .pptx, .ppsx, .odt, .ods, .odp and .pdf files)"
printf '%s\n' " --find-parameters <find_parameters>"
printf '%s\n' " - all the parameters given after this flag, are considered 'find' parameters"
printf '%s\n' " - <find_parameters> can be: any parameters that can be passed to the 'find' utility (which is used internally by this script) - such as: name/path filters"
printf '%s\n' " -h"
printf '%s\n' " - also look in hidden files"
printf '%s\n' " "
printf '%s\n' " Output:"
printf '%s\n' " - '<' - denote URLs from the group 1: '/path/to/file1'"
printf '%s\n' " - '>' - denote URLs from the group 2: '/path/to/file2' ... '/path/to/fileN'"
printf '%s\n' " "
printf '%s\n' " Other commands that might be useful:"
printf '%s\n' " "
printf '%s\n' " - filter results - print lines containing string (highlight):"
printf '%s\n' " ...|grep \"string\""
printf '%s\n' " "
printf '%s\n' " - filter results - print lines not containing string:"
printf '%s\n' " ...|grep -v \"string\""
printf '%s\n' " "
printf '%s\n' " - filter results - print lines containing: string1 or string2 or ... stringN:"
printf '%s\n' " ...|awk '/string1|string2|...|stringN/'"
printf '%s\n' " "
printf '%s\n' " - filter results - print lines not containing: string1 or string2 or ... stringN:"
printf '%s\n' " ...|awk '"'!'"/string1|string2|...|stringN/'"
printf '%s\n' " "
printf '%s\n' " - filter results - print lines in '/file/path/2' that are in '/file/path/1':"
printf '%s\n' " grep -F -f '/file/path/1' '/file/path/2'"
printf '%s\n' " "
printf '%s\n' " - filter results - print lines in '/file/path/2' that are not in '/file/path/1':"
printf '%s\n' " grep -F -vf '/file/path/1' '/file/path/2'"
printf '%s\n' " "
printf '%s\n' " - filter results - print columns <1> and <2> from output:"
printf '%s\n' " awk '{print \$1, \$2}'"
printf '%s\n' ""
}
# Print to "/dev/tty" = Print error messages to screen only
print_to_screen="/dev/tty"
#print_error_messages='&2'
print_error_messages="$print_to_screen"
initial_dir="$PWD" #Store initial directory value
initial_IFS="$IFS" #Store initial IFS value
NL2=$(printf '%s' "\n\n") #Store New Line for use with sed
insert_NL_after_URLs_command='sed -E '"'"'s/([a-zA-Z]*\:\/\/)/'"\\${NL2}"'\1/g'"'"
strip_NON_URL_text_command='sed -n '"'"'s/\(\(.*\([^a-zA-Z+]\)\|\([a-zA-Z]\)\)\)\(\([a-zA-Z]\)*\:\/\/\)\([^ ^\t^>^<]*\).*/\4\5\7/p'"'"
get_domains_command='sed '"'"'s/.*:\/\/\(.*\)/\1/g'"'"'|sed '"'"'s/\/.*//g'"'"
prepare_for_output_command='sed -E '"'"'s/ *([0-9]*)[\ *](<|>) *([0-9]*)[\ *](.*)/\2 \4 \1/g'"'"
remove_angle_brackets_command='sed -E '"'"'s/(<|>) (.*)/\2/g'"'"
find_params=""
#Process parameters:
different_flag="0"
common_flag="0"
domains_flag="0"
domains_full_flag="0"
preserve_order_flag="0"
dff_command1_flag="0"
dff_command2_flag="0"
dff_command3_flag="0"
dff_command4_flag="0"
dff_command_flag="0"
skip_non_text_files_flag="0"
find_parameters_flag="0"
hidden_files_flag="0"
help_flag="0"
flag_params_count=0
file_params_count=0
find_params_count=0
for param; do
if [ "$find_parameters_flag" = "0" ]; then
case "$param" in
"--different" | "-d" | "--common" | "-c" | "--domains" | \
"--domains-full" | "--preserve_order" | "-p" | "--dff_command1" | "--dff_command2" | \
"--dff_command3" | "--dff_command4" | "--skip-non-text" | "--find-parameters" | "-h" | \
"--help" )
flag_params_count=$((flag_params_count+1))
eval flag_params_$flag_params_count=\"\$param\"
case "$param" in
"--different" | "-d" )
different_flag="1"
;;
"--common" | "-c" )
common_flag="1"
;;
"--domains" )
domains_flag="1"
;;
"--domains-full" )
domains_full_flag="1"
;;
"--preserve_order" | "-p" )
preserve_order_flag="1"
;;
"--dff_command1" )
dff_command1_flag="1"
dff_command_flag="1"
;;
"--dff_command2" )
dff_command2_flag="1"
dff_command_flag="2"
;;
"--dff_command3" )
dff_command3_flag="1"
dff_command_flag="3"
;;
"--dff_command4" )
dff_command4_flag="1"
dff_command_flag="4"
;;
"--skip-non-text" )
skip_non_text_files_flag="1"
;;
"--find-parameters" )
find_parameters_flag="1"
;;
"-h" )
hidden_files_flag="1"
;;
"--help" )
help_flag="1"
;;
esac
;;
* )
file_params_count=$((file_params_count+1))
eval file_params_$file_params_count=\"\$param\"
;;
esac
elif [ "$find_parameters_flag" = "1" ]; then
find_params_count=$((find_params_count+1))
eval find_params_$find_params_count=\"\$param\"
fi
done
flag_params_0="$flag_params_count"
file_params_0="$file_params_count"
find_params_0="$find_params_count"
if [ "$help_flag" = "1" -o \( "$file_params_0" = "0" -a "$find_params_0" = "0" \) ]; then
DisplayHelp
exit 0
fi
#Check if any of the necessary utilities is missing:
error="false"
man -f find >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'find' utility is not installed!"; error="true"; }
man -f file >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'file' utility is not installed!"; error="true"; }
man -f kill >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'kill' utility is not installed!"; error="true"; }
man -f seq >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'seq' utility is not installed!"; error="true"; }
man -f ps >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'ps' utility is not installed!"; error="true"; }
man -f sort >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'sort' utility is not installed!"; error="true"; }
man -f uniq >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'uniq' utility is not installed!"; error="true"; }
man -f sed >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'sed' utility is not installed!"; error="true"; }
man -f grep >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'grep' utility is not installed!"; error="true"; }
if [ "$skip_non_text_files_flag" = "0" ]; then
man -f unzip >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'unzip' utility is not installed!"; error="true"; }
man -f pdftotext >/dev/null 2>/dev/null || { printf '\n%s\n' "ERROR: the 'pdftotext' utility is not installed!"; error="true"; }
fi
if [ "$error" = "true" ]; then
printf "\n"
CleanUp; exit 1
fi
#Process parameters/flags and check for errors:
find_params="$(for i in $(seq 1 $find_params_0;); do eval printf \'\%s \' "\'\$find_params_$i\'"; done;)"
if [ -z "$find_params" ]; then
find_params='-name "*"'
fi
if [ "$hidden_files_flag" = "1" ]; then
hidden_files_string=""
elif [ "$hidden_files_flag" = "0" ]; then
hidden_files_string="\( "'! -path '"'"'*/.*'"'"" \)"
fi
find_params="$hidden_files_string"" -a ""$find_params"
current_shell="$(ps -p $$ 2>/dev/null)"; current_shell="${current_shell##*" "}"
current_script_path=$(cd "${0%/*}" 2>/dev/null; printf '%s' "$(pwd -P)/${0##*/}")
error="false"
if [ "$different_flag" = "0" -a "$common_flag" = "0" ]; then
error="true"
printf '\n%s\n' "ERROR: Expected either -c or -d flag!">"$print_error_messages"
elif [ "$different_flag" = "1" -a "$common_flag" = "1" ]; then
error="true"
printf '\n%s\n' "ERROR: The '-c' flag cannot be used together with the '-d' flag!">"$print_error_messages"
fi
if [ "$preserve_order_flag" = "1" -a "$common_flag" = "1" ]; then
error="true"
printf '\n%s\n' "ERROR: The '-p' flag cannot be used together with the '-c' flag!">"$print_error_messages"
fi
if [ "$preserve_order_flag" = "1" -a "$current_shell" = "dash" ]; then
error="true"
printf '\n%s\n' "ERROR: When using the '-p' flag, the \"process substitution\" feature is needed, which is not available in the dash shell (it is available in shells like: bash, zsh, ksh)!">"$print_error_messages"
fi
eval find \'/dev/null\' "$find_params">/dev/null 2>&1||{
error="true"
printf '\n%s\n' "ERROR: Invalid parameters for the 'find' command!">"$print_error_messages"
}
if [ "$error" = "true" ]; then
printf "\n"
PrintErrorExtra
CleanUp; exit 1;
fi
#Check if the file paths given as parameters do exist:
error="false"
for i in $(seq 1 $file_params_0); do
eval current_file=\"\$file_params_$i\"
# If current <file> does not exist:
if [ ! -e "$current_file" ]; then # If current file does not exist:
printf '\n%s\n' "ERROR: File '$current_file' does not exist or is not accessible!">"$print_error_messages"
error="true"
elif [ ! -r "$current_file" ]; then # If current file is not readable:
printf '\n%s\n' "ERROR: File <$i> = '$current_file' is not readable!">"$print_error_messages"
error="true"
fi
done
if [ "$error" = "true" ]; then
printf "\n"
PrintErrorExtra
CleanUp; exit 1;
fi
#Proceed to finding and comparing URLs:
IFS='
'
#Trap "INTERRUPT" (CTRL-C) and "TERMINAL STOP" (CTRL-Z) signals:
trap 'trap1' INT
trap 'trap1' TSTP
if [ "$domains_full_flag" = "0" -o ! "$dff_command_flag" = "0" ]; then
StoreURLsWithLineNumbers
fi
if [ "$domains_full_flag" = "0" ]; then
if [ "$preserve_order_flag" = "0" ]; then
{
for i in $(seq 1 $lines11_0); do
printf "\033]0;%s\007" "Processing group [1] - URL: $i...">"$print_to_screen"
eval printf \'\%s\\\n\' \"\< \$lines11_$i \$lines12_$i\"
done|sort -k 3|uniq -c -f 2
for i in $(seq 1 $lines21_0); do
printf "\033]0;%s\007" "Processing group [2] - URL: $i...">"$print_to_screen"
eval printf \'\%s\\\n\' \"\> \$lines21_$i \$lines22_$i\"
done|sort -k 3|uniq -c -f 2
}|sort -k 4|{
if [ "$different_flag" = "1" ]; then
uniq -u -f 3|sort -k 3|eval "$prepare_for_output_command"
elif [ "$common_flag" = "1" ]; then
uniq -d -f 3|sort -k 3|eval "$prepare_for_output_command"|eval "$remove_angle_brackets_command"
fi
}
elif [ "$preserve_order_flag" = "1" ]; then
if [ "$different_flag" = "1" ]; then
{
URL_count=0
current_line=""
for line in $(eval diff \
\<\(\
count1=0\;\
for i in \$\(seq 1 \$lines11_0\)\; do\
count1=\$\(\(count1 + 1\)\)\;\
eval URL=\\\"\\\$lines12_\$i\\\"\;\
printf \'\%s\\n\' \"File group: 1 URL: \$count1\"\;\
printf \'\%s\\n\' \"\$URL\"\;\
done\;\
printf \'\%s\\n\' \"\#\#\# Sepparator 1\"\;\
\) \
\<\(\
count2=0\;\
for i in \$\(seq 1 $lines21_0\)\; do\
count2=\$\(\(count2 + 1\)\)\;\
eval URL=\\\"\\\$lines22_\$i\\\"\;\
printf \'\%s\\n\' \"File group: 2 URL: \$count2\"\;\
printf \'\%s\\n\' \"\$URL\"\;\
done\;\
printf \'\%s\\n\' \"\#\#\# Sepparator 2\"\;\
\) \
); do
URL_count=$((URL_count + 1))
previous_line="$current_line"
current_line="$line"
#if ( current line starts with "<" and previous line starts with "<" ) OR ( current line starts with ">" and previous line starts with ">" ):
if [ \( \( ! "${current_line#"<"}" = "${current_line}" \) -a \( ! "${previous_line#"<"}" = "${previous_line}" \) \) -o \( \( ! "${current_line#">"}" = "${current_line}" \) -a \( ! "${previous_line#">"}" = "${previous_line}" \) \) ]; then
printf '%s\n' "$previous_line"
fi
done
}
fi
fi
elif [ "$domains_full_flag" = "1" ]; then
# Command to find common domains:
uniql_command1="$current_shell '$current_script_path' -c --domains $(for i in $(seq 1 $file_params_0); do eval printf \'%s \' \\\'\$file_params_$i\\\'; done)"
# URLs that are only in first parameter file (file group 1):
uniql_command2="$current_shell '$current_script_path' -d '$file_params_1' \"/dev/null\""
# Command to find common domains:
uniql_command3="$current_shell '$current_script_path' -c --domains $(for i in $(seq 1 $file_params_0); do eval printf \'%s \' \\\'\$file_params_$i\\\'; done)"
# URLs that are only in 2..N parameter files (file group 2):
uniql_command4="$current_shell '$current_script_path' -d \"/dev/null\" $(for i in $(seq 2 $file_params_0); do eval printf \'%s \' \\\'\$file_params_$i\\\'; done)"
#Store one <command substitution> at a a time (syncronously):
uniql_command1_output="$(eval $uniql_command1 --dff_command1 --find-parameters "$find_params"|sed 's/\([^ *]\) \(.*\)/\1/')"
uniql_command2_output="$(eval $uniql_command2 --dff_command2 --find-parameters "$find_params")"
uniql_command3_output="$(eval $uniql_command3 --dff_command3 --find-parameters "$find_params"|sed 's/\([^ *]\) \(.*\)/\1/')"
uniql_command4_output="$(eval $uniql_command4 --dff_command4 --find-parameters "$find_params")"
if [ "$different_flag" = "1" ]; then
# Find URLs (second escaped process substitution: \<\(...\)) that are not in the common domains list (first escaped process substitution: \<\(...\)):
# URLs in the first file given as parameter (second escaped process substitution: \<\(...\)):
eval grep \-F \-vf \<\( printf \'\%s\' \"\$uniql_command1_output\"\; \) \<\( printf \'\%s\' \"\$uniql_command2_output\"\; \)
# URLs in the files 2..N - given as parameters (second escaped process substitution: \<\(...\)):
eval grep \-F \-vf \<\( printf \'\%s\' \"\$uniql_command3_output\"\; \) \<\( printf \'\%s\' \"\$uniql_command4_output\"\; \)
elif [ "$common_flag" = "1" ]; then
# Find URLs (second escaped process substitution: \<\(...\)) that are in the common domains list (first escaped process substitution: \<\(...\)):
# URLs in the first file given as parameter (second escaped process substitution: \<\(...\)):
eval grep \-F \-f \<\( printf \'\%s\' \"\$uniql_command1_output\"\; \) \<\( printf \'\%s\' \"\$uniql_command2_output\"\; \)
# URLs in the files 2..N - given as parameters (second escaped process substitution: \<\(...\)):
eval grep \-F \-f \<\( printf \'\%s\' \"\$uniql_command3_output\"\; \) \<\( printf \'\%s\' \"\$uniql_command4_output\"\; \)
fi
# grep flags explained:
# -F = do not interpret pattern string (treat string literally)
# -v = select non-matching lines
fi
CleanUp
For the asked question - this should do it:
dash '/path/to/the/above/script.sh' -d '/path/to/file1/containing/URLs.txt' '/dev/null'

Writing a custom bash-completion rule

I have directories full of files with the same prefix, which I want to be able to quickly open in vim. For example, I might have:
$ ls *
bar:
bar_10 bar_20 bar_30
foo:
foo_10 foo_20 foo_30
What I want is to be able to be in one of these directories and type:
$ vim <TAB>
and it autocomplete to:
$ vim bar_
To achieve this I am happy to have a file per directory called ".completion" which has "bar_" in it.
The issue I have is I would like the following behaviour:
* "vim <TAB>" --> "vim bar_" // no space
* "vim bar_1" --> "vim bar_10 " // space
Where | is the cursor, so if a file matches, add the space on the end. If we're matching the prefix, don't add a space.
The best I have so far is this behaviour minus the adding a space at the end. I've tried all sorts of things, all to no avail. The following is what I have:
_vim()
{
local cur opts
local -a toks
cur="${COMP_WORDS[COMP_CWORD]}"
if [ -f .completion ]; then
opts=`cat .completion`
if [[ ${opts} = ${cur} ]]; then
toks=( $(compgen -f ${cur} | sed -e 's/$/ /') )
else
if [[ -z ${cur} ]]; then
toks=( $(compgen -W "${opts}" -- ${cur}) )
else
toks=( $(compgen -f ${cur} | sed -e 's/$/ /') )
fi
fi
else
toks=( $(compgen -f ${cur} | sed -e 's/$/ /') )
fi
COMPREPLY=( "${toks[#]}" )
}
complete -F _vim -o nospace vim
Any ideas on how I can get it to add the space after the file name completion, but not after the prefix completion would be greatly appreciated.
The trailing space that sed is adding is getting dropped. Try this:
saveIFS=$IFS
IFS=$'\n' # this will allow filenames with spaces (but not filenames with newlines)
toks=( $(compgen -f -- "${cur}" )) # the -- protects against filenames that start with a hyphen
toks=("${toks[#]/%/ }") # add a trailing space to each element
IFS=$saveIFS

Resources