Hey guys title pretty much says it, but I'm echoing two variables into a BASH loop to kick it off, and (supposedly) using a case to be able to identify where they are and run a similar but separate (missing -k flag on second go around) wget statement. I hit my git checkout but it doesn't seem like I'm entering my cases. How do I fix this, or is there a better way to do it since I'm just dropping a -k flag?
#!/bin/bash
echo -e "render\nstorage" | while read x; do
git checkout "$x"
case $x in
$1)
wget "${WGDOMAIN}" -r -l INF -k -p \
--no-check-certificate \
--strict-comments \
--warc-header="Operator: Web Archiver" \
--warc-file="$WGDOMAIN" \
--warc-dedup="${WGDOMAIN}.cdx" \
--warc-cdx=on 2> session.log
;;
$2)
wget "${WGDOMAIN}" -r -l INF -p \
--no-check-certificate \
--strict-comments \
--warc-header="Operator: Web Archiver" \
--warc-file="$WGDOMAIN" \
--warc-dedup="${WGDOMAIN}.cdx" \
--warc-cdx=on 2> session.log
;;
$1|$2)
git add . && git ci -m"Archived: ${DATE}"
git push origin "$x"
;;
esac
done
Called with no positional parameters your script will not do anything inside the case statement as $1 and $2 are empty. Besides that, the last case option $1|$2 will never be reached as the prior ones will match. May be you should get that commands out of the case.
Related
I'm trying to run a script for pulling finance history from yahoo. Boris's answer from this thread
wget can't download yahoo finance data any more
works for me ~2 out of 3 times, but fails if the crumb returned from the cookie has a "\" character in it.
Code that sometimes works looks like this
#!usr/bin/sh
symbol=$1
today=$(date +%Y%m%d)
tomorrow=$(date --date='1 days' +%Y%m%d)
first_date=$(date -d "$2" '+%s')
last_date=$(date -d "$today" '+%s')
wget --no-check-certificate --save-cookies=cookie.txt https://finance.yahoo.com/quote/$symbol/?p=$symbol -O C:/trip/stocks/stocknamelist/crumb.store
crumb=$(grep 'root.*App' crumb.store | sed 's/,/\n/g' | grep CrumbStore | sed 's/"CrumbStore":{"crumb":"\(.*\)"}/\1/')
echo $crumb
fileloc=$"https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
echo $fileloc
wget --no-check-certificate --load-cookies=cookie.txt $fileloc -O c:/trip/stocks/temphistory/hs$symbol.csv
rm cookie.txt crumb.store
But that doesn't seem to process in wget the way I intend either, as it seems to be interpreting as described here:
https://askubuntu.com/questions/758080/getting-scheme-missing-error-with-wget
Any suggestions on how to pass the $crumb variable into wget so that wget doesn't error out if $crumb has a "\" character in it?
Edited to show the full script. To clarify I've got cygwin installed with wget package. I call the script from cmd prompt as (example where the script above is named "stocknamedownload.sh, the stock symbol I'm downloading is "A" from the startdate 19800101)
c:\trip\stocks\StockNameList>bash stocknamedownload.sh A 19800101
This script seems to work fine - unless the crumb returned contains a "\" character in it.
The following implementation appears to work 100% of the time -- I'm unable to reproduce the claimed sporadic failures:
#!/usr/bin/env bash
set -o pipefail
symbol=$1
today=$(date +%Y%m%d)
tomorrow=$(date --date='1 days' +%Y%m%d)
first_date=$(date -d "$2" '+%s')
last_date=$(date -d "$today" '+%s')
# store complete webpage text in a variable
page_text=$(curl --fail --cookie-jar cookies \
"https://finance.yahoo.com/quote/$symbol/?p=$symbol") || exit
# extract the JSON used by JavaScript in the page
app_json=$(grep -e 'root.App.main = ' <<<"$page_text" \
| sed -e 's#^root.App.main = ##' \
-e 's#[;]$##') || exit
# use jq to extract the crumb from that JSON
crumb=$(jq -r \
'.context.dispatcher.stores.CrumbStore.crumb' \
<<<"$app_json" | tr -d '\r') || exit
# Perform our actual download
fileloc="https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
curl --fail --cookie cookies "$fileloc" >"hs$symbol.csv"
Note that the tr -d '\r' is only necessary when using a native-Windows jq mixed with an otherwise native-Cygwin set of tools.
You are adding quotes to the value of the variable instead of quoting the expansion. You are also trying to use tools that don't know what JSON is to process JSON; use jq.
wget --no-check-certificate \
--save-cookies=cookie.txt \
"https://finance.yahoo.com/quote/$symbol/?p=$symbol" \
-O C:/trip/stocks/stocknamelist/crumb.store
# Something like thist; it's hard to reverse engineer the structure
# of crumb.store from your pipeline.
crumb=$(jq 'CrumbStore.crumb' crumb.store)
echo "$crumb"
fileloc="https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
echo "$fileloc"
wget --no-check-certificate \
--load-cookies=cookie.txt "$fileloc" \
-O c:/trip/stocks/temphistory/hs$symbol.csv
I am trying to automate a procedure where the system will fetch the contents of a file (1 Url per line), use wget to grab the files from the site (https folder) and then remove the line from the file.
I have made several tries but the sed part (at the end) cannot understand the string (I tried escaping characters) and remove it from that file!
cat File
https://something.net/xxx/data/Folder1/
https://something.net/xxx/data/Folder2/
https://something.net/xxx/data/Folder3/
My line of code is:
cat File | xargs -n1 -I # bash -c 'wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "#" -P /mnt/USB/ && sed -e 's|#||g' File'
It works up until the sed -e 's|#||g' File part..
Thanks in advance!
Dont use cat if it's posible. It's bad practice and can be problem with big files... You can change
cat File | xargs -n1 -I # bash -c
to
for siteUrl in $( < "File" ); do
It's be more correct and be simpler to use sed with double quotes... My variant:
scriptDir=$( dirname -- "$0" )
for siteUrl in $( < "$scriptDir/File.txt" )
do
if [[ -z "$siteUrl" ]]; then break; fi # break line if him empty
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "$siteUrl" -P /mnt/USB/ && sed -i "s|$siteUrl||g" "$scriptDir/File.txt"
done
#beliy answers looks good!
If you want a one-liner, you can do:
while read -r line; do \
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf \
--no-parent --restrict-file-names=nocontrol --user=test \
--password=pass --no-check-certificate "$line" -P /mnt/USB/ \
&& sed -i -e '\|'"$line"'|d' "File.txt"; \
done < File.txt
EDIT:
You need to add a \ in front of the first pipe
I believe you just need to use double quotes after sed -e. Instead of:
'...&& sed -e 's|#||g' File'
you would need
'...&& sed -e '"'s|#||g'"' File'
I see what you trying to do, but I dont understand the sed command including pipes. Maybe some fancy format that I dont understand.
Anyway, I think the sed command should look like this...
sed -e 's/#//g'
This command will remove all # from the stream.
I hope this helps!
I'm writing my first bash script
LANG="en_US.UTF8" ; export LANG
PROXY=$(shuf -n 1 proxy.txt)
export https_proxy=$PROXY
RUID=$(php -f randuid.php)
curl --data "mydata${RUID}" --user-agent "myuseragent" https://myurl.com/url -o "ticket.txt"
This script also use curl, but if proxy is down it gives me this error:
failed to connect PROXY:PORT
How can I make bash script run again, so it can get another proxy address from proxy.txt
Thanks in advance
Run it in a loop until the curl succeeds, for example:
export LANG="en_US.UTF8"
while true; do
PROXY=$(shuf -n 1 proxy.txt)
export https_proxy=$PROXY
RUID=$(php -f randuid.php)
curl --data "mydata${RUID}" --user-agent "myuseragent" https://myurl.com/url -o "ticket.txt" && break
done
Notice the && break at the end of the curl command.
That is, if the curl succeeds, break out of the infinite loop.
If you have multiple curl commands and you need all of them to succeed,
then chain them all together with &&, and add the break after the last one:
curl url1 && \
curl url2 && \
break
Lastly, as #Inian pointed out,
you could use the --proxy flag to pass a proxy URL to curl without the extra step of setting https_proxy, for example:
curl --proxy "$(shuf -n 1 proxy.txt)" --data "mydata${RUID}" --user-agent "myuseragent"
Lastly, note that due to the randomness, a randomly selected proxy may come up more than once until you find one that works.
Avoid that, you could read iterate over the shuffled proxies instead of an infinite loop:
export LANG="en_US.UTF8"
shuf proxy.txt | while read -r proxy; do
ruid=$(php -f randuid.php)
curl --proxy "$proxy" --data "mydata${ruid}" --user-agent "myuseragent" https://myurl.com/url -o "ticket.txt" && break
done
I also lowercased your user-defined variables,
as capitalization is not recommended for those.
I know i accepted #janos answer but since I can't edit his I'm going to add this
response=$(curl --proxy "$proxy" --silent --write-out "\n%{http_code}\n" https://myurl.com/url)
status_code=$(echo "$response" | sed -n '$p')
html=$(echo "$response" | sed '$d')
case "$status_code" in
200) echo 'Working!'
;;
*)
echo 'Not working, trying again!';
exec "$0" "$#"
esac
This will run my script again if it gives 503 status code which i wanted :)
And with #janos code it will run again if proxy is not working.
Thank you everyone i achieved what i wanted.
I'm using a makefile to run docker, where I first collect some modules to download, so that they can be cached and then run docker. I wanted to parameterize this, but I don't think I'm doing this in the best way. Pointers to make this more concise would be really appreciated.
franz:
$(eval REPO_VERSION := $(shell grep franz requirements/github.txt | cut -d'#' -f3 | cut -d'#' -f1))
if [ -d docker/franz ]; then \
echo "Updating franz to [$(REPO_VERSION)]"; \
cd docker/franz && git fetch && git checkout $(REPO_VERSION); \
else \
echo "Cloning franz to [$(REPO_VERSION)]"; \
git clone --branch $(REPO_VERSION) git#github.com:dubizzle/franz.git docker/franz 2> /dev/null; \
fi \
lilith:
$(eval REPO_VERSION := $(shell grep lilith requirements/github.txt | cut -d'#' -f3 | cut -d'#' -f1))
if [ -d docker/lilith ]; then \
echo "Updating lilith to [$(REPO_VERSION)]"; \
cd docker/lilith && git fetch && git checkout $(REPO_VERSION); \
else \
echo "Cloning lilith to [$(REPO_VERSION)]"; \
git clone --branch $(REPO_VERSION) git#github.com:dubizzle/lilith.git docker/lilith 2> /dev/null; \
fi \
dependencies: franz lilith
git archive --format tar.gz --output docker/archive.tar.gz $(GIT_REF)
Basically, this first updates requirements that are on github, downloads them, checks what version is needed, and then updates to that version. If this could be made a function, a parameterised version would be:
$(eval REPO_VERSION := $(shell grep <repo-name> requirements/github.txt | cut -d'#' -f3 | cut -d'#' -f1))
if [ -d docker/<repo-name> ]; then \
echo "Updating <repo-name> to [$(REPO_VERSION)]"; \
cd docker/<repo-name> && git fetch && git checkout $(REPO_VERSION); \
else \
echo "Cloning <repo-name> to [$(REPO_VERSION)]"; \
git clone --branch $(REPO_VERSION) git#github.com:dubizzle/<repo-name>.git docker/<repo-name> 2> /dev/null; \
fi \
I've seen some examples using define, and call, and eval, but, I can't figure out the right combination to make it work.
Any help with this would be much appreciated.
From the tutorial mentioned above, to pull the information to SO.
This is GNU make, mind.
Defining a template:
define RULES_template
$(1)/obj/%.o: $(1)/src/%.c
$$(CC) $$(CFLAGS) $$(CFLAGS_global) $$(CFLAGS_$(1)) -c $$< -o $$#
endef
This uses one parameter ($(1)), which gets substituted as appropriate. The number of parameters is not declared, you just add $(1), $(2) etc. to the template. Note the duplication of $$ everywhere else.
$(foreach module,$(MODULES),$(eval $(call RULES_template,$(module))))
This calls the template mentioned above for each token in $(MODULES).
call RULES_template,foo instantiates the template with one parameter, foo. eval then parses the output as Makefile syntax (as opposed to, for example, putting it into some variable).
This has been ages ago, and I never used that code in a productive environment, so I am a bit fuzzy on the details. I hope it helps, anyway.
CMake not only is cross-platform, but also has much better primitives to handle sophisticated build mechanics. I can recommend it.
I would like to implement this as a Makefile task:
# step 1:
curl -u username:password -X POST \
-d '{"name": "new_file.jpg","size": 114034,"description": "Latest release","content_type": "text/plain"}' \
https://api.github.com/repos/:user/:repo/downloads
# step 2:
curl -u username:password \
-F "key=downloads/octocat/Hello-World/new_file.jpg" \
-F "acl=public-read" \
-F "success_action_status=201" \
-F "Filename=new_file.jpg" \
-F "AWSAccessKeyId=1ABCDEF..." \
-F "Policy=ewogIC..." \
-F "Signature=mwnF..." \
-F "Content-Type=image/jpeg" \
-F "file=#new_file.jpg" \
https://github.s3.amazonaws.com/
In the first part however, I need to get the file size (and content type if it's easy, not required though), so some variable:
{"name": "new_file.jpg","size": $(FILE_SIZE),"description": "Latest release","content_type": "text/plain"}
I tried this but it doesn't work (Mac 10.6.7):
$(shell du path/to/file.js | awk '{print $1}')
Any ideas how to accomplish this?
If you have GNU coreutils:
FILE_SIZE=$(stat -L -c %s $filename)
The -L tells it to follow symlinks; without it, if $filename is a symlink it will give you the size of the symlink rather than the size of the target file.
The MacOS stat equivalent appears to be:
FILE_SIZE=$(stat -L -f %z)
but I haven't been able to try it. (I've written this as a shell command, not a make command.) You may also find the -s option useful:
Display information in "shell output", suitable for initializing variables.
For reference, an alternative method is using du with -b bytes output and -s for summary only. Then cut to only keep the first element of the return string
FILE_SIZE=$(du -sb $filename | cut -f1)
This should return the same result in bytes as #Keith Thompson answer, but will also work for full directory sizes.
Extra: I usually use a macro for this.
define sizeof
$$(du -sb \
$(1) \
| cut -f1 )
endef
Which can then be called like,
$(call sizeof,$filename_or_dirname)
I think this is a case where parsing the output of ls is legitimate:
% FILE_SIZE=`ls -l $filename | awk '{print $5}'`
(no it's not: use stat, as noted by Keith Thompson)
For the type, you can use
% FILE_TYPE=`file --mime-type --brief $filename`