Strip text from string in shell script - shell

I have the variables below in a large and important SH file and I need to remove some data from a variable and keep only part of the text.
I get "repoTest" with a link to an internal git repo and I need the variable "nameAppTest" to only receive the constant data after the last "/".
Example:
I get: repoTest="ssh://git#code.br.repo.local/code/ecsb/name-repo.git"
I try to do a split: nameAppTest=$(echo "$repoTest"|cut -d'/' -f5|sed -e 's/.git//g')
Response I get: echo "$nameAppTest" (ecsb).
What I expect to receive: name-repo
I tried like this and failed: nameAppTest=$(echo "$repoTest"|cut -d'/' -f5|sed -e 's/.git//g')

Here's a nifty trick:
nameAppTest=$(basename "$repoTest" .git)
Uses basename to get just the last component of the URL, and strip the extension all in one step.
You can also use sh parameter expansion to do it in two steps without any external programs:
# Remove everything up to and including the last /
temp="${repoTest##*/}"
# Remove the trailing .git
nameAppTest="${temp%.git}"

Related

Extract image URI from markdown files using sed/grep containing duplicates in a single line

I have some markdown files to process which contain links to images that I wish to download. e.g. a markdown file:
[![](https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png)
a lot of text
some more text...
[![](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif)
some more text
another URL but not image
[https://github.com]
so on
I am trying to parse through this file and extract the list of image URLs, which I can later pass on wget command to download.
So far I have used grep and sed and have got results:
$ sed -nE "/https?:\/\/[^ ]+.(jpg|png|gif)/p" $path
[![](https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png)
[![](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif)
$ grep -Eo "https?://[^ ]+.(jpg|png|gif)" $path
https://imgs.xkcd.com/comics/git.png)](https://imgs.xkcd.com/comics/git.png
https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s320/take_a_break_git.gif)](https://1.bp.blogspot.com/-Ze2SiBflkZ4/XbtF1TjELcI/AAAAAAAALL4/IDC6W-b5moU0eGu2eN60aZ4pxfXW1ybmQCLcBGAsYHQ/s1600/take_a_break_git.gif
The regex is essentially working fine, but the issue is that as the same URL is present twice in the same line, the text selected is the first occurrence of https and last occurrence of jpg|png|gif. But I want the first occurrence of https and first occurrence of jpg|png|gif
How can fix this?
P.S. I have also tried lynx -dump -image_links -listonly $path but this prints the entire file.
I am also open to other options that solve the purpose, and as long as I can hook the code up in my current shell script.
You may add square brackets into the negated bracket expression:
grep -Eo "https?://[^][ ]+\.(jpg|png|gif)"
See the online demo. Details:
https?:// - http:// or https://
[^][ ]+ - one or more chars other than ], [ and space
\. - a dot
(jpg|png|gif) - either of the three alternative substrings.

access filename and file contents in git smudge filter

I am writing git smudge filter.
.gitconfig
[filter "smudgey"]
smudge = smudge_filter
smudge_filter
#!/usr/bin/env bash
# $Id Date: Wed, Mar 25, 2020 1:41:34 PM, User: Joey Gough, Branch: master$
IFS=
log_string="\$Log\nhello world"
changed_data=$(sed s/\$Log[^$]*/"$log_string"/g $1)
echo $changed_data
filtered file
$Log$
Result
When I check out this file, it converts the Log tag and inserts "hello world"
$Log
hello world$
Situation
When I rewrite the .gitconfig to this:
[filter "smudgey"]
smudge = smudge_filter --smudge %f
It prints out two newlines and that's all.
I have tried so many different approaches and so far it seems as though I cannot access the filename and the file contents at the same time in a Bash script.
Question
How do I access the file contents and the filename at the same time in the git filter? Or can I?
How do I access the file contents ...
There are no file contents. Or maybe a better way to phrase this is: there are contents. They are not (yet) in any file.
and the filename at the same time ...
You have the method for getting the file's name, via the %f directive.
The important thing to keep in mind is that the file does not yet exist.1 The contents will go into that file after you filter them!
If the sed command does what you want, keep the sed command as it is. If you want to put the file name in somewhere, do that separately.
Here's a smudge filter that replaces fill in the blank with blanks, and inserts the file's name at the top:
#! /bin/sh
# invoked as "smudge %f" from .gitconfig settings
printf "%s\n" "$#"
sed 's/fill in the blank/_________________/'
Here's a different smudge filter that replaces __myname__ with the file's name:
#! /bin/sh
quoted=$(printf "%s" "$#" | sed -e 's,/,\\/,g' -e 's,&,\\&,g')
sed "s/__myname__/$quoted/"
(The quoting trick is to make sure that $quoted does not expand to characters that affect the sed substitute command: forward slash is the delimiter and ampersand would be replaced by the left hand side.)
1Well, the file may or may not exist. It may be empty. In your case it apparently does exist and is mostly or entirely empty. There are various race conditions here as the filtering is part of a pipeline with different processes doing different things.
Note that if you switch to a long-lived filter, the details change, but the overall strategy is the same: the text you will filter is not yet in the target file(s).

Filter response from "git diff" command to get only the difference in Shell - Dynamic Solution

I am trying automate a redundant deployment process in my project. In order to achieve that I am trying to get the difference between two branches using "git diff" -- Someway and I am able to achieve that using the following command.
git diff <BRANCH_NAME1> -- common_folder_name/ <BRANCH_NAME2> -- common_folder_name/ > toStoreResponse.txt`
Now the response that I get, looks something like below:
diff --git a/cmc-database/common/readme.txt b/cmc-database/common/readme.txt
index 7820f3d..5a0e484 100644
--- a/cmc-database/common/readme.txt
+++ b/cmc-database/common/readme.txt
## -1 +1,5 ##
-This folder contains common database scripts.
\ No newline at end of file
+This folder contains common database scripts.
+TEST STTESA
\ No newline at end of file
So here in the above response only line/text that is a new line or the difference between the two branches is TEST STTESA and I want to store only that much of text in some different text file using shell / git way.
i.e a file named readme.txt which will only contain TEST STTESA content.
Work around Solution:
I have found a workaround to filter the response - but however it is not 100% what I am looking for. Command looks like below:
git diff <Branch_Name1> -- common-directory/ <Branch_Name2> -- common-directory/ | grep -v common-directory | grep -v index | grep -v # | grep -v \\
The above command returns below response:
-This folder contains common database scripts.
+This folder contains common database scripts.
+TEST STTESA
But I want to be able to store only the difference which is TEST STTESA
As you can easily realize, your solution won't work every time. The grep -v parts make it unportable.
Here is a "step0" solution : You want to match lines that start with a "+" or a "-" and then neither a "+" nor a "-". Use grep for that !
git diff ... | grep "^+[^+]\|^-[^-]"
Some explanation :
First, the \| part in the middle is an "or" statement.
Then, each side starts with a ^ which refers to the beginning of the line. And finally, after the first character, we want to reject some characters, using the [^...] syntax.
The line above translates to English as "Run the diff, and find all the lines that either start with a +, followed by something that is not a +, OR start with a -, followed by something that is not a -.
This will not work properly if you remove a line that started with a -. Nor if you add a line that starts with a +.
For such scenarii, I would tinkle with git diff --color and grep some [32m for the fun.
--diff-filter=[ACDMRTUXB*]
Select only files that are
A Added
C Copied
D Deleted
M Modified
R Renamed
T have their type (mode) changed
U Unmerged
X Unknown
B have had their pairing Broken
and * All-or-none

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?
sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.
Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Wget: read URL from file, add sequence of number to the URL

I am reading a file (with URL's) line by line:
#!/bin/bash
while read line
do
url=$line
wget $url
wget $url_{001..005}.jpg
done < $1
For first, I want to download primary url as you see wget $url. After that I want to add to the url sequence of numbers (_001.jpg, _002.jpg, _003.jpg, _004.jpg, _005.jpg):
wget $url_{001..005}.jpg
...but for some reason it's not working.
Sorry, missed out one thing: the url's are like http://xy.com/052914.jpg. Is there any easy way to add _001 before the extension? http://xy.com/052914_001.jpg. Or I have to remove ".jpg" from the file containing URL's then simply add later to the variable?
Another way escaping the underscore char:
wget $url\_{001..005}.jpg
Try encapsulating your variable name:
wget ${url}_{001..005}.jpg
Bash is trying to expand the variable $url_ in your command.
As for your jpg within the URL followup, see substring expansion in the bash manual.
wget ${url:0: -4}_{001..005}.jpg
The :0: -4 means, expand to the variable from position zero (the first character), minus the last 4 characters.
Or from this answer:
wget ${url%.jpg}_{001..005}.jpg
%.jpg removes .jpg specifically and will work on older versions of bash.

Resources