I am having a long list of strings (actually files names) $var looking like this
p1035sEthinylestradiol913
p1035sTAbs872
p946sCarbaryl1182
Now I wish to replace the string, which occurs between the first s and the first integer [1-9], with R. Hence the output should look like:
p1035sR913
p1035sR872
p946sR1182
I was trying something like this:
echo ${var/s*[1-9]/R}
But this of course will remove the first integer in the string after the smatch and that is not what I want. Can someone help me out here? Thanks a lot in advance!
To keep the matched digit you could switch from parameter expansions like ${var/s*[1-9]/R} to matching [[ string =~ pattern ]]. The matched digit could then be retrieved by BASH_REMATCH. However, you still had to do this for every entry in your list.
With sed you automatically change every line and keeping the digit is easy:
sed -E 's/s.*([0-9])/sR\1/' file
or
someCommand | sed -E 's/s.*([0-9])/sR\1/'
Related
I need to manipulate a string (URL) of which I don't know lenght.
the string is something like
https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring
I basically need a regular expression which returns this:
https://x.xx.xxx.xxx/keyword/restofstring
where the x is the current ip which can vary everytime and I don't know the number of dontcares.
I actually have no idea how to do it, been 2 hours on the problem but didn't find a solution.
thanks!
You can use sed as follows:
sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2='
s stands for substitute and has the form s=search pattern=replacement pattern=.
The search pattern is a regex in which we grouped (...) the parts you want to extract.
The replacement pattern accesses these groups with \1 and \2.
You can feed a file or stdin to sed and it will process the input line by line.
If you have a string variable and use bash, zsh, or something similar you also can feed that variable directly into stdin using <<<.
Example usage for bash:
input='https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring'
output="$(sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2=' <<< "$input")"
echo "$output" # prints https://x.xx.xxx.xxx/keyword/restofstring
echo "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring" | sed "s/dontcare[0-9]\+\///g"
sed is used to manipulate text. dontcare[0-9]\+\///g is an escaped form of the regular expression dontcare[0-9]+/, which matches the word "dontcare" followed by 1 or more digits, followed by the / character.
sed's pattern works like this: s/find/replace/g, where g is a command that allowed you to match more than one instance of the pattern.
You can see that regular expression in action here.
Note that this assumes there are no dontcareNs in the rest of the string. If that's the case, Socowi's answer works better.
You could also use read with a / value for $IFS to parse out the trash.
$: IFS=/ read proto trash url trash trash trash keyword rest <<< "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring"
$: echo "$proto//$url/$keyword/$rest"
https://x.xx.xxx.xxx/keyword/restofstring
This is more generalized when the dontcare... values aren't known and predictable strings.
This one is pure bash, though I like Socowi's answer better.
Here's a sed variation which picks out the host part and the last two components from the path.
url='http://example.com:1234/ick/poo/bar/quux/fnord'
newurl=$(echo "$url" | sed 's%\(https*://[^/?]*[^?/]\)[^ <>'"'"'"]*/\([^/ <>'"''"]*/^/ <>'"''"]*\)%\1\2%')
The general form is sed 's%pattern%replacement%' where the pattern matches through the end of the host name part (captured into one set of backslashed parentheses) then skips through the penultimate slash, then captures the remainder of the URL including the last slash; and the replacement simply recalls the two captured groups without the skipped part between them.
I'm using a weather API that outputs all data in a single line. How do I use grep to get the values for "summary" and "apparentTemperature"? My command of regular expressions is basically nonexistent, but I'm ready to learn.
{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}
Thank you!
How do I use grep to get the values for "summary" and "apparentTemperature"?
You use grep's -o flag, which makes it output only the matched part.
Since you don't know much about regex, I suggest you instead learn to use a JSON parser, which would be more appropriate for this task.
For example with jq, the following command would extract the current summary :
<whatever is your JSON source> | jq '.currently.summary'
Assume your single-line data is contained in a variable called DATA_LINE.
If you are certain the field is only present once in the whole line, you could do something like this in Bash:
if
[[ "$DATA_LINE" =~ \"summary\":\"([^\"]*)\" ]]
then
summary="${BASH_REMATCH[1]}"
echo "Summary field is : $summary"
else
echo "Summary field not found"
fi
You would have to do that once for each field, unless you build a more complex matching expression that assumes fields are in a specific order.
As a note, the matching expression \"summary\":\"([^\"]*)\" finds the first occurrence in the data of a substring consisting of :
"summary":" (double quotes included), followed by
([^\"]*) a sub-expression formed of a sequence of zero or more characters other than a double quote : this is in parentheses to make it available later as an element in the BASH_REMATCH array, because this is the value you want to extract
and finally a final quote ; this is not absolutely necessary, but protects from reading from a truncated data line.
For apparentTemperature the code will be a bit different because the field does not have the same format.
if
[[ "$DATA_LINE" =~ \"apparentTemperature\":([^,]*), ]]
then
apparentTemperature="${BASH_REMATCH[1]}"
echo "Apparent temperature field is : $apparentTemperature"
else
echo "Apparent temperature field not found"
fi
This is fairly easily understood if your skills are limited - like mine! Assuming your string is in a variable called $LINE:
summary=$(sed -e 's/.*summary":"//' -e 's/".*//' <<< $LINE)
Then check:
echo $summary
Clear
That executes (-e) 2 sed commands. The first one substitutes everything up to summary":" with nothing and the second substitutes the first remaining double quote and everything that follows with nothing.
Extract apparent temperature:
appTemp=$(sed -e 's/.*apparentTemperature"://' -e 's/,.*//' <<< $LINE)
Then check:
echo $appTemp
-3.34
As Aaron mentioned a json parser like jq is the right tool for this, but since the question was about grep, let's see one way to do it.
Assuming your API return value is in $json:
json='{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}'
The patterns you see in the parenthesis are lookbehind and lookahead assertions for context matching. They can be used with the -P Perl regex option and will not be captured in the output.
summary=$(<<< "$json" grep -oP '(?<="summary":").*?(?=",)')
apparentTemperature=$(<<< "$json" grep -oP '(?<="apparentTemperature":).*?(?=,)')
Please excuse if the question is too naive. I am new to shell scripting and am not able to find any good resource to understand the specifics. I am trying to make sense of a legacy script. Please can someone tell me what the following command does:
sed "s#s3AtlasExtractName#$i#g" load_xyz.sql >> load_abc.sql;
This command will replace all occurrences of s3AtlasExtractName with whatever $i is.
s - Substitute
# - Delimiter
s3AtlasExtractName - Word that needs substituting
# - Delimiter
$i - i variable that will be used to replace s3AtlasExtractName
# - Delimiter
g - Global Replace all instance of s3AtlasExtractName in a single line and not just the first occurrence of it
So this will parse through load_xyz.sql and change all occurrences of s3AtlasExtractName to the value of $i and append the whole of the contents of load_xyz.sql to a file called load_abc.sql with the sed substitutions.
sed is a command line stream editor. You can find information about it here:
http://www.computerhope.com/unix/used.htm
An easy example is shown below where sed is used to replace the word "test" with the word "example" in myfile.txt but output is sent to newfile.txt
sed 's/test/example/g' myfile.txt > newfile.txt
It seems that your script is performing a similar function by replacing the content of the load_xyz.sql file and storing it in a new file load_abc.sql Without more code I am just guessing but it seems that the parameter $i could be used as counter to insert similar but new values into the load_abc.sql file.
In short, this reads load_xyz.sql and replaces every occurrence of "s3AtlasExtractName" by whatever has been stored in the shell variable "i".
The long version is that sed accepts many subcommands with different formattings. Any "simple" sed command will look like 'sed '. The first letter of the subcommand tells you which operation sed is going to do with your files.
The "s" operation stands for "substitution" and is the most commonly used. It is followed by a Perl-like regexp: separator, regexp to look for, separator, value to substitute, separator, PREG flags. In your case, the separator is '#' which is pretty unusual but not forbidden, so the command substitues '$i' to every instance of 's3AtlasExtractName'. The 'g' PREG flag tells sed to replace every occurrence of the pattern (the default is to only replace its first occurrence on every line in the input).
Finally, the use of "$i" inside a double-quote-delimited string tells the shell to actually expand the shell variable 'i' so you'll want to look for a shell statement setting that (possibly a 'for' statement).
Hope this helps.
edit: I focused on the 'sed' part and kinda missed the redirection part. The '>>' token tells the shell to take the output of the sed command (i.e. the contents of load_xyz.sql with all occurrences of s3AtlasExtractName replaced by the contents of $i) and append it to the file 'load_abc.sql'.
Came across this piece of code:
for entry in $(echo $tmp | tr ';' '\n')
do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
So I do understand that initially ';' is converted to new line so that all entries in the file are on a separate line. However, I am seeing this for the first time:
rproj="${entry%%,*}"
rhash="${entry##*,}"
I do understand that this is taking everything before ',' and after comma ',' . But, is this more efficient than split? Also, if someone please explain the syntax because I am unable to relate this to regular expression or bash syntax.
These are string manipulation operators.
${string##substring}
Deletes longest match of $substring from front of $string.
Meaning it will remove everything before the first comma, including it
${string%%substring}
Deletes longest match of $substring from back of $string.
Meaning it will remove everything after the last comma, including it
Btw, I would use the internal field separator instead of the tr command:
IFS=';'
for entry in $tmp ; do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
unset IFS
Like this.
Use the read command both to split the line original line and to split each entry.
IFS=';' read -r -a entries <<< "$tmp"
for entry in "${entries[#]}"; do
IFS=, read -r rproj rhash <<< "$entry"
remoteproj["$rproj"]=$rhash
done
For performance it is best to do things without subshells. I am still getting confused between % and #, but these internal evaluations are way better than using sed, cut or perl.
The %% means "remove the largest possible matching string from the end of the variable's contents".
The ## means "remove the largest possible matching string from the beginning of the variable's contents".
You can see the working with a simple test:
for entry in key,value a,b,c
do
echo "$entry is split into ${entry%%,*} and ${entry##*,}"
done
The result of splitting key,value is obvious. When you are splitting a,b,c the field b is lost.
When using bash-style search-and-replace parameter expansion is there a way to refer to captured substrings in the pattern?
For example, I want to insert a leading 0 in filenames that end in "(some digit).mp3". The files have other parens in the name, so I need to look for the close paren closest to the end:
${x/\(([[:digit:]]\).mp3)/\(0}
This doesn't quite work, b/c it doesn't resubstitute the previous end of the string.
Is there a way in bash or zsh to refer to the captured string? $BASH_REMATCH doesn't seem to work.
You could use sed
sed 's/\(.*(\)\([0-9][0-9]*).mp3\)/\10\2/'
I think that gets the desired functionality... I'm sure there is a slicker way to do it, I'm just starting to learn sed myself.
One way to do this is to first strip the extension from the filename leaving only the base filename in name. Next use the length of name - 1 as the string index to return the last character in name. Lastly, check the lastchar against ).
name="${filename%.*}" # remove extension from filename
if [[ ${name:((${#name}-1))} == ")" ]]; then ## just isolates last char and test again ")"
# modify as needed to insert '0'
newfilename="0${filename}" # to append the '0' to beginning of filename
fi
Note: there is no need for a new variable nefilename that was just for illustration. It is perfectly OK just to add the 0 to filename with filename="0${filename}"