Grep Search in Shell Script - shell

Trying to implement a system which is based on shell script and PHP. Getting the string from PHP file and processing through shell scripts.
Every time it's working except some time where strings are like : "/jobs?location_country=united+states&sort_by=cfml10%2cdesc&v_location=usa"
grep for this command not working.
How to solve this?
Code is :
hcm=$(php largest.php "$file"_hcm_input.txt "$remove")
echo "$hcm"
grep "$hcm" "$file"_sorted.txt > "$file"_jobs.txt

Use grep -F or grep --fixed-strings to tell grep to treat the argument as a fixed string rather than a regex.

As you have & in the string, even if you enclose it in single/double quotes it won't work.
In this case you have to escape &, using \ like \&.

After getting lot hit and trial found out that this is not the issue with & = or + as i have checked individually
grep "=" s37_sorted.txt
grep "&" s37_sorted.txt
grep "+" s37_sorted.txt
Giving me output.
Exact reason was the case. So we need to find with case insensitive manner for that we have to follow the following code and parameter with grep is -i
hcm=$(php largest.php "$file"_hcm_input.txt "$remove")
echo "Highest Common String:"$hcm
grep -i "$hcm" "$file"_sorted.txt > "$file"_jobs.txt
Now it's showing me 366 records.

Related

Regex: match only string C that is in between string A and string B

How can I write a regex in a shell script that would target only the targeted substring between two given values? Give the example
https://www.stackoverflow.com
How can I match only the ":" in between "https" and "//".
If possible please also explain the approach.
The context is that I need to prepare a file that would fetch a config from the server and append it to the .env file. The response comes as JSON
{
"GRAPHQL_URL": "https://someurl/xyz",
"PUBLIC_TOKEN": "skml2JdJyOcrVdfEJ3Bj1bs472wY8aSyprO2DsZbHIiBRqEIPBNg9S7yXBbYkndX2Lk8UuHoZ9JPdJEWaiqlIyGdwU6O5",
"SUPER_SECRET": "MY_SUPER_SECRET"
}
so I need to adjust it to the .env syntax. What I managed to do this far is
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
so basically I fetch the data, then extract the key I am after with jq, and then I use sed to first replace all ":" to "=" and after that I remove all the quotations and semicolons and white spaces that comes from JSON and leave some characters that are necessary.
I am almost there but the problem is that now my graphql url (and only other) would look like so
https=//someurl/xyz
so I need to replace this = that is in between https and // back with the colon.
Thank you very much #Nic3500 for the response, not sure why but I get error saying that
sed: 1: "s/:/=/g;s#https\(.*\)// ...": \1 not defined in the RE
I searched SO and it seems that it should work since the brackets are escaped and I use -r flag (tried -E but no difference) and I don't know how to apply it. To be honest I assume that the replacement block is this part
#\1#
so how can I let this know to what character should it be replaced?
This is how I tried to use it
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s#https\(.*\)//.*#\1#;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
Hope with this context you would be able to help me.
echo "https://www.stackoverflow.com" | sed 's#https\(.*\)//.*#\1#'
:
sed operator s/regexp/replacement/
regexp: https\(.*)//.*. So "https" followed by something (.*), followed by "//", followed by anything else .*
the parenthesis are back slashed since they are not part of the pattern. They are used to group a part of the regex for the replacement part of the s### operator.
replacement: \1, means the first group found in the regex \(.*\)
I used s###, but the usual form is s///. Any character can take the place of the / with the s operator. I used # as using / would have been confusing since you use / in the url.
The problem is that your sed substitutions are terribly imprecise. Anyway, you want to do it in jq instead, where you have more control over which parts you are substituting, and avoid spawning a separate process for something jq quite easily does natively in the first place.
curl -s url |
jq -r '.property.source | to_entries[] |
"\(.key)=\"\(.value\)\""' > .env.test
Tangentially, capturing the output of curl into a variable just so you can immediately cat it once to standard output is just a waste of memory.

Append text to top of file using sed doesn't work for variable whose content has "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three
The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).
The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g
A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.
add \ before special characters:
s/\?page=one&/page\/one\//g
etc.
In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}
this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three
replace.txt should be
s/?page=/\/page\//g
s/&//g
please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /
Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";
A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file
You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

Why is this grep group failing

I am trying to do something like this on my OSX terminal
> grep -i "((\D*)ful)" ./Myfile.rtf
The above statement fails however when I do this
> grep -i "\D*ful" ./Myfile.rtf
it passes - does grep have an issue with regex groups
Since basic grep uses BRE, you need to use \(..\) for capturing group.
grep -i "\(\(\D*\)ful\)" ./Myfile.rtf
The most likely problem when this sort of thing happens is that the special characters are or are not special. In this case, I think the brackets are not special unless you quote them, so:
> grep -i "\(\(\D*\)ful\)" ./Myfile.rtf
would probably work better.
[One of the irritations of regex is the variation that has developed in exactly how they are written...]

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

bash if specific character exists

I am creating a script to automate partitioning of a harddrive.
Now I am having trouble with the following stuff:
I need to check if a "." exists in a text file. (so just the dot).
What is the best way to accomplish this?.
Example:
hdhelft=`cat /sometextfile`
if grep "." $hdhelft
then
hdhelf2=something
fi
No need to read the file into a variable unless that's what you really want. It's not entirely clear from your code what you want, but here's my interpretation. Use the -F flag to grep to get it to interpret the . as a literal, and the -q so it doesn't give any output (just yes or no). Also, if you really want $hdhelft to contain the contents of the file, use $(<filename) to get that.
hdhelft=$(</sometextfile)
if grep -qF '.' /sometextfile; then
hdhelf2=something
fi
You need to escape the . character as grep interprets that as "any character". So use:
grep '\.' yourfile

Resources