Grep variable at exact point in string - bash

I have a file with numerical data, and reading the variables from another file extract the correct string.
I have my code to read in the variables.
The problem is the variable can occur at different points within the string, i only want the string that has the variable on the right-hand side, i.e. the last 8 characters.
e.g.
grep 0335439 foobar.txt
00032394850033543984
00043245845003354390
00060224460033543907
00047444423700335439
In this case its the last line.
I have tried to write something using ${str: -8}, but then I lose the data in front.
I have found this command
grep -Eo '^.{12}(0335439)' foobar.txt
This works, however when I use my script and put a variable in the place it doesn't, grep -Eo '^.{12}($string)' foobar.txt.
I have tried without brackets but it still does not work.
Update:
In this case the length of the string is always 20 characters, so counting from the LHS is OK in my case, but you are correct its was not the answer to the original question. I tried to comment the code so say this but pasting it into the comment box removed the formatting.

i only want the string that has the variable on the right-hand side, i.e. the last 8 characters
A non-regex approach using awk is better suited for this job:
s='00335439'
awk -v n=8 -v kw="$s" 'substr($0, length()-n, n) == kw' file
00043245845003354390
Here we passing n=8 to awk and using substr($0, length()-n, n) we are getting last n characters in a line, which is then compared against variable kw which is set to a value on command line.

Related

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

extract data between similar patterns

I am trying to use sed to print the contents between two patterns including the first one. I was using this answer as a source.
My file looks like this:
>item_1
abcabcabacabcabcabcabcabacabcabcabcabcabacabcabc
>item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
>item_3
cdecde
>item_4
defdefdefdefdefdefdef
I want it to start searching from item_2 (and include) and finish at next occuring > (not include). So my code is sed -n '/item_2/,/>/{/>/!p;}'.
The result wanted is:
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
but I get it without item_2.
Any ideas?
Using awk, split input by >s and print part(s) matching item_2.
$ awk 'BEGIN{RS=">";ORS=""} /item_2/' file
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
I would go for the awk method suggested by oguz for its simplicity. Now if you are interested in a sed way, out of curiosity, you could fix what you have already tried with a minor change :
sed -n '/^>item_2/ s/.// ; //,/>/ { />/! p }' input_file
The empty regex // recalls the previous regex, which is handy here to avoid duplicating /item_2/. But keep in mind that // is actually dynamic, it recalls the latest regex evaluated at runtime, which is not necessarily the closest regex on its left (although it's often the case). Depending on the program flow (branching, address range), the content of the same // can change and... actually here we have an interesting example ! (and I'm not saying that because it's my baby ^^)
On a line where /^>item_2/ matches, the s/.// command is executed and the latest regex before // becomes /./, so the following address range is equivalent to /./,/>/.
On a line where /^>item_2/ does not match, the latest regex before // is /^>item_2/ so the range is equivalent to /^>item_2/,/>/.
To avoid confusion here as the effect of // changes during execution, it's important to note that an address range evaluates only its left side when not triggered and only its right side when triggered.
This might work for you (GNU sed):
sed -n ':a;/^>item_2/{s/.//;:b;p;n;/^>/!bb;ba}' file
Turn off implicit printing -n.
If a line begins >item_2, remove the first character, print the line and fetch the next line
If that line does not begins with a >, repeat the last two instructions.
Otherwise, repeat the whole set of instructions.
If there will always be only one line following >item_2, then:
sed '/^>item_2/!d;s/.//;n' file

Extract a substring (value of an HTML node tag) in a bash/zsh script

I'm trying to extract a tag value of an HTML node that I already have in a variable.
I'm currently using Zsh but I'm trying to make it work in Bash as well.
The current variable has the value:
<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>
and I would like to get the value of data-count (in this case 0, but could be any length integer).
I have tried using cut, sed and the variables expansion as explained in this question but I haven't managed to adapt the regexs, or maybe it has to be done differently for Zsh.
There is no reason why sed would not work in this situation. For your specific case, I would do something like this:
sed 's/.*data-count="\([0-9]*\)".*/\1/g' file_name.txt
Basically, it just states that sed is looking for the a pattern that contains data-count=, then saves everything within the paranthesis \(...\) into \1, which is subsequently printed in place of the match (full line due to the .*)
Could you please try following.
awk 'match($0,/data-count=[^ ]*/){print substr($0,RSTART+12,RLENGTH-13)}' Input_file
Explanation: Using match function of awk to match regex data-count=[^ ]* means match everything from data-count till a space comes, if this regex is TRUE(a match is found) then out of the box variables RSTART and RLENGTH will be set. Later I am printing current line's sub-string as per these variables values to get only value of data-count.
With sed could you please try following.
sed 's/.*data-count=\"\([^"]*\).*/\1/' Input_file
Explanation: Using sed's capability of group referencing and saving regex value in first group after data-count=\" which is its length, then since using s(substitution) with sed so mentioning 1 will replace all with \1(which is matched regex value in temporary memory, group referencing).
As was said before, to be on the safe side and handle any syntactically valid HTML tag, a parser would be strongly advised. But if you know in advance, what the general format of your HTML element will look like, the following hack might come handy:
Assume that your variable is called "html"
html='<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>'
First adapt it a bit:
htmlx="tag ${html%??}"
This will add the string tag in front and remove the final />
Now make an associative array:
declare -A fields
fields=( ${=$(tr = ' ' <<<$htmlx)} )
The tr turns the equal sign into a space and the ${= handles word splitting. You can now access the values of your attributes by, say,
echo $fields[data-count]
Note that this still has the surrounding double quotes. Yuo can easily remove them by
echo ${${fields[data-count]%?}#?}
Of course, once you do this hack, you have access to all attributes in the same way.

awk and sed command Special Character in matching pattern for range [duplicate]

NOTE: I am a noob at bash scripts and the awk command - please excuse any dumb mistakes I make.
I am unable to substitute shell variables into my awk pattern. I am trying to scan through a file, find the first occurence of a specific string in the file, and print each line that succeed it in order until it hits an empty string/line.
I don't know the string I am searching for in advance, and I would like to substitute in that variable.
When I run this with the string directly specified (e.g "< main>:"), it works perfectly. I've already searched on how awk patterns work, and how to substitute in variables. I've tried using the -v flag for awk, directly using the shell variable - nothing works.
funcName="<${2}>:"
awk=`awk -v FN="$funcName" '/FN/,/^$/' "$ofile"`
rfile=search.txt
echo -e "$awk" > "$rfile"
The error is just that nothing prints. I want to print all the lines between my desired string and the next empty line.
Could you please try following, haven't tested it because no clear samples but should work.
funcName="<${2}>:"
awk_result=$(awk -v FN="$funcName" 'index($0,FN){found=1} found; /^$/{found=""}' "$ofile")
rfile=search.txt
echo -e "$awk_result" > "$rfile"
Things fixed in OP's attempt:
NEVER keep same name of a variable as a binary's name or on a keyword's name so changed awk variable name to awk_result.
Use of backticks is depreciated now, so always wrap your variable for having values in var=$(......your commands....) fixed it for awk_result variable.
Now let us talk about awk code fix, I have used index method which checks if value of variable FN is present in a line then make a FLAG(a variable TRUE) and make it false till line is empty as per OP's ask.

Variable not getting assigned in bash after a curl hit

I have a shell script where I have a statement:
isPartial = $searchCurl| grep -Po '\"partialSearch\":(true|false)'|sed 's/\\\"partialSearch\\\"://'
now, if I just echo the RHS
$searchCurl| grep -Po '\"partialSearch\":(true|false)'|sed 's/\\\"partialSearch\\\"://'
it prints "partialSearch":true, but the variable isPartial doesn't get initialized .
Why is this happening and how can I fix it ?
Since the number of backslashes in your examples varies, it is not clear to me if the double quotes are already escaped in the input text. I’ll assume they are not, i.e. the input text looks something like:
sometext... "partialSearch":true ... sometext...
..bla bla bla... "partialsearch":false ...
and my examples below will work under this assumption.
There are a number of points to be made.
You seem to be trying to parse JSON input with regular expressions. While this could be acceptable for quick-and-dirty one-time jobs where you know the exact format of the data being processed, in general it is a very bad idea. You should use a JSON parser like jq.
You obviously have stored some bash code in the variable searchCurl. This is considered bad practice. Instead of searchCurl="... code ..." you should do function searchCurl () { ... code ... } and call searchCurl without prefixing it with a dollar sign. Variables are for values, functions are for code.
In most cases, if you are going to use sed, it’s better to use it for everything without invoking grep. Sometimes it can be simpler to have both. See below for an example.
To assign the output of a command to a variable, you have to use command substitution.
In short, if in your input text you have only one match of '"partialSearch":(true|false)', this is what you want:
isPartial=$(searchCurl|sed -rn 's/^.*"partialSearch":(true|false).*$/\1/p')
If you have more and the input text is one big line as I suppose, usage of grep -o might simplify the task of splitting the input into one match per line, so that
isPartial=$(searchCurl|grep -Po '"partialSearch":(true|false)'|sed -e 's/^.*://')
might be what you want (and in this case, isPartial will hold a space-separated list of true and false).

Resources