Looking to replace the text in a file after match found in ruby - ruby

I have data in the below format in a .txt file:
parameter1=12345 parameter2=23456 parameter3=23456 and so on.. the list is a long one.
I have found a way to match the parameter1 and so on and replace it with some other number.
modified_file=File.read("modified_file.txt",)
modified_file=modified_file.to_s.sub(/#{parameter1}=/, "some text of your choice")
The above regular expression would only replace the value with parameter1= but I intend to change following parameter1=.
I want to write a regular expression which can match the data up to = and replace the data following that.
For Eg: I want to replace 12345 to abcde and 23456 to xyzab so the final result would be:
parameter1=abcde parameter2=xyzab and so on..

/(?<=parameter1=)\S+/
What you want is called a "lookbehind".

Related

Bash: Find line similar to searchstring

I have a csvfile like:
col1,col2
A,100foo
A,104foo
B,110bar
C,111bar
Now I have a searchstring
B,112
Which shall return line:
B,110bar
Or a searchstring
A,103
Which Shall return A,100foo
So I am always looking for the line 'smaller' than the searchstring.
The second column is not a number, so I cannot do math operations.
I more need something like an 'inaccurate search'.
Can I do that in Bash?
The file can be Sorte alphabetically, so I was thinking about like a 'grep-like' and the take the line before.
It's not really clear how inaccurate the search is allowed to be.
Would searching all lines that begin with the same character of searchstring do?
str = "B,110"
grep "^${str:0:1}" csvfile
or are there more requirements on the format of the line?

Extract 2 fields from string with search

I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".

How do I create an XPath query to extract a substring of text?

I am trying to create xpath so that it only returns order number instead of whole line.Please see attached screenshot
What you want is the substring-after() function -
fn:substring-after(string1,string2)
Returns the remainder of string1 after string2 occurs in it
Example: substring-after('12/10','/')
Result: '10'
For your situation -
substring-after(string(//p[contains(text(), "Your order # is")]), ": ")
To test this, I modified the DOM on this page to include a "Order Number: ####" string.
See it in action:
You could also just use your normal Xpath selector to get the complete text, being "Your oder # is: 123456" and then perform a regex on the string like mentioned in Get numbers from string with regex

How to read a file's content and search for a string in multiple files

I have a text file that has around 100 plus entries like out.txt:
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
I have to read out.txt, line-by-line and check whether the strings like "domain\1esrt" are present in any of the files under a different directory. If present delete only that string occurrence and save the file.
I know how to read a file line-by-line and also know how to grep for a string in multiple files in a directory but I'm not sure how to join those two to achieve my above requirement.
You can create an array with all the words or strings you want to find and then delete/replace:
strings_to_delete = ['aaa', 'domain\1esrt', 'delete_me']
Then to read the file and use map to create an array with all the lines who doesn't match with none of the elements in the array created before:
# read the file 'text.txt'
lines = File.open('text.txt', 'r').map do|line|
# unless the line matches with some value on the strings_to_delete array
line unless strings_to_delete.any? do |word|
word == line.strip
end
# then remove the nil elements
end.reject(&:nil?)
And then open the file again but this time to write on it, all the lines which didn't match with the values in the strings_to_delete array:
File.open('text.txt', 'w') do |line|
lines.each do |element|
line.write element
end
end
The txt file looks like:
aaa
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
delete_me
I don't know how it'll work with a bigger file, anyways, I hope it helps.
I would suggest using gsub here. It will run a regex search on the string and replace it with the second parameter. So if you only have to replace any single string, I believe you can simply run gsub on that string (including the newline) and replace it with an empty string:
new_file_text = text.gsub(/regex_string\n/, "")

Regular expression to fetch the value from a given string

I have the following string:
a=<record><FPR_AGENT_CODE>990042833</FPR_AGENT_CODE><FPR_AGENT_LABELCODE>CIF Code :</FPR_AGENT_LABELCODE><FPR_AGENT_LABELNAME>CIF Name :</FPR_AGENT_LABELNAME>
I need to get the value from:
<FPR_AGENT_CODE>990042833</FPR_AGENT_CODE>
to
"FPR_AGENT_CODE 990042833 FPR_AGENT_CODE"
How can I write the regular expression for this? I tried using the one given below, but it's not working.
puts a[/<.*>.*<\/.*>/]
You can use scan with the following regex:
/<([^>]+)>(\d+)<\/\1>/
Sample code:
a="<record><FPR_AGENT_CODE>990042833</FPR_AGENT_CODE><FPR_AGENT_LABELCODE>CIF Code :</FPR_AGENT_LABELCODE><FPR_AGENT_LABELNAME>CIF Name :</FPR_AGENT_LABELNAME><FPR_AGENT_NAME>Mr Kamal Kishore</FPR_AGENT_NAME><FPR_BANK_BRANCH_NAME>STATE BANK OF INDIA KHOUR</FPR_BANK_BRANCH_NAME><FPR_BRANCH_ADDRESS>"
puts a.scan(/<([^>]+)>(\d+)<\/\1>/)
Output:
FPR_AGENT_CODE
990042833
The regex <([^>]+)>(\d+)<\/\1> searches for a string in angle brackets (capturing the text into group 1), then a sequence of 1 or more digits (\d+), and then the closing tag.
If you need to get multiple values, you can use:
puts a.scan(/<([^>]+\b)[^<>]*>(.*?)<\/\1>/)
See another demo, output:
FPR_AGENT_CODE
990042833
FPR_AGENT_LABELCODE
CIF Code :
FPR_AGENT_LABELNAME
CIF Name :
FPR_AGENT_NAME
Mr Kamal Kishore
FPR_BANK_BRANCH_NAME
STATE BANK OF INDIA KHOUR
For multiline input, either use m option, or replace (.*?) with ([^<]*).
puts a.scan(/<([^>]+\b)[^<>]*>(.*?)<\/\1>/m)
Or
puts a.scan(/<([^>]+\b)[^<>]*>([^<]*)<\/\1>/)
See another demo

Resources