If you run this YAML 1.1
- &first {'first': ['description', ['aliases'], ["Explanatory sentences ", "go here."]]}
- *first
- &second 'second':
- 'description'
- ['aliases']
-
- "Explanatory sentences "
- "go here."
- *second
through YAMLlint, you get this:
---
-
first:
- description
-
- aliases
-
- "Explanatory sentences "
- "go here."
-
first:
- description
-
- aliases
-
- "Explanatory sentences "
- "go here."
-
second:
- description
-
- aliases
-
- "Explanatory sentences "
- "go here."
- second
Notice that the first group is repeated twice, while the second group is only shown in full once, with just the name where the repeated block should be. The first group and the second group have exactly the same data - the only difference is the layout. Why doesn't the alias work properly for the second group?
My best guess is that the &anchor has very high precedence. I tried this
- &first 'first': ['description', ['aliases'], ["Explanatory sentences ", "go here."]]
- *first
Rather than this:
- &first {'first': ['description', ['aliases'], ["Explanatory sentences ", "go here."]]}
- *first
And suddenly it behaved the same way as the second group. So it appears that unless you explicitly include the 'first' in a larger node, the &first anchor attaches to just the 'first' string and nothing else.
Related
here is a little example:
02-09-17 1:01 PM - Some User (Add comments)
Hello,
How are you?
Regards,
02-09-17 3:29 PM - Another User (Add comments)
Hey,
Thanks, all is fine.
Some another text here.
02-09-17 4:30 AM - Just a User (Add comments)
some text
with
multiline
I want to parse and process this three comments. What is the best way for this?
Tried regex like this - http://www.rubular.com/r/k1CHJ1STTD but have problems with /m flag. Without multiline flag for regex - can`t catch "body" of comment.
Also tried to split by regex:
text_above.split(/^(\d{1,2}-\d{1,2}-\d{2} \d{1,2}:\d{1,2} [AP]M - .+ \(Add comments\))/)
=> ["",
"02-09-17 1:01 PM - Some User (Add comments)",
"\n" + "Hello,\n" + "\n" + "How are you?\n" + "\n" + "Regards,\n" + "\n",
"02-09-17 3:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 4:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n" + "\n",
"02-09-17 5:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 6:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n"]
But this is not comfortable solution.
Ideally I want to get regex captures with three or two group matches, for example:
1. 02-09-17 1:01 PM
2. Some User (Add comments)
3. Hello,
How are you?
Regards,
for each comment, or, Array of comments:
[['02-09-17 1:01 PM - Some User (Add comments) Hello,
How are you?
Regards,'],[...]]
Any ideas? Thanks.
You can keep it simple using two splits (one for the whole string and one for each block):
text.split(/\n\n(?=\d\d-)/).map { |m| m.split(/ - |\n/, 3) }
You can also use the scan method, but it's a little more fastidious:
text.scan(/([\d-]+[^-]+) - (.*)\n(.*(?>\n.*)*?(?=\n\n\d\d-|\z))/)
slice_before might be easier to understand than a huge scan, and it has the advantage of keeping the pattern (split removes it)
data = text.each_line.slice_before(/^\d\d\-\d\d\-\d\d/).map do |block|
time, user = block.shift.strip.split(' - ')
[time, user, block.join.strip]
end
p data
# [["02-09-17 1:01 PM",
# "Some User (Add comments)",
# "Hello,\n\nHow are you?\n\nRegards,"],
# ["02-09-17 3:29 PM",
# "Another User (Add comments)",
# "Hey,\n\nThanks, all is fine.\n\nSome another text here."],
# ["02-09-17 4:30 AM",
# "Just a User (Add comments)",
# "some text\nwith\nmultiline"]]
You can use this regular expression:
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) - (.*?)\r?\n((?:.|\r?\n)+?)(?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$)
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) matches the first group, the date and time. The date must consist of three numbers, separated by a dash, followed by the time with AM/PM
(.*?)\r?\n((?:.|\r?\n)+?) matches the username up to the first line break (\r?\n) as the second group. Afterwards, anything including linebreaks is matching and building the third group, the comment.
This won't work, because it would handle everything from the beginning of the comment up to the end of the file as a comment. Therefore, you need to select the next date/time format, so that it stops there. You can do this just by repeating the date/time format after the comment and matching non-greedy, but this will include the next datetime already in the current match and therefore exclude it in the next match (which will lead to a skip of every second match). To circumvent this, you can use a positive lookahead: (?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$). This matches a number afterwards, but does not include it in the match. The last comment must then end at the end of the string $.
You need to use the global flag /g but mustn't use the multi-line flag /g, because the matching of the comment goes over multiple lines.
Here is a live example: https://regex101.com/r/o63GQE/2
I am writing a Ruby gem that converts a sql executable to yaml, using Yaml::dump. However, when testing it in Postgresql I am finding that the integers are output with single quotes around them (as strings) unless they start with a zero. Below is the code snippet of the call to Yaml::dump and some resulting data.
db_object = {}
db_output = {}
full_table = ActiveRecord::Base.connection.execute("SELECT * FROM #{model};")
keys = full_table[0].keys
db_object["columns"] = keys
model_arr=[]
full_table.each do |row|
model_arr << row.values_at(*keys)
end
db_object["records"] = model_arr
db_output[model] = db_object
YAML::dump(db_output, file)
And here are the first couple rows of results:
schema_migrations:
columns:
- version
records:
- - '20121225230020'
- - '20121225230129'
---
students:
columns:
- id
- first_name
- last_name
- date_of_birth
- rank
- phone
records:
- - '1'
- Celestino
- Towne
- '2007-09-20'
- '2'
- '6417358360'
Any insight would be much appreciated.
I working with some regular expression matching and I'm trying to figure out how you would exclude a specific character pattern. Specifically, I want to exclude the following pattern:
5 - #in words: digit, space, dash & space)
I know how to exclude the components individually: [^5 ^-] but I'm looking to exclude the specific pattern. Is this possible?
Update - I'm using Ruby as my programming language.
Here is some sample input and desired output.:
Input: 1 - Blue-Stork Stables; 2 - Young, Robert, S.; 3 - Seahorse Stable; 4 - Carney, Elvis; 5 - Guerrero, Juan, Carlos-Martin; 6 - Dubb, Michael; 7 - Summers, Hope; 8 - DTH Stables; 9 - Peebles, Matthew\n
the desired output would be:
Output: Blue-Stork Stables; Young, Robert, S.; Seahorse Stable; Carney, Elvis; Guerrero, Juan, Carlos-Marting; Dubb, Michael; Summers, Hope; DTH Stables; Peebles, Matthew\n
Please take note of the dashes on Blue-Stork Stables and Juan Carlos-Martin.
EDIT: So you mean "remove", not "exclude". No problem:
result = subject.gsub(/\d+ - /, '')
transforms your input into the desired output. I've taken the liberty to allow more than one digit (after all, if numbers reach 10 or higher, you probably want to remove those entirely, too. Right?).
(Old answer for "historical reasons")
Depending on what you mean by "exclude", it appears that you're looking for negative lookahead assertions:
^(?!.*\d - )
will fail on strings that contain 5 - anywhere and succeed on all other strings:
"5 - " // fail
"5 -" // match
"abc5 - xyz" // fail
"foobar5 - " // fail
items:
house:
- bathroom:
- toothbrush
- soap
- bedroom:
- bed:
- pillow
- sheet
- closet:
- clothes:
- underwear
- socks
garden:
- treehouse:
- toys:
- nerfgun
- car
- window
- garage:
- car
- toolbox:
- hammer
- scewdriver
- pliers
- lawnmower
Here is another try at this document, it has no compound list (I guess that's how it's called).
items2:
house:
- bathroom:
- toothbrush
- soap
- bedroom:
- bed:
- pillow
- sheet
- closet:
- clothes:
- underwear
- socks
Which of those two yaml documents are valid ? I'm still wondering if I can use a list of keyed lists like that (nested list ?):
items:
- list1:
-itemA
-itemB
- list2:
-itemC
-itemD
You can use this to check if your yaml is ok: yamlint
It's seems ok.
Yes, it's valid YAML (well, the first two are; in the third, make sure that you have a space after your - in the sequences); but it may not do exactly what you think. In your toy example
items:
- list1:
- itemA
- itemB
- list2:
- itemC
- itemD
the value associated with items is a sequence; and each entry of that sequence is a map with a single key/value pair (for the first entry, the key is list1, and in the second, list2).
What may have confused you in your first real example was how to access each element. Since you tagged this yaml-cpp, here's how you would get, say, the list of the toys in the greenhouse of your first example:
doc["items"]["garden"][0]["treehouse"][0]["toys"];
(Note the [0] before accessing the "treehouse" and "toys" keys.)
I am trying to handle an array with Ruby v1.9.2 but it has some strange behavior.
The best explanation may be done with examples:
CASE 1 TEST
#test1 = "image/bmp, image/gif, image/jpg".split(',')
Debug #test1:
---
- image/bmp # why this?!
- " image/gif"
- " image/jpg"
CASE 2 TEST
#test2 = ", image/bmp, image/gif, image/jpg".split(',')
Debug #test2:
---
- "" # why this?!
- " image/bmp"
- " image/gif"
- " image/jpg"
WHAT I NEED
Notice: I can use the CASE 2 TEST, but I would like to do things right and better.
Debug that I would like to have:
---
- " image/bmp"
- " image/gif"
- " image/jpg"
In the test case 1 there is no space before "image/bmp" in the result because there is no space before "image/bmp" in the original string.
In the test case 2 there is an empty string at the beginning because the string starts with a comma, and for every separator in the string there is a string in the resulting array, containing what comes before that separator (which in this case means the empty string).
If you want the result you've shown, you could just add a space (but no comma) before "image/bmp" in the source string. Alternatively you could split by /, */ and then add one space before each string with map. Though frankly I don't get why you want a space before each string.
>> ", image/bmp, image/gif, image/jpg".split(/\s*,\s*/).select{|x| x!=""}
=> ["image/bmp", "image/gif", "image/jpg"]