How to identify paragraph and store it in an Array - elisp

I have two paragraphs in a buffer which has only simple text in it:
PARAGRAPH 1
PARAGRAPH 2
I need to read from the first char of the first character of each paragraph until it's last word and store in an Array. This should be done for each paragraph. How Can I identify paragraphs if there are no extra markup tags?
If this is not possible, if I ask user to press enter twice after each paragraph how can I again split my text by identifying these? I tried regex but it doesn't work.

Here's one way to do it:
(let ((input-text "this is a sample paragraph.
this is another paragraph"))
(apply #'vector (split-string input-text "\n")))
split-string is an easy way to divide up text based on a regular expression to split on.
To convert the list of results to an array I use the function 'vector' which makes an array the parameters passed to it. In order to pass the contents of the list to that function instead of the list itself, I use 'apply'.

If some other newbie to elisp had problem: I found a way for splitting the text using a while like below:
(while (re-search-forward "[ \n][ \n]$" nil t)
..... ..... .......
)
But still not sure about how I can put it in Array while doing loop.

Related

Apache NiFi: Extracting nth column from a csv [duplicate]

I need a regular expression that can be used to find the Nth entry in a comma-separated list.
For example, say this list looks like this:
abc,def,4322,mail#mailinator.com,3321,alpha-beta,43
...and I wanted to find the value of the 7th entry (alpha-beta).
My first thought would not be to use a regular expression, but to use something that splits the string into an array on the comma, but since you asked for a regex.
most regexes allow you to specify a minimum or maximum match, so something like this would probably work.
/(?:[^\,]*,){5}([^,]*)/
This is intended to match any number of character that are not a comma followed by a comma six times exactly (?:[^,]*,){5} - the ?: says to not capture - and then to match and capture any number of characters that are not a comma ([^,]+). You want to use the first capture group.
Let me know if you need more info.
EDIT: I edited the above to not capture the first part of the string. This regex works in C# and Ruby.
You could use something like:
([^,]*,){$m}([^,]*),
As a starting point. (Replace $m with the value of (n-1).) The content would be in capture group 2. This doesn't handle things like lists of size n, but that's just a matter of making the appropriate modifications for your situation.
#list = split /,/ => $string;
$it = $list[6];
or just
$it = (split /,/ => $string)[6];
Beats writing a pattern with a {6} in it every time.

How to find if a given input is a meaningful word or not

I am trying to solve a problem where i have given a text file with words followed by newline charecter
I need to write a function which takes input as a string and should return output true if it is a meaningful word else false.
My attempt of doing it is to traverse through the text file and maintain a hash for the words . If my given input is a word which exists in the hash i would return true else false.but hash has a space complexity of O(n) how else can we achieve this.
Please help me with the solution.
You could try breaking up your text file into various text files. If your text file has words ranging from A-Z try breaking that up in a meaningful way so that you are only sorting through a subsection of those words instead of the whole dictionary. As others have pointed out we are not here to write the code for you so please post what you have tried so far so we can help!

Verifying searched text displayed is in a single line

How can I test whether a sentence (combination of four or five words) is displayed in a single line?
I have to search with a name or some other fields. After search results are displayed, I should test whether the displayed text is a single line. For example, the code below is used to verify the search result link:
//ol[contains(#class,'search results')]/li[contains(#class,'mod result') and contains(#class,'XXXXXX')]//a[contains(#href,'trk=XXXXXX')]
I am not familiar with ruby, but the following java approach should work in any language.
Assuming that your "sentence" is entirely contained in one element, you could find all occurrences with something like:
driver.findElements(By.xpath("//*[text()='your sentence']"))
Then simply test for the size of the array.
Assuming that a single or multiple lines will be contained within a single DOM element, you could use the vertical component of the element size to check for the multiple line condition.
webElement.getSize()

Xpath - joining text within hrefs

<ul><li>
A name of something
by Hello World and Goodbye for now (1975)
</li></ul>
I cannot seem to grab the text from this loop so it returns me:
"Hello World and Goodbye for now"
How do I make xpath grab these bits of text and put them together? I tried concat and also using a pipe between the two queries:
/ul/li/a[2]|/ul/li/a[3]
concat(/ul/li/a[2]|/ul/li/a[3])
But this doesn't seem to provide me with the text I want.
You can get HelloWorld using /ul/li/a[2] since it is the contents of the second a.
Goodbye for now is at /ul/li/a[3]
Now and in not in an a element, but in the li. It's the third text node (the first one is a line-break, the second one is line-break + by). You can retrieve it with /ul/li/text()[3].
To obtain the sentence you want you can use a simple concat:
concat(/ul/li/a[2],/ul/li/text()[3],/ul/li/a[3])

Using an "uncommon" delimiter for creating arrays in Ruby on Rails

I am building an app in Ruby on Rails in which I am pulling in content another file, and wonder if there's any simple way to create a unique delimiter for separating string content, or whether there's another approach I should take.
Let's say I have a paragraph of text, I'd like to pull in, and let's say I don't know what the text will contain.
What I would like to do is put some sort of delimiter at, let's say, 5 random points in the paragraph so that, later on, an array can be created in which content up to that delimiter can be separated out into an individual element.
For a bit of context, let's say I have a paragraph pulled in as a string:
Hello, this is a paragraph of text which will be delimited. Goodbye.
Now, let's say I add a delimiter at various points, as follows (I know how to do this in code):
Hello, this [DELIMITER] is a paragraph [DELIMITER] of text which [DELIMITER] will [DELIMITER] be delimitted. Goodbye.
Again, I know how to do this, but let's say I'm able to use the above to create an array as follows:
my_array = ["Hello, this", "is a paragraph", "of text which", "will", "be delimitted. Goodbye"
I'm confident of achieving all of the above. The challenge I'm having is: what should my delimiter be?
Normally, commas are used as delimiters but, if the text already includes a comma, this will result in delimitations where I do not wish them to occur. In the above example, for example, the comma between "Hello" and "this" would cause the "Hello, this" element to be split up into "Hello" and "this"—not what I want.
What I have thought of doing is using a random (hex) number generator to create a new delimiter each time the page is loaded, e.g. "Hello, this 023ABCDEF is a paragraph 023ABCDEF...", but I'm not sure this is the correct approach.
Is there a simpler solution?
Multipart mime messages take (more or less) the approach of a GUID separator; it's adequate.
I view this as a different type of problem, though, closer to a text editor marking sections of text bold, or italic, etc. That can be handled via string parsing (a la Markdown, SO's formatting) or data structures.
The text editor approach is generally more flexible, and instead of a simple collection of strings, uses a collection (or tree) of structures that hold metadata about the section (type, formatting, whatever).
The best approach depends on your needs:
Are sections nestable?
Will this be rendered?
If so, do section "types" need specific rendering?
Are there section "types", or are they all the same?
Will the text in question be edited before, during, or after sectioning?
Etc.

Resources