Complex CoreData query - sorting

I am very new to CoreData and would like to do something like this.
I have 2 Entities
Cave.title Condition.date
Cave.conditions <-->> Condition.cave
I need to query all Conditions and sort it by date. (Latest first)
Then I need to get their cave.title but each cave should show only once (latest conditions)
Example
Condition1 (06.09.2011) - Cave1
Condition2 (05.09.2011) - Cave3
Condition3 (05.09.2011) - Cave1
Condition4 (04.09.2011) - Cave5
This should show like this
Cave1 (06.09.2011)
Cave3 (05.09.2011)
Cave5 (04.09.2011)
Any idea on how I could get this done?
In SQL I would do it like this
SELECT DISTINCT c.title as title, c.caveID as caveID, cn.countryshort as countryshort, MAX(cc.divedate) as divedate
FROM caves as c, countries as cn, caveconditions as cc
WHERE cn.countryID = c.countryID
AND c.caveID = cc.caveID
GROUP BY c.title
ORDER BY divedate DESC;
Output
2011-09-08 13:39:24.951 CaveConditions[23026:11903] Chaudanne (1287350157)
2011-09-08 13:39:24.952 CaveConditions[23026:11903] Sorgente Bossi (1287333080)
2011-09-08 13:39:24.953 CaveConditions[23026:11903] Elefante Bianco (1287248755)
2011-09-08 13:39:24.953 CaveConditions[23026:11903] Cogol dei Siori - Oliero (1287248678)
2011-09-08 13:39:24.954 CaveConditions[23026:11903] Source du Lison (1287324493)
2011-09-08 13:39:24.955 CaveConditions[23026:11903] Resurgénce de Gouron (1287324296)
2011-09-08 13:39:24.955 CaveConditions[23026:11903] Fontaine du Truffe (1287006107)
2011-09-08 13:39:24.956 CaveConditions[23026:11903] Gouffre de Cabouy (1287005780)
2011-09-08 13:39:24.957 CaveConditions[23026:11903] Emergence du Ressel (1286908470)
2011-09-08 13:39:26.037 CaveConditions[23026:11903] Source de l'Orbe (1287175659)
2011-09-08 13:39:26.120 CaveConditions[23026:11903] Bätterich (1286812411)
2011-09-08 13:39:26.220 CaveConditions[23026:11903] Cogol dei Siori - Oliero (1286787535)
2011-09-08 13:39:26.288 CaveConditions[23026:11903] Fontaine de Saint Georges (1286744641)
2011-09-08 13:39:26.379 CaveConditions[23026:11903] Source du Doubs (1286736293)
2011-09-08 13:39:26.480 CaveConditions[23026:11903] Source Bleue (Montperreux) (1286736150)
2011-09-08 13:39:26.613 CaveConditions[23026:11903] Source Bleue Cusance (1286814108)
2011-09-08 13:39:26.796 CaveConditions[23026:11903] Fontaine de Saint Georges (1286652629)
2011-09-08 13:39:27.096 CaveConditions[23026:11903] Source de l'Orbe (1286735940)
2011-09-08 13:39:27.846 CaveConditions[23026:11903] Gouffre de Cabouy (1286568932)

I would fetch once to get all the conditions sorted by date and then do some logic to filter the fetched data to get the titles.
If you need to see some code let me know.
Edit: OK let me try..
NSFetchRequest *fetch = [[NSFetchRequest alloc] init];
NSSortDescriptor *sort = [[NSSortDescriptor alloc]
initWithKey:#"date" ascending:NO];
[fetch setSortDescriptors:[NSArray arrayWithObject:sort]];
NSEntityDescription *entity = [NSEntityDescription
entityForName:#"Condition" inManagedObjectContext:self.managedObjectContext];
[fetch setEntity:entity];
[fetch setReturnsDistinctResults:YES];
[fetch setPropertiesToFetch:[NSArray arrayWithObject:#"cave.title"];
NSError *error;
[self.managedObjectContext executeFetchRequest:fetch error:&error];
[fetch release];
[sort release];
I'm not sure haven't tried it myself, but I think this should work. Let me know how it goes..

Related

termvector counting keywords in Elasticsearch

I am using elasticsearch 1.5.2.
I want to get the number of time that every keyword is repeated on a stored products in my index.
keywords are : sauces, crèmes, gâteaux.
these are my products:
POST test/prod
{
"titre": "product 1",
"catégorie": "Epicerie",
"informations": "Sauces, potages, crèmes & gâteaux. / Composé exclusivement d'amidon de maïs, il vous permet d'alléger vos gâteaux et crêpes ou bien d'épaissir vos sauces, soupes et crèmes. Il rend vos desserts onctueux et légers : il suffit simplement de remplacer la moitié de votre farine par la fécule de maïs. // Poids net : 0,400 kg // Veuillez contacter notre responsable de mise en marché : Distribué par Système U - BP 30159 - 94533 Rungis cedex"
}
POST test/prod
{
"titre": "product 2",
"catégorie": "Fruits",
"informations": "A conserver dans un endoit frais et sec. Idée recette: Madeleines fourrées au chocolat Ingrédients (pour une vingtaine de madeleines) : 100g de farine, 90g fécule de maïs,1 sachet de levure, 3 oeufs,125g de sucre,190g de beurre,1cuillère à soupe d'eau de fleur d'oranger, 1 vingtaine de carrés de chocolat Préparation : Mélangez la farine, la fécule et la levure. Ajoutez le sucre et les oeufs battus. Faites fondre le beurre à feu doux et incorporer le au précédent mélange. Mélangez bien et laissez reposer la pâte une heure au réfrigérateur. Préchauffez le four à 220°C. Beurrez les moules à madeleine, garnissez les moules de moitié, ajoutez au centre un petit carré de chocolat et recouvrez de l'appareil à madeleine. Enfournez pour 5 minutes puis baissez le four à 180°C et poursuivez la cuisson 5 minutes. Sortez les madeleines, laissez-les tiédir et démoulez-les."
}
I want to get a result in order desc like this for example:
product 1:
sauces repeated :9
crèmes repeated :8
gâteaux repeated: 2
product 2:
sauces repeated :8
crèmes repeated :6
gâteaux repeated: 4
I used termvector in only one field, however I want to get the result from all document. It was like this:
POST test/prod/1/_termvector
{
"fields" : ["informations"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
But I get as a result a very long list which contains all words in this field. I want only as a result my 3 keywords.

Replace all non capitalised words with ruby

I would like to replace all non capitalised words in a text with "-".length of the word.
For instance I have the following Text (German):
Florian Homm wuchs als Sohn des mittelständischen Handwerksunternehmers Joachim Homm und seiner Frau Maria-Barbara „Uschi“ Homm im hessischen Bad Homburg vor der Höhe auf. Sein Großonkel mütterlicherseits war der Unternehmer Josef Neckermann. Nach einem Studium an der Harvard University, das er mit einem Master of Business Administration an der Harvard Business School abschloss, begann Homm seine Tätigkeit in der US-amerikanischen Finanzwirtschaft bei der Investmentbank Merrill Lynch, danach war er bei dem US-Fondsanbieter Fidelity Investments, der Schweizer Privatbank Julius Bär und dem US-Vermögensverwalter Tweedy Browne....
should be transformed into
Florian Homm ---- --- Sohn --- ------------ Handwerksunternehmers Joachim Homm --- ------ Frau Maria-Barbara „Uschi“ Homm -- ---------- Bad Homburg --- Höhe ---. ....
▶ input.gsub(/\p{L}+/) { |m| m[0] != m[0].upcase ? '-'*m.length : m }
#⇒ "Florian Homm ----- --- Sohn --- ------------------ Handwerksunternehmers..."
More clean solution (credits to Cary):
▶ input.gsub(/(?<!\p{L})\p{Lower}+(?!\p{L})/) { |m| '-' * m.length }
Try something like this
s.split.map { |word| ('A'..'Z').include?(word[0]) ? word : '-' * word.length }.join(' ')
You can try something like this for small input size:
Basically, I:
Split the input string on whitespace character
Map the array to either the word itself (if not capitalized) or the word replaced with dashes (if capitalized)
join with whitespaces.
Like so
s = "Florian Homm wuchs als Sohn des mittelständischen Handwerksunternehmers Joachim Homm und seiner Frau Maria-Barbara „Uschi“ Homm im hessischen Bad Homburg vor der Höhe auf. Sein Großonkel mütterlicherseits war der Unternehmer Josef Neckermann. Nach einem Studium an der Harvard University, das er mit einem Master of Business Administration an der Harvard Business School abschloss, begann Homm seine Tätigkeit in der US-amerikanischen Finanzwirtschaft bei der Investmentbank Merrill Lynch, danach war er bei dem US-Fondsanbieter Fidelity Investments, der Schweizer Privatbank Julius Bär und dem US-Vermögensverwalter Tweedy Browne...."
s.split(/[[:space:]]/).map { |word| word.capitalize == word ? word : '-' * word.length }.join(' ')
Does that apply to your problem?
Cheers!
Edit: For a more memory efficient solution you can use regex replace gsub, check out this other answer by mudasobwa https://stackoverflow.com/a/41570686/4411941
r = /
(?<![[:alpha:]]) # do not match a letter (negative lookbehind)
[[:lower:]] # match a lowercase letter
[[:alpha:]]* # match zero or more letters
/x # free-spacing regex definition mode
str = "Frau Maria-Barbara „Uschi“ Homm im hessischen Bad Homburg vor der Höhe auf."
str.gsub(r) { |m| '-'*m.size }
#=> "Frau Maria-Barbara „Uschi“ Homm -- ---------- Bad Homburg --- --- Höhe ---."
"die Richter/-innen".gsub(r) { |m| '-'*m.size }
#=> "--- Richter/------"
"Jede(r) Anwältin und Anwalt".gsub(r) { |m| '-'*m.size }
#=> "Jede(-) Anwältin --- Anwalt"
Solution
This problem is harder than it looks!
This code might be more memory hungry than others, but I dare say it works for a wider range of (weird) German words :
def hide_non_capitalized(text)
text.split(/[[:space:]]/).map do |chars|
first_letter = chars[/[[:alpha:]]/]
if first_letter && first_letter == first_letter.downcase
## Keep non-letters :
chars.gsub(/[[:alpha:]]/,'-')
## Replace every character :
# '-' * chars.size
else
chars
end
end.join(' ')
end
It splits the text into character blocks, and replaces all the letters of a block if its first letter is lowercase. This code requires Ruby 2.4, because 'ä'.upcase is still 'ä' up to Ruby 2.3.
Test
puts hide_non_capitalized(text)
#=> Florian Homm ----- --- Sohn --- ----------------- Handwerksunternehmers Joachim Homm --- ------ Frau Maria-Barbara „Uschi“ Homm -- ---------- Bad Homburg --- --- Höhe ---. Sein Großonkel ----------------- --- --- Unternehmer Josef Neckermann. Nach ----- Studium -- --- Harvard University, --- -- --- ----- Master -- Business Administration -- --- Harvard Business School ---------, ------ Homm ----- Tätigkeit -- --- US-amerikanischen Finanzwirtschaft --- --- Investmentbank Merrill Lynch, ------ --- -- --- --- US-Fondsanbieter Fidelity Investments, --- Schweizer Privatbank Julius Bär --- --- US-Vermögensverwalter Tweedy Browne....
hide_none = "Änderung. „Uschi“, Attaché-case Maria-Barbara US-Fondsanbieter. Die Richter/-innen. Jede(r) 1234 \"#+?\""
puts hide_non_capitalized(hide_none)
#=> Änderung. „Uschi“, Attaché-case Maria-Barbara US-Fondsanbieter. Die Richter/-innen. Jede(r) 1234 "#+?"
hide_all = "öfters. „word“ lowercase-Uppercase jede(r) not/exactly/a/word"
puts hide_non_capitalized(hide_all)
#=> ------. „----“ ------------------- ----(-) ---/-------/-/----

Google Speech Recognition API Result is Empty

I'm performing an asynchronous request to Google Cloud Speech API, and I do not know how to get the result of operation:
Request POST: https://speech.googleapis.com/v1beta1/speech:asyncrecognize
Body:
{
"config":{
"languageCode" : "pt-BR",
"encoding" : "LINEAR16",
"sampleRate" : 16000
},
"audio":{
"uri":"gs://bucket/audio.flac"
}
}
Which returns:
{ "name": "469432517" }
So, I do a POST: https://speech.googleapis.com/v1beta1/operations/469432517
Which returns:
{
"name": "469432517",
"metadata": {
"#type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
"progressPercent": 100,
"startTime": "2016-08-11T21:18:29.985053Z",
"lastUpdateTime": "2016-08-11T21:18:31.888412Z"
},
"done": true,
"response": {
"#type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
}
}
I need to get the result of the operation: the transcribed text.
How can I do that?
You've got the result of the operation and it is empty. The reason of the empty result is format mismatch. You should have submitted "LINEAR16" file (PCM uncompressed data, basically WAV file) and you try to submit FLAC (compressed format).
Other reason of the empty result might be incorrect sample rate, incorrect number of channels and so on.
Last, the file with pure silence will result in empty response.
I got this issue also. The problem can be with the encoding and rate. Here is how I found what is the appropriate encoding and rate:
audio = types.RecognitionAudio(content = content )
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, enums.RecognitionConfig.AudioEncoding.FLAC,enums.RecognitionConfig.AudioEncoding.MULAW,enums.RecognitionConfig.AudioEncoding.AMR,enums.RecognitionConfig.AudioEncoding.AMR_WB,enums.RecognitionConfig.AudioEncoding.OGG_OPUS,enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='fa-IR')
# Detects speech in the audio file
response = []
try:
response = CLIENT.recognize(config, audio)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
Google Speech Recognition API Result could be Empty because parameters are incorrect. My suggestion is first to analyze audio properties, for instance with command line tools like ffmpeg.
Audio encoding formats list
Language codes info
My complete example:
$ ffmpeg -i 1515244791.flac -hide_banner
Input #0, flac, from '1515244791.flac':
Metadata:
ARTIST : artist
YEAR : year
Duration: 00:00:59.98, start: 0.000000, bitrate: 363 kb/s
Stream #0:0: Audio: flac, 44100 Hz, mono, s16
then use the correct config:
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
LANG = "es-MX"
RATE = 44100
ENC = enums.RecognitionConfig.AudioEncoding.FLAC
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
client = speech.SpeechClient()
with io.open(stream_file, 'rb') as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)
for chunk in stream)
config = types.RecognitionConfig(
encoding=ENC,
sample_rate_hertz=RATE,
language_code=LANG)
streaming_config = types.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
print(streaming_config)
responses = client.streaming_recognize(streaming_config, requests)
for response in responses:
print(response)
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print('Finished: {}'.format(result.is_final))
print('Stability: {}'.format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print('Confidence: {}'.format(alternative.confidence))
print('Transcript: {}'.format(alternative.transcript))
So the transcription service works:
config {
encoding: FLAC
sample_rate_hertz: 44100
language_code: "es-MX"
}
results {
alternatives {
transcript: "lo tienes que saber tienes derecho a recibir informaci\303\263n de todas las instituciones que reciben recursos p\303\272blicos M\303\251xico 4324 plataformadetransparencia.org.mx derecho Porque adem\303\241s de defender tu voto te atiende si no se respetan tus derechos pol\303\255tico-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos pol\303\255ticos comit\303\251s ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de M\303\251xico y en consulta ciudadana en tu elecci\303\263n MVS 102.5 espacio a las nuevas voces de la radio continuamos"
confidence: 0.9409132599830627
}
is_final: true
}
Finished: True
Stability: 0.0
Confidence: 0.9409132599830627
Transcript: lo tienes que saber tienes derecho a recibir información de todas las instituciones que reciben recursos públicos México 4324 plataformadetransparencia.org.mx derecho Porque además de defender tu voto te atiende si no se respetan tus derechos político-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos políticos comités ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de México y en consulta ciudadana en tu elección MVS 102.5 espacio a las nuevas voces de la radio continuamos

How to extract data from text columns

I have two addresses side-by-side in a multi-line string:
Adresse de prise en charge : Adresse d'arrivée :
rue des capucines rue des tilleuls
92210 Saint Cloud 67000 Strasbourg
Tél.: Tél.:
I need to extract the addresses on the left and right with a regexp, and assign them to variables. I need to match:
address1: "rue des capucines 92210 Saint Cloud"
address2: "rue des tilleuls 67000 Strasbourg"
I thought of separating them with spaces, but I cant find any regexp to count the spaces. I tried:
en\s*charge\s*:\s*((.|\n)*)\s*
and similar, but that gives me both addresses, and is not what I'm looking for. Any help will be deeply appreciated.
I'd do something like this:
str = <<EOT
Adresse de prise en charge : Adresse d'arrivée :
rue des capucines rue des tilleuls
92210 Saint Cloud 67000 Strasbourg
Tél.: Tél.:
EOT
left_addr = []
right_addr = []
lines = str.squeeze("\n").gsub(':', '').lines.map(&:strip) # => ["Adresse de prise en charge Adresse d'arrivée", "rue des capucines rue des tilleuls", "92210 Saint Cloud 67000 Strasbourg", "Tél. Tél."]
center_line_pos = lines.max.length / 2 # => 35
lines.each do |l|
left_addr << l[0 .. (center_line_pos - 1)].strip
right_addr << l[center_line_pos .. -1].strip
end
At this point left_addr and right_addr look like:
left_addr # => ["Adresse de prise en charge", "rue des capucines", "92210 Saint Cloud", "Tél."]
right_addr # => ["Adresse d'arrivée", "rue des tilleuls", "67000 Strasbourg", "Tél."]
And here's what they contain:
puts left_addr
puts '------'
puts right_addr
# >> Adresse de prise en charge
# >> rue des capucines
# >> 92210 Saint Cloud
# >> Tél.
# >> ------
# >> Adresse d'arrivée
# >> rue des tilleuls
# >> 67000 Strasbourg
# >> Tél.
If you need the results all in one line without the 'Tel:':
puts left_addr[0..-2].join(' ').squeeze(' ')
puts '------'
puts right_addr[0..-2].join(' ').squeeze(' ')
# >> Adresse de prise en charge rue des capucines 92210 Saint Cloud
# >> ------
# >> Adresse d'arrivée rue des tilleuls 67000 Strasbourg
Here's a breakdown of what is going on:
str.squeeze("\n") # => " Adresse de prise en charge : Adresse d'arrivée :\n rue des capucines rue des tilleuls\n 92210 Saint Cloud 67000 Strasbourg\n Tél.: Tél.:\n"
.gsub(':', '') # => " Adresse de prise en charge Adresse d'arrivée \n rue des capucines rue des tilleuls\n 92210 Saint Cloud 67000 Strasbourg\n Tél. Tél.\n"
.lines # => [" Adresse de prise en charge Adresse d'arrivée \n", " rue des capucines rue des tilleuls\n", " 92210 Saint Cloud 67000 Strasbourg\n", " Tél. Tél.\n"]
.map(&:strip) # => ["Adresse de prise en charge Adresse d'arrivée", "rue des capucines rue des tilleuls", "92210 Saint Cloud 67000 Strasbourg", "Tél. Tél."]
Assuming that each address section in each line is indented as much as or further than the corresponding "Adresse" in the first line, the following can extract not only two addresses aligned sidewards, but n addresses in general.
lines = string.split(/#{$/}+/)
# => [
# => "Adresse de prise en charge : Adresse d'arrivée :",
# => " rue des capucines rue des tilleuls",
# => " 92210 Saint Cloud 67000 Strasbourg",
# => " Tél.: Tél.:"
# => ]
break_points = []
lines.first.scan(/\bAdresse\b/){break_points.push($~.begin(0))}
ranges = break_points.push(0).each_cons(2).map{|s, e| s..(e - 1)}
# => [0..53, 54..-1]
address1, address2 =
lines[1..-2]
.map{|s| ranges.map{|r| s[r]}}
.transpose
.map{|a| a.join(" ").strip.squeeze(" ")}
# => [
# => "rue des capucines 92210 Saint Cloud",
# => "rue des tilleuls 67000 Strasbourg"
# => ]
str =
" Adresse de prise en charge : Adresse d'arrivée :
rue des capucines rue des tilleuls
92210 Saint Cloud 67000 Strasbourg
Tél.: Tél.:"
adr_prise, adr_arr = str.lines[3].strip.split(/ {2,}/) #split on 2+ spaces
code_prise, cite_prise, code_arr, cite_arr = str.lines[6].strip.split(/ {2,}/)
Assumptions
I have assumed that the first and last lines are not wanted and the street names are separated by at least two spaces, and the same for the postal code/city strings. This permits the street name (and postal code/city pair) for "prise en charge" to end below "Adresse d'arrivée :".
Code
def parse_text(text)
text.split(/\n+\s+/)[1..-2].
map { |s| s.gsub(/\d+\K\s+/,' ').split(/\s{2,}/) }.
transpose.
map { |a| a.join(' ') }
end
Examples
Example 1
text = <<BITTER_END
Adresse de prise en charge : Adresse d'arrivée :
rue des capucines rue des tilleuls
92210 Saint Cloud 67000 Strasbourg
Tél.: Tél.:
BITTER_END
parse_text(text)
#=> ["rue des capucines 9210 Saint Cloud",
# "rue des tileuls 670 Strasbourg"]
Example 2
text = <<_
Adresse 1 : Adresse 2 : Adresse 3 :
rue nom le plus long du monde par un mile rue gargouilles rue des tilleuls
92210 Saint Cloud 31400 Nice 67000 Strasbourg
France France France
Tél.: Tél.: Tél.:
_
parse_text(text)
#=> ["rue nom le plus long du monde par un mile 92210 Saint Cloud France",
# "rue gargouilles 31400 Nice France",
# "rue des tilleuls 67000 Strasbourg France"]
Explanation
The steps for text given in the question:
Split into lines, removing blank lines and leading spaces:
a1 = text.split(/\n+\s+/)
#=> ["Adresse de prise en charge : Adresse d'arrivée :",
# "rue des capucines rue des tilleuls",
# "92210 Saint Cloud 67000 Strasbourg",
# "Tél.: Tél.:\n"]
Remove first and last lines:
a2 = a1[1..-2]
#=> ["rue des capucines rue des tilleuls",
# "92210 Saint Cloud 67000 Strasbourg"]
Remove extra spaces between the postal codes and cities and split each line on two or more spaces:
r = /
\d+ # match one or more digits
\K # forget everything matched so far
\s+ # match one of more spaces
/x # extended/free-spacing regex definition mode
a3 = a2.map { |s| s.gsub(/\d+\K\s+/,' ').split(/\s{2,}/) }
#=> [["rue des capucines", "rue des tilleuls"],
# ["92210 Saint Cloud", "67000 Strasbourg"]]
Group by column:
a4 = a3.transpose
#=> [["rue des capucines", "92210 Saint Cloud"],
# ["rue des tilleuls", "67000 Strasbourg"]]
Join strings:
a4.map { |a| a.join(' ') }
#=> ["rue des capucines 92210 Saint Cloud",
# "rue des tilleuls 67000 Strasbourg"]
Inspired by #steenslag's very pragmatic answer, here's a pretty dense one-liner just for fun.
# Assume the input data is in the variable `text`
left_addr, right_addr = text.lines.values_at(3, 6).map do |line|
line.scan(/(?:\d+ +)?\S+(?: \S+)*/)
.map {|part| part.squeeze(' ') }
end
.transpose
.map {|addr| addr.join(' ') }
puts left_addr
# => rue des capucines 92210 Saint Cloud
puts right_addr
# => rue des tilleuls 67000 Strasbourg
Like #steenslag's answer, this assumes that the desired data is always on lines 3 and 6. It also assumes that on line 6 both columns will have a postal code and city and that the postal code will always start with a digit.
Because it's a pretty dense one-liner and because it makes a lot of assumptions, I don't think this is the best answer and I'm marking it Community Wiki.
If I have time I'll come back and unpack this later.
Assuming that the "center line position" is known, this would work:
left_lines, right_lines = str.scan(/^(.{50})(.*)$/).transpose
The regular expression captures 50 characters at the beginning of each line plus the remaining characters until the line's end.
scan returns a nested array: (I'm using placeholders because the actual lines are too long)
[
['1st left line', '1st right line'],
['2nd left line', '2nd right line'],
...
]
transpose converts it to:
[
['1st left line', '2nd left line', ...], # <- assigned to left_lines
['1st right line', '2nd right line', ...] # <- assigned to right_lines
]
The lines (excluding the first and last line) have to be joined and spaces have to be removed: (see strip and squeeze)
left_lines[1..-2].join(' ').strip.squeeze(' ')
#=> "rue des capucines 92210 Saint Cloud"
Same for right_lines:
right_lines[1..-2].join(' ').strip.squeeze(' ')
#=> "rue des tilleuls 67000 Strasbourg"

JSON string decode

I receive a JSON object as :
Pots veure bé les anotacions que tenen accents? O caràcters "estranys"? Tipus l'ampersand &, les cometes ", les u's amb dièresis ü, apòstrofs ' i coses així?","numAnotacions"
when really is :
Pots veure bé les anotacions que tenen accents? O caràcters "estranys"? Tipus l'ampersand &, les cometes ", les u's amb dièresis ü, apòstrofs ' i coses així?","numAnotacions"
So, I don´t find the way to decode the string. I get the string :
NSString *responseString = [request responseString];
Please anyone can help me ?
Thanks
I thing JSONKit can help with parsing JSON (example).

Resources