Is the following lossless data compression algorithm theoretically valid? - algorithm

I am wondering if the following algorithm is a valid lossless data compression algorithm (although not practical with traditional computers, maybe quantum computers?).
At a high and simplified level, the compression steps are:
Calculate the character frequency of the uncompressed text.
Calculate the SHA3-512 (or another hash function) of the uncompressed text.
Concatenate the SHA3-512 and the character frequency (this is now the compressed text that would be written to a file).
And at a high and simplified level, the decompression steps are:
Using the character frequency in the compressed file, generate a permutation of the uncompressed text (keep track of which one).
Calculate the SHA3-512 of the generated permutation in step 1.
If the SHA3-512 calculated in step 2 matches the SHA3-512 in the compressed file, the decompression is complete. Else, go to step 1.
Would it be possible to have a SHA3-512 collision with a permutation of the uncompressed text (i.e. can two permutations of a given character frequency have the same SHA3-512?)? If so, when could this start happening (i.e. after how many uncompressed text characters?)?
One simplified example is as follows:
The uncompressed text is: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas et enim vitae ligula ultricies molestie at ac libero. Duis dui erat, mollis nec metus nec, porttitor scelerisque enim. Aenean feugiat tellus sit amet facilisis imperdiet. Fusce et nisl porta, aliquam quam eget, mollis sapien. Sed purus est, efficitur elementum quam quis, congue rutrum libero. Etiam metus leo, hendrerit ac dui in, hendrerit blandit sem. Etiam pellentesque enim dapibus luctus volutpat. Praesent aliquet ipsum vitae mauris pulvinar, et pharetra leo semper. Nulla a mauris tellus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer sollicitudin dui sapien, in tempus arcu facilisis in. Vivamus dui dolor, faucibus eu accumsan eu, porttitor id risus. In auctor congue pellentesque. Cras malesuada enim eget est vehicula pretium. Phasellus scelerisque imperdiet lorem, eu euismod lectus convallis consequat. Nam vitae euismod est, vitae lacinia arcu. Praesent fermentum sit amet erat feugiat cursus. Pellentesque magna felis, euismod vel vehicula eu, tincidunt ac ex. Vestibulum viverra justo nec orci semper, nec consequat justo faucibus. Curabitur dignissim feugiat nulla, in cursus nunc facilisis id. Suspendisse potenti. Etiam commodo turpis non fringilla semper. Vivamus aliquam ex non lorem tincidunt, et sagittis tellus placerat. Proin malesuada tortor eu viverra faucibus. Curabitur euismod orci lorem, ut fermentum velit consectetur vel. Nullam sodales cursus maximus. Curabitur nec turpis erat. Vestibulum eget lorem nunc. Morbi laoreet massa vel nulla feugiat gravida. Nulla a rutrum neque. Phasellus maximus tempus neque, eu sagittis ex volutpat ac. Duis malesuada sem vitae lacus suscipit, eu dictum elit euismod. Sed id sagittis leo. Sed convallis nisi nisl, vel pretium elit cursus vel. Duis quis accumsan odio. Ut arcu ex, iaculis a lectus sit amet, lacinia pellentesque enim. Donec maximus ante odio, a porta odio luctus at. Nullam dapibus aliquet sollicitudin. Sed ultrices iaculis blandit. Suspendisse dapibus, odio non venenatis faucibus, justo urna euismod neque, non finibus ante ante in massa. Sed sit amet nunc vel lacus dictum euismod. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Interdum et malesuada fames ac ante ipsum primis in faucibus. Fusce varius lacus velit, venenatis consequat justo rutrum nec. Nunc cursus odio arcu, nec egestas purus feugiat nec. Aliquam efficitur ornare ullamcorper. Mauris consectetur, quam vitae ultricies ullamcorper, nulla nulla tempus risus, aliquet euismod urna erat gravida neque. Suspendisse et viverra enim, ut facilisis enim. Quisque quis elit diam. Morbi quis nulla bibendum, molestie risus egestas, pharetra nisl. Aliquam sed massa dictum, scelerisque odio vel, finibus tellus. Nam tristique commodo sem, a dictum risus euismod sed. Morbi vel urna nec sem consectetur auctor quis ac augue. Donec ac pellentesque tortor. In hendrerit ultricies consequat. Pellentesque non metus vitae elit euismod efficitur in in leo. Nulla ac pulvinar nunc. Donec porttitor nunc ante, et congue augue laoreet ac. Vivamus bibendum id est eleifend efficitur. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc arcu neque, molestie ac lorem id, feugiat efficitur erat. Vestibulum vel condimentum lectus, eu euismod turpis.".
The character frequency is: "⎵:501 e:345 i:277 u:266 s:240 t:226 a:219 l:161 n:154 r:147 m:132 c:128 o:117 d:79 .:64 p:54 ,:47 v:40 q:39 f:35 g:31 b:31 h:11 P:9 N:9 S:8 x:7 D:6 V:6 M:5 I:4 C:4 j:4 L:3 A:3 E:3 F:2 U:1 Q:1".
The SHA3-512 is: "45ebde65cf667d1bfdcf779baab84301c1d4abe60448be821adda9cf7b99b36a61c53233db4a0eda93a04c75201be13bbb638b5e78f5047560fffc97f1c95adb".
The compressed file contents are: "45ebde65cf667d1bfdcf779baab84301c1d4abe60448be821adda9cf7b99b36a61c53233db4a0eda93a04c75201be13bbb638b5e78f5047560fffc97f1c95adb⎵:501 e:345 i:277 u:266 s:240 t:226 a:219 l:161 n:154 r:147 m:132 c:128 o:117 d:79 .:64 p:54 ,:47 v:40 q:39 f:35 g:31 b:31 h:11 P:9 N:9 S:8 x:7 D:6 V:6 M:5 I:4 C:4 j:4 L:3 A:3 E:3 F:2 U:1 Q:1".

Your compression method assumes that there is only one permutation of the given character frequency table that will generate the given hash code. That's provably false.
A 512-bit hash can represent on the order of 1.34E+154 unique values. The number of permutations in a 100-character file is 100!, or 9.33E+157.
Given a 100-character file, there are over 6,900 different permutations for each possible 512-bit hash code.
Using a larger hash code won't help. The number of hash codes doubles with each bit you add, but the number of possible permutations grows more with each character you add to the file.

Related

How to split a file in bash by pattern if find a number

I have a text like:
1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula, 3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum. 4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet 5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl. 6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.
I want to split the text by the numbers found, whatever; like:
1Lorem ipsum dolor sit amet, consectetur adipiscing elit.
2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula,
3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum.
...
In awk, I tried:
cat text | awk -F'/^[-+]?[0-9]+$/' '{for (i=1; i<= NF; i++) print $i}'
Where -F is /^[-+]?[0-9]+$/, a pattern used to test if is a number or not. But it`snt split the text.
If I change the pattern to any separator it works without problems, what is then the pattern that I should use for it?
I would harness GNU AWK for this task following way, let file.txt content be
1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula, 3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum. 4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet 5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl. 6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.
then
awk 'BEGIN{RS="[-+]?[0-9]+"}{printf "%s%s%s", $0, NR==1?"":"\n", RT}' file.txt
gives output
1Lorem ipsum dolor sit amet, consectetur adipiscing elit.
2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula,
3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum.
4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet
5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl.
6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.
Explanation: I inform GNU AWK that row separator (RS) is (- or +) repeated 0 or 1 time and digit repeated 1 or more time. Then for every row I printf content of said line followed by newline (only for non-first word) followed by found row terminator (RT).
(tested in gawk 4.2.1)
This inserts a new line before every number, except the first, and also strips any whitespace before the new line.
sed -E 's/[[:blank:]]*([0-9]+)/\
\1/g; s/\n//'
You still have the problem of numbers within each line which are regular content. These will also have a new line prepended.
absolutely no need for vendor proprietary solutions :
{m,n,g}awk '
(NF=NF)+gsub("[0-9]+[^0-9]+[.]? ","&\n")+gsub("[ \t]+\n",FS)' FS='\n' OFS= \
RS='^$' ORS=
_
1Lorem ipsum dolor sit amet, consectetur adipiscing elit.
2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula,
3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum.
4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet
5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl.
6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.

How to ignore URL when searching using ElasticSearch?

Hi,I have a set of documents which may contains some texts, but may have URLs inside them:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam tincidunt metus a convallis imperdiet. Praesent interdum magna ut lorem bibendum vehicula. Maecenas consectetur tortor a ex pulvinar, sit amet sollicitudin nunc maximus. Pellentesque non gravida ligula, imperdiet pharetra odio. Nunc non massa vitae mauris tempor tempus. Nulla ac laoreet tellus. Nulla consequat tortor eu eros euismod bibendum. Curabitur ante ligula, aliquet at lacus at, pretium convallis eros. Fusce id mi condimentum, tempor lorem ut, pharetra libero.
https://document.io/document/ipsum
In eget eleifend neque. Morbi ex leo, tincidunt non enim ut, rutrum suscipit metus. Cras laoreet ex ut massa consequat condimentum. Aenean finibus eu nisl ut rhoncus. Aliquam finibus nisl risus, id facilisis justo rutrum et. Aenean enim libero, commodo id mi ut, mattis sollicitudin tellus. Aliquam molestie ligula sit amet lorem malesuada, aliquet pretium dolor malesuada. Phasellus fringilla libero in sollicitudin tristique. Quisque molestie, enim et aliquam dapibus, ex erat ultrices nisi, luctus ornare lorem metus eu sapien.
I am using a match query to search words inside the document, however, as you can see sometimes the URL has words that are also part of the actual texts. This is messing the result up. I am just wondering if ElasticSearch has a way for me to simply ignore the URLs and just focus on the texts?
I am using english analyzer for this field at this moment.
You can use Pattern replace character filter in your analyzer. For removing URL from your text you can add this filter to your search analyzer:
Filter:
"char_filter": {
"type": "pattern_replace",
"pattern": "\\b(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]",
"replacement": ""
}
This filter will replace URL with empty string so you will not get result from URL match.

Match when paragraph contains sentences from indexes with Elasticsearch

I use elasticsearch to create a program allowing to find all the places in a text where the bible is quoted as well as the place where is the verse mentioned
I indexed all the verses of the bible in elasticsearch, each verse is a document
When I do a search by partially typing a verse, I find the right result (even by making mistakes)
How to browse the text to find all the occurrences where a verse (even partial) is cited and thus attribute the source of the verse to them? and tolerating faults (with the fuzziness parameter or using synonyms I think)
Example of my index :
{"index":{"_index":"test","_type":"","_id":1}}
{"fields":{"year":3560,"book":"1","chapter":1,"section":1,"text":"others words consectetur adipiscing and others words"},"id":"test1","type":"add"}
{"index":{"_index":"test","_type":"","_id":2}}
{"fields":{"year":3560,"book":"2","chapter":3,"section":2,"text":"others words a sagittis nisl quam and others words"},"id":"test2","type":"add"}
{"index":{"_index":"test","_type":"","_id":3}}
{"fields":{"year":3560,"book":"3","chapter":1,"section":5,"text":"others words Aliquam ultrices auctor pharetra and others words"},"id":"test3","type":"add"}
{"index":{"_index":"test","_type":"","_id":4}}
{"fields":{"year":3560,"book":"4","chapter":2,"section":4,"text":"others words Proin ut vestibulum and others words"},"id":"test4","type":"add"}
{"index":{"_index":"test","_type":"","_id":5}}
{"fields":{"year":3560,"book":"5","chapter":1,"section":5,"text":"others words Aenean pretium tincidunt aliquet and others words"},"id":"test5","type":"add"}
{"index":{"_index":"test","_type":"","_id":6}}
{"fields":{"year":3560,"book":"6","chapter":2,"section":1,"text":"others words In vitae sagittis and others words"},"id":"test6","type":"add"}
{"index":{"_index":"test","_type":"","_id":7}}
{"fields":{"year":3560,"book":"7","chapter":7,"section":7,"text":"others words ligula laoreet pharetra and others words"},"id":"test7","type":"add"}
{"index":{"_index":"test","_type":"","_id":8}}
{"fields":{"year":3560,"book":"8","chapter":1,"section":4,"text":"others words luctus eros a pretium and others words"},"id":"test8","type":"add"}
{"index":{"_index":"test","_type":"","_id":9}}
{"fields":{"year":3560,"book":"9","chapter":1,"section":7,"text":"others words ullamcorper eu id quam and others words"},"id":"test9","type":"add"}
{"index":{"_index":"test","_type":"","_id":10}}
{"fields":{"year":3560,"book":"10","chapter":5,"section":4,"text":"others words Nullam ac enim ac lacus hendrerit and others words"},"id":"test10","type":"add"}
I need to find all the occurrences in the paragraph which are in the index, in order to recover their sources :
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla rhoncus, nulla vitae porta euismod, purus nisl faucibus nunc, a sagittis nisl quam id arcu. Sed sit amet arcu sed dui auctor bibendum. Proin ut vestibulum sem, id rutrum felis. Phasellus sagittis justo sit amet justo consequat, id scelerisque eros cursus. Quisque dapibus finibus euismod. Proin dui urna, auctor ut gravida quis, fringilla quis velit. Donec sed pulvinar leo. Sed pulvinar pharetra arcu nec egestas. Mauris non dapibus diam. Pellentesque quis pellentesque libero.
Aliquam ultrices auctor pharetra. Cras ullamcorper, odio sit amet aliquam convallis, magna nibh gravida nunc, sit amet volutpat elit purus eget lectus. Pellentesque eu est a risus euismod consequat. Duis id erat porttitor, sodales justo non, aliquet ex. Etiam tincidunt neque ut nisi commodo auctor. Sed congue urna ac tellus scelerisque hendrerit. Mauris lobortis sed dui ut varius.
Proin ac luctus felis. In vitae sagittis erat, nec luctus sapien. Aenean pretium tincidunt aliquet. Morbi at enim vel ligula laoreet pharetra. Sed dignissim luctus eros a pretium. Vestibulum molestie molestie nisi, vitae scelerisque nibh bibendum nec. Donec laoreet sapien sed vehicula dictum. Nullam ac enim ac lacus hendrerit tempor et vitae neque. Quisque at leo pretium, efficitur augue vitae, congue eros. Maecenas volutpat ante nec scelerisque vestibulum.
Donec tristique orci erat, nec imperdiet nulla commodo ut. Nam non odio vel quam cursus ullamcorper eu id quam. Duis volutpat, nisl eu interdum mattis, augue ipsum mollis leo, eget efficitur orci augue eget leo. Integer feugiat facilisis dolor ut vehicula. Maecenas quis feugiat massa. Curabitur feugiat odio eget ligula tincidunt sodales. Donec feugiat dapibus lectus, non maximus dui rhoncus vitae. Phasellus eget massa faucibus, tristique nibh sed, aliquet metus.
I do not know if I have been clear enough but do not hesitate to ask me if you need more precision
I think this problem is handled by the Aho-Corasick algorithm but I don't know how to integrate it into elasticsearch
Thank you!
If I am able to understand your question correctly then all you are looking for is to be able to
"some partial verses" : query
and get the source documents from elasticsearch as response with the results showing the searched verse in them (which is what highlighting is)
Here is the simplest of the query to achieve the same
GET <index_name>/_search
{
"query": {
"match": {
"message": "partial verse"
}
} ,
"highlight" : {
"fields" : {
"message": {}
}
}
}
In response you will get something like this
"hits" : [
{
"_index" : "testSample",
"_type" : "_doc",
"_id" : "TkdvGXAB5bHyIJQ-QRow",
"_score" : 0.2876821,
"_source" : {
"bookName" : "bible",
"message" : "this is a good book"
},
"highlight" : {
"message" : [
"<em>this</em> is a good book"
]
}
}
]
The response is self explanatory , where you get the heighlted results in a different section.

how to get value of height or width attribute of img tag using xpath at a given position

For the below XML how to get the value of height or width attribute by giving index.
<root>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer non nunc vitae nisl luctus pharetra at eu nulla. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
<img src="http://www.google.com/logos/pacman10-hp.png"/> Nullam in odio at ligula euismod adipiscing convallis in justo. Donec at massa nulla, at facilisis magna. Integer sit amet elit eu felis venenatis dignissim. In ut mi leo. Suspendisse blandit faucibus fermentum. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Phasellus ultricies turpis id magna semper vestibulum.
</p>
<p>Quisque blandit pretium libero, venenatis pellentesque purus egestas id. Integer nulla ante, pellentesque eget rhoncus sed, semper vel eros. Nam placerat est et est dictum egestas. Ut gravida blandit lacus rhoncus feugiat. Nunc ut euismod eros. Pellentesque sit amet vehicula mauris. Quisque in nulla quis sapien dictum mattis. Curabitur vehicula lorem ac elit dignissim egestas. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Cras sit amet tincidunt quam.
<img src="http://www.google.com/logos/2010/gabor10-hp.png"/> Ut urna neque, mollis vel tempor placerat, cursus vel enim.
</p>
<p>Praesent gravida dignissim sagittis. Vivamus dictum nisi pulvinar augue vulputate euismod. Vestibulum arcu sapien, laoreet sagittis pulvinar ac, porttitor a tellus.
<img width="100" height="100" src="http://www.google.com/logos/2010/d4g_worldcup10_ko-hp.jpg"/> Quisque cursus dignissim libero in convallis. Fusce cursus nisi ut felis feugiat sodales. Praesent nec arcu purus. Donec lorem lectus, tristique eget faucibus sit amet, bibendum nec ipsum. Mauris tempus laoreet tortor non egestas. Aliquam erat volutpat. Aliquam erat volutpat. Phasellus a arcu convallis nibh luctus tempor non quis sem.
<img src="http://www.google.com/logos/2010/d4g_worldcup10_uk-hp.jpg"/> Aliquam ac risus velit, ut sodales justo. Ut eget lacus eget nisi hendrerit gravida quis et nibh. Etiam purus felis, fermentum a cursus at, congue vel eros. Aenean semper, sapien eget eleifend fermentum, odio sem tempor dolor, sed porta ligula nunc ac tellus.
</p>
<p>Mauris volutpat nisi vitae sem imperdiet sed ultricies est dictum. Mauris id urna turpis, sit amet rhoncus lectus. Maecenas vitae mi at nulla mattis congue id blandit purus.
<img src="http://www.google.com/logos/2010/d4g_worldcup10_nl-hp.jpg"/> Maecenas hendrerit, dui eget faucibus pretium, tellus augue pellentesque metus, id molestie diam arcu ac nibh. Suspendisse sollicitudin viverra blandit. Maecenas sed tellus quis purus bibendum eleifend. Nunc sodales magna id nulla tristique et suscipit purus interdum. Ut at risus quam, nec rutrum risus. Integer ac leo lorem, eget porta nisi. Sed quis lacus dapibus massa commodo ornare. Mauris scelerisque rutrum accumsan. Duis fermentum adipiscing mi eget suscipit. Duis quis nisi libero, iaculis fermentum purus. Etiam risus nibh, tincidunt pellentesque luctus sed, gravida vitae magna.
<img src="http://www.google.com/logos/2010/d4g_worldcup10_au-hp.jpg"/> Sed laoreet, erat id rutrum dignissim, elit libero fermentum enim, pretium auctor lectus urna vitae nulla. Nullam ante diam, elementum nec elementum quis, consectetur eget arcu.
</p>
<p>Fusce eu nisl risus. Fusce rhoncus iaculis viverra. Curabitur eleifend, nisl sed aliquam dapibus, urna leo scelerisque orci, id commodo dui libero vitae nisi.</p>
<img WIDTH="100" HEIGHT="100" src="http://www.google.com/logos/2010/d4g_worldcup10_nl-hp.jpg"/>
</root>
i tried with
//img[1]/#width
but not working. Basically i need a XPATH for to get height or width of an img tag irrespective of case(WIDTH or width) and if the width attribute is not available it should return no match or null
Use:
(//img)[$k]/#*[name() = 'width' or name() = 'WIDTH']
Where you need to replace $k ith the desired image index.
This selects the attribute named "width" or named "WIDTH" of the $k-th img element in the XML document.
For example, for the 3rd image use:
(//img)[3]/#*[name() = 'width' or name() = 'WIDTH']

Only return slimmed down version of feed

I'm trying to create a new yahoo pipe that will only returned a slimmed down version of an xml.
Say my original XML looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<name>Joe bloggs</name>
<age>31</age>
<description>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse aliquam metus id eros blandit vel convallis nunc accumsan. Fusce adipiscing eros a enim feugiat vestibulum. Cras vulputate malesuada neque vel ultricies. Nunc commodo condimentum risus, eu interdum odio rutrum ut. Nullam nec neque eget dolor tristique dignissim sit amet non nibh. Donec sagittis, elit eget tempus laoreet, tellus eros gravida nunc, eu elementum sem turpis eget velit. In hac habitasse platea dictumst. Donec sed nibh nec arcu feugiat malesuada nec sollicitudin neque. Morbi egestas gravida blandit. Praesent luctus ipsum sed sem porta a tempus ipsum congue. Cras non lectus metus. Fusce non purus quam, vel convallis urna. Aenean dignissim consequat tincidunt. Nunc posuere pulvinar est, id pretium sem vestibulum non</description>.
I'm trying to create a yahoo pipe that will change the tag names, in which I'm using the rename module, and it works fine.
Now, I'm wanting to get rid of the description tag, so my XML only returns name and age.
How can I do that with yahoo pipes?
Cheers in advance for any help
Use the Regex module on the description field and replace .* with an empty textfield. That deletes the field.
Use the "Create RSS" module as the last step in the chain. Then only include the fields you want.

Resources