Matching subset patterns in split string results - ruby

Here's a more complex one for me:
I have content like this being pulled into a jekyll post:
# Lorem ipsum dolor sit amet.
Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore.
~
# Et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation.
~
# Ullamco laboris nisi.
Ut aliquip ex ea commodo consequat.
~
I'm pulling this into my layout like this: {{ post.content | jekreged: 1 | markdownify }}
Jekreged is a custom liquid plugin I wrote that splits the content based on the ~ and then specifies which piece to include. The layout requires ripping apart a post like that.
I am trying to adapt this to then run a subset of match commands that I can call specifically from the liquid tag.
here's the example (and the one not working) that I am trying to troubleshoot.
module Jekyll
module AssetFilter
def jekreged(input, chunk)
drugs = input.split("~")[chunk]
title = (drugs).match(/^#{1}.+$/)
jekreged = "#{title}"
end
end
end
Liquid::Template.register_filter(Jekyll::AssetFilter)
I get no output from this. What I would ideally like is to be able to specify "title" as a parameter from the liquid tag but I'm not sure how to connect that through into the plugin.
Long range version I'll have something like title = regmatch for title, body = ..., img = ...
Thanks for any and all help!

took a shot at it (in the future, some example inputs/outputs would go a long way).
module Jekyll
module AssetFilter
def jekreged(input, matcher)
titles = input.split("\n~\n").select { |title| title.include? matcher }
if titles.size > 1
raise "Can't determine title from #{matcher.inspect}, found #{titles.inspect}"
elsif titles.size.zero?
raise "#{matcher.inspect} didn't match any of #{titles.inspect}"
end
titles.first
end
end
end
describe 'jekreged' do
include Jekyll::AssetFilter
let(:titles) { <<-TITLES.gsub /^ /, "" }
# Lorem ipsum dolor sit amet.
Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore.
~
# Et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation.
~
# Ullamco laboris nisi.
Ut aliquip ex ea commodo consequat.
~
TITLES
it 'finds the title that has the string in it' do
jekreged(titles, "Consectetur" ).should == "# Lorem ipsum dolor sit amet.\nConsectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore."
jekreged(titles, "minim veniam").should == "# Et dolore magna aliqua.\nUt enim ad minim veniam, quis nostrud exercitation."
jekreged(titles, "aliquip" ).should == "# Ullamco laboris nisi.\nUt aliquip ex ea commodo consequat."
end
it 'raises an error if there is more than one title that matches' do
expect { jekreged titles, 'Ut' }.to raise_error /Can't determine title/
end
it 'raises an error if there are no titles that match' do
expect { jekreged titles, 'asdfasdfasdf' }.to raise_error /didn't match/
end
end

Related

Can't knit PDF files in Spanish from RStudio (on Centos 7)

I've been figthing a bit trying to create PDF files with RStudio in Spanish.
When I knit the document without any lang specification, everything works. This is a .Rmd file that renders without any issue:
[test_file.Rmd]
---
title: "Lorem ipsum test"
output:
pdf_document:
toc: yes
html_notebook: default
abstract: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel
euismod metus."
---
# Lorem ipsum 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel euismod
metus. Aenean molestie ligula ligula, imperdiet egestas tellus porttitor id.
Aliquam feugiat ullamcorper consequat. Suspendisse non libero scelerisque metus
sagittis placerat. Ut id tristique quam, ac tristique orci.
```{r}
plot(cars)
```
# Lorem ipsum 2
Cras id pretium enim, sed bibendum dui. Curabitur turpis lacus, ultricies vitae
sem vitae, interdum accumsan ligula. Donec quis ipsum pellentesque est ornare
pretium sit amet eu libero. Fusce eget ante sed leo vestibulum placerat.
But when I include the lang: es-MX option, I get errors. I simply add the lang: es-MX option to the header (everything else is the same):
---
title: "Lorem ipsum test"
lang: "es-MX"
output:
pdf_document:
toc: yes
html_notebook: default
abstract: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel
euismod metus."
---
And I get the following output in the R Markdown tab:
|...................... | 33%
ordinary text without R code
|........................................... | 67%
label: unnamed-chunk-1
processing file: test_file.Rmd
|.................................................................| 100%
ordinary text without R code
/usr/lib/rstudio/bin/pandoc/pandoc +RTS -K512m -RTS test_file.utf8.md --to latex
--from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash
--output test_file.tex --table-of-contents --toc-depth 2
--template /home/myuser/R/x86_64-redhat-linux-gnu-library/3.6/rmarkdown/rmd/latex/default-1.17.0.2.tex
--highlight-style tango --latex-engine pdflatex --variable graphics=yes
--variable 'geometry:margin=1in' --variable 'compact-title:yes'
output file: test_file.knit.md
! Package babel Error: Language definition file shorthands=off.ldf not found.
Error: Failed to compile test_file.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips.
See test_file.log for more info.
Ejecución interrumpida
Checking the log file (test_file.log) I see only these warnings and errors:
[...]
Package babel Warning: No hyphenation patterns were loaded for
(babel) the language `Spanish'
(babel) I will use the patterns loaded for \language=0 instead.
[...]
! Package babel Error: Language definition file shorthands=off.ldf not found.
See the babel package documentation for explanation.
Type H <return> for immediate help.
[...]
! ==> Fatal error occurred, no output PDF file produced!
Now, I've found where the .ldf files live: /usr/share/texlive/texmf-dist/tex/generic/babel, and there's no file named off.ldf.
So, my specific question is:
How to get my PDF file from my .Rmd file in Spanish?
I haven't found any solution, so... I'd appreciate any help.
Aditional info:
R version: 3.6.0
Platform: CentOS Linux 7 (64 bits)
Edit
As requested by Ralf Stubner in his comment, here's the "offending line" in the generated .tex file:
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[shorthands=off,main=spanish]{babel} %% <- This is it!
\else
\usepackage{polyglossia}
\setmainlanguage[]{spanish}
\fi
I've found the template for this "offending" line at /home/myuser/R/x86_64-redhat-linux-gnu-library/3.6/rmarkdown/rmd/latex/default-1.17.0.2.tex (lines 82 and 83):
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[shorthands=off,$for(babel-otherlangs)$$babel-otherlangs$,$endfor$main=$babel-lang$]{babel}
%% ^^^^^^^^^^^^^^
%% This is where I think the problem is
I tried commenting out this lines, and also removing that "shortnahds=off" bit, but errors keep popping out.
Edit 2
This is the header of the log file (test_file.log), which includes both TeX and TexLive version:
This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013)
(format=pdflatex 2019.1.18) 21 AUG 2019 15:57
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
**test_file.tex
(./test_file.tex
LaTeX2e <2011/06/27>
Babel <v3.8m> and hyphenation patterns for english, dumylang, nohyphenation, lo
aded.
(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo
File: size10.clo 2007/10/19 v1.4h Standard LaTeX file (size option)
)
According to this, TeX version is 3.1415926-2.5-1.40.14 and TexLive version is 2013. I've installed both using yum on Centos 7. Interestingly, when using LyX, everything works flawlessly.
The log file shows that no Spanish hyphenation patterns have been loaded. I am not sure how one would change that with TeXLive 2013. Instead of investing time there I suggest installing a current TeX distribution. tinytex::install_tinytex() within R gives you a simple way to accomplish that. I am not sure if that will include Spanish by default, but if not, it will be easy to fix.

Show the content of a file by its name

I'm trying to build a script that ask for user to type the name of the file, and once written, it simply shows what is inside this file, entirely.
So for instance,
Let's say I have a directory located in, /home/evaluation/, which contains severals files :
In /home/evaluation/file01,
Lorem ipsum dolor sit amet.
In /home/evaluation/file02,
Lorem ipsum sit amet.
In /home/evaluation/file03,
Lorem ipsum dolor
I'm looking forward to build a script that will ask me to write, the file name, I want to read, and once this file written, it will show all its content.
So if I type : file01, it will show me :
Lorem ipsum dolor sit amet.
Else, if the file doesn't exist in the directory, then it shall be written : "no file found".
Try this and see if you get what you are looking for
#!/bin/bash
echo Enter file name # ask for the file name
read fileName # get the a file name from the user and store it in fileName
if [ ! -f $fileName ] ; then # check to see if the file exists
echo "File not found" # print a message if it does not
exit # all done so exit
fi
# cat the file if we are still here , ie the file exists
cat $fileName
if [] in bash defines a test so
and -f file_name checks to see if the file exists and is a regular file
so [ ! -f $fileName ] will be true if the file does not exist, so then the message will be printed, otherwise the contents will be printed

how to clean the text from emoticons using bash

Is there a way to remove anything that's not either a token, punctuation or a special character from text using awk or sed? What I really want to get rid off are the emoticons and the 󾌧 like symbols.
Sample input:
Si tú no estáss yo no voy a lloraar por tiii🎶🎶
Me respondes porfavor?? 😭❤ piensas venir a Ecuador
cosas veredes!!!! Ay Papá. 😂😂😂
👀 🔵🔴 what y'all know about this?
🇲🇽👑❤️‼️ 🇲🇽👑❤️‼️ tag they make the final decision 🇲🇽🙏🏼👑
Vähän on twiitattavaa muuta kuin että aijjai ja oijjoi sekä nannaa. 😉👍👏👏👏🇫🇮💕
Binta On est arrivé au chicken elle voulait pleuré carrément tellement elle était heureuse 😂😂😂😂😭
ja mir fällt nix mehr ein😂😂
Někdo v pátek semnou na flédu na Moju reč??? 󾌧
Sample output:
Si tú no estáss yo no voy a lloraar por tiii
Me respondes porfavor?? piensas venir a Ecuador
cosas veredes!!!! Ay Papá.
what y'all know about this?
‼️ ‼️ tag they make the final decision
Vähän on twiitattavaa muuta kuin että aijjai ja oijjoi sekä nannaa.
Binta On est arrivé au chicken elle voulait pleuré carrément tellement elle était heureuse
ja mir fällt nix mehr ein
Někdo v pátek semnou na flédu na Moju reč???
My best solution is using Python, the Python file must be in UTF-8.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
text = u"""Si tú no estáss yo no voy a lloraar por tiii🎶🎶
Me respondes porfavor?? 😭❤ piensas venir a Ecuador
cosas veredes!!!! Ay Papá. 😂😂😂
👀 🔵🔴 what y'all know about this?
🇲🇽👑❤️‼️ 🇲🇽👑❤️‼️ tag they make the final decision 🇲🇽🙏🏼👑
Vähän on twiitattavaa muuta kuin että aijjai ja oijjoi sekä nannaa. 😉👍👏👏👏🇫🇮💕
Binta On est arrivé au chicken elle voulait pleuré carrément tellement elle était heureuse 😂😂😂😂😭
ja mir fällt nix mehr ein😂😂
Někdo v pátek semnou na flédu na Moju reč???
"""
emoji_pattern = re.compile(
"["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002760-\U0000276F" # emoticons
"]+", flags=re.UNICODE
)
print(emoji_pattern.sub(r'', text))
Out
Si tú no estáss yo no voy a lloraar por tiii
Me respondes porfavor?? piensas venir a Ecuador
cosas veredes!!!! Ay Papá.
what y'all know about this?
‼️ ️‼️ tag they make the final decision
Vähän on twiitattavaa muuta kuin että aijjai ja oijjoi sekä nannaa.
Binta On est arrivé au chicken elle voulait pleuré carrément tellement elle était heureuse
ja mir fällt nix mehr ein
Někdo v pátek semnou na flédu na Moju reč???
This command will remove every character that is not alphabetic, numeric, punctuation or white space:
sed 's/[^[:alnum:][:punct:][:space:]]//g' input
Limitation: Note that some of those funny characters that you see might be valid unicode alphabetic characters for which your computer lacks an installed font. This won't remove them.
How it works
[:alnum:], [:punct:], and [:space:] are character classes that match, respectively any alphanumeric, punctuation, or white space character. The regex [^[:alnum:][:punct:][:space:]] matches any character that does not belong to one of those three classes. The sed substitution command s/[^[:alnum:][:punct:][:space:]]//g does global search-and-replace that finds any character not in one of those classes and replaces it with nothing, that is, removes it.
You might be able to use tr:
% tr -dc '[:print:]' < emoji.txt
Si t no estss yo no voy a lloraar por tiiiMe respondes porfavor?? piensas venir a Ecuadorcosas veredes!!!! Ay Pap. what y'all know about this? tag they make the final decision Vhn on twiitattavaa muuta kuin ett aijjai ja oijjoi sek nannaa. Binta On est arriv au chicken elle voulait pleur carrment tellement elle tait heureuse ja mir fllt nix mehr einNkdo v ptek semnou na fldu na Moju re???
As you can see this will also remove newline characters, this can be prevented with:
% tr -dc '[:print:]\n' < emoji.txt
Si t no estss yo no voy a lloraar por tiii
Me respondes porfavor?? piensas venir a Ecuador
cosas veredes!!!! Ay Pap.
what y'all know about this?
tag they make the final decision
Vhn on twiitattavaa muuta kuin ett aijjai ja oijjoi sek nannaa.
Binta On est arriv au chicken elle voulait pleur carrment tellement elle tait heureuse
ja mir fllt nix mehr ein
Nkdo v ptek semnou na fldu na Moju re???

How to read expect output in bash variable?

I use expect tu verify a password against ssh.
I have tried to output via puts
but I just get the end of the server response in the string.
How do I only get the puts values?
okk=$(expect -c "
set timeout 15
spawn ssh -p 22 user#server.com
expect {
\"(yes/no)\" {
sleep 1
send \"yes\n\"
exp_continue
}
\"(y/n)\" {
sleep 1
send \"y\n\"
exp_continue
}
password {
sleep 1
send \"$sshpw\r\"
exp_continue
}
Password {
sleep 1
send \"$sshpw\r\"
exp_continue
}
\"Last login\" {
puts \"yes\"
exit 1
}
\"Permission denied\" {
return \"no\"
exit 1
}
timeout {
puts \"timeout\"
exit 1
}
eof {
puts \"error\"
}
}
sleep 1
expect eof
")
echo $okk
onsectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim sit amet elit pharetra tincidunt feugiat nis
Two changes are needed to filter out everything except the output of puts:
Use spawn -noecho ssh -p 22 user#server.com
Add log_user 0 after set timeout 15
WARNING: if you hit a case that does not print anything in the EXPECT world, you will get an empty string in the BASH world. So, take care of this!

How to format text file as it can be seen in man pages (justifying text, nothing more) using bash

What I would like to do is the following.
Text file content :
This is a simple text file
containing lines of text
with different width
but I would like to justify
them. Any idea ?
Expected result :
This is a simple text file containing
lines of text with different width
but I would like to justify them.
Any Idea ?
I already can split my files at the required width using :
cat textfile|fmt -s -w 37
But in that case, there is no justification...
EDIT : Using par as suggested, I found a problem with accented chars.
This is what gives par 37j1 for me :
This is à simplé text file
containing lines of tèxt with
different wïdth but I woùld like to
justîfy them. Any idéà ?
Not justified anymore... But spaces are altered anyway...
Thanks for your help,
Slander
You can employ nroff as using it man.
(echo '.ll 37'
echo '.pl 0'
cat orig.txt) | nroff
from your input produces:
This is a simple text file containing
lines of text with different width
but I would like to justify them. Any
idea ?
The above WORKS ONLY WITH ASCII.
EDIT
If you want handle utf8 text with a nroff, you can try the next:
cat orig.txt | ( #yes, i know - UUOC
echo '.ll 37' #line length
echo '.pl 0' #page length (0-disables empty lines)
echo '.nh' #no hypenation
preconv -e utf8 -
) | groff -Tutf8
From this utf8 encoded input:
Voix ambiguë d'un cœur qui au zéphyr préfère les jattes de kiwi.
Voyez le brick géant que j'examine près du wharf.
Monsieur Jack, vous dactylographiez bien mieux que votre ami Wolf.
Eble ĉiu kvazaŭ-deca fuŝĥoraĵo ĝojigos homtipon..
Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj.
Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu, tanga a
quickstepu.
produces:
Voix ambiguë d’un cœur qui au zéphyr
préfère les jattes de kiwi. Voyez le
brick géant que j’examine près du
wharf. Monsieur Jack, vous
dactylographiez bien mieux que votre
ami Wolf. Eble ĉiu kvazaŭ‐deca
fuŝĥoraĵo ĝojigos homtipon.. Laŭ
Ludoviko Zamenhof bongustas freŝa
ĉeĥa manĝaĵo kun spicoj. Nechť již
hříšné saxofony ďáblů rozezvučí síň
úděsnými tóny waltzu, tanga a
quickstepu.
If you delete the line
echo '.nh' #no hypenation
you will get hypenated text
Voix ambiguë d’un cœur qui au zéphyr
préfère les jattes de kiwi. Voyez le
brick géant que j’examine près du
wharf. Monsieur Jack, vous dactylo‐
graphiez bien mieux que votre ami
Wolf. Eble ĉiu kvazaŭ‐deca fuŝĥoraĵo
ĝojigos homtipon.. Laŭ Ludoviko Za‐
menhof bongustas freŝa ĉeĥa manĝaĵo
kun spicoj. Nechť již hříšné saxo‐
fony ďáblů rozezvučí síň úděsnými
tóny waltzu, tanga a quickstepu.
You could use par:
par -j -w37 < inputfile
The -j option would justify paragraphs.
-w denotes max output line length.
For your input, it'd produce:
This is a simple text file containing
lines of text with different width
but I would like to justify them. Any
idea ?
An alternative would be to use emacs:
emacs -batch inputfile --eval '(set-fill-column 37)' --eval '(fill-region (point-min) (point-max))' -f save-buffer
This would also produce:
This is a simple text file containing
lines of text with different width
but I would like to justify them. Any
idea ?

Resources