Deeplearning4J slow on Word2Vec - performance

I want to try word2vec with this configuration:
compile "org.deeplearning4j:deeplearning4j-core:1.0.0-beta2"
compile "org.nd4j:nd4j-native-platform:1.0.0-beta2"
//compile "org.nd4j:nd4j-cuda-9.2-platform:1.0.0-beta2"
compile group: 'org.deeplearning4j', name: 'deeplearning4j-nlp', version: '1.0.0-beta2'
compile group: 'org.deeplearning4j', name: 'deeplearning4j-ui_2.11', version: '1.0.0-beta2'
SentenceIterator iter = new BasicLineIterator(new File("E:/temp/text_1.txt"));
TokenizerFactory t = new DefaultTokenizerFactory();
t.setTokenPreProcessor(new CommonPreprocessor());
Word2Vec vec = new Word2Vec.Builder()
.minWordFrequency(20)
.iterations(1)
.layerSize(150)
.seed(42)
.windowSize(5)
.iterate(iter)
.tokenizerFactory(t)
.allowParallelTokenization(true)
.batchSize(1000)
.workers(8)
.elementsLearningAlgorithm(new SkipGram<>())
.build();
vec.fit();
The file with the sentences is about 15GB and has one sentence per line.
22:33:07.116 [main] INFO o.d.m.w.wordstore.VocabConstructor - Sequences checked: [200000]; Current vocabulary size: [48699]; Sequences/sec: 8298,07; Words/sec: 69217,82;
How can I tune it so that its not so slow? It took over 24 hours to just build the vocab.
These are some lines from the textfile I want to process:
jeden abend sieht sie den schatten am fenster
dann weht ein eisiger hauch ins zimmer
der unheimliche besucher laesst sich nicht abwehren
bei seinem anblick erstarrt die frau vor entsetzen
denn sie kann nicht begreifen dass ploetzlich der mann vor ihr steht den sie vor vielen jahren begraben hat
dem unheimlichen besucher eine falle zu stellen

Related

Gstreamer: Record video stream and save single frame image parallel

I have a script that is recording from multiple USB-connected Webcam moviefiles,
its continuous recording with splitmuxsync. (I can record up to 4 videos in parallel until the USB max-bandwith is reached)
Now I tried to save in parallel from the same video source a single image
(gst-launch v4l2src ! filesink location=file.jpg)
The result was a error with "device or resource is busy".
Question: Is it possible at all to do both in parallel, record video stream and save single Images ?
Leitung wird auf PAUSIERT gesetzt ...
Leitung ist aktiv und erfordert keinen VORLAUF …
Leitung wird auf ABSPIELEN gesetzt ...
New clock: GstSystemClock
FEHLER: Von Element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Gerät »/dev/video0« ist belegt
Zusätzliche Fehlerdiagnoseinformation:
gstv4l2object.c(3754): gst_v4l2_object_set_format_full (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
Call to S_FMT failed for YUYV # 2304x1296: Das Gerät oder die Ressource ist belegt
Execution ended after 0:00:00.002511906
Leitung wird auf PAUSIERT gesetzt ...
Leitung wird auf BEREIT gesetzt ...
Leitung wird auf NULL gesetzt ...
Leitung wird geleert ...

how get only specific fields from elsticsearch

i have elasticsearch indice i have filed name "titile":Rolex i want to write DSL to save
ratemax,ratemin fields as variable
can this be done here is my sample json
"vente": [
"62402c696a271a7d0ceeef2f"
],
"status": [
0
]
},
"ignored_field_values": {
"description.keyword": [
"ROLEX \nPRINCE BRANCARD \"EATON QUARTER CENTURY CLUB\" \nVERS 1930 \nMontre bracelet en or jaune 14K sur cuir. \nBOITIER : rectangulaire curvex cintré en or jaune 14K (belle oxydation de l'or). Fond gravé d'une dédicace: \"Presented to Fred Peartree to mark a quarter century of continuous service with the T. Eaton, 1916-1941.\" \nCADRAN : duodial, argenté deux tons, personnalisé Eaton, inscrit \"1/4 Century Club\" à la place des index. Aiguilles en acier bleui. \nMOUVEMENT: mécanique, certifié chronometre, Ultra-Prima. \nBRACELET : cuir avec boucle ardillon en or Rolex. \nBoîtier et mouvement signés Rolex. \nNumérotée 53207. \nDIM. 42 x 24 mm. \n \nCette montre fait partie d'une série personnalisée par un industriel canadien nommé Eaton. Cet entrepreneur avait l'habitude d'en faire cadeau à ses employés pour les remercier de leur fidélité. \nA 14K gold manual winding wristwatch by Rolex from the 30's."
]
}
}

Laravel Http client

Could someone help me regarding Laravel Http client. I have this endpoint
http://projects.knmi.nl/klimatologie/uurgegevens/getdata_uur.cgi?stns=240&vars=RH&start=2020071500&end=2020072200 which store in $result variable. my problem is when I tried to dump ($result->body()) I don't know how to get the data excluding all the labels starting with '#'. Thanks
sample return data:
# BRON: KONINKLIJK NEDERLANDS METEOROLOGISCH INSTITUUT (KNMI)
# Opmerking: door stationsverplaatsingen en veranderingen in waarneemmethodieken zijn deze tijdreeksen van uurwaarden mogelijk inhomogeen! Dat betekent dat deze reeks van gemeten waarden niet geschikt is voor trendanalyse. Voor studies naar klimaatverandering verwijzen we naar de gehomogeniseerde reeks maandtemperaturen van De Bilt <http://www.knmi.nl/klimatologie/onderzoeksgegevens/homogeen_260/index.html> of de Centraal Nederland Temperatuur <http://www.knmi.nl/klimatologie/onderzoeksgegevens/CNT/>.
#
#
# STN LON(east) LAT(north) ALT(m) NAME
# 240: 4.790 52.318 -3.30 SCHIPHOL
#
# YYYYMMDD = datum (YYYY=jaar,MM=maand,DD=dag);
# HH = tijd (HH=uur, UT.12 UT=13 MET, 14 MEZT. Uurvak 05 loopt van 04.00 UT tot 5.00 UT;
# RH = Uursom van de neerslag (in 0.1 mm) (-1 voor <0.05 mm);
#
# STN,YYYYMMDD, HH, RH
#
240,20200715, 24, 0
240,20200716, 24, 0
240,20200717, 24, 0
240,20200718, 24, 0
240,20200719, 24, 0
240,20200720, 24, 0
240,20200721, 24, 0
Codes
$result = Http::get('http://projects.knmi.nl/klimatologie/uurgegevens/getdata_uur.cgi?stns=240&vars=RH&start=2020071500&end=2020072200');
dd($result->body());

Rmarkdown with Rstudio doesn't show the changes

I'm using RStudio to produce an R usage manual. However, after making changes to YAML when I run Build Book (bookdown :: pdf_book) the output does not show the changes. In fact, the output is leaving with the date of the 2020/01/20.
My YAML before the change:
---
title: "Manual de Econometria com R"
subtitle: "Centro de Ciências Sociais Aplicadas \nUniversidade Federal da Paraíba \nParaíba, Brasil"
author: "Alexandre Loures"
date: "`r Sys.Date()`"
output: bookdown::gitbook
site: bookdown::bookdown_site
documentclass: book
cover-image: images/logo-R.png
bibliography: [econometrics.bib]
biblio-style: apalike
link-citations: yes
colorlinks: yes
---
My YAML after change:
---
title: "Manual de Econometria com R"
subtitle: "Programa de Pós-Graduação em Economia \nCentro de Ciências Sociais Aplicadas \nUniversidade Federal da Paraíba \nJoão Pessoa, Paraíba, Brasil"
author: "Alexandre Loures"
date: "`r Sys.Date()`"
output: bookdown::gitbook
site: bookdown::bookdown_site
documentclass: book
cover-image: images/logo-R.png
bibliography: [econometrics.bib]
biblio-style: apalike
link-citations: yes
colorlinks: yes
---
I don't know if it's related to the fact that I uninstalled Miktex and then installed Miktex again.
Thanks in advance for your attention!

Authors and affiliations in the YAML of RMarkdown

I know this question was already asked in the past in this forum (1, 2, 3). Before you mark this as duplicated, I tried all the answers with no success. Most of the questions were asked long ago, and some updates related with pandoc might affect nowadays results.
The issue is that I am writing a scientific paper using RMarkdown, and would like to export the results either in HTML, PDF or Word files.
More importantly is that there are 12 authors signing the papers. Some of the authors present more than one affiliation, and some authors presenting the same affiliation.
My question is very clear: How can I edit the YAML to include all the authors with all the affiliations in the YAML in order to export to different formats (HTML, PDF, DOC)?
I tried this YAML:
---
title: "My title"
author:
- name: Mario Modesto-Mata^1,2^
email: paleomariomm#gmail.com
- name: Christopher^1^
- name: Seaghán Mhartain^2^
- name: Rita Yuri Ynoue^1^
address:
- code: 1
address: Instituto de Astronomía, Geofísica e Ciências Atmosféricas, Universidade de São Paulo
- code: 2
address: Faculdade de Medicina, Universidade de São Paulo
date: "1 October 2018"
output:
pdf_document:
number_sections: yes
toc: yes
toc_depth: 4
word_document:
toc: yes
html_document:
css: Scripts accesorios/estiloboton.css
number_sections: yes
theme: sandstone
toc: yes
toc_depth: 4
bibliography: references.bib
csl: science.csl
---
PDF output
This is what I see when I export to PDF the .Rmd file:
Instead of the authors, I see true and no affiliations.
HTML output
I see the authors and not all the affiliation numbers. The affiliations themselves remain hidden.
DOCX output
Neither names nor affiliations appear in the final Word file.
My system
I am using the latest RStudio version (Version 1.1.453), running on Linux Mint 19 Cinnamon.
UPDATE: original example
---
title: "My title"
author:
- Mario Modesto-Mata:
email: paleomariomm#gmail.com
institute: [cenieh, ucl1, ppex]
correspondence: true
- M. Christopher Dean:
institute: [ucl2, nhm]
- Yuliet Quintino:
institute: ubu
- Rebeca García-González:
institute: ubu
- Rodrigo S. Lacruz:
institute: nyu
- Timothy G. Bromage:
institute: nyu
- Cecilia García-Campos:
institute: [cenieh, ucl1]
- Marina Martínez de Pinillos:
institute: cenieh
- Laura Martín-Francés:
institute: [bor, cenieh]
- María Martinón-Torres:
institute: [cenieh, ucl1]
- Eudald Carbonell:
institute: [iphes, urv]
- Juan Luis Arsuaga:
institute: [isciii, ucm]
- José María Bermúdez de Castro:
institute: [cenieh, ucl1]
institute:
- cenieh: Centro Nacional de Investigación sobre la Evolución Humana (CENIEH), Paseo Sierra de Atapuerca 3, 09002, Burgos, Spain
- ucl1: Department of Anthropology, University College London, London, WC1H 0BW, UK
- ucl2: Department of Cell and Developmental Biology, University College London, Gower Street, London, WC1E 6BT, UK
- ubu: Laboratorio de Evolución Humana, Unierisdad de Burgos, Edificio I+D+i, Burgos, Spain
- ppex: Equipo Primeros Pobladores de Extremadura, Casa de Cultura Rodríguez Moñino, Cáceres, Spain
- nhm: Centre for Human Evolution Research (CHER), Department of Earth Sciences, Natural History Museum, London, SW7 5BD, UK
- nyu: New York University
- bor: De la Préhistoire à l'Actuel - Culture, Environnement et Anthropologie, University of Bordeaux, CNRS, MCC, PACE, UMR 5199 F_33615, Pessac Cedex, France
- iphes: Institut Català de Paleoecologia Humana i Evolució Social (IPHES), Zona Educacional 4, Campus Sescelades, Edifici W3, Universitat Rovira i Virgili, Tarragona, Spain
- urv: Àrea de Prehistòria, Universitat Rovira i Virgili, Avinguda de Catalunya 35, 43002, Tarragona, Spain
- isciii: Centro mixto UCM-ISCIII de Evolución y Comportamiento humanos, Madrid, Spain
- ucm: Departamento de Geodinámica, Estratigrafía y Paleontología, Facultad de Ciencias Geológicas, Universidad Complutense de Madrid, Spain
date: "1 October 2018"
output:
pdf_document:
number_sections: yes
toc: yes
toc_depth: 4
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
html_document:
css: Scripts accesorios/estiloboton.css
number_sections: yes
theme: sandstone
toc: yes
toc_depth: 4
word_document:
toc: yes
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
bibliography: references.bib
csl: science.csl
---
There is, to the best of my knowledge, no one-size-fits-it-all solution as of now.
If the target was only PDF, I'd suggest rticles by RStudio. It's great.
A solution which also works with docx is more difficult. One possibility is to use pandoc Lua filters. The repository collecting useful filters contains two filters which will help you: scholarly-metadata and author-info-blocks. (Disclosure: I wrote these.)
Place the .lua files in your directory, change the YAML structure a bit, and instruct pandoc to run the filters:
---
title: "My title"
author:
- Mario Modesto-Mata:
email: paleomariomm#gmail.com
institute: [astro, med]
correspondence: true
- name: Christopher
institute: astro
- name: Seaghán Mhartain
institute: med
- name: Rita Yuri Ynoue
institute: astro
institute:
- astro: Instituto de Astronomía, Geofísica e Ciências Atmosféricas, Universidade de São Paulo
- med: Faculdade de Medicina, Universidade de São Paulo
date: "1 October 2018"
output:
word_document:
toc: yes
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
pdf_document:
number_sections: yes
toc: yes
toc_depth: 4
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
---
This will be the PDF output:
while this is what it looks like in Word:
The affiliation and contact information is added to the body text, which is why the toc is displayed above it.

Resources