Loading Loughran finance sentiment into Tidytext - tidytext

I'm using the sentiment tools in Tidytext for the first time, and would like to use the Loughran dictionary. After several attempts, the closest I get is this error:
get_sentiments("loughran")
Error in get_sentiments("loughran") : could not find function "%>%"
Is Loughran a Tidytext offering or must it be externally retrieved/loaded? Thank you.

The Loughran sentiment lexicon is in the version of tidytext that is on GitHub but not yet on CRAN. We will be releasing a new version on CRAN in the near future! In the meantime, you can install the current development version of tidytext from GitHub using devtools:
library(devtools)
install_github("juliasilge/tidytext")
library(tidytext)
get_sentiments("loughran")
#> # A tibble: 4,149 × 2
#> word sentiment
#> <chr> <chr>
#> 1 abandon negative
#> 2 abandoned negative
#> 3 abandoning negative
#> 4 abandonment negative
#> 5 abandonments negative
#> 6 abandons negative
#> 7 abdicated negative
#> 8 abdicates negative
#> 9 abdicating negative
#> 10 abdication negative
#> # ... with 4,139 more rows

Related

Problems using R after update in loadNamespace

I am very new to R. Working mostly with Seurat package to evaluate my single-cell RNAseq data.
Today I wanted to update the R version and RStudio. After that I had problems using installed packages. This is my problem:
> install.packages("Seurat", dependencies = TRUE)
Installing package into ‘C:/Users/benne/AppData/Local/R/win-library/4.2’
(as ‘lib’ is unspecified)
Warning in install.packages :
dependencies ‘S4Vectors’, ‘SummarizedExperiment’, ‘SingleCellExperiment’, ‘MAST’, ‘DESeq2’, ‘BiocGenerics’, ‘GenomicRanges’, ‘GenomeInfoDb’, ‘IRanges’, ‘rtracklayer’, ‘monocle’, ‘Biobase’, ‘limma’ are not available
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/Seurat_4.2.0.zip'
Content type 'application/zip' length 2376157 bytes (2.3 MB)
downloaded 2.3 MB
package ‘Seurat’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\benne\AppData\Local\Temp\RtmpIlveV0\downloaded_packages
> library(Seurat)
Error: package or namespace load failed for ‘Seurat’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘spatstat.data’
I think, there is no problem with the installation of Seurat-package but I cannot make the library-function work. I found other topics that tried to solve that problem but they did not help me.
What could be the problem? With the old R/RStudio version everything worked well. After the update I had to install the RTools42 because it said I have to do that. I have never done that before, why today??
I really hope, you guys may help me. I am totally lost!!
Attached my sessionInfo():
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8
[4] LC_NUMERIC=C LC_TIME=German_Germany.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] httr_1.4.4 tidyr_1.2.1 viridisLite_0.4.1 jsonlite_1.8.2 splines_4.2.1
[6] leiden_0.4.3 shiny_1.7.2 sp_1.5-0 ggrepel_0.9.1 globals_0.16.1
[11] pillar_1.8.1 lattice_0.20-45 glue_1.6.2 reticulate_1.26 digest_0.6.29
[16] RColorBrewer_1.1-3 promises_1.2.0.1 colorspace_2.0-3 plyr_1.8.7 cowplot_1.1.1
[21] htmltools_0.5.3 httpuv_1.6.6 Matrix_1.5-1 pkgconfig_2.0.3 listenv_0.8.0
[26] purrr_0.3.5 xtable_1.8-4 patchwork_1.1.2 scales_1.2.1 RANN_2.6.1
[31] later_1.3.0 Rtsne_0.16 spatstat.utils_2.3-1 tibble_3.1.8 generics_0.1.3
[36] ggplot2_3.3.6 ellipsis_0.3.2 ROCR_1.0-11 pbapply_1.5-0 SeuratObject_4.1.2
[41] lazyeval_0.2.2 cli_3.4.1 survival_3.3-1 magrittr_2.0.3 mime_0.12
[46] future_1.28.0 fansi_1.0.3 parallelly_1.32.1 MASS_7.3-57 ica_1.0-3
[51] progressr_0.11.0 tools_4.2.1 fitdistrplus_1.1-8 data.table_1.14.2 lifecycle_1.0.3
[56] matrixStats_0.62.0 stringr_1.4.1 plotly_4.10.0 munsell_0.5.0 cluster_2.1.3
[61] irlba_2.3.5.1 compiler_4.2.1 rlang_1.0.6 scattermore_0.8 grid_4.2.1
[66] ggridges_0.5.4 RcppAnnoy_0.0.19 htmlwidgets_1.5.4 igraph_1.3.5 miniUI_0.1.1.1
[71] gtable_0.3.1 codetools_0.2-18 reshape2_1.4.4 R6_2.5.1 gridExtra_2.3
[76] zoo_1.8-11 dplyr_1.0.10 fastmap_1.1.0 future.apply_1.9.1 rgeos_0.5-9
[81] utf8_1.2.2 KernSmooth_2.23-20 stringi_1.7.8 parallel_4.2.1 Rcpp_1.0.9
[86] sctransform_0.3.5 vctrs_0.4.2 png_0.1-7 tidyselect_1.2.0 lmtest_0.9-40
Thank you so much!
I tried to find out what the problem could be. I had hope that the installation of RTools42 may work but that does not make it better. The error still occurs.
Issue occurred for me as well after upgrading to R-4.2.1. Following steps helped me resolve the issue:
Restart the computer after successful installation of R Tools
Run following commands
install.packages('spatstat.data')
install.packages('spatstat.core')
After RTools wraps up it's compilation as mentioned in answer by Maso Sato
Library(Seurat) should load fine!
I had a similar problem with my installation of R, RStudio, and Seurat today (2022/10/26).
(I did not have a problem on another computer a few weeks ago).
install.packages('Seurat') said that I should install RTools.
I did so, and I got a similar error messages as yours when executing library(Seurat).
Then, I executed install.packages('spatstat.data').
RTools had to recompile various things (gcc), but at the end, library(Seurat) ran smoothly.

LinAlgError: not positive definite, even with jitter. When using a conda environment instead of pip

I am trying to fit some random data to a GP with the RBF kernel, using the GPy package. When I change the active dimensions, I get the LinAlgError: not positive definite, even with jitter error. This error is generated only with a conda environment. When I use pip, I have never run into this error. Has anyone come across this?
import numpy as np
import GPy
import random
def func(x):
return np.sum(np.power(x, 5) - np.power(x, 3))
# 20 random data with 10 dimensions
random.seed(2)
random_sample = [[random.uniform(0,3.4) for i in range(10)] for j in range(20)]
# get the first random sample as an observed data
y = np.array([func(random_sample[0])])
X = np.array([random_sample[0]])
y.shape = (1, 1)
X.shape = (1, 10)
# different set of dimensions
set_dim = [[np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])],
[np.array([0, 1]), np.array([2, 3]), np.array([4, 5]), np.array([6, 7]), np.array([8, 9])],
[np.array([0, 1, 2, 3, 4]), np.array([5, 6, 7, 8, 9])],
[np.array([0, 1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]]
for i in range(len(set_dim)):
# new kernel based on active dims
k = GPy.kern.Add([GPy.kern.RBF(input_dim=len(set_dim[i][x]), active_dims=set_dim[i][x]) for x in range(len(set_dim[i]))])
# increase data set with the next random sample
y = np.concatenate((y, np.array([[func(random_sample[i+1])]])))
X = np.concatenate((X, np.array([random_sample[i+1]])))
model = GPy.models.GPRegression(X, y, k)
model.optimize()
The output of conda list for gpy, scipy and numpy.
The paths of the above packages.
Possible Channel-Mixing Issue
Sometimes package builds from across different channels (e.g., anaconda versus conda-forge) are incompatible. The times I've encountered this, it happened when compiled symbols were referenced across packages, and the different build stacks used on the channels used different symbol names, leading to missing symbols when mixing.
I can report that using the exact same package versions as OP, but prioritizing the Conda Forge channel builds, gives me reliable behavior. While not conclusive, this would be consistent with the issue somehow coming from the mixing of the Conda Forge build of GPy with otherwise Anaconda builds of dependencies (e.g., numpy, scipy). Specifically suggestive is the fact that I have the exact same GPy build and that module is where the error originates. At the same time, there is nothing in the error that immediately suggests this is a channel mixing issue.
Workaround
In practice, I avoid channel mixing issues by always using YAML definitions to create my environments. This is a helpful practice because it encourages one to explicitly state the channel priority as part of the definition and it makes Conda aware of your preference from the outset. The following environment definition works for me:
gpy_cf.yaml
name: gpy_cf
channels:
- conda-forge
- defaults
dependencies:
- python=3.6
- gpy=1.9.6
- numpy=1.16.2
- scipy=1.2.1
and using
conda env create -f gpy_cf.yaml
conda activate gpy_cf
Unless you really do need these exact versions, I would remove whatever versioning constraints are unnecessary (at the very least remove the patches).
Broken Version
For the record, this is the version that I can replicate the error with:
gpy_mixed.yaml
name: gpy_mixed
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- conda-forge::gpy=1.9.6
- numpy=1.16.2
- scipy=1.2.1
In this case, we force gpy to come from Conda Forge and let everything else source from the Anaconda (defaults) channel, similar to the configuration found in OP.

How to reproduce the results of Stanford neural parser?

I would like to run Stanford neural dependency parser which has very impressive performance like 92.0% UAS, 89.7% LAS (Chen & Manning, 2014). I tried to follow their instructions but got sad numbers: 66.2% UAS, 62.0% LAS. Could somebody please tell me what I did wrong?
The commands:
PENN_TEST_PATH="test.mrg"
CONLL_TEST_PATH="$PENN_TEST_PATH.dep"
cat penntree/23/* > $PENN_TEST_PATH
java -cp stanford-parser-full-2014-10-31/stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -originalDependencies -conllx -treeFile $PENN_TEST_PATH > $CONLL_TEST_PATH
java -cp stanford-parser-full-2014-10-31/stanford-parser.jar edu.stanford.nlp.parser.nndep.DependencyParser -model stanford-parser-full-2014-10-31/PTB_Stanford_params.txt.gz -testFile $CONLL_TEST_PATH
Output:
Loading depparse model file: stanford-parser-full-2014-10-31/PTB_Stanford_params.txt.gz ...
dict=44392
pos=48
label=46
embeddingSize=50
hiddenSize=200
numTokens=48
preComputed=422468
###################
#Transitions: 91
#Labels: 45
ROOTLABEL: root
PreComputed 100000, Elapsed Time: 1.789 (s)
Initializing dependency parser done [2.6 sec].
Test File: test.mrg.dep
UAS = 66.2110
LAS = 62.0160
DependencyParser tagged 56684 words in 2416 sentences in 3.4s at 16559.7 w/s, 705.8 sent/s.
References
Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.
I found the problem. I need to call edu.stanford.nlp.trees.EnglishGrammaticalStructure with -basic option.

TensorFlow (Mac OS X): can't determine number of CPU cores:

There must be a simple setting for Mac OS X, to get rid of the following warning...something in .bash_profile?
>>> import tensorflow as tf
>>> sess = tf.Session()
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
To provide explicit values for the relevant configuration options, you can do:
NUM_CORES = ... # Choose how many cores to use.
sess = tf.Session(
config=tf.ConfigProto(inter_op_parallelism_threads=NUM_CORES,
intra_op_parallelism_threads=NUM_CORES))
This issue is present in the initial binary release of TensorFlow for Mac OS X, but should be fixed in this commit: https://github.com/tensorflow/tensorflow/commit/430a054d6134f00e5188906bc4080fb7c5035ad5
The fix will be included in the next binary release. In the meantime, you can try building from source, by following the instructions here: http://tensorflow.org/get_started/os_setup.md#installing_from_sources

sphinx config || config/sphinx.yml

my sphinx configuration is:
================================ config/sphinx.yml
development:
bin_path: "/usr/local/bin"
searchd_binary_name: searchd
indexer_binary_name: indexer
but everytime i run a rake ts:index
Sphinx cannot be found on your system. You may need to configure the following
settings in your config/sphinx.yml file:
* bin_path
* searchd_binary_name
* indexer_binary_name
For more information, read the documentation:
For more information, read the documentation:
http://freelancing-god.github.com/ts/en/advanced_config.html
Generating Configuration to config/development.sphinx.conf
Sphinx 2.0.1-beta (r2792)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'config/development.sphinx.conf'...
indexing index 'post_core'...
collected 2 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 2 docs, 675 bytes
total 0.006 sec, 110510 bytes/sec, 327.43 docs/sec
skipping non-plain index 'post'...
total 6 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=19438).
Generating Configuration to config/development.sphinx.conf
Sphinx 2.0.1-beta (r2792)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'config/development.sphinx.conf'...
indexing index 'post_core'...
collected 2 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 2 docs, 675 bytes
total 0.006 sec, 105567 bytes/sec, 312.79 docs/sec
skipping non-plain index 'post'...
total 6 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=19438).
So what's the problem? Why does the rake output that it cant find it even though its installed?
The warning from Thinking Sphinx could definitely be clearer... the problem is very likely to be how old your version of Thinking Sphinx is. Older TS versions don't know about Sphinx 2.0.x - so I'd recommend updating to the latest version of Thinking Sphinx (either 1.4.6 for Rails 1.2 and 2.x, or 2.0.5 for Rails 3).
There are two things that help to solve this problem. First, as Pat says, it is useful to update the Thinking Sphinx plugin or gem to the latest version (either 1.4.x for Rails 2, or 2.0.x for Rails 3). Second it helps sometimes to specify the version of Sphinx in the configuration file (you can find it out by calling "indexer"), especially if Sphinx is running on a remote server and Thinking Sphinx does not have access to Sphinx locally:
production:
..
version: 2.0.4 # <------- Version of Sphinx on remote server 192.168.1.10
port: 9312
address: 192.168.1.10
..
I was facing the same issue and looked everywhere for an answer without any resolution.
The trick that worked for me was to install older version of sphinx. v .9 instead of the latest beta.
Using the latest Thinking-Sphinx with this version of sphinx resolved the issue.

Resources