I have on my PC a folder containing many epub and pdf files that i want to be able to do fulltext search.
I know windows has already indexing service. but i would like to perform more logic than simple search for keywords.
So i would like to import those epub and pdf files into elasticsearch. anyone knows a script that can do this?
ElasticSearch has implemented plugin for mapping attachments so hope this would help you:
https://www.elastic.co/guide/en/elasticsearch/plugins/master/mapper-attachments.html
https://github.com/elastic/elasticsearch-mapper-attachments
It works fine for me.
Related
I am looking at creating text search capabilities on the files on multiple github repo. I am looking at options elastic search or logstash. Please suggest and point to any reference related to such examples. Thanks!
I'm trying to index a lot of latex and markdown files which are in different folders using elasticsearch from command line.
So far I haven't been able to find a tutorial which gives me detailed information on how to do it.
Is there anyone with ElasticSearch that could help me out?
Thank you very much.
Collecting files is easy with Logstash.
But what are you trying to achieve? Capturing the full LaTeX file or just the raw text?
If you're only after the raw text, I'd use Detex and you can actually call it from Logstash with the exec plugin. Should be pretty straight forward.
I have an existing xlsx file that I am trying to write data to programatically. Is there any modern solution to this for ruby? Also looked into the Google Sheets API but it's only Java and .Net.
I've searched quite a bit, and so far have checked out the following gems with no luck:
https://github.com/roo-rb/roo
https://github.com/randym/axlsx
https://github.com/weshatheleopard/rubyXL
https://github.com/cxn03651/write_xlsx
In the meantime it seems my best solution is to write to CSV, then import the CSV into the xlsx file. But it's tough to do that programmatically in the future.
Any help would be appreciated, thanks.
I started playing with Elasticsearch. I want to create index for a textfile. I mean that I have multiple text files in a folder. I want to create index on these text files so that I can perform text search on these files. Is there a way to do this using command line or . Please guide me with an example.
yes, you can by using the FS river + mapper attachment plugin. Here is a link to the source page.
I ran a few tests with it a little while ago. It works fine. Be aware though, that the file has to be local for this to work (even if you can mount a remote file to a local path).
Hope this helps.
I am trying order the files on a common fileshare of my department, containing thousands of documents of various filetypes. My idea was to sort them by content-related keywords. Only few files contain valid info in the keywords file attribute provided by Windows. My idea was to let some desktop search engine index the files (and their content) and then use the generated keywords from the index.
The problem is that I don't know how to read these generated keywords from the search index.
Neither Microsoft nor Copernic seem to provide any information on how to access their index files.
MSDN only provides info about how to query the Windows Search engine directly from your program, but the results do only contain Windows file attributes and file information, but not those generated keywords used for indexing.
Copernic does not seem to provide any info at all.
I am very grateful for any idea on how to access these generated keywords.
Thank you in advance!
If Google Desktop search is an option, you may use the Google Desktop Search API.
A more programming-intensive option is using Lucene. Somewhere in the middle is nutch.