I had installed Hue with the Cloudera Manager on AWS. I have uploaded some directories with few files in there. If I am on the /user/hdfs path, there are directories like project1, project2. If I am searching project, I get as result the projects. But if Im searching to files in the project directories like file1, I get no result. I have looked at config files from hdfs, hue, solr, but cannot find this configuration parameter. How can I fix that, so I can search deeper in the hdfs with hue filebrowser?
Hue doesn't use solr for the file browser, it's webhdfs, which is only able to search from the current folder
The search within the File Browser is just a filtering of the files on the current directory, not a full text search across all the directories.
Currently the top search is only about tables and documents: http://gethue.com/realtime-catalog-search-with-hue-and-apache-atlas/
In the future, it will be extended to leverage the File search of Apache Atlas.
Related
So I have a folder which has a bunch of files (source code) and other folders. I want to search this folder for a word, say, faceGroup, and get all the files that contain this word and also get locations of these files. How to perform this action effectively and efficiently (maybe using terminal or spotlight)?
You can use the 3rd party app EasyFind. It will search in files and folders and list where all matches occur. You can also narrow your search to a specific disk or folder. EasyFind is free and available from the app store.
You can use this command to search folder.
mdfind kind:folder "Huddl-Backend"
In my current project using IBM Content Collector 4.0.1 SP5 with IBM Filenet P8 Content Engine 5.2.1 I need to collect files from file system and add them to a certain P8's object store.
Report files are added into folders eight fixed folders and under them the structure grows dinamically over time with further nested folders following the form yyyy/mm:
I started from the FS to P8 Archiving (Replicate File System and Detect Duplicates).ctms task route example in order to have the replicated structure in my object store.
Let's focus on the P8 File Document in folder task and particularly on the File in Folder Options.
The problem is that the path created in my repository contains also the drive letter (e.g.: E:\Report\AMM_000001_00001\2017\05) whereas I would like to have only the folder structure replicated starting from the Report folder.
How can I achieve this?
Should I use regular expressions for this?
I managed to achieve the desired result using regular expressions in the "P8 File Document In Folder" task, like this:
Purpose: Get a folder path without a drive letter.
Regular expression: ^[^\\]*
Replacement string: $1
Sample text: C:\folder 1\folder 2
Sample result: \folder 1\folder 2
as described here.
I am facing this issues even though after adding the .jar file in the lib directory still the expected outcome is not there hre is the question
Response data tab this error message has displayed
"Missing tika-app.jar in classpath. Unable to convert to plain text this kind of document.Download the tika-app-x.x.jar file from http://tika.apache.org/download.html"
How do i solve this issues
And put the file in /lib directory
Go to Apache Tika download page
Click on Mirrors for tika-app-x.xx.jar link
Choose the closest mirror and store the file somewhere to JMeter Classpath (it is enough to drop the file to "lib" folder of your JMeter installation)
Restart JMeter to pick the Tika jar up
**Select Document from the View Results Tree listener drop-down menu
You should now be able to see the contents of the binary files under View Results Tree listener. Check out How to Extract Data From Files With JMeter for a little bit more detailed explanation and to learn how to perform correlations/assertions on the binary responses.
When I want to make sure the jar is added
I open in GUI mode,
Go to Test Plan and click Browse near Add directory or jar to classpath below
Add jar manually
This will make sure you use the jar on execution
As per http://jmeter.apache.org/usermanual/component_reference.html#View_Results_Tree:
A requirement to the Document view is to download the Apache Tika binary package (tika-app-x.x.jar) and put this in JMETER_HOME/lib directory.
SO you need to download Apache Tika-app from here:
https://tika.apache.org/download.html
And put it in jmeter/lib folder.
Goal is to index uploaded files and search for text within them.
Current setup:
MediaWiki 1.27
PostgreSQL 9.4
Elasticsearch 1.7.5
MW-Extension CirrusSearch 1.27
MW-Extension Elastica (master)
The search with Elasticsearch in wiki-pages and for uploaded files is working. But what do I have to do to index and search for text within the uploaded files (pdf, doc, ...)?
You need a media handler which can extract the text; see MediaHandler::getEntireText. For PDF PdfHandler does it; I imagine extensions exist for other common formats as well.
I used this plugin . One disadvantage of it that it is using too much space, so later in my project we migrated to use tika (.net port version) which is used by mapper plugin.
I started playing with Elasticsearch. I want to create index for a textfile. I mean that I have multiple text files in a folder. I want to create index on these text files so that I can perform text search on these files. Is there a way to do this using command line or . Please guide me with an example.
yes, you can by using the FS river + mapper attachment plugin. Here is a link to the source page.
I ran a few tests with it a little while ago. It works fine. Be aware though, that the file has to be local for this to work (even if you can mount a remote file to a local path).
Hope this helps.