Hadoop webhdfs curl create file - hadoop

I use ubuntu 12, Hadoop 1.0.3, i use webhdfs curl to create file.
curl -i -X PUT "http://localhost:50070/webhdfs/v1/test.txt?op=CREATE
or use the
curl -i -X PUT -T /home/hadoop/TestFile/test.txt "http://localhost:50070/webhdfs/v1/test?op=CREATE"
The two commend result is
HTTP/1.1 307 TEMPORARY_REDIRECT
What set lack of the hdfs-site.xml? Or other permission not set?
Thanks!

According to the docs for Web HDFS, this is expected:
http://hadoop.apache.org/common/docs/stable/webhdfs.html#CREATE
When you make the first put, you'll be given a temporary redirect URL of the datanode to which you can then issue another PUT command to actually upload the file into HDFS.
The document also explains the reasoning behind this 2-step create method:
Note that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "Expect: 100-continue" header in HTTP/1.1; see RFC 2616, Section 8.2.3. Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "Expect: 100-continue". The two-step create/append is a temporary workaround for the software library bugs.

I know i am replying very late but anyone else looking for answers here will be able to see it.
Hi #Krishna Pandey this is the new link for WebHDFS
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
You can refer to this blog for steps and commands https://wordpress.com/post/had00ping.wordpress.com/194

Related

How do I use the public swagger-generator docker image to generate a client?

We have a fully dockerized web app with a valid Swagger definition for the API. The API runs in its own docker container, and we're using docker-compose to orchestrate everything. I want to generate a Ruby client based on the Swagger definition located at http://api:8443/apidocs.json.
I've poured through the documentation here, which led me to Swagger's public docker image for generating client and server code. Sadly the documentation is lacking and offers no examples for actually generating a client with the docker image.
The Dockerfile indicates its container runs a web service, which I can only assume is the dockerized version of http://generator.swagger.io. As such, I would expect to be able to generate a client with the following command:
curl -X POST -H "content-type:application/json" -d \
'{"swaggerUrl":"http://api:8443/apidocs"}' \
http://swagger-generator:8080/api/gen/clients/ruby
No luck here. I keep getting "invalid swagger definition" even though I've confirmed the swagger definition is valid with (npm -q install -g swagger-tools >/dev/null) && swagger-tools validate http://api:8443/apidocs.
Any ideas?
indeed you are correct, the docker image you're referring to is the same image used at http://generator.swagger.io
The issue you're having is the input parameter isn't correct.
Now to get it right, please note that the swagger-generator has a web interface. So once you start it up, like the instructions say, open it in a browser. For example (replace the GENERATOR_HOST with your machine's IP address):
docker run -d -e GENERATOR_HOST=http://192.168.99.100 -p 80:8080 swaggerapi/swagger-generator
then you can open the swagger-ui on http://192.168.99.100
The important part here is that you can use the UI to see the call syntax. If you're generating a client, go to http://192.168.99.100/#!/clients/generateClient select the language you want to generate and click the payload on the right. Replace the swaggerUrl field with the address of your server and voila.
You can use the output in the curl to figure out how to call from the command line. It should be easy from there.
Please keep in mind that just because a 3rd party tool says the swagger definition is valid doesn't mean it actually is. I don't think that's your issue, though, but 3rd party tool mileage may vary...

Parcel not distributed but have active state ACTIVATING

In my Cloudera manager 5.4.7, I tried to distribute the spark package using parcels, but got stuck at activating 0%.
The error message stated Parcel not distributed but have active state ACTIVATING. It seems to be in an inconsistent state.
By searching the error message on Google, I followed instructions here to use the Cloudera API to force deactivation, but it did not work either (with 405 Method not allowed HTTP response code).
The API I tried are as follows:
http://master:7180/api/v10/clusters/myCluster/parcels/products/SPARK/versions/0.9.0-1.cdh4.6.0.p0.98 (this worked fine and returned a JSON object)
http://master:7180/api/v10/clusters/myCluster/parcels/products/SPARK/versions/0.9.0-1.cdh4.6.0.p0.98/commands/deactivate (this returned HTTP code 405)
Thanks for your help.
I've found the solution in case anyone is having the same problem. See the solution at here. Basically you should use curl -X POST with Cloudera API.

What is Nutch 1.10 crawl command for elasticsearch

Using Nutch 1.10 (newbie), I am trying to learn how to crawl using Nutch 1.10 and using ElasticSearch as my indexer. Not sure why, but I can not get this crawl command to work:
bin/crawl -i --elastic -D elastic.server.url=http://localhost:9200/elastic/ urls elasticTestCrawl 1
UPDATE: just used
bin/crawl -i -D elastic.server.url=http://localhost:9200/elastic/ urls/ elasticTestCrawl/ 2
--almost succesfully, received following error when it came to the indexing part of the command:
Error running:
/home/david/apache-nutch-1.10/bin/nutch clean -Delastic.server.url=http://localhost:9200/elastic/ elasticTestCrawl//crawldb
Failed with exit value 255.
What is exit value 255 for nutch 1.x? And why does the space get deleted between "-D and elastic..."
I have these ElasticSearch Properties from here in my nutch-site.xml file:
If someone can point my to the error of my ways, that would be great!
Update
I just posted my own answer below, its the second one. I had already accepted the first answer months ago when I initially got it working. My answer is simply more clear and concise to make it easier (and quicker) to get started with Nutch.
Unfortunately I can't tell you where you're going wrong as I'm in the same boat although from what I can see you are running nutch and elastic on the same box where as I've split it across two.
I've not got it to work but according to a guide I found on integrating nutch 1.7 with elastic it should just be
bin/crawl urls/ TestCrawl -depth 3 -topN 5
It may just be it isn't working for me because I've added the extra complication of networking.
I also assume you have created an index called elasticTestIndex in your elastic instance and launched it on the box before trying to run your crawl?
Should it be of help the guide I got that command from is
https://www.mind-it.info/integrating-nutch-1-7-elasticsearch/
Update:
I'm not sure I'm quite there yet but using your update I've got further than I had.
You are putting in port 9200 which is the web administartion port but you need to use port 9300 to interact with the service so change the port to 9300
I'm not sure but I thing the portion after the slash refers to the index so in your example make sure you have "elastic" set up as an index. or change
blah (low rep score so can't put in to many urls) blah localhost:9300/[index name]/
so that it uses and index you have created. If you haven't created one then you can do so from the putty with the following command.
curl -XPUT 'http://localhost:9200/[index name]/'
Using the command you supplied with the alternative port it did run although I've yet to extract the crawl data from elastic.
Supplemental Update:
It's successfully dumping data crawled from nutch into elastic for me and having put a different index in on the command line I can tell you it ignores that and uses what ever is in your nutch-site.xml
To help anyone else get it working
Start off by reading this blog post to help you get Elasticsearch configured to work with Nutch.
After that read this Nutch doc to get familiar with the NEW cli command for running the crawl script. (Works for 1.9+)
Follow the example in the new Nutch crawl script command on that page. You have to change it a bit for elasticsearch:
solr.server.url=http://localhost:8983/solr/ to something like
elastic.server.url=http://localhost:9300/yourelasticindex/
So basically there are 2 steps:
Configure Elasticsearch to work with Nutch (click on first link above)
Change the new cli command for solr to work with Elasticsearch (its
default is solr) Hope that helps!

Installing Elasticsearch

I try to index data on Elasticsearch, my problem is;after run the "elasticsearch.bat" command, I able to connect to the server, all process well done.but after that I cant write anything to the command line.Do you have any idea what is wrong?
Thats Ok, you see the ElasticSearch console output. Just open an other console to make some input. Or start ElasticSearch as a service (http://www.elastic.co/guide/en/elasticsearch/reference/1.3/setup-service-win.html)
There is no command-line input available for Elasticsearch. You can do operations on Elasticsearch by REST commands (or with a Client API in for example JAVA).
You can use CURL (application) to do REST operations in a command line.
You can use the the internet browser to do some HTTP-GET commands. You can also do other REST commands (PUT, POST, DELETE) with some Chrome plugins like POSTMAN.
There are some Elasticsearch plugins available to enable monitoring and management tooling that becomes available via the browser.
Please read the Elasticsearch documentation!
For all operations on indexes, mappings, querying, etc, the Marvel plugin has the Sense REST API interface which is fabulous. Sense is wrapped within the Marvel plugin which is free for development.
It allows you to execute all possible ES API commands as JSON. We use it both as a way to prototype commands before implementing them in our ES client, and as a way to test very specific/boundary search scenarios.
There are lots of other cool plugins to help you manage your ElasticSearch, some of which are described here.
Good luck!
When you type only elasticsearch.bat, it means you are starting Elasticsearch server in the foreground, that´s why you are seeing real-time logs in your terminal and hence you can´t type anything.
Now, leave that unclosed and open another terminal (no need to go to the Elasticsearch directory again) and just type
curl 'http://localhost:9200/?pretty' but first make sure that curl is supported in your terminal, if not, you need to use another terminal that supports it, for example Git Shell for Windows.
Afterwards you can use this second terminal to do your indexing.
Via terminal with command curl + XGET (or XPUT, XDELETE, XPOST) you can send commands to elasticsearch:
curl -XGET 'http://localhost:9200/your_index' -d '{
"query":
{
"filtered":
{
"query":
{
"match_all": {}
}
}
}
}';
You can also use the Chrome extension Sense, which can handle JSON configs (with handy history, nice highlighting).
I think you have misunderstood something:
ElasticSearch runs as a http service, that the reason why you cant still using that console.
Solution: just open another console.
But, keep in mind you dont need to use a console, you can access it using any REST Client. Take a look at "Postman - REST Client" and "Sense (Beta)". Both are Chrome extensions.

How to configure firefox over command line on a linux machine

I use two Internet connections so i want to use bash scripts to automate the task of switching between the two..
the problem is i cant able to configure firefox proxy settings via scripts, so is there a way to do that... does any configuration file exists for firefox so that i can modify over command line..
I have read this entry but this dint helped me much.. (its on windows)
firefox proxy settings via command line
you can use the "automatic proxy configuration" for this. this field takes a "pac" file which in fact is just a javascript function named FindProxyForURL that can use things like dnsResolve or isInNet to determine wether a proxy is needed or not. there is a wikipedia article which describes the files in detail and i have written a blog post a while a go that gives an example function.

Resources