Is there a way to figure out all the resource offers from a particular slave? To give you the context some of my slaves which have a specific tag attached to them are not making any offers though in mesos UI I see they are less than 50% loaded. I want to debug the root cause and for the same need a way to figure out what offers are flowing from slave to master to framework. Framework in my case is marathon.
Related
I want suggestions for my application:
I have Multitenancy in Nifi. For each Process group, I have different Tenants/Users.
For any changes in one Tenant/user like in his custom processor(.nar file will create), we need to copy-paste that .nar file into lib folder and again restart the nifi. But due to this full Nifi server has restarted because of that Each Tenant/User and Processes group get restarted.
So, Please give Some Suggestions So that we can restart only one Tenant/user or process group Or Without Restart Nifi .nar file will reflect?
NiFi does not currently have the kind of warm restart option that you describe, however a lot of the base functionality needed to support it is in the code base and the concept is on the community roadmap.
Some options that might help you today:
Consider segregating the tenants with a high rate of code change into separate development environments. You could possibly leverage the Docker builds to provide flexibility and easy automation. You could then promote the end-of-day versions of your Nars into the 'Production' cluster each night, hopefully without disturbing users.
Consider utilising the NiFi Site-to-Site capability to have linked NiFi environments instead of a single shared one. Processors that change regularly could be called out to and updated in their own schedule
Consider why you are changing processor code so regularly, there may be a better approach than hard coding logic and parameters into the processors - the variable registry, various controller services, flow registry, etc. all provide a very rich featureset.
Marathon does not support task configuration template which can establish command patterns and avoid redundancy. We are trying to find a way around it, otherwise we need to create 100,000s tasks and it would be very difficult to manage those config files. One approach, we are thinking is running multiple marathon clusters inside mesos. Now the question is can we run multiple marathon clusters inside mesos? And is there a limit number of frameworks mesos can handle?
Yes, multiple Marathon frameworks is not only possible but actually considered a best practices. There are many use cases, from scaling to Chinese Wall setups (esp. in financial services area).
For example, in the DCOS we're installing a 'system Marathon' per default and you can then install as many 'application' or 'project' or 'group' Marathons as you like.
I'm not aware of a theoretical limit of the number of frameworks, but hey, this might actually be a good load test to run, I'll look into it.
I just began studying hadoop (based on 2.6.0) and still have trouble in getting a big picture of how hadoop is structured physically and logically.
All the references I have found use the term "node" like master/slave nodes and name/data nodes, but I couldn't find clear definitions of such "nodes" from none of them. (maybe I missed the details...)
What I would like to know is, are master/slave "nodes" the terms for physical machines and name/data "nodes" the terms for processes which manage actual data?
My second question is, how such nodes communicate each other? What I know is that they need ssh for communication but no more than that. It would be really helpful if I have a clue how they actually communicate each other to understand its architecture.
ps. Is there any good online reference to study hadoop? For me hadoop website is too unkind for beginners like me and blogs that I found so far are sometimes uninformative. Please share some good resources!
are master/slave "nodes" the terms for physical machines and name/data
"nodes" the terms for processes which manage actual data?
Well, namenode datanode etc are hadoop daemon services that run on a physical machine. So if you have system in your cluster which has the namenode service running then its called a namenode. A single node may run more than one service i.e., it can run a namenode and datanode although in a production setup it is not done since we don't want the machine that is running namenode service to be overburdened. Since you are using hadoop 2.6 ,you might also want to have a look at the YARN architecture to understand how jobs are getting executed
how such nodes communicate each other? What I know is that they need
ssh for communication but no more than that.
Have a look at this.
Datanode uses DatanodeProtocol to communicate with Namenode. This interface provides ability to send heartbeat messages, new datanode registration, block report etc. Client communicates with Datanode using DataTransferProtocol. This interface provides ability to read block,write block,copy block etc.
Is there any good online reference to study hadoop?
Take a look at this and this - might be slightly different from new architecture, but still it is good to read.
bigdatauniversity has lot of courses for beginners.
Is it possible to specify resource requirements (cpu, mem, ...) when scheduling a job in chronos via the REST API? I found there are configuration options that allow specifying general resource requirements for each task but I wonder whether it is possible to do this per job.
Generally it's possible to restrict resources per task, but you have to use cgroups isolation on mesos slaves. However it seems that Chronos API doesn't support it yet (see github issue for more details). Mesos is being developed quite rapidly, be sure to check that it is supported in your version.
We are considering building a service oriented architecture on top of YARN. We have different application types - some would work in Storm like streaming mode (where we connect to the running service), some in batch processing mode (when the app is started on every request).
Moreover applications might need to communicate to each other often which would require a lot of internal traffic between different applications within YARN. We want to use as well the caching of different applications, so whenever the request with the same data goes to the same app we can return cached responses.
Is YARN a good or bad solution as a basis for SOA framework? Is Yarn just a autoscaling/deployment-like tool or would it be a good fit for SOA? Would it be fast enough to do this with YARN?
The way I see it YARN is pushing Hadoop form being a distributed file system to a distributed OS. There are a lot of SOA-ish infrastructures that are being built or migrating to YARN (Storm, Samza) that are compelling servicehosts. You can also at weave from continuuity, that will help you host additional types of services.
to specifically address you q. - YARN is a good basis for SOA framework, it is more than a autoscaling it is a resource management and hosting framework and it is fast enough (esp. if you use one of the already developed infrastructures that are built on top of it)