Hadoop using c++ pipes: How to call Mapper.cleanup() - hadoop

Using C++ pipes api(1.2.0), How can I possibly get a call in Mapper.cleanup() after the map() phase of the mapper? Basically for each chunk I want to store my records in-memory during the map phase and then apply some processing afterwards.
Any hints are welcome,
Thanks,

The Mapper c++ class extends Closable:
class Mapper: public Closable {
public:
virtual void map(MapContext& context) = 0;
};
and Closable has the following signature:
class Closable {
public:
virtual void close() {}
virtual ~Closable() {}
};
So (not being a c++ programmer), i'm guessing you just need to write your logic in a method named close

Related

Instantiation of virtual member function with templated return type

I have a base class (with which I want to simulate interfaces)
template<typename TType>
class Base
{
public:
virtual SomeTemplatedClass<TType> GetTheObject() = 0;
}
and obviously a derived class
template<typename TType>
class Derived : public Base<TType>
{
public:
virtual SomeTemplatedClass<TType> GetTheObject() = 0;
}
but for some specific type I have the intention to specialize the 'GetTheObject'
template<>
SomeTemplatedClass<int> Derived<int>::GetTheObject()
{
return 5;
}
Visual Studio 2015 complains it cannot instantiate abstract class, when I try to use
Derived<int>
Providing even a throwing behavior to a template version
class Derived : public Base<TType>
{
public:
virtual SomeTemplatedClass<TType> GetTheObject() override
{
throw <something>;
}
}
Let everything compile.
So my question is: Why do i need to provide a generic behavior, when I have a specific one and the only one that is needed?
You don't need to implement the generic GetTheObject, but you need to declare it as non-pure. Otherwise your class is abstract.
template<typename TType>
class Derived : public Base<TType>
{
public:
virtual SomeTemplatedClass<TType> GetTheObject();
}
You can specialise the function now.
You won't be able to instantiate any non-specialised derived objects (you will get linker errors).
You cannot make an abstract class into concrete by simply providing an implementation of its pure virtual member outside of the class.
class A { virtual void f() = 0; }; // A is abstract
void A::f() {} // A is still abstract
Templates are no different.
template <int> class A { virtual void f() = 0; }; // A is abstract
template <int k> void A<k>::f() {} // A is still abstract
A function specialisation changes nothing.
template <int> class A { virtual void f() = 0; }; // A is abstract
template <int k> void A<k>::f() {} // A is still abstract
template <> void A<42>::f() {} // srsly are you kidding?
If you want the generic case to be abstract and the specialised case concrete, you need to specialise the entire class, not just the pure function implementation.

Are member variables of Hadoop Reducer class thread-safe?

I'm a newbie of the Hadoop ecosystem.
What I want to ask is that: "Are member variables of Reducer class thread-safe?"
Mapper passes data to Reducer with unique key.
There is a collection(ConcurrentLinkedQueue) which is a member variable in Reducer class.
The collection is initialized in the setup(Context) method of Reducer class.
Some Query objects(jOOQ) are created and appended into the collection in the reduce(...) method of Reducer class.
jooq.batch(collection).execute() method will be called in the last line of reduce(...) method within specified threshold(e.g 1000). And then the collection will be cleared by clear() method.
The remains of collection from step 4 will be processed as same as step 5 in cleanup(Context) method.
Question: Do I need to synchronize step 5?
Codes
public class SomeReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {
private Queue<Query> queries;
#Override
protected void setup(Context context) {
...
queries = new ConcurrentLinkedQueue<>();
}
#Override
protected void cleanup(Context context) {
if (!queries.isEmpty()) db.batch(queries).execute();
...
}
#Override
public void reduce(Text key, Iterable<Session> sessions, Context context) {
for (...iteration...) { queries.add(...create Query object...); }
// Is this code snippet below should be synchronized?
if (queries.size() >= 1000) {
db.batch(queries).execute();
queries.clear();
}
}
}
A Reducer is threadsafe. You will most likely have multiple Reducers running in parallel, but they are completely isolated from each other and only see their own data and instance variables.
So to answer your qustion, you do not need to synchronize your code or even use a ConcurrentLinkedQueue, it could just be a normal ArrayList.

Why CAsyncSocket does not have copy constructor or = operator?

I have inhereted CAsyncSocket and wanted to pass the objects around.
class ClientSocket : public CAsyncSocket
{
CAsyncSocket nitSocket;
public:
ClientSocket(void);
virtual ~ClientSocket(void);
};
I get sevaral compile errors when i do
void SomeOtherClass::func(ClientSocket &socket)
this->socket = socket;
}
Error:
'CAsyncSocket::operator =' : cannot access private member declared in class 'CAsyncSocket'
I looked into file and found
private:
CAsyncSocket(const CAsyncSocket& rSrc); // no implementation
void operator=(const CAsyncSocket& rSrc); // no implementation
Should i make my copy constructor but since there is no implementation for base class would my code crash at runtime.
Important: Should i make a copy ? WOULD my new object receive the events of original object?
Polymorphic types in C++ are usually made non-copyable because taking a copy of base class easily leads to slicing.

Static method Uses

if i have a static method Only advantage is that we have single copy.Need not have a object to call the Method. The same can be done be creating an object i.e we can call method with object. Why should we have static method. Can someone provide a example to explain?
Static methods can be useful when you have private constructors, because you want to abstract the instantiation process.
For example in C++:
class Foo {
Foo() {}
public:
static Foo *create() {
return new Foo;
}
};
In that example the abstraction just called an otherwise in accessible constructor, but in practice you might want to have a pool of objects which is shared and so the create() method would be managing this for you.
Sometimes when you have const members which need to be initalised at construction time it can be cleaner to move the logic for this into a private static method, e.g.:
struct Foo;
struct Bar {
Bar() : f(make()) {
}
private:
const Foo f;
static Foo make() {
// Create it here
}
};
The static method is used when developer is really sure the method is only have one instance in the class. There are no other instance that can change that.
eg :
public class People
{
private
public static Int32 GetValue(Int x)
{
return x + 3;
}
}
So even you are make instances of object people, the return from getvalue static method only produce x + 3.
It is usually used when you are really sure to make a functional method like math or physics method.
You can refer to functional programming that using static point of view.
Some of the old school guys are overusing the static method instead of doing OOP approach.
eg:
public class People
{
public static DataSet GetPeopleById(String personId)
{ .... implementation that using SQL query or stored procedure and return dataset ... }
public static DataSet GetXXXXXXX(String name, DateTime datex)
{ .... implementation ... }
}
The implementation above can be thousands of lines
This style happens everywhere to make it like OOP style (because it happen in the class) but thinking like procedural approach.
This is a help since not all people understand OOP style rather than like OOP style.
The other advantage using static are saving memory footprints and faster.
You can see in the blogs : http://www.dotnetperls.com/callvirt

Visual Studio code generated when choosing to explicitly implement interface

Sorry for the vague title, but I'm not sure what this is called.
Say I add IDisposable to my class, Visual Studio can create the method stub for me. But it creates the stub like:
void IDisposable.Dispose()
I don't follow what this syntax is doing. Why do it like this instead of public void Dispose()?
And with the first syntax, I couldn't work out how to call Dispose() from within my class (in my destructor).
When you implement an interface member explicitly, which is what the generated code is doing, you can't access the member through the class instance. Instead you have to call it through an instance of the interface. For example:
class MyClass : IDisposable
{
void IDisposable.Dispose()
{
// Do Stuff
}
~MyClass()
{
IDisposable me = (IDisposable)this;
me.Dispose();
}
}
This enables you to implement two interfaces with a member of the same name and explicitly call either member independently.
interface IExplict1
{
string InterfaceName();
}
interface IExplict2
{
string InterfaceName();
}
class MyClass : IExplict1, IExplict2
{
string IExplict1.InterfaceName()
{
return "IExplicit1";
}
string IExplict2.InterfaceName()
{
return "IExplicit2";
}
}
public static void Main()
{
MyClass myInstance = new MyClass();
Console.WriteLine( ((IExplcit1)myInstance).InstanceName() ); // outputs "IExplicit1"
IExplicit2 myExplicit2Instance = (IExplicit2)myInstance;
Console.WriteLine( myExplicit2Instance.InstanceName() ); // outputs "IExplicit2"
}
Visual studio gives you two options:
Implement
Implement explicit
You normally choose the first one (non-explicit): which gives you the behaviour you want.
The "explicit" option is useful if you inherit the same method from two different interfaces, i.e multiple inheritance (which isn't usually).
Members of an interface type are always public. Which requires their method implementation to be public as well. This doesn't compile for example:
interface IFoo { void Bar(); }
class Baz : IFoo {
private void Bar() { } // CS0737
}
Explicit interface implementation provides a syntax that allows the method to be private:
class Baz : IFoo {
void IFoo.Bar() { } // No error
}
A classic use for this is to hide the implementation of a base interface type. IEnumerable<> would be a very good example:
class Baz : IEnumerable<Foo> {
public IEnumerator<Foo> GetEnumerator() {}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { }
}
Note how the generic version is accessible, the non-generic version is hidden. That both discourages its use and avoids a compile error because of a duplicate method.
In your case, implementing Dispose() explicitly is wrong. You wrote Dispose() to allow the client code to call it, forcing it to cast to IDisposable to make the call doesn't make sense.
Also, calling Dispose() from a finalizer is a code smell. The standard pattern is to add a protected Dispose(bool disposing) method to your class.

Resources