I am trying to write a balancer tool for Hbase which could balance regions across regionServers for a table by region count and/or region size (sum of storeFile sizes). I could not find any Hbase API class which returns the regions size or related info. I have already checked a few of the classes which could be used to get other table/region info, e.g. org.apache.hadoop.hbase.client.HTable and HBaseAdmin.
I am thinking, another way this could be implemented is by using one of the Hadoop classes which returns the size of the directories in the fileSystem, for e.g. org.apache.hadoop.fs.FileSystem lists the files under a particular HDFS path.
Any suggestions ?
I use this to do managed splits of regions, but, you could leverage it to load-balance on your own. I also load-balance myself to spread the regions ( of a given table ) evenly across our nodes so that MR jobs are evenly distributed.
Perhaps the code-snippet below is useful?
final HBaseAdmin admin = new HBaseAdmin(conf);
final ClusterStatus clusterStatus = admin.getClusterStatus();
for (ServerName serverName : clusterStatus.getServers()) {
final HServerLoad serverLoad = clusterStatus.getLoad(serverName);
for (Map.Entry<byte[], HServerLoad.RegionLoad> entry : serverLoad.getRegionsLoad().entrySet()) {
final String region = Bytes.toString(entry.getKey());
final HServerLoad.RegionLoad regionLoad = entry.getValue();
long storeFileSize = regionLoad.getStorefileSizeMB();
// other useful thing in regionLoad if you like
}
}
What's wrong with the default Load Balancer?
From the Wiki:
The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via hbase.balancer.period and defaults to 300000 (5 minutes).
If you really want to do it yourself you could indeed use the Hadoop API and more specifally, the FileStatus class. This class acts as an interface to represent the client side information for a file.
Related
I have a 2 node setup distributed cache setup which needs persistence setup for both members.
I have MapSore and Maploader implemented and the same code is deployed on both nodes.
The MapStore and MapLoader work absolutely ok on a single member setup, but after another member joins, MapStore and Maploader continue to work on the first member and all insert or updates by the second member are persisted to disk via the first member.
My requirement is that each member should be able to persist to disk independently so that distributed cache is backed up on all members and not just the first member.
Is there a setting I can change to achieve this.
Here is my Hazlecast Spring Configuration.
#Bean
public HazelcastInstance hazelcastInstance(H2MapStorage h2mapStore) throws IOException{
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setImplementation(h2mapStore);
mapStoreConfig.setWriteDelaySeconds(0);
YamlConfigBuilder configBuilder=null;
if(new File(hazelcastConfiglocation).exists()) {
configBuilder = new YamlConfigBuilder(hazelcastConfiglocation);
}else {
configBuilder = new YamlConfigBuilder();
}
Config config = configBuilder.build();
config.setProperty("hazelcast.jmx", "true");
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.setMapStoreConfig(mapStoreConfig);
return Hazelcast.newHazelcastInstance(config);
}
Here is my hazlecast yml config - This is placed in /opt/hazlecast.yml which is picked up by my spring config up above.
hazelcast:
group:
name: tsystems
management-center:
enabled: false
url: http://localhost:8080/hazelcast-mancenter
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
- 0
join:
multicast:
enabled: false
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: true
member-list:
- 192.168.1.13
Entire code is available here :
[https://bitbucket.org/samrat_roy/hazelcasttest/src/master/][1]
This might just be bad luck and low data volumes, rather than an actual error.
On each node, try the running the localKeySet() method and printing the results.
This will tell you which keys are on which node in the cluster. The node that owns key "X" will invoke the map store for that key, even if the update was initiated by another node.
If you have low data volumes, it may not be a 50/50 data split. At an extreme, 2 data records in a 2-node cluster could have both data records on the same node.
If you have a 1,000 data records, it's pretty unlikely that they'll all be on the same node.
So the other thing to try is add more data and update all data, to see if both nodes participate.
Ok after struggling a lot I noticed a teeny tiny buy critical detail.
Datastore needs to be a centralized system that is accessible from all Hazelcast members. Persistence to a local file system is not supported.
This is absolutely in line with what I was observing
[https://docs.hazelcast.org/docs/latest/manual/html-single/#loading-and-storing-persistent-data]
However not be discouraged, I found out that I could use event listeners to do the same thing I needed to do.
#Component
public class HazelCastEntryListner
implements EntryAddedListener<String,Object>, EntryUpdatedListener<String,Object>, EntryRemovedListener<String,Object>,
EntryEvictedListener<String,Object>, EntryLoadedListener<String,Object>, MapEvictedListener, MapClearedListener {
#Autowired
#Lazy
private RequestDao requestDao;
I created this class and hooked it into the config as so
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.addEntryListenerConfig(new EntryListenerConfig(entryListner, false, true));
return Hazelcast.newHazelcastInstance(config);
This worked flawlessly, I am able to replicate data over to both the embedded databases on each node.
My use case was to cover HA failover edge-cases. During HA failover, The slave node needed to know the working memory of the active node.
I am not using hazelcast as a cache, rather I am using as a data syncing mechanism.
My spark application process the files (average size is 20 MB) with custom hadoop input format and stores the result in HDFS.
Following is the code snippet.
Configuration conf = new Configuration();
JavaPairRDD<Text, Text> baseRDD = ctx
.newAPIHadoopFile(input, CustomInputFormat.class,Text.class, Text.class, conf);
JavaRDD<myClass> mapPartitionsRDD = baseRDD
.mapPartitions(new FlatMapFunction<Iterator<Tuple2<Text, Text>>, myClass>() {
//my logic goes here
}
//few more translformations
result.saveAsTextFile(path);
This application creates 1 task/ partition per file and processes and stores the corresponding part file in HDFS.
i.e, For 10,000 input files 10,000 tasks are created and 10,000 part files are stored in HDFS.
Both mapPartitions and map operations on baseRDD are creating 1 task per file.
SO question
How to set the number of partitions for newAPIHadoopFile?
suggests to set
conf.setInt("mapred.max.split.size", 4); for configuring no of partitions.
But when this parameter is set CPU is utilized at maximum and none of the stage is not started even after long time.
If I don't set this parameter then application will be completed successfully as mentioned above.
How to set number of partitions with newAPIHadoopFile and increase the efficiency?
What happens with mapred.max.split.size option?
============
update:
What happens with mapred.max.split.size option?
In my use case file size is small and changing the split size options are irrelevant here.
more info on this SO: Behavior of the parameter "mapred.min.split.size" in HDFS
Just use baseRDD.repartition(<a sane amount>).mapPartitions(...). That will move the resulting operation to fewer partitions, especially if your files are small.
I had a single node cassandra cluster on EC2. I was running my tests on it and it worked great.
But then, I had to move this cluster to a VPC, so rather than moving the data, I created a new cluster with two nodes (both seeds), and imported the data from the former cluster using sstableloader.
I thought it was really slow, so decided to add two more instances (not seeds). It's even slower.
I use a ONE consistency, and my replication factor is 1, so I don't quite see why it is so slow.
To give you an idea, I can only do 3 read per second.
We use the EC2Snitch but not the AMI recommended by Cassandra though (we didn't see that part in the documentation when we installed it).
I didn't run a cleanup yet on the two first nodes after adding the two new nodes.
When I request all elements of a column family which contains only a dozen of rows, it times out. If I request one element, I get the result after a long time, and with a huge tracing session (~30000 lines...)!
Does anyone know what I can do to make it faster? I don't quite know where to look at right now.
My Cassandra version is Cassandra 2.1.3.
Here is my keyspace schema:
CREATE KEYSPACE keyspace_name WITH replication = {'class': 'NetworkTopologyStrategy', 'us-west-2': '1'} AND durable_writes = true;
And the options for our column family
CREATE TABLE keyspace_name."CFName" (
// ...
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
I had to run a compaction on my nodes because I had too many tombstones.
Many thanks to the amazing IRC channel on freenode #cassandra.
Is it possible to change Hazelcast configuration at runtime and if so what parameters are modifiable.
It seems to be possible using Hazelcast Management Center but can't find any examples/references in official docos/forums.
Might be a bit late to answer your question but better late than never :)
You can modify some of the map config properties after the map has been created using the MapService:
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
// create map
IMap<String, Integer> myMap = instance.getMap("myMap");
// create a new map config
MapConfig newMapConfig = instance.getConfig().getMapConfig("myMap").setAsyncBackupCount(1);
// submit the new map config to the map service
MapService mapService = (MapService)(((AbstractDistributedObject)instance.getDistributedObject(MapService.SERVICE_NAME, "")).getService());
mapService.getMapServiceContext().getMapContainer("myMap").setMapConfig(newMapConfig);
Note that this API is not visible/documented so it might not work in future versions.
We are using this in our application when we need to insert several million entries in a distributed map at startup. Disabling the backup cut the insertion time by 30%. After the data are inserted, we enable the backup.
The Hazelcast internals are not really designed to be modifiable. What do you want to modify?
I am trying to perform a mapreduce job using the Python MRJob lib and am having some issues getting it to properly distribute across my Hadoop cluster. I believe I am simply missing a basic principle of mapreduce. My cluster is a small, one master one slave test cluster. The basic idea is that I'm just requesting a series of web pages with parameters, doing some analysis on them and returning back some properties on the web page.
The input to my map function is simply a list of URLs with parameters such as the following:
http://guelph.backpage.com/automotive/?layout=bla&keyword=towing
http://guelph.backpage.com/whatever/?p=blah
http://semanticreference.com/search.html?go=Search&q=red
http://copiahcounty.wlbt.com/h/events?ename=drupaleventsxmlapi&s=rrr
http://sweetrococo.livejournal.com/34076.html?mode=ffff
Such that the key-value pairs for the initial input are just key:None, val:URL.
The following is my map function:
def mapper(self, key, url):
'''Yield domain as the key, and (url, query parameter) tuple as the value'''
parsed_url = urlparse(url)
domain = parsed_url.scheme + "://" + parsed_url.netloc + "/"
if self.myclass.check_if_param(parsed_url):
parsed_url_query = parsed_url.query
url_q_dic = parse_qs(parsed_url_query)
for query_param, query_val in url_q_dic.iteritems():
#yielding a tuple in mrjob will yield a list
yield domain, (url, query_param)
Pretty simple, I'm just checking to make sure the URL has a parameter and yielding the URL's domain as key and a tuple giving me the URL and the query parameter as value which MRJob kindly transforms into a list to pass to the reducer, which is the following:
def reducer(self, domain, url_query_params):
final_list = []
for url_query_param in url_query_params:
url_to_list_props = url_query_param[0]
param_to_list_props = url_query_param[1]
#set our target that we will request and do some analysis on
self.myclass.set_target(url_to_list_props, param_to_list_props)
#perform a bunch of requests and do analysis on the URL requested
props_list = self.myclass.get_props()
for prop in props_list:
final_list.append(prop)
#index this stuff to a central db
MapReduceIndexer(domain, final_list).add_prop_info()
yield domain, final_list
My problem is that only one reducer task is run. I would expect the number of reducer tasks to be equal to the number of unique keys emitted by the mapper. The end result with the above code is that I have one reducer which runs on the master, but the slave sits idly and does nothing, which is obviously not ideal. I notice that in my output a few mapper tasks are started, but always only 1 reducer task. Other than that, the task runs smoothly and all works as expected.
My question is... what the heck am I doing wrong? Am I misunderstanding the reduce step or screwing up my key-value pairs somewhere? Why are there not multiple reducers running on this job?
Update: OK so from the answer given I increased mapred.reduce.tasks to higher (it was the default which I now realize is 1). This was indeed why I was getting 1 reducer. I now see 3 reduce tasks being performed simultaneously. I now have an import error on my slave that needs to be resolved but at least I am getting somewhere...
The number of reducers is totally unrelated to the form of your input data. For MRJob it looks like you need bootstrap options