couchbase cbdocloader problems - loader

I am just getting started with using Couchbase and have a problem using cbdocloader
I try this :
"\Program Files\Couchbase\Server\bin\tools\cbdocloader " -u
Administrator -p Administrator -n 127.0.0.1:8091 -b v2
json_street_trees.zip {'username': 'Administrator', 'node':
'127.0.0.1:8091', 'password': 'Administrat or', 'bucket': 'v2',
'ram_quota': 100} ['json_street_trees.zip'] [2014-01-04 21:03:01,460]
- [rest_client] [6148] - INFO - existing buckets : [u' beer-sample', u'default', u'gamesim-sample', u'v2'] [2014-01-04 21:03:01,461] -
[rest_client] [6148] - INFO - found bucket v2 done
Seems to work but I can't view the documents: there appears to be 23 items eg
v2 1 23 0 0 94MB /324MB
But 23 is the number of json files in the .zip file (not the individual records)
Then when I try to click on the "documents" it just hangs and doesn't return.
Is it working but just needs more time?
Am I doing something wrong?
Thanks,
See more at: http://www.couchbase.com/communities//node/add/question#sthash.fk1Hwr73.dpuf

If your documents are large, the web ui will hang because it will try to color & indent the data as a json string. This is a known issue with long content. Try to create small documents instead, if you want to use the web ui for watching them. Ofc, you can get the big documents via your own code too.

Related

How to list azure Databricks workspaces along with properties like workspaceId?

My objective is to create a csv file that lists all azure databricks workspaces and in particular has the workspace id.
I have been able to retrieve all details as json using the CLI:
az rest -m get --header "Accept=application/json" -u 'https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Databricks/workspaces?api-version=2018-04-01' > workspaces.json
How can I retrieve the same information using azure resource graph?
If you prefer to work with the workspace list api that returns json, here is one approach for post processing the data (in my case I ran this from a jupyter notebook):
import json
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# json from https://learn.microsoft.com/en-us/rest/api/databricks/workspaces/list-by-subscription?tabs=HTTP&tryIt=true&source=docs#code-try-0
# E.g.
# az rest -m get --header "Accept=application/json" -u 'https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Databricks/workspaces?api-version=2018-04-01' > workspaces.json
pdf = pd.read_json('./workspaces.json')
# flatten the nested json
pdf_flat = pd.json_normalize(json.loads(pdf.to_json(orient="records")))
# drop columns with name '*.type'
pdf_flat.drop(pdf_flat.columns[pdf_flat.columns.str.endswith('.type')], axis=1, inplace=True)
# drop rows without a workspaceId
pdf_flat = pdf_flat[ ~pdf_flat['value.properties.workspaceId'].isna() ]
# drop unwanted columns
pdf_flat.drop(columns=[
'value.properties.parameters.enableFedRampCertification.value',
'value.properties.parameters.enableNoPublicIp.value',
'value.properties.parameters.natGatewayName.value',
'value.properties.parameters.prepareEncryption.value',
'value.properties.parameters.publicIpName.value',
'value.properties.parameters.relayNamespaceName.value',
'value.properties.parameters.requireInfrastructureEncryption.value',
'value.properties.parameters.resourceTags.value.databricks-environment',
'value.properties.parameters.storageAccountName.value',
'value.properties.parameters.storageAccountSkuName.value',
'value.properties.parameters.vnetAddressPrefix.value',
], inplace=True)
pdf_flat
I was able to retrieve the information I needed by:
Searching for databricks resources in the Azure portal:
From there I could click Open Query to use the Azure Resource Graph Explorer and write a query to extract the information I need:
I ended up using the following query:
// Run query to see results.
where type == "microsoft.databricks/workspaces"
| project id,properties.workspaceId,name,tenantId,type,resourceGroup,location,subscriptionId,kind,tags

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

I seem to be having trouble properly combining thousands of netCDF files (42000+) (3gb in size, for this particular folder/variable). The main variable that i want to combine has a structure of (6, 127, 118) i.e (time,lat,lon)
Im appending each file 1 by 1 since the number of files is too long.
I have tried:
for i in input_source/**/**/*.nc; do ncrcat -A -h append_output.nc $i append_output.nc ; done
but this method seems to be really slow (order of kb/s and seems to be getting slower as more files are appended) and is also giving a warning:
ncrcat: WARNING Intra-file non-monotonicity. Record coordinate "forecast_period" does not monotonically increase between (input file file1.nc record indices: 17, 18) (output file file1.nc record indices 17, 18) record coordinate values 6.000000, 1.000000
that basically just increases the variable "forecast_period" 1-6 n-times. n = 42000files. i.e. [1,2,3,4,5,6,1,2,3,4,5,6......n]
And despite this warning i can still open the file and ncrcat does what its supposed to, it is just slow, at-least for this particular method
I have also tried adding in the option:
--no_tmp_fl
but this gives an eror:
ERROR: nco__open() unable to open file "append_output.nc"
full error attached below
If it helps, im using wsl and ubuntu in windows 10.
Im new to bash and any comments would be much appreciated.
Either of these commands should work:
ncrcat --no_tmp_fl -h *.nc
or
ls input_source/**/**/*.nc | ncrcat --no_tmp_fl -h append_output.nc
Your original command is slow because you open and close the output files N times. These commands open it once, fill-it up, then close it.
I would use CDO for this task. Given the huge number of files it is recommended to first sort them on time (assuming you want to merge them along the time axis). After that, you can use
cdo cat *.nc outfile

How to get via AMI the Pause time of Agent?

I'm making a WebSocket application, and need to get the current Pause Time of an Agent.
When I Call the action: QueueStatus, the return is QueueMember event.
an in JSON is returned something like this:
{ActionID: "WelcomeStatus/7000"
CallsTaken: "0"
Event: "QueueMember"
InCall: "0"
LastCall: "0"
LastPause: "1568301325"
Location: "Agent/7000"
Membership: "dynamic"
Name: "Agent/7000"
Paused: "1"
PausedReason: "Almoço"
Penalty: "0"
Queue: "queue1"
StateInterface: "Agent/7000"
Status: "4"}
Note, is returned "LastPause", "PausedReson" and "Pause"..
In "LastPause", aways show some crazy number (i dont understand that number hahahahah).
Well, how to get the current pause time from Asterisk 15?
--EDIT:
By retesting, I have found that what is causing this is that I am also submitting a Reason for Break.
If I do not send the Reason for break time works normally.
Thanks for u help.
Surfing on asterisk's forum, I found the release:
Bugs fixed in this release:
ASTERISK-27541 - app_queue: Queue paused reason was (big number) secs ago when reason is set (Reported by César Benjamín García Martínez)
But this release is for Asterisk 16, not for Asterisk 15.
I've decided to search this issue in some C files, and i found the fail.
Remember, I have to recompile my asterisk, because I change things straight from the source code.
So if you need to perform this procedure, do it in a test environment before it is passed to the production environment.
Open the file:
/usr/src/asterisk-15.7.3/apps/app_queue.c
And search for this line:
mem->reason_paused, (long) (time(NULL) - mem->lastcall), ast_term_reset());
Change:
mem->reason_paused, (long) (time(NULL) - mem->lastpause), ast_term_reset());
And on this line:
"LastPause", (int)mem->lastpause,
Change to:
"LastPause", (long) (time(NULL) - mem->lastpause),
I think is done... All AMI requests and commands on CLI for me is returning the correct information, and works pretty on my AMI Socket.

MapReduceIndexerTool output dir error "Cannot write parent of file"

I want to use Cloudera's MapReduceIndexerTool to understand how morphlines work. I created a basic morphline that just reads lines from the input file and I tried to run that tool using that command:
hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
--morphline-file morphline.conf \
--output-dir hdfs:///hostname/dir/ \
--dry-run true
Hadoop is installed on the same machine where I run this command.
The error I'm getting is the following:
net.sourceforge.argparse4j.inf.ArgumentParserException: Cannot write parent of file: hdfs:/hostname/dir
at org.apache.solr.hadoop.PathArgumentType.verifyCanWriteParent(PathArgumentType.java:200)
The /dir directory has 777 permissions on it, so it is definitely allowed to write into it. I don't know what I should do to allow it to write into that output directory.
I'm new to HDFS and I don't know how I should approach this problem. Logs don't offer me any info about that.
What I tried until now (with no result):
created a hierarchy of 2 directories (/dir/dir2) and put 777 permissions on both of them
changed the output-dir schema from hdfs:///... to hdfs://... because all the examples in the --help menu are built that way, but this leads to an invalid schema error
Thank you.
It states 'cannot write parent of file'. And the parent in your case is /. Take a look into the source:
private void verifyCanWriteParent(ArgumentParser parser, Path file) throws ArgumentParserException, IOException {
Path parent = file.getParent();
if (parent == null || !fs.exists(parent) || !fs.getFileStatus(parent).getPermission().getUserAction().implies(FsAction.WRITE)) {
throw new ArgumentParserException("Cannot write parent of file: " + file, parser);
}
}
In the message printed is file, in your case hdfs:/hostname/dir, so file.getParent() will be /.
Additionally you can try the permissions with hadoop fs command, for example you can try to create a zero length file in the path:
hadoop fs -touchz /test-file
I solved that problem after days of working on it.
The problem is with that line --output-dir hdfs:///hostname/dir/.
First of all, there are not 3 slashes at the beginning as I put in my continuous trying to make this work, there are only 2 (as in any valid HDFS URI). Actually I put 3 slashes because otherwise, the tool throws an invalid schema exception! You can easily see in this code that the schema check is done before the verifyCanWriteParent check.
I tried to get the hostname by simply running the hostname command on the Cent OS machine that I was running the tool on. This was the main issue. I analyzed the /etc/hosts file and I saw that there are 2 hostnames for the same local IP. I took the second one and it worked. (I also attached the port to the hostname, so the final format is the following: --output-dir hdfs://correct_hostname:8020/path/to/file/from/hdfs
This error is very confusing because everywhere you look for the namenode hostname, you will see the same thing that the hostname command returns. Moreover, the errors are not structured in a way that you can diagnose the problem and take a logical path to solve it.
Additional information regarding this tool and debugging it
If you want to see the actual code that runs behind it, check the cloudera version that you are running and select the same branch on the official repository. The master is not up to date.
If you want to just run this tool to play with the morphline (by using the --dry-run option) without connecting to Solr and playing with it, you can't. You have to specify a Zookeeper endpoint and a Solr collection or a solr config directory, which involves additional work to research on. This is something that can be improved to this tool.
You don't need to run the tool with -u hdfs, it works with a regular user.

How to fill redis with redis-cli with dummy data of size weigh hundreds of MB?

I am getting my hand dirty with redis monitoring. So far I came up with this metrics useful to monitor about redis:
memory_used
through put
latency
connections
replication
I am newbie on this. I am trying to fill the redis from redis-cli with dummy data as:
for i in `seq 10000000`; do redis-cli SET users:app "{id: '$i', name: 'name$i', address: 'address$i' }" ; done
but it doesn't scale my need to fillup the redis-db fast enough...
Also I need some help regarding the latency and throught put monitoring. I know what they mean, but I don't know how to measure them... My eyes don't see anything rellated to that on output for redis-cli info
Thanks, for support/guidence :D
Use the undocumented DEBUG POPULATE command.
DEBUG POPULATE count [prefix] [size]: Create count string keys named key:<num>. If a prefix is specified it's used instead of the key prefix.
The value starts with value:<num> and is filled with null chars if needed until it achieves the given size if specified.
> DEBUG POPULATE 5 test 1000000
OK
> KEYS *
1) "test:3"
2) "test:1"
3) "test:4"
4) "test:2"
5) "test:0"
> STRLEN test:0
(integer) 1000000
> STRLEN test:4
(integer) 1000000
> GETRANGE test:1 0 10
"value:1\x00\x00\x00\x00"
To "fill fast", follow the instructions in the documentation about Mass Insert - the gist is using the --pipe directive on a pre-prepared data file.
following #leomurillo
I got this to work without the last parameter, and I couldn't find the documentation for this undocumented command :)
127.0.0.1:6379> DEBUG POPULATE 10000000 PHPREDIS_SESSION
OK
(15.61s)
127.0.0.1:6379> dbsize
(integer) 10000334
Using Python
redis-dummy-data-generator.py, Creates 10000 key-value pairs
#!/usr/bin/python
for i in range(10000):
print 'set name'+str(i),'helloworld'
Run generator script and store the output in redis_commands.txt file
python redis-dummy-data-generator.py > redis_commands.txt
Load generated dummy data into redis-server
redis-cli -a mypassword -h localhost -p 6379 < redis_commands.txt

Resources