I am writing unit testcase for spark-scala code in windows with intellij using winutils.
The function I need to test has hadoop commands run using the ProcessBuilder.
I am not getting how to mock this part of the function to return either 0 or 1. I do not wish to run the hadoop command at all.
Can anyone please suggest.
def fun(path:String):Boolean={
val fullCmd="hadoop fs -ls "+path
val exitCode=Process.apply(fullCmd).run().exitValue()
return exitCode
}
Related
I am using tarantool/tarantool:2.6.0 Docker image (the latest at the moment) and writing lua scripts for the project. I try to find out how to see the results of callin' print() function. It's quite difficult to debug my code without print() working.
In tarantool console print() have no effect also.
Using simple print()
Docs says that print() works to stdout, but I don't see any results when I watch container's logs by docker logs -f <CONTAINER_NAME>
I also tried to set container's logs driver to local. Than I get one time print to container's logs, but only once...
The container's /var/log directory is always empty.
Using box.session.push()
Using box.session.push() works fine in console, but when I use it in lua script:
-- app.lua
function log(s)
box.session.push(s)
end
-- No effect
log('hello')
function say_something(s)
log(s)
end
box.schema.func.create('say_something')
box.schema.user.grant('guest', 'execute', 'function', 'say_something')
And then call say_something() from nodeJs connector like this:
const TarantoolConnection = require('tarantool-driver');
const conn = new TarantoolConnection(connectionData);
const res = await conn.call('update_links', 'hello');
I get error:
Any suggestions?
Thanx!
I suppose you've missed io.flush() after print command.
After I added io.flush() after each print call my messages start to write to logs (docker logs -f <CONTAINER_NAME>).
Also I'd recommend to use log module for such purpose. It writes to stderr without buffering.
Regarding the error in the connector, I think nodejs connector simply doesn't support pushes.
I'm trying to use Jenkins to automate performance testing with JMeter,
each build is a single JMeter test and I want to increase the number of users(threads) for each Jenkins build if the previous was successful.
I have configured most of the build, with SSH plugin I can restart Tomcat, copy catalina.out, with publishing performance I can open the .jtl file and determine if the build was successful.
What I want is to execute a different batch command for the next build(to increase the number of users(threads) and user id's)
For example:
jmeter -Jthreads=10 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl
jmeter -Jthreads=20 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl
jmeter -Jthreads=30 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl...
Is there some good jmeter plugin some counter that i can use to increase some variable by 10 each time:
jmeter -Jthreads=%variable1%...
I have tried by setting environmental variables and then incrementing that variable by:
"SET /A thread+=10"
but it doesn't change that variable because jenkins opens its own CMD, a new process :
("cmd /c call C:\WINDOWS\TEMP\jenkins556482303577128680.bat")
Use the following SET command to increase threads variable by 10:
SET /A threads=threads+10
Or inside double quotes:
SET /A "threads+=10"
Not knowing your Jenkins configuration, and which plugins you have installed and how do you run the test it is quite hard to come up with the best solution.
The only "universal" workaround I can think of is writing the current number of threads into a file in Jenkins workspace and reading the value of threads from the file on next execution.
Add setUp Thread Group to your Test Plan
Add JSR223 Sampler to your Thread Group
Put the following Groovy code into "Script" area:
import org.apache.jmeter.threads.ThreadGroup
import org.apache.jorphan.collections.SearchByClass
import org.apache.commons.io.FileUtils
SampleResult.setIgnore()
def file = new File(System.getenv('WORKSPACE') + System.getProperty('file.separator') + 'threads.number')
if (file.exists()) {
def newThreadNum = (FileUtils.readFileToString(file, 'UTF-8') as int) + 10
FileUtils.writeStringToFile(file, newThreadNum as String)
def engine = ctx.getEngine()
def test = org.apache.commons.lang.reflect.FieldUtils.getField(engine.getClass(), 'test', true)
def testPlanTree = test.get(engine)
SearchByClass<ThreadGroup> threadGroupSearch = new SearchByClass<>(ThreadGroup.class)
testPlanTree.traverse(threadGroupSearch)
def threadGroups = threadGroupSearch.getSearchResults()
threadGroups.each {
it.setNumThreads(newThreadNum)
}
} else {
FileUtils.writeStringToFile(file, props.get('threads'))
}
The code will write down the current number of threads in all Thread Groups into a file called threads.number in Jenkins Workspace and on subsequent runs it reads the value from it, adds 10 and writes it back.
For now i am creating 20 .jmx files (1.jmx, 2.jmx , 3.jmx ...) each whith a different number of users. and calling them whit this command :
jmeter -n -t C:\TestScripts\%BUILD_NUMBER%.jmx -l C:\TestScripts\%BUILD_NUMBER%.jtl
the first billd will call 1.jmx the second 2.jmx ...
it isn't the best method but it works for now. I will try your advice over the weekend when i have more time.
i have found the a solution that works for me, it inst pretty. I created a python script which changes a .CVS fil from which JMeter reads the number of threads and the starting user id. This python script incremets the starting user id by the number of threads in the previous bild and the number of threads by 10
file = open('C:\\Users\\mp\\AppData\\Local\\Programs\\Python\\Python37-32\\eggs.csv', 'r')
a,b=file.readlines()[0].split(",")
print(a,b)
b=int(b)
a=int(a)
b=a+b
a=a+10
print(a,b)
f = open("C:\\Users\\mp\\AppData\\Local\\Programs\\Python\\Python37-32\\eggs2.csv", "a")
f.write(str(a)+","+str(b))
f.close()
I have python on my pc and a i am calling the script in Jenkins as a windows Bach command
C:\Users\mp\AppData\Local\Programs\Python\Python37-32\python.exe C:\Users\mp\AppData\Local\Programs\Python\Python37-32\rename_write_file.py
I am much better in python than Java so I implemented this in Python.
So for each new test,the CSV file from which jmeter reads values is changed.
I want to use Cloudera's MapReduceIndexerTool to understand how morphlines work. I created a basic morphline that just reads lines from the input file and I tried to run that tool using that command:
hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
--morphline-file morphline.conf \
--output-dir hdfs:///hostname/dir/ \
--dry-run true
Hadoop is installed on the same machine where I run this command.
The error I'm getting is the following:
net.sourceforge.argparse4j.inf.ArgumentParserException: Cannot write parent of file: hdfs:/hostname/dir
at org.apache.solr.hadoop.PathArgumentType.verifyCanWriteParent(PathArgumentType.java:200)
The /dir directory has 777 permissions on it, so it is definitely allowed to write into it. I don't know what I should do to allow it to write into that output directory.
I'm new to HDFS and I don't know how I should approach this problem. Logs don't offer me any info about that.
What I tried until now (with no result):
created a hierarchy of 2 directories (/dir/dir2) and put 777 permissions on both of them
changed the output-dir schema from hdfs:///... to hdfs://... because all the examples in the --help menu are built that way, but this leads to an invalid schema error
Thank you.
It states 'cannot write parent of file'. And the parent in your case is /. Take a look into the source:
private void verifyCanWriteParent(ArgumentParser parser, Path file) throws ArgumentParserException, IOException {
Path parent = file.getParent();
if (parent == null || !fs.exists(parent) || !fs.getFileStatus(parent).getPermission().getUserAction().implies(FsAction.WRITE)) {
throw new ArgumentParserException("Cannot write parent of file: " + file, parser);
}
}
In the message printed is file, in your case hdfs:/hostname/dir, so file.getParent() will be /.
Additionally you can try the permissions with hadoop fs command, for example you can try to create a zero length file in the path:
hadoop fs -touchz /test-file
I solved that problem after days of working on it.
The problem is with that line --output-dir hdfs:///hostname/dir/.
First of all, there are not 3 slashes at the beginning as I put in my continuous trying to make this work, there are only 2 (as in any valid HDFS URI). Actually I put 3 slashes because otherwise, the tool throws an invalid schema exception! You can easily see in this code that the schema check is done before the verifyCanWriteParent check.
I tried to get the hostname by simply running the hostname command on the Cent OS machine that I was running the tool on. This was the main issue. I analyzed the /etc/hosts file and I saw that there are 2 hostnames for the same local IP. I took the second one and it worked. (I also attached the port to the hostname, so the final format is the following: --output-dir hdfs://correct_hostname:8020/path/to/file/from/hdfs
This error is very confusing because everywhere you look for the namenode hostname, you will see the same thing that the hostname command returns. Moreover, the errors are not structured in a way that you can diagnose the problem and take a logical path to solve it.
Additional information regarding this tool and debugging it
If you want to see the actual code that runs behind it, check the cloudera version that you are running and select the same branch on the official repository. The master is not up to date.
If you want to just run this tool to play with the morphline (by using the --dry-run option) without connecting to Solr and playing with it, you can't. You have to specify a Zookeeper endpoint and a Solr collection or a solr config directory, which involves additional work to research on. This is something that can be improved to this tool.
You don't need to run the tool with -u hdfs, it works with a regular user.
The following statement in SCALA as part of Spark Streaming module, 1) creates the part-xxxxx files, 2) but they are all empty in (Databricks). Wondering why this is so, as when output to console it is displayed correctly.
QS.foreachRDD(q=> {
var file=q.map(_.toUpper + "...")
file.saveAsTextFile("/QS/filexxx")
})
I need some help in my map-reduce code.
The code run's perfectly in eclipse and in standalone mode, but when i package the code and try running it locally on pseudo distributed mode, the output is not as i expect.
Map input records = 11
Map input records = 11
Reduce input records = 11
Reduce output records = 0
These are the values i get.
where as when i run the same code in eclipse or in standalone mode with same config & input file
Map input records = 11
Map output records = 11
Reduce input records = 11
Reduce output records = 4
Can any one tell me whats wrong..??
i tried both the ways of building .jar file for eclipse -> export -> runable jar and form terminal as well(javac -classpath hadoop-core-1.0.4 -d classes mapredcode.java && jar -cvf mapredcode.jar -C classes/ .)
and how do i debug this..
Are you using a combiner() method?
And if yes. then is the o/p of combiner the same as that of the mapper?
Because in Hadoop, Combiner is run at the disposal of Hadoop itself and may not be running in the pseudo-disrtibuted mode in your case.
The combiner in itself is nothing but a reducer that is used to lower the network traffic.
And the code should be such that even if a Combiner is not running, the reducer should get the expected format from the mapper.
Hope it helps.