How can I run a scala command line in databricks?

How can I run a scala command line in databricks? - azure-databricks

In databricks, I wish to run the following scala command
scala [-cp scalatest-<version>.jar:...] org.scalatest.tools.Runner [arguments]
Can this be run within a scala notebook, or as a job?
I have tried various ways of running this command from within a notebook without success. I'm not sure if its possible or if I need to structure the sytax in such a way for it to work.
scalatest_2.12__3.0.8.jar is included within the databricks runtime

A databricks job can be defined which runs a scalatest project:
specify:
Main class: org.scalatest.tools.Runner
Dependent libraries: (eg a jar containing the scalatest project)
Parameters: ["-R","dbfs:/ourProject.jar","-o","-f","/dbfs/ourLog.txt","-s","uk.xxx.xxx.xxx.Main","-DmaxConnections=100"]

Related

What happen after "./gradlew {package}-rpm"

I am new to bigtop architecture, I would like to know
how does bigtop know the real build command to launch for this specific package after ./gradlew {package}-rpm, I assume there must be some kind of configs define the real build command. (The package is a maven based project)
Thank you.

I'm not familiar with Bigtop, but I am familiar with Gradle. See here for the Gradle task definition that you're referring to: https://github.com/apache/bigtop/blob/2d6f3dd7b7241aa2191c9ebc5a502a1415932464/packages.gradle#L460
The command that the task will execute is given under the exec directive: rpmbuild <command>. command is an array of arguments defined just above that directive. Most of its arguments are derived from the config object, which is basically a nested map (like a JSON object) produced by Groovy's ConfigSlurper, which reads the input BOM file as if it were a Groovy file.
So:
"Slurp" the BOM configuation into the config object
For each "component" defined within the config configuration, produce a set of tasks (${package}-rpm and others)
When configuring the ${package}-rpm task, use the BOM configuration to derive the command arguments using the logic provided within the task closure
Upon execution, run rpmbuild with the aforementioned command arguments

How can I run single groovy class file independently in gradle?

I have 2 groovy class files under one rootProject. I want to run just one of them without running the other one. How can I do that using intellij command line and command prompt. Right now when I run the rootProject both files run one after the other. Is there any setting required build.gradle file then please explain hoow can I do that as I am very new to gradle and don't know much about it. Thanks!

If you have only those two groovy scripts (and there is nothing to compile etc.), you do not need Gradle to run it at all. From command line call
groovy <filename>.groovy
To run the script directly from Gradle Script, use
new GroovyShell().run(file('somePath'))

Yes you can create intellij config to run your script.
Follow these steps:
1) Edit configuration
2) Add new Groovy configuration
3) Browse you script
4) Run your script

Setting Spark Classpath on Amazon EMR

I am trying to run some simple jobs on EMR (AMI 3.6) with Hadoop 2.4 and Spark 1.3.1. I have installed Spark manually without a bootstrap script. Currently I am trying to read and process data from S3 but it seems like I am missing an endless number of jars on my classpath.
Running commands on spark-shell. Starting shell using:
spark-shell --jars jar1.jar,jar2.jar...
Commands run on the shell:
val lines = sc.textFile("s3://folder/file.gz")
lines.collect()
The errors always look something like: "Class xyz not found". After I find the needed jar and add it to the classpath, I will get this error again but with a different class name in the error message.
Is there a set of jars that are needed for working with (compressed and uncompressed) S3 files?

I was able to figure out the jars needed for my classpath by following the logic in the AWS GitHub repo https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark.
The install-spark and install-spark-script.py files contain logic for copying jars into a new 'classpath' directory used by the SPARK_CLASSPATH variable (spark-env.sh).
The jars I was personally missing were located in /usr/share/aws/emr/emrfs/lib/ and /usr/share/aws/emr/lib/

It seems that you have not imported the proper libraries from with-in the spark-shell.
To do so :
import path.to.Class
or more likely if you want to import the RDD class, per say:
import org.apache.spark.rdd.RDD

How to build and execute examples in Mahout in Action

I am learning Mahout in Action now and writing to ask how to build and execute examples in the book. I can find instructions with eclipse, but my environment doesn't include UI. So I copied the first example (RecommenderIntro) to RecommenderIntro.java and compile it through javac.
I got an error because the package was not imported. So I am looking for :
Approaches to import missing packages.
I guess, even it compiles successfully, .class file will be generated,
how can I execute it? through "java RecommnderIntro"? I can execute
mahout examples through sudo -u hdfs hadoop jar
mahout-examples-0.7-cdh4.2.0-job.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job, how can I
do something similar for my own example?
All my data is saved in HBase tables, but in the book (and even
google), I cannot find a way to integrate it with HBase, any
suggestions?

q1 and q2, you need a java build tool like maven.
You build the hadoop-jar with : 'mvn clean install' This creates your hadoop job in target/mia-job.jar
You then execute your job with:
hadoop jar target/mia-job.jar RecommenderIntro inputDirIgnored outputDirIgnored
(The RecommenderIntro ignores parameters, but hadoop forces you to specify at least 2 parameters usually the input and output dir )
q3: You can't out-of-the-box.
Option1: export your hbase data to a text file 'intro.csv' with content like: "%userId%, %ItemId%, %score%" as described in the book. Because that's the file the RecommenderIntro is looking for.
Option2: Modify the example code to read data from hbase...
ps1. for developing such an application I'd really advise using an IDE. Because it allows you to use code-completion, execute, build, etc. A simple way to get started is to download a virtual image with hadoop like Cloudera or HortonWorks and install an IDE like eclipse. You can also configure these images to use your hadoop cluster, but you dont need to for small data sets.
ps2. The RecommenderIntro code isn't a distributed implementation and thus can't run on large datasets. It also runs locally instead of on a hadoop cluster.

Running a groovy script created in STS(Spring Tool Suite) using the command line

I have used the STS(Spring Tool Suite) to create a compiled groovy script which exists as a file on windows called Test.class. I am able to right click on the file in STS and execute it which works well.
However - I want to be able to execute the script on the windows command line, so far I have tried various ways but have not been successful. I have tried the following...
java -cp C:\Users\MyName\springsource\sts->
3.1.0.RELEASE\plugins\org.codehaus.grails.bundle_2.1.1\content\lib\
org.codehaus.groovy\groovy-all\jars\groovy-all-1.8.8.jar Test.class
But that does not work it gives me an error --> Error: Could not find or load main class Test.class
Any Pointers?

You are trying to run a test case, so you really ned to be launching junit and ass this test as the test to run.
The easiest thing to do is to download a distribution of groovy, unzip, and run:
groovy Test.groovy

Step 1
In STS(Spring Tool Suite) ,Create a Groovy Class e.g. Customer.groovy file. Specify a Package Name e.g. com.customer. In the main method put in code to validate the code is being called e.g. println ‘test’.
Step 2
Go to your command line (Windows use command prompt). Switch\cd to the ROOT directory of your project. Execute the command below.
Step 3
execute java -cp C:\Users\Profile\springsource\sts-3.1.0.RELEASE\plugins\org.codehaus.grails.bundle_2.1.1\content\lib\org.codehaus.groovy\groovy-all\jars\groovy-all-1.8.8.jar;. com.customer.Customer
The code should run.
V Important
If like me, you do not have a class and you only have a groovy script then in step#3 specify the Groovy script name without the suffix.

Test is naked? (without package name?) try this.
java -cp C:\Users\MyName\springsource\sts-> 3.1.0.RELEASE\plugins\org.codehaus.grails.bundle_2.1.1\content\lib\ org.codehaus.groovy\groovy-all\jars\groovy-all-1.8.8.jar;. package.Test
Pay attention to ;. and package name.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I run a scala command line in databricks? - azure-databricks

A databricks job can be defined which runs a scalatest project: specify: Main class: org.scalatest.tools.Runner Dependent libraries: (eg a jar containing the scalatest project) Parameters: ["-R","dbfs:/ourProject.jar","-o","-f","/dbfs/ourLog.txt","-s","uk.xxx.xxx.xxx.Main","-DmaxConnections=100"]

Related

What happen after "./gradlew {package}-rpm"

How can I run single groovy class file independently in gradle?

Setting Spark Classpath on Amazon EMR

How to build and execute examples in Mahout in Action

Running a groovy script created in STS(Spring Tool Suite) using the command line

Categories

Resources