I have huge number of csv files which I am processing via bash.
Is there a way that i can call the bash script file from pyspark and then generate a RDD out of that?
Used subprocess.call to trigger the bash code.
subprocess.call("run.bash", shell=True)
Related
I have a Bash script from the terminal that accepts a file as one of its arguments.
I want a user from the web to upload a file within the UI. Then that file would be used as one of the arguments in the Bash script. I am assuming the file and Bash Script will have to be in a server. Then, I want to print the results of the Bash script to the user in their UI.
What technologies and architecture would you use to achieve this? I do not want to recreate the Bash script.
I am a beginner in shell scripting. I have to write a code to get the list of databases which are modified after our last export from one server and import those to our new server.
I could successfully execute the query to get the list of databases which are modified from old server using a java CLI. But it gives the output as below
|DATABASE_NAME|
|CONSUMER_B1|
|MOSAIC|
|ADMIN|
|AML|
etc
So I want to retrieve the database list within this pipe using a shell command and put it in a for loop or some loop and using an export command (which is already tested) need to get export of each of this databases one by one.
For example, prototype would be like below. databasename should be replaced with each of the databases which is in the above list.
Loop
./export.sh --login abc --password *** --file export_delta.vql --server "localhost:9999/databasename" 2>&1 >> $logfile
end loop
Could anyone please help in how to implement it in shell script?
Thanks!
I am trying to write a wrapper script that calls other shell scripts in a sequential manner
There are 3 shell scripts that pick .csv files of a particular pattern from a specified location and process them.
I need to run them sequentially by calling them from one wrapper script
Let's consider 3 scripts
a.ksh, b.ksh and c.ksh that run sequentially in the same order.
The requirement is that the script should fail if a.ksh fails but continue if b.sh fails.
Please suggest.
Thanks in advance!
Something like:
./a.ksh && ./b.ksh; ./c.ksh
I haven't tried this out. Do test with sample scripts that fail/pass before using.
See: http://www.gnu.org/software/bash/manual/bashref.html#Lists
I have a snakemake rule that creates a text file will many shell commands as its output. I would like to design a second rule that would take the file as an input and run all the commands specified in the file in parallel - taking advantage of multiple threads/cores or submitting the commands to a cluster if --cluster is specified. Is that possible?
You could write python code, either using the "run:" block in snakemake or "script:" with an external python script, which uses the subprocess module to run each command in the file in a separate process. I don't think there's a way to do it more directly in Snakemake.
A = LOAD '/pig/student.tsv' as (rollno:int, name:chararray, gpa:float);
DUMP A;
If I want to execute the first line, I have to type Enter key after the first line.
How can I make it as a single execution?
You can create a pig script file to make it as single execution.
test.pig
A = LOAD '/pig/student.tsv' as (rollno:int, name:chararray, gpa:float);
DUMP A;
Now run the pig script using below command from pig/bin,
pig -f /path/test.pig
You need to create a pig script(say, myscript.pig) containing those 2 lines. Then, run this script using the command pig myscript.pig
Short-answer, use a script as suggested by Kumar.
Long answer, if you create a single line script containing multiple statements, then it will be not be long before it becomes a nightmare to read and understand as your script grows. Having said that, if you use a script, it won't matter whether you use one line or multiple lines.
So, my suggestion will be to use a well-indented script for learning/development/what-have-you.