Bash - work with a file in the temp folder - bash

In my script I am creating a temp directory with this command
TMPDIR=$(mktemp -d)
and later when I want to create a file there I use (with $DATA being my source data file)
touch $TMPDIR/data
echo "$DATA" > $TMPDIR/data
command. Later on, I use awk to alter the data with this syntax :
awk '
{ a[i++]= ($0 * '$factor') }
END{
{ for (j=0;j < i;j++) print a[j] }
}
' ${TMPDIR}/data
and then I use gnuplot to plot it. But gnuplot says there are some errors and thus I wanted to print the $TMPDIR/data with cat. But it says the file doesn't exist. What do I do wrong ?
Thanks

I was reading through the unanswered questions and found this one. Later on reading all the comments realized that this is one of the questions already answered in the comments. The issue here was that the user has forgotten to redirect the output from the awk command to a file. To save others from reading the comments and coming to the same conclusion, I am posting this as an answer. Here is the comment which answers the question:
as dumb as it seems to be, lurker was right, I have forgotten to
output the awk into the file I wanted to thank you all for your
comments – Jesse_Pinkman

Related

remove line in csv file if string found (from another text file) in bash

Due to a power failure issue, I am having to clean up jobs which are run based on text files. So the problem is, I have a text file with strings like so (they are uuids):
out_file.txt (~300k entries)
<some_uuidX>
<some_uuidY>
<some_uuidZ>
...
and a csv like so:
in_file.csv (~500k entries)
/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location3/,<some_uuidX>.json.<some_string3>
/path/to/some/location4/,<some_uuidY>.json.<some_string4>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
/path/to/some/location6/,<some_uuidZ>.json.<some_string6>
...
I would like to remove lines from out_file for entries which match in_file.
The end result:
/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
...
Since the file sizes are fairly large, I was wondering if there is an efficient way to do it in bash.
any tips would be geat.
Here is a potential grep solution:
grep -vFwf out_file.txt in_file.csv
And a potential awk solution (likely faster):
awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv
NB there are caveats to each of these approaches. Although they both appear to be suitable for your intended purpose (as indicated by your comment "the numbers add up correctly"), posting a minimal, reproducible example in future questions is the best way to help us help you.

awk command works with small files but does nothing with big ones

I have the following awk command to join lines which are smaller than a limit (it is basically used to break lines in multiline fixed-width file):
awk 'last{$0=last $0;} length($0)<21{last=$0" ";next} {print;last=""}' input_file.txt > output_file.txt
input_file.txt:
1,11,"dummy
111",1111
2,22,"dummy 222",2222
3,33,"dummy 333",3333
output_file.txt (expected):
1,11,"dummy 111",1111
2,22,"dummy 222",2222
3,33,"dummy 333",3333
The script works pretty well with small files (~MB) but it does nothing with big files (~GB).
What may be the problem?
Thanks in advance.
Best guess - all the lines in your big file are longer than 21 chars. There are more robust ways to do what you're trying to do with that script, though, so it may not be worth debugging this and ask for help with an improved script instead.
Here's one more robust way to combine quoted fields that contain newlines using any awk:
$ awk -F'"' '{$0=prev $0; if (NF%2){print; prev=""} else prev=$0 OFS}' input_file.txt
1,11,"dummy 111",1111
2,22,"dummy 222",2222
3,33,"dummy 333",3333
That may be a better starting point for you than your existing script. To do more than that, see What's the most robust way to efficiently parse CSV using awk?.

Bash - Read Directory Path From TXT, Append Executable, Then Execute

I am setting up a directory structure with many different R & bash scripts in it. They all will be referencing files and folders. Instead of hardcoding the paths I would like to have a text file where each script can search for a descriptor in the file (see below) and read the relevant path from that.
Getting the search-append to work in R is easy enough for me; I am having trouble getting it to work in Bash, since I don't know the language very well.
My guess is it has something to do with the way awk works / stores the variable, or maybe the way the / works on the awk output. But I'm not familiar enough with it and would really appreciate any help
Text File "Master_File.txt":
NOT_DIRECTORY "/file/paths/Fake"
JOB_TEST_DIRECTORY "/file/paths/Real"
ALSO_NOT_DIRECTORY "/file/paths/Fake"
Bash Script:
#! /bin/bash
master_file_name="Master_File.txt"
R_SCRIPT="RScript.R"
SRCPATH=$(awk '/JOB_TEST_DIRECTORY/ { print $2 }' $master_file_name)
Rscript --vanilla $SRCPATH/$R_SCRIPT
The last line, $SRCPATH/$R_SCRIPT, seems to be replacing part of SRCPath with the name of $R_SCRIPT which outputs something like /RScript.Rs/Real instead of what I would like, which is /file/paths/Real/RScript.R.
Note: if I hard code the path path="/file/paths/Real" then the code $path/$R_SCRIPT outputs what I want.
The R Script:
system(command = "echo \"SUCCESSFUL_RUN\"", intern = FALSE, wait = TRUE)
q("no")
Please let me know if there's any other info that would be helpful, I added everything I could think of. And thank you.
Edit Upon Answer:
I found two solutions.
Solution 1 - By Mheni:
[ see his answer below ]
Solution 2 - My Adaptation of Mheni's Answer:
After seeing a Mehni's note on ignoring the " quotation marks, I looked up some more stuff, and found out it's possible to change the character that awk used to determine where to separate the text. By adding a -F\" to the awk call, it successfully separates based on the " character.
The following works
#!/bin/bash
master_file_name="Master_File.txt"
R_SCRIPT="RScript.R"
SRCPATH=$(awk -F\" -v r_script=$R_SCRIPT '/JOB_TEST_DIRECTORY/ { print $2 }' $master_file_name)
Rscript --vanilla $SRCPATH/$R_SCRIPT
Thank you so much everyone that took the time to help me out. I really appreciate it.
the problem is because of the quotes around the path, this change to the awk command ignores them when printing the path.
there was also a space in the shebang line that shouldn't be there as #david mentioned
#!/bin/bash
master_file_name="/tmp/data"
R_SCRIPT="RScript.R"
SRCPATH=$(awk '/JOB_TEST_DIRECTORY/ { if(NR==2) { gsub("\"",""); print $2 } }' "$master_file_name")
echo "$SRCPATH/$R_SCRIPT"
OUTPUT
[1] "Hello World!"
in my example the paths are in /tmp/data
NOT_DIRECTORY "/tmp/file/paths/Fake"
JOB_TEST_DIRECTORY "/tmp/file/paths/Real"
ALSO_NOT_DIRECTORY "/tmp/file/paths/Fake"
and in the path that corresponds to JOB_TEST_DIRECTORY i have a simple hello_world R script
[user#host tmp]$ cat /tmp/file/paths/Real/RScript.R
print("Hello World!")
I would use
Master_File.txt :
NOT_DIRECTORY="/file/paths/Fake"
JOB_TEST_DIRECTORY="/file/paths/Real"
ALSO_NOT_DIRECTORY="/file/paths/Fake"
Bash Script:
#!/bin/bash
R_SCRIPT="RScript.R"
if [[ -r /path/to/Master_File.txt ]]; then
. /path/to/Master_File.txt
else
echo "ERROR -- Can't read Master_File"
exit
fi
Rscript --vanilla $JOB_TEST_DIRECTORY/$R_SCRIPT
Basically, you create a configuration file Key=value, source it then use the the keys as variable for whatever you need throughout the script.

Looking up and extracting a line from a big file matching the lines of another big file

I permitted myself to create a new question as some parameters changed dramatically compared to my first question in my bash script optimising (Optimising my script which lookups into a big compressed file)
In short : I want to lookup, and extract all the lines where the variable of the first column of a file(1) (a bam file) matches the first column of a text file (2). For bioinformaticians, it's actually extracting the matching reads id from two files.
File 1 is a binary compressed 130GB file
File 2 is a tsv file of 1 billion lines
Recently a user came with a very elegant one liner combining the decompression of the file and the lookup with awk and it worked very well. With the size of the files it is now looking up for more than 200 hours (multithreaded).
Does this "problem" have a name in algorithmics ?
What could be a good way to tackle this challenge ? (If possible with simple solutions such as sed, awk, bash .. )
Thank you a lot
Edit : Sorry for the code, as it was on the link I though it would be a "doublon". Here is the one liner used :
#!/bin/bash
samtools view -# 2 /data/bismark2/aligned_on_nDNA/bamfile.bam | awk -v st="$1" 'BEGIN {OFS="\t"; while (getline < st) {st_array[$1]=$2}} {if ($1 in st_array) {print $0, st_array[$1], "wh_genome"}}'
Think of this as a long comment rather than an answer. The 'merge sort' method can be summarised as: If two records don't match, advance one record in the file with the smaller record. If they do match then record the match and advance one record in the big file.
In pseudocode, this looks something like:
currentSmall <- readFirstRecord(smallFile)
currentLarge <- readFirstRecord(largeFile)
searching <- true
while (searching)
if (currentLarge < currentSmall)
currentLarge <- readNextRecord(largeFile)
else if (currentLarge = currentSmall)
//Bingo!
saveMatchData(currentLarge, currentSmall)
currentLarge <- readNextRecord(largeFile)
else if (currentLarge > currentsmall)
currentSmall <- readNextRecord(smallFile)
endif
if (largeFile.EOF or smallFile.EOF)
searching <- false
endif
endwhile
Quite how you translate that into awk or bash is beyond my meagre knowledge of either.

Issue with bash script using SED/AWK for substituion

I have been working on this little script at work to free up my own time and am currently stuck on part of it. The script is supposed to pull some content from a JSON, modify the content, and then re-upload it. The modification part is the portion that doesn't work.
An example of what the content looks like after being extracted from the JSON is:
<p>App1_v1.0_20160911_release.apk</p<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
The modification function is supposed to update the list with the newer app filenames in the same location. I've tried using both SED and AWK to get this to work but I haven't gotten anywhere fast.
Here are examples of both commands and the parameters for the substitution I am trying to run on the example file:
old_name=App1_.*_release.apk
new_name=App1_v1.0_20160920_1152_release.apk
sed "s/$old_name/$new_name/" body > upload
awk -v oldname="$old_name" -v newname="$new_name" '{sub(oldname, newname)}1' body > upload
What ends up happening is the substitution will change the correct part of the list, but then nuke everything between that point and the end of the list.
Thank you for any and all help.
PS: If I didn't explain something correctly or you feel some information is missing, please comment and let me know so I can better explain the problem.
There are SO many possible values of oldname, newname, and your input data that could cause either of the commands you wrote to fail - don't use that "replace a regexp with a backreference-enabled-string" approach in any command, use string operations instead (which means you can't use sed since sed doesn't support strings)
This modifies your sample input as you say you want:
$ awk -v new='App1_v1.0_20160920_1152_release.apk' 'BEGIN{RS="</p>\n?"; FS=OFS="<p>"} NR==1{$2=new} {printf "%s%s", $0, RT}' file
<p>App1_v1.0_20160920_1152_release.apk<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
If that's not adequate then edit your question to better explain your requirements and provide more truly representative sample input/output.
The above uses GNU awk for multi-char RS and RT.

Resources