I am coding a shell script that reads text files and creates JSON key-value pairs based on them. The key is the filename and the value is a random line of the file.
The trouble is when I concatenate the key with the value in the global variable data.
When I run the code bellow:
data='{'
for file in $(ls _params)
do
key=${file%.txt}
f_line=$(($$%100))
value=$(sed "${f_line}q;d" "./_params/$file")
# assembles key-value pairs
data=$data\"$key\":\""value"\",
done
data=${data%?} # removes last comma
data="$data}"
echo $data
My output is: {"firstName":"value","lastName":"value"}
But changing the string "value" to the variable $value, as follows:
data='{'
for file in $(ls _params)
do
key=${file%.txt}
f_line=$(($$%100))
value=$(sed "${f_line}q;d" "./_params/$file")
# assembles key-value pairs
data=$data\"$key\":\"$value\",
done
data=${data%?} # removes last comma
data="$data}"
echo $data
The output gets confused: "}"lastName":"Adailee.
I wish to store in the $data variable something like: {"firstName":"Bettye","lastName":"Allison"}
Note: My bash version is 4.3.48.
Note: Inside my archive _params I have two files firstName.txt and lastName.txt both with random names each line.
As #ruakh suggests, the specific issue is your input files. Here are steps to repro your issue and verify this:
I created two a firstNames.txt file with A B C D repeated 100 times:
$ cat ABCD
A
B
C
D
$ for _ in $(seq 1 100); do cat ABCD >> _params/firstName.txt
And then similar with W X Y Z for lastNames.txt. Then I ran your script:
$ bash q.sh
{"firstName":"A","lastName":"W"
However, if I use unix2dos (from the dos2unix package) to convert this to \r\n line endings.
$ unix2dos _params/firstName.txt
unix2dos: converting file _params/firstName.txt to DOS format...
$ unix2dos _params/lastName.txt
unix2dos: converting file _params/lastName.txt to DOS format...
$ bash q.sh
"}"lastName":"W
So you could probably use dos2unix to fix your input files (or open vim and do :set ft=unix and then :x).
But I wanted to let you know about three other things.
$$ is not a random number, it's the PID of your current process.
best practice is not to parse ls, but to use globbing instead1
you can solve the fencepost problem without removing the comma you just placed by starting with the empty separator and setting it to comma after the first iteration of the loop.
Here is my suggestion for improving your script (once you fix the newlines in the input):
#!/bin/bash
data='{'
sep=""
for file in _params/*
do
key=${file%.txt}
file_length=$(wc -l < ${file})
f_line=$(( (RANDOM % file_length) + 1 ))
value=$(sed "${f_line}q;d" "${file}")
# assembles key-value pairs
data="${data}${sep} \"$key\":\"$value\""
sep=","
done
data="${data} }"
echo $data
$value apparently ends with a carriage return character (\r, U+000D). As a result, when you print it, the cursor moves back to the beginning of the line, and subsequent characters are printed starting at the first column, overwriting what was there before. (This doesn't affect the actual order of characters, of course; it's just displayed confusingly when you print it.)
To fix this, you can write
value="${value%$'\r'}"
to remove the trailing carriage return.
Related
I have a large number of files with filenames of the format
OUTPUT_11_0.175
I want to extract the two numbers, I managed to get the second number with the following:
for file in ./dir/*; do
phi=${file##*_}
echo "$phi"
done
To get the other number 11 in this case, I tried
a=${file#*_}
but this returns everything to the left of the first underscore (the directory contains an underscore) - is there some way to convince bash to go to the read 'between' the two underscores and return '11'?
$ IFS=_ read -a foo <<< "OUTPUT_11_0.175"
$ echo "${foo[0]}"
OUTPUT
$ echo "${foo[1]}"
11
$ echo "${foo[2]}"
0.175
I got a output from a command and it is something like this 2048,4096,8192,16384,24576,32768.
I want to split it into 6 different files but only the numbers, not the commas e.g
The initial text: 2048,4096,8192,16384,24576,32768 be split into: 2048 to the file A, 4096 to the file B, 8192 to the file C and so on.
That output follows this rules:
There are always 6 spaces, separated by commas
The numbers are always from 3 to 5 "length" (I don't know the proper English word)
As I told you, commas doesn't interest me because I'm going to do mathematical operations with those numbers
I tried to delete the last X numbers but didn't get a way to "detect" a comma so the operation can stop.
Is it possible using SED?
The following requires on no commands external to a POSIX-compliant shell (such as busybox ash, which you're most likely to be using on Android):
csv=/system/file.csv
IFS=, read a b c d e f <"$csv"
echo "$a" >A
echo "$b" >B
echo "$c" >C
echo "$d" >D
echo "$e" >E
echo "$f" >F
This does assume that the files to be written (A, B, C, D, E and F) are all inside the current working directory. If you want to write them somewhere else, either amend their names, or use cd to change to that other directory.
I'm not sure sed is the right tool for that.
With a simple Bash script:
IFS=',' read -ra val < file.csv
for i in "${val[#]}"; do
echo $i > file$(( ++j ))
done
It writes each values of you csv into file1, file2, etc. :
The read command assigns values from file.csv to array variable val.
Using loop, each value is written to file.
Just make sure you have write permissions in the current directory. If not, change the redirection (eg: > /dirWithWritePermissions/).
This might work for you (GNU sed, bash and parallel):
parallel --xapply echo {1} ">>" part{2} :::: <(sed 's/,/\n/g' file.csv) ::: {1..6}
This "zips" together two files reusing the shorter file as necessary.
N.B. Remember to remove any part* files before applying this command otherwise those files will grow (>> appends).
declare list_of_files=(fileA fileB fileC fileD fileE)
readarray a <<< $(sed 's/,/\n/;{;P;D;}' <<< '2048,4096,8192,16384,24576,32768')
for i in ${a}; do
echo $i > "${list_of_files["$((num++))"]}"
done
Explanation:
s/,/\n/ substitutes every comma with a newline
{ starts a command group
P prints everything in the pattern buffer up to the first newline
D Detetes everything in the pattern buffer up to the first newline and then restarts the current command group
} ends the command group
EDIT:
Let's say you want to copy the information into /system/file but want to have every number in its own row:
$ sed 's/,/\n/;{;P;D;}' < /sys/module/lowmemorykiller/parameters/minfree > /system/file
This will create a new file /system/file that will contain the formatted output.
EDIT: even shorter would: sed 's/,/\n/g', which works just by replacing every comma with a new space (which is done by the g at the end).
Also note that while sed is a nice to tool to use (you gotta love it for its confusing language and commands...), the better and faster way is to use the bash built in read.
I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).
I have some set of files with character strings in lines such that there is a folder containing
file1
file2
file3
and within those files there a variable length lists of strings of characters such that file 1 may read
file1.itemA
file1.itemB
file1.itemC
While file 2 may only contain
file2.itemA
file2.itemB
I want to add a file specific code to every line within each file such that
code1.file1.itemA
code1.file1.itemB
code1.file1.itemC
And
code2.file2.itemA
code2.file2.itemB
How can I do this within unix? I am using the OSX terminal to execute commands.
I'm not near a terminal to test, but, how about:
cd /path/to/your/files
word='code'
base=1
for file in *; do sed -i -e "s/^/$word$base/" "${file}"; base=$(( $base + 1 )); done
The $word variable is the constant you want. The $base variable holds the count that is incremented on each file, initially set to 1.
I am experiencing some trouble while reading a file in a bash script.
Here is the file I am trying to read :
--------------- (not in the file)
123.234 231.423
1223.23 132.134
--------------- (not in the file)
In this file, the 4 numbers are on two different lines and there is a line left blank at the end of the file. There is no "space" character at the end of each line.
When I am trying to read this file using this script :
for val in $(cat $myFile)
do
echo "$val ";
done
I do have the following result :
123.234
231.423
1223.23
32.134
When I add a space character after the variable, it erases the beginning of the last number
for val in $(cat ~/Windows/Trash/bashReadingBehavior/trashFile.out)
do
echo "$val ";
done
output :
123.234
231.423
1223.23
2.134
In fact, characters added after the last numbers are written at the beginning of the last numbers. I assume this is a behavior caused by an invisible character such as carriage return or something like this but I can't figure out how to solve this issue.
Your input file has DOS line endings. When you execute
echo "$val "
the value of $val ends with a carriage return, which when printed moves the cursor to the beginning of the line before the final two spaces are printed, which can overwrite whatever is already on the line.
You should use the following code to read from this file (don't iterate over the output of cat):
while read; do
REPLY=${REPLY%$'\r'} # Remove the carriage return from the line
for val in $REPLY; do
echo "$val"
done
done
Iterating over a line with a for loop like I show isn't really recommended either, but it is OK in this case if the line read from the file and stored in REPLY is known to be a space-separated list of numbers.