Parsing simple text file and integrate content in bash script - bash

I want to automatically parse a text file with a content like this:
car 12345 W
train 54321 D
To be integrated in a new bash file. The content after that should look like that:
curl http://example.com/?vehicle=car&number=12345&period=W
curl http://example.com/?vehicle=train&number=54321&period=D
My problem is that I really don't now how to realize that or which program to use, sed, awk, etc..
What do I have to do?

Here's probably the funniest answer: assume your file is file.txt, then just do this:
printf 'curl http://example.com/?vehicle=%s&number=%s&period=%s\n' $(<file.txt)
It:
is 100% pure bash,
has no explicit loops
is the shortest answer
is very funny
Hope this helps :)

Try this:
while read vehicle number period ; do
echo curl http://example.com/\?vehicle="$vehicle"\&number="$number"\&period="$period"
done < input.txt

Maybe use sed to interleave the URL arguments with other parts. Guessing from the sample you posted:
sed 's/\([a-z]*\) \([0-9]*\) \([A-Z]\)/curl http:\/\/example.com\/?vehicle=\1\&number=\2\&period=\3/'

Related

Appending text to specific patterns in a fasta BASH

I have a fasta with headers like this:
tr|Q7MX99|Q7MX99_PORGI_BACT
I would like them to say:
tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
So basically, whenever I have PORGI_BACT I want to append _ORALMICROBIOME to each instance.
I'm sure there is an easy fix through the terminal, but I can't seem to find it.
My first idea is to do something like:
sed 's/>.*/&_ORALMICROBIOME/' file.fa > outfile.fa
BUT I only want to add to specific header endings, and that is where I'm stuck.
Using sed:
sed -r 's/(^.*)(PORGI_BACT|HUMAN_MAM|TESTA_BACT)(.*$)/\1\2_ORALMICROBIOME\3/' file.fa > outfile.fa
Enable regular expression interpretation using -r or -E and then split the line into three sections based on "PORGI_BACT" being in section two and then substitute the line for the first and second sections, followed by "_ORALMICROBIOME" and finally the third section.
You are almost close. Would you please try the following:
sed 's/^>.*PORGI_BACT/&_ORALMICROBIOME/' file.fa > outfile.fa
[Edit]
According to the OP's requirement, how about:
sed -E 's/^>.*(PORGI_BACT|HUMAN_MAM|TESTA_BACT)/&_ORALMICROBIOME/' file.fa > outfile.fa
Sample input as file.fa:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
Output:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM_ORALMICROBIOME
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT_ORALMICROBIOME
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK

What is an efficient BASH one-liner for adding line every 130 characters to a file without lines?

I am trying to process some largish files which lack line separators. I would like to process them with line-oriented tools like grep and sed.
I'd like to put something on the front of my pipeline that will insert a newline every 130 characters. I'm not interested in modifying the input files.
What is an efficient BASH one-liner for adding line every 130 characters to a file without lines?
If you really want efficient, there is special command (part of coreutils) for this:
fold -w130 file | ...
I discovered the answer in the asking. Another way to phrase this question to understand this answer is: What is an efficient BASH one-liner for reading a file without lines 130 characters at a time?
The answer is the BASH builtin read. It is a powerful tool that is well-suited to this purpose.
cat unlined_data.txt | while read -n130 record ; do echo $record ; done # | rest of pipeline
There may be a more elegant solution and I'd be happy to see one.

Parsing data in Bash

I have a file with lots of Rubbish inside. All of it is in one line.
But there are often things like:
"var":"value"
Before and after are different characters ...
What I want to do is, to extract only the above mentioned format and put them into a single line. Any ideas how I could realize it in Shell scripting?
Best regards,
Alex
I believe
grep -o '"[^"]*":"[^"]*"' yourFile.txt > yourOutput.txt
would do the trick:
> echo 'xxx "a":"b" yyy"x":"y"' | grep -o '"[^"]*":"[^"]*"'
"a":"b"
"x":"y"

Bash substring with pipes and stdin

My goal is to cut the output of a command down to an arbitrary number of characters (let's use 6). I would like to be able to append this command to the end of a pipeline, so it should be able to just use stdin.
echo "1234567890" | your command here
# desired output: 123456
I checked out awk, and I also noticed bash has a substr command, but both of the solutions I've come up with seem longer than they need to be and I can't shake the feeling I'm missing something easier.
I'll post the two solutions I've found as answers, I welcome any critique as well as new solutions!
Solution found, thank you to all who answered!
It was close between jcollado and Mithrandir - I will probably end up using both in the future. Mithrandir's answer was an actual substring and is easier to view the result, but jcollado's answer lets me pipe it to the clipboard with no EOL character in the way.
Do you want something like this:
echo "1234567890" | cut -b 1-6
What about using head -c/--bytes?
$ echo t9p8uat4ep | head -c 6
t9p8ua
I had come up with:
echo "1234567890" | ( read h; echo ${h:0:6} )
and
echo "1234567890" | awk '{print substr($0,1,6)}'
But both seemed like I was using a sledgehammer to hit a nail.
This might work for you:
printf "%.6s" 1234567890
123456
If your_command_here is cat:
% OUTPUT=t9p8uat4ep
% cat <<<${OUTPUT:0:6}
t9p8ua

BASH file attribute gymnastics: How do I easily get a file with full paths and privileges?

Dear Masters of The Command Line,
I have a directory tree for which I want to generate a file that contains on two entries per line: full path for each file and the corresponding privileges of said file.
For example, one line might contain:
/v1.6.0.24/lib/mylib.jar -r-xr-xr-x
The best way to generate the left hand column there appears to be find. However, because ls doesn't seem to have a capability to either read a list of filenames or take stdin, it looks like I have to resort to a script that does this for me. ...Cumbersome.
I was sure I've seen people somehow get find to run a command against each file found but I must be daft this morning as I can't seem to figure it out!
Anyone?
In terms of reading said file there might be spaces in filenames, so it sure would be nice if there was a way to get some of the existing command-line tools to count fields right to left. For example, we have cut. However, cut is left-hand-first and won't take a negative number to mean start the numbering on the right (as seems the most obvious syntax to me). ... Without having to write a program to do it, are there any easy ways?
Thanks in advance, and especial thanks for explaining any examples you may provide!
Thanks,
RT
GNU findutils 4.2.5+:
find -printf "$PWD"'/%p %M\n'
It can also be done with ls and awk:
ls -l -d $PWD/* | awk '{print $9 " " $1}' > my_files.txt
stat -c %A file
Will print file permissions for file.
Something like:
find . -exec echo -ne '{}\t\t' ';' -exec stat -c %A {} ';'
Will give you a badly formatted version of what your after.
It is made much trickier because you want everything aligned in tables. You might want to look into the 'column' command. TBH I would just relax my output requirements a little bit. Formatting output in SH is a pain in the ass.
bash 4
shopt -s globstar
for file in /path/**
do
stat -c "%n %A" "$file"
done

Resources