Format words under each other - bash

So, I started learning bash last week and I have to do a task where I should print the content of this file.txt:
1|george|01/02/2042
2|TPS Reports|03/01/2015
3|Go clubbing this weekend|
4|Metting with family|03/08/2015
5|Help Rose with dating boys|
6|Update hacking software for hacking StackExchange|09/30/2015
I have written this code:
while IFS='' read -r line || [[ -n $line ]]; do
IFS='|' read -ra ADDR <<< "$line"
echo -e "${ADDR[0]}: ${ADDR[1]} \t\t\t ${ADDR[2]}"
done < "$HOME_DIRECTORY_FILE"
So, basically this code, will go line by line, take each line, and split it into the array using the delimiter |, then print each part of the array on screen, output:
1: george 01/02/2042
2: TPS Reports 03/01/2015
3: Go clubbing this weekend
4: Metting with family 03/08/2015
5: Help Rose with dating boys
6: Update hacking software for hacking StackExchange 09/30/2015
You might think this is correct, but my instructor said the dates need to be under each other, like this:
1: george 01/02/2042
2: TPS Reports 03/01/2015
3: Go clubbing this weekend
4: Metting with family 03/08/2015
5: Help Rose with dating boys
6: Update hacking software for hacking StackExchange 09/30/2015
Is that achievable in bash? Or I should let go of this? My instructor gave the exact output example, spaced this way and said "the output should be neatly formatted (spacing)."

Use the column command. It does exactly what you're looking for.
For example:
$ cat input.txt
1|george|01/02/2042
2|TPS Reports|03/01/2015
3|Go clubbing this weekend|
4|Metting with family|03/08/2015
5|Help Rose with dating boys|
6|Update hacking software for hacking StackExchange|09/30/2015
$ column --separator \| --table input.txt
1 george 01/02/2042
2 TPS Reports 03/01/2015
3 Go clubbing this weekend
4 Metting with family 03/08/2015
5 Help Rose with dating boys
6 Update hacking software for hacking StackExchange 09/30/2015
You'll need to do a little pre-formatting to get your numbers to have :, but that should be the easy part (you can pipe the modified file into column).

You can also use printf, although it requires you to guess at the width of the middle column (which column computes for you).
while IFS='' read -r line || [[ -n $line ]]; do
IFS='|' read -ra ADDR <<< "$line"
printf "%d: %-30s %s\n" "${ADDR[0]}" "${ADDR[1]}" "${ADDR[2]}"
done < "$HOME_DIRECTORY_FILE"

Related

Delete lines in file that have a date older than x

I can read an entire file into memory like so:
#!/bin/bash
filename='peptides.txt'
filelines=`cat $filename`
ten_days_ago="$(date)"
for line in $filelines ; do
date_of="$(echo "$line" | jq -r '.time')"
if [[ "$ten_days_ago" > "$date_of" ]]; then
# delete this line
fi
done
the problem is:
I may not want to read the whole file into memory
If I stream it line by line with bash, how can I store which line to delete from? I would delete lines 0 to x, where line x has a date equal to 10 days ago.
A binary search would be appropriate here - so maybe bash is not a good solution to this? I would need to find the number of lines in the file, divide by two and go to that line.
You can use binary search only if the file is sorted.
You do not need to read the whole file into memory; you can process it line by line:
while read line
do
....
done <$filename
And: Yes, I personally would not use shell scripting for this kind of problems, but this is of course a matter of taste.
You didn't show what the input file looks like but judging by your jq its JSON data.
With that said this is how i would do it
today=$(date +%j)
tenDaysAgo=$(date --date="10 day ago" +%j)
#This is where you would create the data for peptides.txt
#20 spaces away there is a date stamp so it doesn't distract you
echo "Peptides stuff $today" >> peptides.txt
while read pepStuff; do
if [ $pepStuff == $tenDaysAgo ]; then
sed -i "/.*$pepStuff/d" peptides.txt
fi
done < <(awk '{print $3}' peptides.txt)

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

Create newline character when third column in file is not unique

I have a tab-separated file with the following format:
January Jay RESERVED 4
February Jay RESERVED 5
March Jay SUBMITTED 6
December Jay USED 7
What I would like to do is create spaces, or new lines between the lines where the third column is unique.
For this example, I would like this output:
January Jay RESERVED 4
February Jay RESERVED 5
March Jay SUBMITTED 6
December Jay USED 7
If your data is in a file called stuff:
lastVal="";cat stuff |while read i ; do thisVal=$(echo "$i" |cut -d$'\t' -f'3'); if [ "$lastVal" != "$thisVal" ]; then echo "" ;lastVal=$thisVal; fi ;echo "$i" ;done
Here's a version of the same command that you can use as a script. See usage below.
#!/bin/bash
lastVal="";
while read i ; do
thisVal=$(echo "$i" |cut -d$'\t' -f'3')
if [ "$lastVal" != "$thisVal" ]; then
echo ""
lastVal=$thisVal
fi
echo "$i"
done
If you name the script myScript.bash, you can use it one of these two ways:
cat yourfile | /path/to/myScript.bash
or
/path/to/MyScript.bash < yourfile
Note that if you want to insert a literal tab at the Bash prompt, you can enter ctrl+v and then hit tab. Ctrl+v lets you insert other special chars too. Ctrl+v lets you enter special chars like tab, so to add TAB as the delimiter in the cut -d' part, hit ctrl-v then hit tab (that's in Linux, not SO).
Awk can do this quite handily:
awk -F $'\t' '{print (v==$3 ? $0 : "\n"$0); v=$3}' foo.txt
awk is designed to work with whitespace-separated columns of data, so the third column is represented by $3. All we do is check if the value has changed, and print an extra line.
This doesn't check for "unique" values, but only a change in the value from the previous line. From what I can tell, that's the same thing as the answer you accepted.

How to assign line number to a variable in a while loop

I have a file contains some lines. Now I want to read the lines and get the line numbers. As below:
while read line
do
string=$line
number=`awk '{print NR}'` # This way is not right, gets all the line numbers.
done
Here is my scenario: I have one file, contains some lines, such as below:
2015Y7M3D0H0Mi44S7941
2015Y7M3D22H24Mi3S7927
2015Y7M3D21H28Mi21S5001
I want to read each line of this file, print out the last characters starts with "S" and the line number of it. it shoud looks like:
1 S7941
2 S7927
3 S5001
So, what should I properly do to get this?
Thanks.
Can anyone help me out ???
The UNIX shell is simply an environment from which to call tools and a language to sequence those calls. The UNIX general purpose text processing tool is awk so just use it:
$ awk '{sub(/.*S/,NR" S")}1' file
1 S7941
2 S7927
3 S5001
If you're going to be doing any text manipulation, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
I just asked one of my friend, Found a simple way:
cat -n $file |while read line
do
number=echo $line | cut -d " " -f 1
echo $number
done
That means if we can not get line number from the file itself, we pass it with a line number.

Combining Directory of Text Files into CSV, One File Per Line

I have a large directory of text files, each rather complicated:
Say file1.txt:
Am no an listening depending up believing. Enough around remove to
barton agreed regret in or it. Advantage mr estimable be commanded
provision. Year well shot deny shew come now had. Shall downs stand
marry taken his for out. Do related mr account brandon an up. Wrong
for never ready ham these witty him. Our compass see age uncivil
matters weather forbade her minutes. Ready how but truth son new
under.
Am increasing at contrasted in favourable he considered astonished. As
if made held in an shot. By it enough to valley desire do. Mrs chief
great maids these which are ham match she. Abode to tried do thing
maids. Doubtful disposed returned rejoiced to dashwood is so up.
And file2.txt:
Among going manor who did. Do ye is celebrated it sympathize
considered. May ecstatic did surprise elegance the ignorant age. Own
her miss cold last. It so numerous if he outlived disposal. How but
sons mrs lady when. Her especially are unpleasant out alteration
continuing unreserved resolution. Hence hopes noisy may china fully
and. Am it regard stairs branch thirty length afford.
Blind would equal while oh mr do style. Lain led and fact none. One
preferred sportsmen resolving the happiness continued. High at of in
loud rich true. Oh conveying do immediate acuteness in he. Equally
welcome her set nothing has gravity whether parties. Fertile suppose
shyness mr up pointed in staying on respect.
What I need to do is to create a new file, say allfiles.txt that is:
Am no an listening depending up believing. Enough around remove to barton agreed regret in or it. Advantage mr estimable be commanded provision. Year well shot deny shew come now had. Shall downs stand marry taken his for out. Do related mr account brandon an up. Wrong for never ready ham these witty him. Our compass see age uncivil matters weather forbade her minutes. Ready how but truth son new under. Am increasing at contrasted in favourable he considered astonished. As if made held in an shot. By it enough to valley desire do. Mrs chief great maids these which are ham match she. Abode to tried do thing maids. Doubtful disposed returned rejoiced to dashwood is so up.
Among going manor who did. Do ye is celebrated it sympathize considered. May ecstatic did surprise elegance the ignorant age. Own her miss cold last. It so numerous if he outlived disposal. How but sons mrs lady when. Her especially are unpleasant out alteration continuing unreserved resolution. Hence hopes noisy may china fully and. Am it regard stairs branch thirty length afford. Blind would equal while oh mr do style. Lain led and fact none. One preferred sportsmen resolving the happiness continued. High at of in loud rich true. Oh conveying do immediate acuteness in he. Equally welcome her set nothing has gravity whether parties. Fertile suppose shyness mr up pointed in staying on respect.
This file is just two lines in this case, the full text on each. I have searched the archives but cannot seem to find an implementation for this in bash.
touch allfiles.txt # create allfiles.txt
for f in *.txt; do # for each file of the current directory
cat "$f" | tr '\n' ' ' >> allfiles.txt; # append the content of that file to allfiles.txt
echo >> allfiles.txt # insert a new line
done
for file in dir/* #Process all files in directory
do
tr '\n' ' ' < "$file" # Remove newlines
echo '' # Add newline between files
done > newfile # Write all the output of the loop to the newfile
Here's a pure INTERCAL implementation, no bash, tr, or cat required:
PLEASE DO ,1 <- #1
DO .4 <- #0
DO .5 <- #0
DO COME FROM (30)
PLEASE ABSTAIN FROM (40)
DO WRITE IN ,1
DO .1 <- ,1SUB#1
DO (10) NEXT
PLEASE GIVE UP
(20) PLEASE RESUME '?.1$#256'~'#256$#256'
(10) DO (20) NEXT
DO FORGET #1
PLEASE DO .2 <- .4
DO (1000) NEXT
DO .4 <- .3~#255
PLEASE DO .3 <- !3~#15'$!3~#240'
DO .3 <- !3~#15'$!3~#240'
DO .2 <- !3~#15'$!3~#240'
PLEASE DO .1 <- .5
DO (1010) NEXT
DO .5 <- .2
DO ,1SUB#1 <- .3
(30) PLEASE READ OUT ,1
PLEASE NOTE: having had pressing business at the local pub
(40) the author got bored with this implementation
With awk
awk 'FILENAME!=f&&NR>1{print "\n"}{FILENAME=f}1' ORS='' file1.txt file2.txt > allfiles.txt
Combined Perl/bash solution:
for f in *.txt; do
perl -ne 'chomp; print "$_ "; END{ print "\n" }' "$f"
done > output.txt
Perl-only solution
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $file (<*.txt>) {
open FILE, "<$file" or die $!;
while (<FILE>) {
chomp;
print "$_ ";
}
close FILE;
print "\n";
}
Here a pure bash solution: no cat, tr, awk, etc...
Besides, it will have a nicely output format: you won't get double spaces, or beginning or trailing spaces as with the methods provided in the other answers.
for f in *.txt; do
# There are purposely no quotes for $(<"$f")
echo $(<"$f")
echo
done > newfile
The only caveat is if a file starts with -e, -E or -n: these characters won't be output: they will be slurped by echo considering it's an option. But I guess this is very unlikely to happen!
The trick is to use echo $l with no quotes!
Using this trick, here's how you can use cat in a funny way to achieve what you want (but this time it's not a pure bash solution): same thing, it's a funny no-use of quotes!
for f in *.txt; do
# There are purposely no quotes for $(<"$f")
cat <<< $(<"$f")
echo
done > newfile
If you only have two files, say file1.txt and file2.txt you can do without a loop and a single cat command:
# there's purposely a lack of quotes
cat <<< $(<file1.txt)$'\n\n'$(<file2.txt) > newfile
or with a single echo (and same caveat as above), and pure bash:
# there's purposely a lack of quotes
echo $(<file1.txt)$'\n\n'$(<file2.txt) > newfile
Note. I added comments to specify that there are no quotes as every bash programmer should feel uncomfortable when reading these unquoted parts!
Note2. Can you do shorter?
This might work for you:
for file in *.txt ;do paste -s "$file"; done | sed 's/^ *//;s/ */ /g'
awk '
FNR == 1 && FILENAME != ARGV[1] {print "\n"}
{printf "%s",$0}
END {print ""}
' *.txt > allfiles.txt

Resources