shell: prefixing output with spaces with paste - shell

A lot of time one needs to prefix 4 spaces to some shell output and transform it into valid markdown code. E.g. When posting a question or answer here on stackoverflow.
It's actually quite easy to do with sed:
some_command | sed -e 's/^/ /'
But I'd like to do it with paste if possible. Because paste takes 2 files as input, all I came up with was this:
some_command | paste 4_space_file -
where 4_space_file is actually a file whose whole content was 4 spaces.
Is there a neater way to achieve this with paste without having an actual file on the hard drive?

Literal Answers Using Paste
First, to answer your literal question:
some_command | paste <(printf ' \n') -
...yields the same output as passing paste the name of a file with a single line having four spaces and a newline as its content. However, the output from paste in this case is not four-character indents for each line; the first line has four spaces and a tab prepended, subsequent lines are prefixed with only a tab.
If you wanted to generate an input of the appropriate length while still using paste, then you'd end up with something uglier. Say (with bash 4.0 or newer):
ls | {
mapfile -t lines # read output from ls into an array
# our answer, here, is to move to three spaces in the input, and use paste -d' ' to
# ...add a fourth space during processing.
paste -d' ' \
<(yes ' ' | head -n "${#lines[#]}") \
<(printf '%s\n' "${lines[#]}")
}
<() is process substitution syntax, which expands to a filename which, when read from, will yield the output from the code contained.
Better Answers
For a native bash approach, you might also consider defining a function:
ident4() { while IFS= read -r line; do printf ' %s\n' "$line"; done; }
...for later use:
some_command | indent4
Unlike paste, this actually inserts exactly four spaces (with no intervening tab) on every line, for the exact number of lines in your input (no need to synthesize the correct length).
Also consider awk:
awk '{ print " " $0; }'

Related

read values of txt file from bash [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 5 years ago.
I'm trying to read values from a text file.
I have test1.txt which looks like
sub1 1 2 3
sub8 4 5 6
I want to obtain values '1 2 3' when I specify 'sub1'.
The closest I get is:
subj="sub1"
grep "$subj" test1.txt
But the answer is:
sub8 4 5 6
I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:
test2.txt looks like:
sub1
1 2 3
sub8
4 5 6
However, when I type
grep "$subj" test2.txt
The answer is:
sub1
It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful
Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt
The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).
With your original text file format:
target=sub1
while IFS=$' \t\n' read -r key values; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.
As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:
target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
Alternately, using awk (for a standard UNIX text file):
target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt
...or, for a file with CR-only newlines:
target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'
This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).
grep "sub1" test1.txt | cut -c6-
or
grep -A 1 "sub1" test2.txt | tail -n 1
You doing it right, but it seems like test1.txt has a wrong value in it.
with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.
then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.
In the end the command would look like this ...
$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-
But this doenst explain why you could not find sub1 in the first place.
Did you read the proper file ?
There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:
egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"
or
egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"
The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.
BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).
-------------- EDIT ---------
As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3
What happens? After regexp /sub1/ the three following fields are printed.
Any drawbacks? It affects the space.
Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt
It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

how to read one line to calculate the md5

I am using Linux bash version 4.1.2
I have a tab-delimited input_file having 5 fields and I want to calculate the MD5 for each line and put the md5sum at the end of each line.
The expected output_file should therefore has 6 fields for each line.
Here is my coding:
cat input_file | while read ONELINE
do
THEMD5=`echo "$ONELINE" | md5sum | awk '{print $1}'`
echo -e "${ONELINE}\t${THEMD5}"
done > output_file
The coding works well most of the time.
However, if ONELINE is ended with single/double tabs, the trailing tab(s) will disappear!
As a result, the output_file will sometimes contain lines of 4 or 5 fields, due to the missing tab(s).
I have tried to add IFS= or IFS='' or IFS=$'\n' or IFS-$'\012' in the while statement, but still cannot solve the problem.
Please help.
Alvin SIU
The following is quite certainly correct, if you want trailing newlines included in your md5sums (as your original code has):
while IFS= read -r line; do
read sum _ < <(printf '%s\n' "$line" | md5sum -)
printf '%s\t%s\n' "$line" "$sum"
done <input_file
Notes:
Characters inside IFS are stripped by read; setting IFS= is sufficient to prevent this effect.
Without the -r argument, read also interprets backslash literals, stripping them.
Using echo -e is dangerous: It interprets escape sequences inside your line, rather than emitting them as literals.
Using all-uppercase variable names is bad form. See the relevant spec (particularly the fourth paragraph), keeping in mind that shell variables and environment variables share a namespace.
Using echo in general is bad form when dealing with uncontrolled data (particularly including data which can contain backslash literals). See the relevant POSIX spec, particularly the APPLICATION USAGE and RATIONALE sections.
If you want to print the lines in a way that makes hidden characters visible, consider using '%q\t%s\n' instead of '%s\t%s\n' as a format string.

printing first word in every line of a txt file unix bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks
To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename
you can try
awk '{print $1}' your_file
read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.
You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.
-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

How can I read words (instead of lines) from a file?

I've read this question about how to read n characters from a text file using bash. I would like to know how to read a word at a time from a file that looks like:
example text
example1 text1
example2 text2
example3 text3
Can anyone explain that to me, or show me an easy example?
Thanks!
The read command by default reads whole lines. So the solution is probably to read the whole line and then split it on whitespace with e.g. for:
#!/bin/sh
while read line; do
for word in $line; do
echo "word = '$word'"
done
done <"myfile.txt"
The way to do this with standard input is by passing the -a flag to read:
read -a words
echo "${words[#]}"
This will read your entire line into an indexed array variable, in this case named words. You can then perform any array operations you like on words with shell parameter expansions.
For file-oriented operations, current versions of Bash also support the mapfile built-in. For example:
mapfile < /etc/passwd
echo ${MAPFILE[0]}
Either way, arrays are the way to go. It's worth your time to familiarize yourself with Bash array syntax to make the most of this feature.
Ordinarily, you should read from a file using a while read -r line loop. To do this and parse the words on the lines requires nesting a for loop inside the while loop.
Here is a technique that works without requiring nested loops:
for word in $(<inputfile)
do
echo "$word"
done
In the context given, where the number of words is known:
while read -r word1 word2 _; do
echo "Read a line with word1 of $word1 and word2 of $word2"
done
If you want to read each line into an array, read -a will put the first word into element 0 of your array, the second into element 1, etc:
while read -r -a words; do
echo "First word is ${words[0]}; second word is ${words[1]}"
declare -p words # print the whole array
done
In bash, just use space as delimiter (read -d ' '). This method requires some preprocessing to translate newlines into spaces (using tr) and to merge several spaces into a single one (using sed):
{
tr '\n' ' ' | sed 's/ */ /g' | while read -d ' ' WORD
do
echo -n "<${WORD}> "
done
echo
} << EOF
Here you have some words, including * wildcards
that don't get expanded,
multiple spaces between words,
and lines with spaces at the begining.
EOF
The main advantage of this method is that you don't need to worry about the array syntax and just work as with a for loop, but without wildcard expansion.
I came across this question and the proposed answers, but I don't see listed this simple possibile solution:
for word in `cat inputfile`
do
echo $word
done
This can be done using AWK too:
awk '{for(i=1;i<=NF;i++) {print $i}}' text_file
You can combine xargs which reads word delimited by space or newline and echo to print one per line:
<some-file xargs -n1 echo
some-command | xargs -n1 echo
That also works well for large or slow streams of data because it does not need to read the whole input at once.
I’ve used this to read 1 table name at a time from SQLite which prints table names in a column layout:
sqlite3 db.sqlite .tables | xargs -n1 echo | while read table; do echo "1 table: $table"; done

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

Resources