I'm having the following issue. I have an array of numbers:
text="\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0"
And I'd like to delete the leading newline.
I've tried
sed 's/.//' <<< "$text"
cut -c 1- <<< "$text"
and some iterations. But the issue is that both of those delete the first character AFTER EVERY newline. Resulting in this:
text="\n\t2\t3\t4\t5\n\t7\t8\t9\t0"
This is not what I want and there doesn't seem to be an answer to this case.
Is there a way to tell either of those commands to treat newlines like characters and the entire string as one entity?
awk to the rescue!
awk 'NR>1'
of course you can do the same with tail -n +2 or sed 1d as well.
You can probably use the substitution modifier (see parameter expansion and ANSI C quoting in the Bash manual):
$ text=$'\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0'
$ echo "$text"
1 2 3 4 5
6 7 8 9 0
$ echo "${text/$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
It replaces the first newline with nothing, as requested. However, note that it is not anchored to the first character:
$ alt="${text/$'\n'/}"
$ echo "${alt/$'\n'/}"
1 2 3 4 56 7 8 9 0
$
Using a caret ^ before the newline doesn't help — it just means there's no match.
As pointed out by rici in the comments, if you read the manual page I referenced, you can find how to anchor the pattern at the start with a # prefix:
$ echo "${text/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$ echo "${alt/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
The notation bears no obvious resemblance to other regex systems; you just have to know it.
Related
EDITS: For reference, "stuff" is a general variable, as is "KEEP".
KEEP could be "Hi, my name is Dave" on line 2 and "I love pie" on line 7. The numbers I've put here are for illustration only and DO NOT show up in the data.
I had a file that needed to be parsed, keeping every 4th line, starting at the 3rd line. In other words, it looked like this:
1 stuff
2 stuff
3 KEEP
4
5 stuff
6 stuff
7 KEEP
8 stuff etc...
Great, sed solved that easily with:
sed -n -e 3~4p myfile
giving me
3 KEEP
7 KEEP
11 KEEP
Now I have a different file format and a different take on the pattern:
1 stuff
2 KEEP
3 KEEP
4
5 stuff
6 KEEP
7 KEEP etc...
and I still want the output of
2 KEEP
3 KEEP
6 KEEP
7 KEEP
10 KEEP
11 KEEP
Here's the problem - this is a multi-pattern "pattern" for sed. It's "every 4th line, spit out 2 lines, but start at line 2".
Do I need to have some sort of DO/FOR loop in my sed, or do I need a different command like awk or grep? Thus far, I have tried formats like:
sed -n -e '3~4p;4~4p' myfile
and
awk 'NR % 3 == 0 || NR % 4 ==0' myfile
and
sed -n -e '3~1p;4~4p' myfile
and
awk 'NR % 1 == 0 || NR % 4 ==0' myfile
source: https://superuser.com/questions/396536/how-to-keep-only-every-nth-line-of-a-file
If your intent is to print lines 2,3 then every fourth line after those two, you can do:
$ seq 20 | awk 'BEGIN{e[2];e[3]} (NR%4) in e'
2
3
6
7
10
11
14
15
18
19
You were pretty close with your sed:
$ printf '%s\n' {1..12} | sed -n '2~4p;3~4p'
2
3
6
7
10
11
this is the idiomatic way to write in awk
$ awk 'NR%4==2 || NR%4==3' file
however, this special case can be shortened to
$ awk 'NR%4>1' file
This might work for you (GNU sed):
sed '2~4,+1p;d' file
Use a range, the first parameter is the starting line and modulus (in this case from line 2 modulus 4). The second parameter is how man lines following the start of the range (in this case plus one). Print these lines and delete all others.
In the generic case, you want to keep lines p to p+q and p+n to p+q+n and p+2n to p+q+2n ... So you can write:
awk '(NR - p) % n <= q'
I have a .txt file and on each line is some amount of numbers. What I need is to filtrate these which does not contain the same number. So I want the output to be only the lines which have all the numbers different. I have to use command grep!
Example:
File_input:
1 1 2 3 4 5
1 2 3 4 5 6
6 6 6 6 6 6
What I want
File_output:
1 2 3 4 5 6
First and third lines contains same numbers so these has to be filtrated out.
This should work for your example:
grep -v "\([0-9]\).*\1" myfile
Idea is to catch any single digit [0-9] and store it \(\) and search for the existing same pattern \1 on the same line. You can easily extend to any word made of digits.
With the given input you can use
sed -r '/([0-9]+).+\1/d' File_input
You will have problems with suubstrings: 1 matches 12 and 12 matches 1.
ou can add word boundaries \b with
sed -r '/\b([0-9]+)\b.*\b\1\b/d' File_input
For a file that contains entries similar to as follows:
foo 1 6 0
fam 5 11 3
wam 7 23 8
woo 2 8 4
kaz 6 4 9
faz 5 8 8
How would you replace the nth field of every mth line with the same element using bash or awk?
For example, if n = 1 and m = 3 and the element = wot, the output would be:
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
I understand you can call / print every mth line using e.g.
awk 'NR%7==0' file
So far I have tried to keep this in memory but to no avail... I need to keep the rest of the file as well.
I would prefer answers using bash or awk, but sed solutions would also be helpful. I'm a beginner in all three. Please explain your solution.
awk -v m=3 -v n=1 -v el='wot' 'NR % m == 0 { $n = el } 1' file
Note, however, that the inter-field whitespace is not guaranteed to be preserved as-is, because awk splits a line into fields by any run of whitespace; as written, the output fields of modified lines will be separated by a single space.
If your input fields are consistently separated by 2 spaces, however, you can effectively preserve the input whitespace by adding -F' ' -v OFS=' ' to the awk invocation.
-v m=3 -v n=1 -v el='wot' defines Awk variables m, n, and el
NR % m == 0 is a pattern (condition) that evaluates to true for every m-th line.
{ $n = el } is the associated action that replaces the nth field of the input line with variable el, causing the line to be rebuilt, implicitly using OFS, the output-field separator, which defaults to a space.
1 is a common Awk shorthand for printing the (possibly modified) input line at hand.
Great little exercise. While I would probably lean toward an awk solution, in bash you can also rely on parameter expansion with substring replacement to replace the nth field of every mth line. Essentially, you can read every line, preserving whitespace, then check your line count, e.g. if c is your line counter and m your variable for mth line, you could use:
if (( $((c % m )) == 0)) ## test for mth line
If the line is a replacement line, you can read each word into an array after restoring default word-splitting and then use your array element index n-1 to provide the replacement (e.g. ${line/find/replace} with ${line/"${array[$((n-1))]}"/replace}).
If it isn't a replacement line, simply output the line unchanged. A short example could be similar to the following (to which you can add additional validations as required)
#!/bin/bash
[ -n "$1" -a -r "$1" ] || { ## filename given an readable
printf "error: insufficient or unreadable input.\n"
exit 1
}
n=${2:-1} ## variables with default n=1, m=3, e=wot
m=${3:-3}
e=${4:-wot}
c=1 ## line count
while IFS= read -r line; do
if (( $((c % m )) == 0)) ## test for mth line
then
IFS=$' \t\n'
a=( $line ) ## split into array
IFS=
echo "${line/"${a[$((n-1))]}"/$e}" ## nth replaced with e
else
echo "$line" ## otherwise just output line
fi
((c++)) ## advance counter
done <"$1"
Example Use/Output
n=1, m=3, e=wot
$ bash replmn.sh dat/repl.txt
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
n=1, m=2, e=baz
$ bash replmn.sh dat/repl.txt 1 2 baz
foo 1 6 0
baz 5 11 3
wam 7 23 8
baz 2 8 4
kaz 6 4 9
baz 5 8 8
n=3, m=2, e=99
$ bash replmn.sh dat/repl.txt 3 2 99
foo 1 6 0
fam 5 99 3
wam 7 23 8
woo 2 99 4
kaz 6 4 9
faz 5 99 8
An awk solution is shorter (and avoids problems with duplicate occurrences of the replacement string in $line), but both would need similar validation of field existence, etc.. Learn from both and let me know if you have any questions.
Problem
The behaviour of
!(pattern-list)
does not work the way I would expect when used in parameter expansion, specifically
${parameter/pattern/string}
Input
a="1 2 3 4 5 6 7 8 9 10"
Test cases
$ printf "%s\n" "${a/!([0-9])/}"
[blank]
#expected 12 3 4 5 6 7 8 9 10
$ printf "%s\n" "${a/!(2)/}"
[blank]
#expected 2 3 4 5 6 7 8 9 10
$ printf "%s\n" "${a/!(*2*)/}"
2 3 4 5 6 7 8 9 10
#Produces the behaviour expected in previous one, not sure why though
$ printf "%s\n" "${a/!(*2*)/,}"
,2 3 4 5 6 7 8 9 10
#Expected after previous worked
$ printf "%s\n" "${a//!(*2*)/}"
2
#Expected again previous worked
$ printf "%s\n" "${a//!(*2*)/,}"
,,2,
#Why are there 3 commas???
Specs
GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)
Notes
These are very basic examples, so if it is possible to include more complex examples with explanations in the answer then please do.
Any more info or examples needed let me know in the comments.
Have already looked at How does extglob work with shell parameter expansion?, and have even commented on what the problem is with that particular problem, so please don't mark as a dupe.
Parameter expansion of the form ${parameter/pattern/string} (where pattern doesn't start with a /) works by finding the leftmost longest substring in the value of the variable parameter that matches the pattern pattern and replacing it with string. In other words, $parameter is decomposed into three parts prefix,match, and suffix such that
$parameter == "${prefix}${match}${suffix}"
$prefix is the shortest possible string enabling the other requirements to be fulfilled (i.e. the match, if at all possible, occurs in the leftmost position)
$match matches pattern and is as long as possible
any of $prefix, $match and/or $suffix can be empty
and the result of ${parameter/pattern/string} is "${prefix}string${suffix}".
For the global replacement form (${parameter//pattern/string}) of this type of parameter expansion, the same process is recursively performed for the suffix part, however a zero-length match is handled as a special case (in order to prevent infinite recursion):
if "${prefix}${match}" != ""
"${parameter//pattern/string}" = "${prefix}string${suffix//pattern/string}"
else suffix=${parameter:1} and
"${parameter//pattern/string}" = "string${parameter:0:1}${suffix}//pattern/string}"
Now let's analyze the cases individually:
"${a/!([0-9])/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Indeed, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of a single digit, and therefore it matches the pattern !([0-9]). Hence the empty result of expansion.
"${a/!(2)/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Similar to the above, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of the single character '2', and therefore it matches the pattern !(2). Hence the empty result of expansion.
"${a/!(*2*)/}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. The substring '1 ' doesn't match the pattern *2*, and therefore it matches the pattern !(*2*).
"${a/!(*2*)/,}". There were no surprises here, so no need to elaborate.
"${a//!(*2*)/}". There were no surprises here, so no need to elaborate.
"${a//!(*2*)/,}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. Then ${suffix//!(*2*)/,} expands to ",2," as follows. The empty string in the beginning of suffix matches the pattern !(*2*), producing an extra comma in the result. Since the zero-length match special case (described above) was triggered, the first character of suffix is forcibly consumed, leaving us with ' 3 4 5 6 7 8 9 10', which matches the !(*2*) pattern in its entirety and is replaced with the last comma that we see in the final result of the expansion.
This question already has answers here:
Length of string in bash
(11 answers)
Closed 6 years ago.
Is it even possible? I currently have a one-liner to count the number of words in a file. If I output what I currently have it looks like this:
3 abcdef
3 abcd
3 fec
2 abc
This is all done in 1 line without loops and I was thinking if I could add a column with length of each word in a column. I was thinking I could use wc -m to count the characters, but I don't know if I can do that without a loop?
As seen in the title, no AWK, sed, perl.. Just good old bash.
What I want:
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3
Where the last column is length of each word.
while read -r num word; do
printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file
You can do something like this also:
File
> cat test.txt
3 abcdef
3 abcd
3 fec
2 abc
Bash script
> cat test.txt.sh
#!/bin/bash
while read line; do
items=($line) # split the line
strlen=${#items[1]} # get the 2nd item's length
echo $line $strlen # print the line and the length
done < test.txt
Results
> bash test.txt.sh
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3