Find negative numbers in file with grep - bash

i have this script that reads a file, the file looks like this:
711324865,438918283,2
-333308476,886548365,2
1378685449,-911401007,2
-435117907,560922996,2
259073357,714183955,2
...
the script:
#!/bin/bash
while IFS=, read childId parentId parentLevel
do
grep "\$parentId" parent_child_output_level2.csv
resul=$?
echo "child is $childId, parent is $parentId parentLevel is $parentLevel resul is $resul"
done < parent_child_output_level1.csv
but it is not working, resul is allways returning me 1, which is a false positive.
I know that because i can launch the next command, equivalent, i think:
[core#dub-vcd-vms165 generated-and-saved-to-hdfs]$
grep "\-911401007"parent_child_output_level2.csv
-911401007,-157143722,3
Please help.

grep command to print only the negative numbers.
$ grep -oP '(^|,)\K-\d+' file.csv
-333308476
-911401007
-435117907
(^|,) matches the start of a line or comma.
\K discards the previously matched characters.
-\d+ Matches - plus the following one or more numbers.

Your title is inconsistent with your question. Your title asks for how to grep negative numbers, which Avinash Raj answered well, although I'd suggest you don't even need the (Perl-style) look-behind positive assertion (^|,)\K to match start-of-field, because if the file is well-formed, then -\d+ would match all numbers just as well. So you could just run (edit: realized that with a leading - you need -- to prevent grep from taking the pattern as an option):
grep -oP -- '-\d+' file.csv;
Your question includes a script whose intention seems to be to grep for any number (positive or negative) in the first field (childId) of one file (parent_child_output_level2.csv) that occurs in the second field (parentId) of another file (parent_child_output_level1.csv). To accomplish this, I wouldn't use grep, because you're trying to do an exact numerical equality test, which can even be done as an exact string equality test assuming your numbers are always consistently represented (e.g. no redundant leading zeroes). Repeatedly grepping through the entire file just to search for a number in one column is also wasteful of CPU.
Here's what I would do:
parentIdList=($(cut -d, -f2 parent_child_output_level1.csv));
childIdList=($(cut -d, -f1 parent_child_output_level2.csv));
for parentId in "${parentIdList[#]}"; do
for childId in "${childIdList[#]}"; do
if [[ "$childId" == "$parentId" ]]; then
echo "$parentId";
fi;
done;
done;
With this approach, you precompute both the parent id list and the child id list just once, using cut to extract the appropriate field from each file. Then you can use the shell-builtin for loop, shell-builtin if conditional, and shell-builtin [[ test command to accomplish the check, and finally finish with a shell-builtin echo to print the matches. Everything is shell-builtin, after the initial command substitutions that run the cut external executable.
If you also want to filter these results on negative numbers, you could grep for ^- in the results of the above script, or grep for it in the results of each (or just the first) cut command, or add the following line just inside the outer for loop:
if [[ "${parentId:0:1}" != '-' ]]; then continue; fi;
Alternative approach:
if [[ "$parentId" != -* ]]; then continue; fi;
Either approach will skip non-negatives.

Related

I want to compare one line to the next line, but only in the third column, from a file using bash

So, what I'm trying to do is read in a file, loop through it comparing it line by line, but only in the third column. Sorry if this doesn't make sense, but maybe this will help. I have a file of names:
JOHN SMITH SMITH
JIM JOHNSON JOHNSON
JIM SMITH SMITH
I want to see if (first, col3)SMITH is equal to JOHNSON, if not, move onto the next name. If (first, col3) SMITH is equal to (second, col3) SMITH, then I'll do something with that.
Again, I'm sorry if this doesn't make much sense, but I tried to explain it as best as I could.
I was attempting to see if they were equal, but obviously that didn't work. Here is what I have so far, but I got stuck:
while read -a line
do
if [ ${line[2]} == ${line[2]} ]
then
echo -e "${line[2]}" >> names5.txt
else
echo "Not equal."
fi
done < names4.txt
Store your immediately prior line in a separate variable, so you can compare against it:
#!/usr/bin/env bash
old_line=( )
while read -r -a line
do
if [ "${line[2]}" = "${line[2]}" ]; then
printf '%s\n' "${line[2]}"
else
echo "Not equal." >&2
fi
old_line=( "${line[#]}" )
done <names4.txt >>names5.txt
Some other changes of note:
Instead of re-opening names5.txt every time you want to write a single line to it, we're opening it just once, for the whole loop. (You could make this >names5.txt if you want to clear it at the top of the loop and append from there, which is likely to be desirable behavior).
We're avoiding echo -e. See the APPLICATION USE and RATIONALE sections of the POSIX standard for echo for background on why echo use is not recommended for new development when contents are not tightly constrained (known not to contain any backslashes, for example).
We're quoting both sides of the test operation. This is mandatory with [ ] to ensure correct operation of words can be expanded as globs (ie. if you have a word *, you don't want it replaced with a list of files in your current directory in the final command), or if they can contain spaces (not so much a concern here, since you're using the same IFS value for the read -a as the unquoted expansion). Even if using [[ ]], you want to quote the right-hand side so it's treated as a literal string and not a pattern.
We're passing -r to read, which ensures that backslashes are not silently removed (changing \t in the input to just t, for example).
When you want to compare each third field with all previous third fields, you need to store the old third fields in an array. You can use awk for this.
When you only want to see the repeated third fields, you can use other tools:
cut -d" " -f3 names4.txt | sort | uniq -d
EDIT:
When you onlu want to print doubles from 2 consecutive lines, it is even easier:
cut -d" " -f3 names4.txt | uniq -d

Bash - binary operator expected

I've got bash script for counting rows in the reports. I have one array where all reports names are stored and in the loop I'm counting rows. However for some files my script receives binary operator expected error. Do anyone have a solution?
for i in ${ARRAY[#]}; do
if [ ! -f "$BASE_DIR/$i"* ];
then
echo "File not generated yet"
else
ARRAY2=$(wc -l < "$BASE_DIR/$i"*.tab | awk '{print $1-2}')
echo ${ARRAY2[$i]} $i
fi
Use double straight braces instead of ones as follows since you r using extended expressions.
if [[ ! -f "$BASE_DIR/$i"* ]];
Need to check with array contents. Special characters as ' ' (spaces) in file names must be escaped.
-f takes just one argument, so the error occurs when the pattern matches more than one file.
It seems to work with [[, although I can't find any documentation as to why it does.
The bigger problem is you can also only use one file with the < operator; if the pattern matches multiple files, you'll get an ambiguous redirect error. To fix that, you'll need to use cat:
cat "$BASE_DIR/$i"*.tab | wc -l
However, it's not clear what you are expecting from the output; ARRAY2 will not actually be an array.

Match exact word in bash script, extract number from string

I'm trying to create a very simple bash script that will open new link base on the input command
Use case #1
$ ./myscript longname55445
It should take the number 55445 and then assign that to a variable which will later be use to open new link based on the given number.
Use case #2
$ ./myscript l55445
It should do the exact same thing as above by taking the number and then open the same link.
Use case #3
$ ./myscript 55445
If no prefix given then we just simply open that same link as a fallback.
So far this is what I have
#!/bin/sh
BASE_URL=http://api.domain.com
input=$1
command=${input:0:1}
if [ "$command" == "longname" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
elseif [ "$command" == "l" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
else
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
fi
But this will always fallback to the elseif there.
I'm using zsh at the moment.
input=$1
command=${input:0:1}
sets command to the first character of the first argument. It's not possible for a one character string to be equal to an eight-character string ("longname"), so the if condition must always fail.
Furthermore, both your elseif and your else clauses set
number=${input:1:${#input}}
Which you could have written more simply as
number=${input:1}
But in both cases, you're dropping the first character of input. Presumably in the else case, you wanted the entire first argument.
see whether this construct is helpful for your purpose:
#!/bin/bash
name="longname55445"
echo "${name##*[A-Za-z]}"
this assumes a letter adjacent to number.
The following is NOT another way to write the same, because it is wrong.
Please see comments below by mklement0, who noticed this. Mea culpa.
echo "${name##*[:letter:]}"
You have command=${input:0:1}
It takes the first single char, and you compare it to "longname", of course it will fail, and go to elseif.
The key problem is to check if the input is beginning with l or longnameor nothing. If in one of the 3 cases, take the trailing numbers.
One grep line could do it, you can just grep on input and get the returned text:
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"l234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"longname234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"foobar234"
<we got nothing>
You can use regex matching in bash.
[[ $1 =~ [0-9]+ ]] && number=$BASH_REMATCH
You can also use regex matching in zsh.
[[ $1 =~ [0-9]+ ]] && number=$MATCH
Based on the OP's following clarification in a comment,
I'm only looking for the numbers [...] given in the input.
the solution can be simplified as follows:
#!/bin/bash
BASE_URL='http://api.domain.com'
# Strip all non-digits from the 1st argument to get the desired number.
number=$(tr -dC '[:digit:]' <<<"$1")
open "$BASE_URL?id=$number"
Note the use of a bash shebang, given the use of 'bashism' <<< (which could easily be restated in a POSIX-compliant manner).
Similarly, the OP's original code should use a bash shebang, too, due to use of non-POSIX substring extraction syntax.
However, judging by the use of open to open a URL, the OP appears to be on OSX, where sh is essentially bash (though invocation as sh does change behavior), so it'll still work there. Generally, though, it's safer to be explicit about the required shell.

Shell: extract words matching pattern, but ignore circumventing expression

I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.

Shell Script : If a string is present in a file

I am a newbie to shell scriptng and I want to check if 3 strings("hello","who","when " etc) are present in a file.
I find many ways when I google out awk,cat ,grep etc ,What can be the best way and how Can I do it.
I just need to know if the strings are present or not .
Your question is a little incomplete:
do you want to find strings or words? So when the word Othello appears, does that count as hello?
in your question there is whitespace behind the when. Is that intentional?
do you want to know whether all three words are in the file, or is one of the words enough?
The general solution is to use grep or egrep to search for text in a file. The exact command line depends on the answers to the above questions.
to search for words (Othello doesn't count as hello) you need to pass the -w option to grep.
I'm assuming thhat the whitespace was a mistake.
When you need all the words, you can do egrep -wo 'hello|who|when' | sort -u. The egrep command finds all instances of the given words, and prints them out one per line. At that point, you will have many duplicates. Therefore the sort -u command sorts them and only keeps the unique lines (that's what the -u means). In a complete program, I would do it as follows:
filename="story.txt"
words=$(egrep -wo 'hello|who|when' "$filename" | sort -u)
n=$(echo "$words" | wc -l)
if [ $n = 3 ]; then
echo "found all words in the file"
else
echo "didn't find all words, only \""$words"\"."
fi
There's a lot more that I could tell you about this little piece of code, and why I wrote it exactly like that, but for a beginner, it's already enough to understand.
But just in case that you need a simple solution and the file is small anyway, so performance is not critical, you can do this:
filename="story.txt"
if egrep -wl 'hello' "$filename" 1>/dev/null; then
if egrep -wl 'when' "$filename" 1>/dev/null; then
if egrep -wl 'who' "$filename" 1>/dev/null; then
echo "found all three words"
fi
fi
fi
[Update:]
This second code snippet also checks whether the given file contains all three words. Each of the if clauses checks for one of the words. The option -l (lowercase ell) to egrep makes it potentially faster, but you probably don't need that option at all.
Normally egrep prints all lines that match the given expressions (your three words in this case). Since we don't need that output, we redirect it using the arrow operator > to a special file called /dev/null. Whatever you write into that file is discarded.
The if statement takes another command as its argument, and if that command returns successfully, the then branch is taken. The nice thing about the egrep command is that it returns successfully iff the given search expression is contained in the file, so these two things perfectly fit together.
For further reading you should try the reference documentation from the Open Group website: http://www.google.com/search?q=opengroup+grep

Resources