Linux - GREP print a line with special characters - utf-8

I have a problem with grep command, I want all the lines that start with '<', but that is not the problem, when I create a new file with the result of this grep, some special characters are changed for hex symbols
Grep command
cat OriginalFile.txt | grep '^<' > NewFile.txt
Before Grep
After Grep
Is it a way to prevent this change?
File command
Diff its OriginalFile.txt
Grep its NewFile.txt

Related

grep to sed, append after string match but instead on end of line

I have the following text file with the following lines:
<test="123">
<test="456">
<test="789">
My aim is to have the above text file to be appended with a keyword "HELLO" after the above numbers, as following:
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
with the grep command and cut, I manage to get the value between the quotation mark.
grep -o "test=".* test.txt | cut -d \" -f2
I tried to use sed on top of it, with this line
grep -o "test=".* test.txt | cut -d \" -f2 | sed -i -- 's/$/.HELLO/' test.txt
however the closest I manage to get is instead a ".HELLO" which directly appended on the end of the line (and not after the numbers in between the quotes)
<test="123">.HELLO
<test="456">.HELLO
<test="789">.HELLO
How can I fix my sed statement to provide me with the requested line?
You can do it with groups in sed. To create new output, you can do this:
sed 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
To modify it in-place, you can use the -i switch:
sed -i 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
Explanation:
() is a group. You can refer to it with \1. In sed we have to escape the parentheses: \(\)
[^"]* matches everything that's not a quote. So the match will stop before the quote
In the replacement, you have to add the quote manually, since it's outside of the group. So you can put stuff before the quote.
Try this:
This is how your file looks like.
bash > cat a.txt
<test="123">
<test="456">
<test="789">
Your text piped to SED
bash > cat a.txt |sed 's/">/.HELLO">/g'
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
bash >
Let me know if this worked out for you.
awk 'sub("[0-9]+","&.HELLO")' file
You can accomplish this with sed directly. Cut should not be necessary:
grep "test=" test.txt | sed 's/"\(.*\)"/"\1.HELLO"/'

combine grep and grep -v search together

I am trying to combine grep and grep -v search together.
Output should be display all lines ending with .xml, but to exclude lines starting with $.
Here are the commands I have tried; none worked:
grep *.xml file1.txt | grep -v '$' file1.txt > output
grep *.xml | grep -v '$' file1.txt > output
grep *.xml grep -v '$' file1.txt > output
grep *.xml '$' file1.txt > output
To match a $ at the start of a line, anchor it to the start of the line with ^. Also, $ by itself matches the end of the line (it's a special character, just like ^), and * will not do what you think it does (it works differently in regular expressions compared to in shell globbing patterns). So,
grep -v '^\$'
will filter out all lines starting with a $.
You can do either
grep '\.xml$' file1.txt | grep -v '^\$'
or
grep '^[^$].*\.xml$' file1.txt
to find all lines in the file file1.txt that do not start with $ but that ends with .xml.
Notice that I also escape the dot in .xml as that otherwise matches any character, and that the second command combines both criteria by using a character range ([ ... ]) containing all characters except $ (the .* matches any number of any characters).
The single quotes are necessary so that the shell won't interpret the regular expression as a shell globbing pattern.
You should use "cat" command to direct the output to an file.
And then use regular expression to filter the keyword, in this case all lines start with $ symbol is '^[$]'.
So you can use command cat *.xml | grep -v '^[$]'.

Bash - remove all lines beginning with 'P'

I have a text file that's about 300KB in size. I want to remove all lines from this file that begin with the letter "P". This is what I've been using:
> cat file.txt | egrep -v P*
That isn't outputting to console. I can use cat on the file without another other commands and it prints out fine. My final intention being to:
> cat file.txt | egrep -v P* > new.txt
No error appears, it just doesn't print anything out and if I run the 2nd command, new.txt is empty.
I should say I'm running Windows 7 with Cygwin installed.
Explanation
use ^ to anchor your pattern to the beginning of the line ;
delete lines matching the pattern using sed and the d flag.
Solution #1
cat file.txt | sed '/^P/d'
Better solution
Use sed-only:
sed '/^P/d' file.txt > new.txt
With awk:
awk '!/^P/' file.txt
Explanation
The condition starts with an ! (negation), that negates the following pattern ;
/^P/ means "match all lines starting with a capital P",
So, the pattern is negated to "ignore lines starting with a capital P".
Finally, it leverage awk's behavior when { … } (action block) is missing, that is to print the record validating the condition.
So, to rephrase, it ignores lines starting with a capital P and print everything else.
Note
sed is line oriented and awk column oriented. For your case you should use the first one, see Edouard Lopez's reponse.
Use sed with inplace substitution (for GNU sed, will also for your cygwin)
sed -i '/^P/d' file.txt
BSD (Mac) sed
sed -i '' '/^P/d' file.txt
Use start of line mark and quotes:
cat file.txt | egrep -v '^P.*'
P* means P zero or more times so together with -v gives you no lines
^P.* means start of line, then P, and any char zero or more times
Quoting is needed to prevent shell expansion.
This can be shortened to
egrep -v ^P file.txt
because .* is not needed, therefore quoting is not needed and egrep can read data from file.
As we don't use extended regular expressions grep will also work fine
grep -v ^P file.txt
Finally
grep -v ^P file.txt > new.txt
This works:
cat file.txt | egrep -v -e '^P'
-e indicates expression.

grepping string from long text

The command below in OSX checks whether an account is disabled (or not).
I'd like to grep the string "isDisabled=X" to create a report of disabled users, but am not sure how to do this since the output is on three lines, and I'm interested in the first 12 characters of line three:
bash-3.2# pwpolicy -u jdoe -getpolicy
Getting policy for jdoe /LDAPv3/127.0.0.1
isDisabled=0 isAdminUser=1 newPasswordRequired=0 usingHistory=0 canModifyPasswordforSelf=1 usingExpirationDate=0 usingHardExpirationDate=0 requiresAlpha=0 requiresNumeric=0 expirationDateGMT=12/31/69 hardExpireDateGMT=12/31/69 maxMinutesUntilChangePassword=0 maxMinutesUntilDisabled=0 maxMinutesOfNonUse=0 maxFailedLoginAttempts=0 minChars=0 maxChars=0 passwordCannotBeName=0 validAfter=01/01/70 requiresMixedCase=0 requiresSymbol=0 notGuessablePattern=0 isSessionKeyAgent=0 isComputerAccount=0 adminClass=0 adminNoChangePasswords=0 adminNoSetPolicies=0 adminNoCreate=0 adminNoDelete=0 adminNoClearState=0 adminNoPromoteAdmins=0
Your ideas/suggestions are most appreciated! Ultimately this will be part of a Bash script. Thanks.
This is how you would use grep to match "isDisabled=X":
grep -o "isDisabled=."
Explanation:
grep: invoke the grep command
-o: Use the --only-matching option for grep (From grep manual: "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line."
"isDisabled=.": This is the search pattern you give to grep. The . is part of the regular expression, it means "match any character except for newline".
Usage:
This is how you would use it as part of your script:
pwpolicy -u jdoe -getpolicy | grep -oE "isDisabled=."
This is how you can save the result to a variable:
status=$(pwpolicy -u jdoe -getpolicy | grep -oE "isDisabled=.")
If your command was run some time prior, and the results from the command was saved to a file called "results.txt", you use it as input to grep as follows:
grep -o "isDisabled=." results.txt
You can use sed as
cat results.txt | sed -n 's/.*isDisabled=\(.\).*/\1/p'
This will print the value of isDisbaled.

Linux commands to output part of input file's name and line count

What Linux commands would you use successively, for a bunch of files, to count the number of lines in a file and output to an output file with part of the corresponding input file as part of the output line. So for example we were looking at file LOG_Yellow and it had 28 lines, the the output file would have a line like this (Yellow and 28 are tab separated):
Yellow 28
wc -l [filenames] | grep -v " total$" | sed s/[prefix]//
The wc -l generates the output in almost the right format; grep -v removes the "total" line that wc generates for you; sed strips the junk you don't want from the filenames.
wc -l * | head --lines=-1 > output.txt
produces output like this:
linecount1 filename1
linecount2 filename2
I think you should be able to work from here to extend to your needs.
edit: since I haven't seen the rules for you name extraction, I still leave the full name. However, unlike other answers I'd prefer to use head rather then grep, which not only should be slightly faster, but also avoids the case of filtering out files named total*.
edit2 (having read the comments): the following does the whole lot:
wc -l * | head --lines=-1 | sed s/LOG_// | awk '{print $2 "\t" $1}' > output.txt
wc -l *| grep -v " total"
send
28 Yellow
You can reverse it if you want (awk, if you don't have space in file names)
wc -l *| egrep -v " total$" | sed s/[prefix]//
| awk '{print $2 " " $1}'
Short of writing the script for you:
'for' for looping through your files.
'echo -n' for printing the current file
'wc -l' for finding out the line count
And dont forget to redirect
('>' or '>>') your results to your
output file

Resources