vimdiff files given in a text file - bash

I have a text file files.txt with following entries
"/home/dilawar/a.txt","/home/dilawar/b.txt"
"/home/dilawar/aa.txt","/home/dilawar/bb.txt"
Now I wish to see the diff of files on line 1. I tried the following
head -n 1 files.txt | cut -d, -f 2,3 | sed "s/,/\t/g" | xargs -I files vimdiff files
It is not working. I replaced vimdiff with diff, it did not work either. However this works
head -n 1 files.txt | cut -d, -f 1 | xargs -I file vim file
How to pass file as an argument to diff as two separate file paths rather than a single string?
PS : To make matter worse, I have space in some of file paths.

First take the first line, then recplace the symbols by a space, and feed it to vimdiff via a subshell.
vimdiff $(head -1 files.txt | tr '",' ' ')
The above elegant method will not work with names with a space. The below dirty one will.
awk -F, 'NR==1{print "vimdiff",$1,$2}' files.txt | bash

try this, see if it helps
sed '1{s/,/ /; s/^/diff /;q}' files.txt|sh

I also escaped the whitespace in filepath (first sed command)
head -n 1 files.txt | sed "s/ /\\\\ /g" | sed "s/[\",]/ /g" |xargs vimdiff

Related

Pass a list of files to sed to delete a line in them all

I am trying to do a one liner command that would delete the first line from a bunch of files. The list of files will be generated by grep command.
grep -l 'hsv,vcv,tro,ztk' ${OUTPUT_DIR}/*.csv | tr -s "\n" " " | xargs /usr/bin/sed -i '1d'
The problem is that sed can't see the list of files to act on.I'm not able to work out what is wrong with the command. Please can someone point me to my mistake.
Line numbers in sed are counted across all input files. So the address 1 only matches once per sed invocation.
In your example, only the first file in the list will get edited.
You can complete your task with loop such as this:
grep -l 'hsv,vcv,tro,ztk' "${OUTPUT_DIR}/"*.csv |
while IFS= read -r file; do
sed -i '1d' "$file"
done
This might work for you (GNU sed and grep):
grep -l 'hsv,vcv,tro,ztk' ${OUTPUT_DIR}/*.csv | xargs sed -i '1d'
The -l ouputs the file names which are received as arguments for xargs.
The -i edits in place the file and removes the first line of each file.
N.B. The -i option in sed works at a per file level, to use line numbers for each file within a stream use the -s option.
The only solution that worked for me is this apart from the one posted by Dan above -
for k in $(grep -l 'hsv,vcv,tro,ztk' ${OUTPUT_DIR}/*.csv | tr -s "\n" " ")
do
/usr/bin/sed -i '1d' "${k}"
done

File Name comparision in Bash

I have two files containing list of files. I need to check what files are missing in the list of second file. Problem is that I do not have to match full name, but only need to match last 19 Characters of the file names.
E.g
MyFile12343220150510230000.xlsx
and
MyFile99999620150510230000.xlsx
are same files.
This is a unique problem and I don't know how to start. Kindly help.
awk based solution:
$ awk '
{start=length($0) - 18;}
NR==FNR{a[substr($0, start)]++; next;} #save last 19 characters for every line in file2
{if(!a[substr($0, start)]) print $0;} #If that is not present in file1, print that line.
' file2.list file.list
First you can use comm to match the exact file names and obtain a list of files not matchig. Then you can use agrep. I've never used it, but you might find it useful.
Or, as last option, you can do a brute force and for every line in the first file search into the second:
#!/bin/bash
# Iterate through the first file
while read LINE; do
# Find the section of the filename that has to match in the other file
CHECK_SECTION="$(echo "$LINE" | sed -nre 's/^.*([0-9]{14})\.(.*)$/\1.\2/p')"
# Create a regex to match the filenames in the second file
SEARCH_REGEX="^.*$CHECK_SECTION$"
# Search...
egrep "$SEARCH_REGEX" inputFile_2.txt
done < inputFile_1.txt
Here I assumed the filenames end with 14 digits that must match in the other file and a file extension that can be different from file to file but that has to match too:
MyFile12343220150510230000.xlsx
| variable | 14digits |.ext
So, if the first file is FILE1 and the second file is FILE2 then if the intention is only to identify the files in FILE2 that don't exist in FILE1, the following should do:
tmp1=$(mktemp)
tmp2=$(mktemp)
cat $FILE1 | rev | cut -c -19 | sort | uniq > ${tmp1}
cat $FILE2 | rev | cut -c -19 | sort | uniq > ${tmp2}
diff ${tmp1} ${tmp2} | rev
rm ${tmp1} ${tmp2}
In a nutshell, this reverses the characters on each line, and extracts the part you're interested in, saving to a temporary file, for each list of files. The reversal of characters is done since you haven't said whether or not the length of filenames is guaranteed to be constant---the only thing we can rely on here is that the last 19 characters are of a fixed format (in this case, although the format is easily inferred, it isn't really relevant). The sort is important in order for the diff to show you what's not in the second file that is in the first.
If you're certain that there will only ever be files missing from FILE2 and not the other way around (that is, files in FILE2 that don't exist in FILE1), then you can clean things up by removing the cruft introduced by diff, so the last line becomes:
diff ${tmp1} ${tmp2} | rev | grep -i xlsx | sed 's/[[:space:]]\+.*//'
The grep limits the output to those lines with xlsx filenames, and the sed removes everything on a line from the first space encountered onwards.
Of course, technically this only tells you what time-stamped-grouped groups of files exist in FILE1 but not FILE2--as I understand it, this is what you're looking for (my understanding of your problem description is that MyFile12343220150510230000.xlsx and MyFile99999620150510230000.xlsx would have identical content). If the file names are always the same length (as you subsequently affirmed), then there's no need for the rev's and the cut commands can just be amended to refer to fixed character positions.
In any case, to get the final list of files, you'll have to use the "cleaned up" output to filter the content of FILE1; so, modifying the script above so that it includes the "cleanup" command, we can filter the files that you need using a grep--the whole script then becomes:
tmp1=$(mktemp)
tmp2=$(mktemp)
missing=$(mktemp)
cat $FILE1 | rev | cut -c -19 | sort | uniq > ${tmp1}
cat $FILE2 | rev | cut -c -19 | sort | uniq > ${tmp2}
diff ${tmp1} ${tmp2} | rev | grep -i xlsx | sed 's/[[:space:]]\+.*//' > ${missing}
grep -E "("`echo $(<${missing}) | sed 's/[[:space:]]/|/g'`")" ${tmp1}
rm ${tmp1} ${tmp2} ${missing}
The extended grep command (-E) just builds up an "or" regular expression for each timestamp-plus-extension and applies it to the first file. Of course, this is all assuming that there will never be timestamp-groups that exist in FILE2 and not in FILE1--if this is the case, then the "diff output processing" bit needs to be a little more clever.
Or you could use your standard coreutil tools:
for i in $(cat file1 file2 | sort | uniq -u); do
grep -q "$i" f1.txt && \
echo "f2 missing '$i'" || \
echo "f1 missing '$i'"
done
It will identify which non-common entries are missing from which file. You can also manipulate the non-common filenames in any way you like, e.g. parameter expansion/substring extraction, substring removal, or character indexes.

Print lines with sed using line number from grep

I'm trying to pipe line numbers from grep to sed.
First I was extracting the start and end line of what I want to print with sed:
grep -n "Start" file1 | cut -d: -f 1 | head -n 1
grep -n "End" file1 | cut -d: -f 1 | head -n 1
Now I need to use these numbers to print everything from Start to End by line. E.g.
sed -ne '1,30w output1' file1
I'm not sure how this can be done as piping the line numbers to sed will be seen as "input" right?
Example:
Start
some text
some more text
End
Start
some text
some more text
End
As there's more than one start and end i cut of the rest of the line numbers from grep. And I'm supposed to combine grep and sed or is this not possible?
You can do it without grep
sed -n '/Start/,/End/w output1' file1
should work.
It looks like you want to print from the first occurrence of Start to the first subsequent occurrence of End, inclusive. That'd just be:
awk '/Start/{found=1} found{print; if (/End/) exit}' file
This might work for you (GNU sed):
sed -ne '/Start/,/End/w outputfile' -e '/End/q' file
This will write to outputfile the lines between the first Start and End and then quit and obviate the need to use grep too.
If you must use grep then perhaps:
sed -n "$(grep -n "Start" file | cut -d: -f 1 | head -n 1),$(grep -n "End" file | cut -d: -f 1 | head -n 1)"'p' file

grep pipe with sed

This is my bash command
grep -rl "System.out.print" Project1/ |
xargs -I{} grep -H -n "System.out.print" {} |
cut -f-2 -d: |
sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/
What I'm trying to do here is,I'm trying to iterate through sub folders and check what files contains "System.out.print" (using grep)
using 2nd grep trying to get file names and line numbers
using sed command I display those to console.
from here I want to remove "System.out.print" with "XXXXX" how I can pipe sed command to this?
pls help me
thanxx
GNU sed has an option to change files in place:
find Project1/ -type f | xargs sed -i 's/System\.out\.print/XXXXX/g'
Btw, your script could be written as:
grep -rsn 'root' /etc/ |
awk -F: '{ print "filename is", $1, "and line number is", $2 }'
I'm just building on hop's answer, which I found to be more useful than find -exec. I had search_text dispersed all over my computer, in logs, config files and so on, but I didn't want to search (or especially change) anything in /dev, /sys, /proc, and so on. One note, read man xargs; it doesn't like file names with spaces.
grep -HriIl --exclude-dir=dev --exclude-dir=proc --exclude-dir=sys search_text / | xargs sed -i 's/search_text/replace_text/g'

Linux commands to output part of input file's name and line count

What Linux commands would you use successively, for a bunch of files, to count the number of lines in a file and output to an output file with part of the corresponding input file as part of the output line. So for example we were looking at file LOG_Yellow and it had 28 lines, the the output file would have a line like this (Yellow and 28 are tab separated):
Yellow 28
wc -l [filenames] | grep -v " total$" | sed s/[prefix]//
The wc -l generates the output in almost the right format; grep -v removes the "total" line that wc generates for you; sed strips the junk you don't want from the filenames.
wc -l * | head --lines=-1 > output.txt
produces output like this:
linecount1 filename1
linecount2 filename2
I think you should be able to work from here to extend to your needs.
edit: since I haven't seen the rules for you name extraction, I still leave the full name. However, unlike other answers I'd prefer to use head rather then grep, which not only should be slightly faster, but also avoids the case of filtering out files named total*.
edit2 (having read the comments): the following does the whole lot:
wc -l * | head --lines=-1 | sed s/LOG_// | awk '{print $2 "\t" $1}' > output.txt
wc -l *| grep -v " total"
send
28 Yellow
You can reverse it if you want (awk, if you don't have space in file names)
wc -l *| egrep -v " total$" | sed s/[prefix]//
| awk '{print $2 " " $1}'
Short of writing the script for you:
'for' for looping through your files.
'echo -n' for printing the current file
'wc -l' for finding out the line count
And dont forget to redirect
('>' or '>>') your results to your
output file

Resources