unix Diff utility - how to generates output file without < or >? - bash

I am using diff to see the differenceS between 2 files.
It generates the output like:
some numbers here
< gi|description1
< ADGCAAAGGCC
---
> gi|description3
> GGCCTAAGGGG
Can I produce the output like:
gi|description1
ADGCAAAGGCC
gi|description3
GGCCTAAGGGG
without < , > ,--- , the first numbers line ????
Thanks a lot.

Certainly you can modify the output of the diff utility to your liking. In the end it is a utility in the typical unix tradition, so one would expect it to be tweakable into all directions :-)
The "man page" explains the options and points out the ...-line-format options for this. Take a look yourself: man diff...
This leads to a command like this:
diff --unchanged-line-format="" --old-line-format="%L" --new-line-format="%L" file1 file2
It outputs only lines that have changed and for those simply the literal old and new form without any further indication marks. Which is what you want, according to your example.

Related

Finding a newline in the csv file

I know there are a lot of questions about this (latest one here.), but almost all of them are how to join those broken lines into one from a csv file or remove them. I don't want to remove, but I just want to display/find that line (or probably the line number?)
Example data:
22224,across,some,text,0,,,4 etc
33448,more,text,1,,3,,,4 etc
abcde,text,number,444444,0,1,,,, etc
358890,more
,text,here,44,,,, etc
abcdefg,textds3,numberss,413,0,,,,, etc
985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
More search on this, and I know I shouldn't use bash to accomplish this, but rather shoud use perl. I tried (from various website, I don't know perl), but apparently I don't have the Text::CSV package and I don't have permission to install one.
As I told I have no idea how to even start looking for this, so I don't have any script. This is not a windows file, this is very much unix file so we can ignore the CR problem.
Desired output:
358890,more
,text,here,44,,,, etc
985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
or
Line 4: 358890,more
,text,here,44,,,, etc
Line 7: 985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
Much appreciated.
You can use perl to count the number of fields(commas), and append the next line until it reaches the correct number
perl -ne 'if(tr/,/,/<28){$line=$.;while(tr/,/,/<28){$_.=<>}print "Line $line: $_\n"}' file
I do love Perl but I don't think it is the best tool for this job.
If you want a report of all lines that DO NOT have exactly the correct number of commas/delimiters, you could use the unix language awk.
For example, this command:
/usr/bin/awk -F , 'NF != 8' < csv_file.txt
will print all lines that DO NOT have exactly 7 commas. Comma is specified as the Field with -F and the Number of Fields is specified with NF.

Display only difference using diff utils

I am using gnu diffutils to output the difference between two sql files using the following:
diff -e abc.sql abcd.sql >diff.sql
But the difference saved is also showing line number where it is found before the difference and a full stop after it.
I want to only display the difference and ignore the line number and the full stop.
diff --line-format=%L file1 file2
From man diff:
--line-format=LFMT
format all input lines with LFMT
LFMT (only) may contain:
%L contents of line

Using cloc (count Lines of Codes) result

I am writing a script for my research, and I want to get the total number of lines in a source file. I came around cloc and I think I am going to use it in my script.
However, cloc gives result with too many information (unfortunately since I am a new member I cannot upload a photo). It gives number of files, number of lines, number of blank lines, number of comment lines, and other graphical representation stuff.
I am only interested in the number of lines to use it on my calculations. Is there a way to get that number easily (maybe by performing some options in command line (although I went through the available options and didn't find something useful for my case))?
I thought to do regular expression on the result to get the number; however, this is my first time using cloc and there might be a better/professional way of doing it.
Any thought?
Regards,
Arwa
I am not sure about CLOC. But it is worth using default shell command.
Please have a look at this question.
To get number of lines of code individually
find . -name '*.*' | xargs wc -l
To get total number of lines of code in a directory.
(find ./ -name '*.*' -print0 | xargs -0 cat) | wc -l
Please note that if you need number of lines from files with specific extension you could use *.ext. *.rb, if it is ruby.
For something very quick and simple you could just use:
Dir.glob('your_directory/**/*.rb').map do |file|
File.foreach(file).count
end.reduce(:+)
This will count all the lines of .rb files in your_directory and it's sub directories. Although I would recommend adding some handling for blank lines as well as comment lines. For more on Dir::glob
#BinaryMee and #engineersmnky thanks for your response.
I tried two different solutions, one using "readlines" got the answer from #gicappa
Count the length (number of lines) of a CSV file?
the other solution using cloc. I ran the command
%x{perl #{ClocPath} #{path-to-file} > result.txt}
and saved the result in result.txt
cloc returns result in a graphical form (I cannot upload image), it also reports number of blank lines, comment lines, and code lines. As I said, I am interested in code lines. So, I opened the file and used regular expression to get the number I needed.
content = File.read("#{path}/result.txt")
line = content.scan(/(\s+\d+\s+\d+\s+\d+\s+\d+)/)
total = line[0][0].split(' ').last
content here will have the content of a file, then line will get this line from the file:
C# 1 3 3 17
C# is the language of a file, 1 is number of files, 3 is number of blank lines, 3 is number of comment lines, and 17 is number of code lines. I got the help of the format from the script of cloc. total then will have number 17.
This solution will help if you are reading a specific file only, you need to add more solutions if you are reading the lines of more than one file.
Hopefully this will help who needs it.
Regards,
Arwa

How to add a shell command and use the result in a Fortran program?

Is it possible to call shell command from a Fortran script?
My problem is that I analyze really big files. These files have a lot of lines, e.g. 84084002 or similar.
I need to know how many lines the file has, before I start the analysis, therefore I usually used shell command: wc -l "filename", and than used this number as a parameter of one variable in my script.
But I would like to call this command from my program and use the number of lines and store it into the variable value.
Since 1984, actually in the 2008 standard but already implemented by most of the commonly-encountered Fortran compilers including gfortran, there is a standard intrinsic subroutine execute_command_line which does, approximately, what the widely-implemented but non-standard subroutine system does. As #MarkSetchell has (almost) written, you could try
CALL execute_command_line('wc -l < file.txt > wc.txt' )
OPEN(unit=nn,file='wc.txt')
READ(nn,*) count
What Fortran doesn't have is a standard way in which to get the number of lines in a file without recourse to the kind of operating-system-dependent workaround above. Other, that is, than opening the file, counting the number of lines, and then rewinding to the start of the file to commence reading.
You should be able to do something like this:
command='wc -l < file.txt > wc.txt'
CALL system(command)
OPEN(unit=nn,file='wc.txt')
READ(nn,*) count
You can output the number of lines to a file (fort.1)
wc -l file|awk '{print $1}' > fort.1
In your Fortran program, you can then store the number of lines to a variable (e.g. count) by reading the file fort.1:
read (1,*) count
then you can loop over the variable count and read your whole file
do 1,count
read (file,*)

Opposite of Linux Split

I have a huge file and I split the big file into several small chunks and divide and conquer. Now I have a folder contains a list of files like below:
output_aa #(the output file done: cat input_aa | python parse.py > output_aa)
output_ab
output_ac
output_ad
...
I am wondering is there a way to merge those files back together FOLLOWING THE INDEX ORDER:
I know I could do it by using
cat * > output.all
but I am more curious another magical command already exist comes with split..
The magic command would be:
cat output_* > output.all
There is no need to sort the file names as the shell already does it (*).
As its name suggests, cat original design was precisely to conCATenate files which is basically the opposite of split.
(*) Edit:
Should you use an (hypothetical ?) locale that use a collating order where the a-z order is not abcdefghijklmnopqrstuvwxyz, here is one way to overcome the issue:
LC_ALL=C "sh -c cat output_* > output.all"
There are other ways to concat files together, but there is no magical "opposite of split" in "linux".
Of course, talking about "linux" in general is a bit far fetched, as many distributions have different tools (most of them use a different shell already by default, like sh, bash, csh, zsh, ksh, ...), but if you're talking about debian based linux at least, I don't know of any distribution which would provide such a tool.
For sorting you can use the linux command "sort" ;
Also be aware that using ">" for redirecting stdout will override maybe existing contents, while ">>" will concat to an existing file.
I don't want to copycat, but still make this answer complete, so what jlliagre said about the cat command should also be considered of course (that "cat" was made to con-"cat" files, effectively making it possible to reverse the split command - but that's only provided you use the same ordering of files, so it's not exactly the "opposite of split", but will work that way in close to 100% of the cases (see comments under jlliagre answer for specifics))

Resources