Bash script deletes first row in csv during editing with 'cut' - bash

I have little problem with my bash script. The script is cloning repositories to my local folder and then search for files with specified extensions in those repositories (results are CSVtemporary2/ASSETS-LIST-"$dir".csv) , then edits results (cuts first 4 elements of paths, results are:/CSVtoGD2/"$dir".csv)
In the CSV there are paths to files in every single row ex. /users/krzysztofpaszta/temporaryprojects/gamex/car1.png
/users/krzysztofpaszta/temporaryprojects/gamex/sound1.mp3
(...)
Every part of the script is working fine but after cutting the first 4 elements of paths the first row is deleted.. I do not know why this is happening.
So results should be:
/car1.png
/sound1.mp3
(...)
But results are:
/sound1.mp3
(...)
I do not know why this is happening.
So in other words, files CSVtemporary2/ASSETS-LIST-"$dir".csv are fine but then, files
/CSVtoGD2/"$dir".csv have first row deleted.. Someone has idea why this is happening?
#!/bin/bash
rm -vrf /Users/krzysztofpaszta/CSVtoGD2/*
rm -vrf /Users/krzysztofpaszta/CSVtemporary2/*
cd /Users/krzysztofpaszta/temporaryprojects
for repo in $(cat /users/krzysztofpaszta/repolinks.csv); do
git clone "$repo"
dir=${repo##*/}
find /users/krzysztofpaszta/temporaryprojects/"$dir" -name "*.fnt" -o -name "*.png" -o -name "*.ttf" -o -name "*.asset" -o -name "*.jpeg" -o -name "*.tga" -o -name "*.tif" -o -name "*.bmp" -o -name "*.jpg" -o -name "*.fbx" -o -name "*.prefab" -o -name "*.flare" -o -name "*.ogg" -o -name "*.wav" -o -name "*.anim" -o -name "*.mp3" -o -name "*.tiff" -o -name "*.otf" -o -name "*.hdr" >> /users/krzysztofpaszta/CSVtemporary2/ASSETS-LIST-"$dir".csv
while read in ; do
cut -d'/' -f6- >> /users/krzysztofpaszta/CSVtoGD2/"$dir".csv #| awk 'BEGIN{print"//"}1' - adding first empty row is not the solution, first row with text is still deleted
done < /users/krzysztofpaszta/CSVtemporary2/ASSETS-LIST-"$dir".csv
done
#rm -vrf /Users/krzysztofpaszta/temporaryprojects/*
#echo Repo deleted

Your syntax
while read in ; do
cut -d'/' ^f6- >> /users/krzysztofpaszta/CSVtoGD2/"$dir".csv
done < /users/krzysztofpaszta/CSVtemporary2/ASSETS-LIST-"$dir".csv
will not work as you expect.
The read command reads the first line of the csv file
assigning a variable in to it.
Then the remaining lines are fed to cut command via stdin.
Instead you can say;
while IFS= read -r line; do
cut -d'/' ^f6- <<< "$line" >> /users/krzysztofpaszta/CSVtoGD2/"$dir".csv
done < /users/krzysztofpaszta/CSVtemporary2/ASSETS-LIST-"$dir".csv
Actually you don't even need to use the while loop:
cut -d'/' ^f6- < /users/krzysztofpaszta/CSVtemporary2/ASSETS-LIST-"$dir".csv > /users/krzysztofpaszta/CSVtoGD2/"$dir".csv
Besides there are many points to be improved:
There may be a typo in either of the pathnames /Users or /users.
You can use the while .. read .. loop instead of
for repo in $(cat path/to.csv)
You don't need to create a temporary file for the output of find.
You can feed the output directly to cut command via a pipeline.

Related

How to run a command (1000 times) that requires two different types of input files

I have calculated directed modularity by means of DirectedLouvain (https://github.com/nicolasdugue/DirectedLouvain). I am now trying to test the significance of the values obtained, by means of a null model. To do it I need to run 1000 times one of the commands of DirectedLouvain over 1000 different input files.
Following # KamilCuk recomendations I have used this code that takes the 1000 *.txt input files and generates 1000 *.bin files and 1000 *.weights files. It worked perfectly:
find -type f -name '*.txt' |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./convert -i "$file" -o "$file_no_extension".bin -w "$file_no_extension".weights
done
Now I am trying to use another command that takes these two types of files (*.bin and *.weights) and generates *.tree files. I have tried this with no success:
find ./ -type f \( -iname \*.bin -o -iname \*.weights \) |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./community "$file.bin" -l -1 -w "$file.weights" > "$file_no_extension".tree
done
Any suggestion?
Find all files with that extension.
For each file
Extract the filename without exntesion
Run the command
So:
find -type f -name '*.ext' |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./convert -i "$file" -o "$file_no_extension".bin -w "$file_no_extension".weights
done
// with find:
find -type f -name '*.ext' -exec sh -c 'f=$(basename "$1" .ext); ./convert -i "$1" -o "$f".bin -w "$f".weights' _ {} \;
// with xargs:
find -type f -name '*.ext' |
xargs -d '\n' -n1 sh -c 'f=$(basename "$1" .ext); ./convert -i "$1" -o "$f".bin -w "$f".weights' _
You could use GNU Parallel to run your jobs in parallel across all your CPU cores like this:
parallel convert -i {} -o {.}.bin -w {.}.weights ::: input*.txt
Initially, you may like to do a "dry run" that shows what it would do without actually doing anything:
parallel --dry-run convert -i {} -o {.}.bin -w {.}.weights ::: input*.txt
If you get errors about the argument list being too long because you have too many files, you can feed their names in on stdin like this instead:
find . -name "input*txt" -print0 | parallel -0 convert -i {} -o {.}.bin -w {.}.weights
You can use find to list your files and execute a command on all of them:
find -name '*.ext' -exec ./runThisExecutable '{}' \;
If you have a.ext and b.ext in a directory, this will run ./runThisExecutable a.ext and ./runThisExecutable b.ext.
To test whether it identifies the right files, you can run it without -exec so it only prints the filenames:
find -name '*.ext'
./a.ext
./b.ext

use command find and copy but using a list.txt with names

I have a list.txt with different filenames and I want to find all those 3600 filename in subdirectories and then copy to /destination_folder.
Can I use the command find /path/ {file.txt} then copy to /destination_folder ?
The list.txt should have the following filenames/lines:
test_20180724004008_4270.txt.bz2
test_20180724020008_4278.txt.bz2
test_20180724034009_4288.txt.bz2
test_20180724060009_4302.txt.bz2
test_20180724061009_4303.txt.bz2
test_20180724062010_4304.txt.bz2
test_20180724063010_4305.txt.bz2
test_20180724065010_4307.txt.bz2
test_20180724070010_4308.txt.bz2
test_20180724071010_4309.txt.bz2
test_20180724072010_4310.txt.bz2
test_20180724072815_4311.txt.bz2
test_20180724073507_4312.txt.bz2
test_20180724074608_4314.txt.bz2
test_20180724075041_4315.txt.bz2
test_20180724075450_4316.txt.bz2
test_20180724075843_4317.txt.bz2
test_20180724075843_4317.txt.bz2
test_20180724080207_4318.txt.bz2
test_20180724080522_4319.txt.bz2
test_20180724080826_4320.txt.bz2
test_20180724081121_4321.txt.bz2
................................
You will probably want to make a list of all of the files in a directory, then use your list to iterate through the list of files found.
First save your list of files found to a file
find . -type f > foundFiles.txt
Then you need to use your file to search the other
cat list.txt | while read line
do
if [ `grep -c "${line}" foundFiles.txt` ]
then
cp -v $(grep "${line}" foundFiles.txt) /destination_folder/
fi
done
I'll let you take the base and make it into a script for use again.
You could use echo and sed:
echo $(sed "s/.*/\"\0\"/;s/^/ -name /;s/$/ -o/;$ s/-o//" list.txt)
This outputs a list of files to be used in find command:
-name "file1.txt.bz2" -o -name "file2.txt.bz2" -o -name "file3.txt.bz2"
Then use -exec cp -t targetDir {} + in find to copy the files:
find \( $(eval echo $(sed "s/.*/\"\0\"/;s/^/ -name /;s/$/ -o/;$ s/-o//" list.txt)) \) -exec cp -t targetDir {} +
Loop through the file and append the results to your destination folder:
for i in `cat list.txt`;
do cp `find * -name $i` destination_folder/;
done
This finds all the files in list.txt and copies those files to destination_folder/.
The for i in `cat list.txt` creates a variable i that loops through the entire file.
The cp `find * -name $i` destination_folder/ finds the path to file and copies it to the destination_folder/.

What's wrong with this bash code to replace words in files?

I wrote this code a few months ago and didn't touch it again. Now I picked it up to complete it. This is part of a larger script to find all files with specific extensions, find which ones have a certain word, and replace every instance of that word with another one.
In this excerpt, ARG4 is the directory it starts looking at (it keeps going recursively).
ARG2 is the word it looks for.
ARG3 is the word that replaces ARG2.
ARG4="$4"
find -P "$ARG4" -type f -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Like I said it's been a while, but I've read the code and I think it's pretty understandable. I think the problem must be in the while loop. I googled more info about "while read ---" but I didn't find much.
EDIT 2: See my answer down below for the solution.
I discovered that find wasn't working properly. It turns out that it's because of -maxdepth 0 which I put there so that the search would only happen in the current directory. I took it out, but then the output of find was one single string with all of the file names. They needed to be separate entities so that the while loop could read each one. So I rewrote it:
files=(`find . -type f \( -name "*.h" -o -name "*.C" -o \
-name "*.cpp" -o -name "*.cc" \) \
-exec grep -l "$ARG1" {} \;`)
for i in ${files[#]} ; do
echo $i
echo `gsed -E -i "s/$ARG1/$ARG2/g" ${i}`
done
I had to install GNU sed, the regular one just wouldn't accept the file names.
It's hard to say if this is the only issue, since you haven't said precisely what's wrong. However, your find command's -exec action is only being applied for *.cc files. If you want it to apply for any of those, it should look more like:
ARG4="$4"
find -P "$ARG4" -type f \( -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \) \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Note the added ( and ) for grouping to attach the action to the result of all of those.

Bash - Rename ".tmp" files recursively

A bunch of Word & Excel documents were being moved on the server when the process terminated before it was complete. As a result, we're left with several perfectly fine files that have a .tmp extension, and we need to rename these files back to the appropriate .xlsx or .docx extension.
Here's my current code to do this in Bash:
#!/bin/sh
for i in "$(find . -type f -name *.tmp)"; do
ft="$(file "$i")"
case "$(file "$i")" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done
It seems that while this does search recursively, it only does 1 file. If it finds an initial match, it doesn't go on to rename the rest of the files. How can I get this to loop correctly through the directories recursively without it doing just 1 file at a time?
Try find command like this:
while IFS= read -r -d '' i; do
ft="$(file "$i")"
case "$ft" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done < <(find . -type f -name '*.tmp' -print0)
Using <(...) is called process substitution to run find command here
Quote filename pattern in find
Use -print0 to get find output delimited by a null character to allow space/newline characters in file names
Use IFS= and -d '' to read null separated filenames
I too would recommend using find. I would do this in two passes of find:
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Word 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.docx"' \;
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Excel 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.xlsx"' \;
Lines are split for readability.
Each instance of find will search for tmp files, then use -exec to test the output of find. This is similar to how you're doing it within the while loop in your shell script, only it's launched from within find itself. We're using the pipe to grep instead of your case statement.
The second -exec only gets run if the first one returned "true" (i.e. grep -q ... found something), and executes the rename in a tiny shell instance.
I haven't profiled this to see whether it would be faster or slower than a loop in a shell script. Just another way to handle things.

How do I check if all files inside directories are valid jpegs (Linux, sh script needed)?

Ok, I got a directory (for instance, named '/photos') in which there are different directories
(like '/photos/wedding', '/photos/birthday', '/photos/graduation', etc...) which have .jpg files in them. Unfortunately, some of jpeg files are broken. I need to find a way how to determine, which files are broken.
I found out, that there is tool named imagemagic, which can help a lot. If you use it like this:
identify -format '%f' whatever.jpg
it prints the name of the file only if file is valid, if it is not it prints something like "identify: Not a JPEG file: starts with 0x69 0x75 `whatever.jpg' # jpeg.c/EmitMessage/232.".
So the correct solution should be find all files ending with ".jpg", apply to them "identify", and if the result is just the name of the file - don't do anything, and if the result is different from the name of the file - then save the name of the file somethere (like in a file "errors.txt").
Any ideas how I can probably do that?
The short-short version:
find . -iname "*.jpg" -exec jpeginfo -c {} \; | grep -E "WARNING|ERROR"
You might not need the same find options, but jpeginfo was the solution that worked for me:
find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1
as a script (as requested in this question)
#!/bin/sh
find . -type f \
\( -iname "*.jpg" \
-o -iname "*.jpeg" \) \
-exec jpeginfo -c {} \; | \
grep -E "WARNING|ERROR" | \
cut -d " " -f 1
I was clued into jpeginfo for this by http://www.commandlinefu.com/commands/view/2352/find-corrupted-jpeg-image-files and this explained mixing find -o OR with -exec
One problem with identify -format is that it doesn't actually verify that the file is not corrupt, it just makes sure that it's really a jpeg.
To actually test it you need something to convert it. But the convert that comes with ImageMagick seems to silently ignore non-fatal errors in the jpeg (such as being truncated.)
One thing that works is this:
djpeg -fast -grayscale -onepass file.jpg > /dev/null
If it returns an error code, the file has a problem. If not, it's good.
There are other programs that could be used as well.
You can put this into bash script file or run directly:
find -name "*.jpg" -type f |xargs --no-run-if-empty identify -format '%f' 1>ok.txt 2>errors.txt
In case identify is missing, here is how to install it in Ubuntu:
sudo apt install imagemagick --no-install-recommends
This script will print out the names of the bad files:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if [[ $(identify -format '%f' "$FILE" 2>/dev/null) != $FILE ]]; then
echo "$FILE"
fi
done
You could run it as is or as ./badjpegs > errors.txt to save the output to a file.
To break it down, the find command finds *.jpg files in /photos or any of its subdirectories. These file names are piped to a while loop, which reads them in one at a time into the variable $FILE. Inside the loop, we grab the output of identify using the $(...) operator and check if it matches the file name. If not, the file is bad and we print the file name.
It may be possible to simplify this. Most UNIX commands indicate success or failure in their exit code. If the identify command does this, then you could simplify the script to:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if ! identify "$FILE" &> /dev/null; then
echo "$FILE"
fi
done
Here the condition is simplified to if ! identify; then which means, "did identify fail?"

Resources