iconv: illegal input sequence at position - bash

I have a bash script which downloads some files from a url and stores them into a folder named "data1". Since these files are downloaded as .zip then the next step is to unzip them. After that, the variables extension and encoding are get from every file, where extension is the type of file (txt, csv, docx) and encoding is the encoding format of each file (ISO, utf-8). Since the files that this script downloads are not in utf-8 format i have to perform this transformation. This is the line which performs the encoding:
iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;
As you can see, I have to pass two parameters, the file to be encoded to utf-8 format and the name of the output file which will be: conversion_(name of the original file).(extension of the original file). However, I'm getting the following error:
iconv: illegal input sequence at position 1234704
This error is affecting the datos_abiertos_covid19.zip file which after the unzipping process is named as 200715COVID19MEXICO.csv (but it changes depending on the day this script is run). Does anyone knows how I can avoid this error? I specifically need all of the files downloaded to be in utf-8 format. I would really appreciate your help.
Here is the script I'm using:
! /usr/bin/bash
# creating folders
mkdir data1
cd data1
# downloading data
wget http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip
wget http://187.191.75.115/gobmx/salud/datos_abiertos/diccionario_datos_covid19.zip
# unziping data
for i in `ls | grep .zip`; do unzip $i; done
# this for will iterate over all the files contained on the data1 folder
for name in `ls -F -1 | grep -v / | grep -v zip`; do
# getting extension of current file
extension=`echo $name | sed 's/\./ /g' | awk '{print $2}'`;
# getting encoding format of current file
encoding=`file -i $name | sed 's/=/ /g' |awk '{print $4}'`;
# echo $encoding
query="s/\.$extension//g"
# echo $query
name2=`echo $name | sed -e $query`;
# echo $name2
# echo $name" "$extension" "$encoding" "$name2
# encoding current file
iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;
done
mkdir old
mv `ls | grep -v "conversion_" | grep -v "old"` old
Since this script is intended to be run automatically every 24 hours, then I need the old data (data from a day before) to be stored in another place. That's why at the end of the script a new folder is created and all the "old files" are moved to the folder named "old".

Related

sed: can't read ../../build.gradle: No such file or directory

I am new to git and github. I am working on a project where I need to commit my changes to github repository in a specific branch.
But I am getting the error
$ git commit
3.5.0.1
s/3.5.0.1/3.5.1.1/g
sed: can't read ../../build.gradle: No such file or directory
I have also attached the pre-commit file code here.
#!/bin/sh
## finding the exact line in the gradle file
#ORIGINAL_STRING=$(cat ../../build.gradle | grep -E '\d\.\d\.\d\.\d')
## extracting the exact parts but with " around
#TEMP_STRING=$(echo $ORIGINAL_STRING | grep -Eo '"(.*)"')
## the exact numbering scheme
#FINAL_VERSION=$(echo $TEMP_STRING | sed 's/"//g') # 3.5.0.1
#Extract APK version
v=$(cat build.gradle | grep rtVersionName | awk '{print $1}')
FINAL_VERSION=$(echo ${v} | cut -d"\"" -f2)
echo ${FINAL_VERSION}
major=0
minor=0
build=0
assets=0
regex="([0-9]+).([0-9]+).([0-9]+).([0-9]+)"
if [[ $FINAL_VERSION =~ $regex ]]; then
major="${BASH_REMATCH[1]}"
minor="${BASH_REMATCH[2]}"
build="${BASH_REMATCH[3]}"
assets="${BASH_REMATCH[4]}"
fi
# increment the build number
build=$(echo $build + 1 | bc)
NEW_VERSION="${major}.${minor}.${build}.${assets}"
SED_ARGUMENT=$(echo "s/${FINAL_VERSION}/${NEW_VERSION}/g")
echo $SED_ARGUMENT
sed -i -e `printf $SED_ARGUMENT` ../../build.gradle
The error comes in the last line of this file basically. I am using windows.
Things I tried:
sed -i -e `printf $SED_ARGUMENT` ../../build.gradle
sed -i ' ' -e `printf $SED_ARGUMENT` ../../build.gradle
I am unable to understand where am I actually doing wrong. Kindly help me out.
sed: can't read ../../build.gradle: No such file or directory
This one is rather simple. Your build.gradle file is not at ../../build.gradle.
The solution is to determine actual path to the build.gradle file relative to the script, and change the path in the script.
To debug this, do echo Current Directory: $PWD in the script to see what the actual working directory is, then you should be able to determine the correct path to use.

how to print names of files being downloaded

I'm trying to write a bash script that downloads all the .txt files from a website 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'.
So far I have wget -A txt -r -l 1 -nd 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/' but I'm struggling to find a way to print the name of each file to the screen (when downloading). That's the part I'm really stuck on. How would one print the names?
Thoughts?
EDIT this is what I have done so far, but I'm trying to remove a lot of stuff like ghcnd-inventory.txt</a></td><td align=...
wget -O- $LINK | tr '"' '\n' | grep -e .txt | while read line; do
echo Downloading $LINK$line ...
wget $LINK$line
done
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -O- $LINK | tr '"' '\n' | grep -e .txt | grep -v align | while read line; do
echo Downloading $LINK$line ...
wget -nv $LINK$line
done
Slight optimization of Sundeep's answer:
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -q -O- $LINK | sed -E '/.*href="[^"]*\.txt".*/!d;s/.*href="([^"]*\.txt)".*/\1/' | wget -nv -i- -B$LINK
The sed command eliminates all lines not matching href="xxx.txt" and extracts only the xxx.txt part of the others. It then passes the result to another wget that uses it as the list of files to retrieve. The -nv option tells wget to be as less verbose as possible. It will thus print the name of the file it currently downloads but almost nothing else. Warning: this works only for this particular web site and does not descend in the sub-directories.

Use "Touch -r" for several files with automator

I use "MacOS X Yosemite (10.10.4)"
I've converted video mts files to mov files using QuickTime, but the new file created doesn't preserve original Creation Date.
fileA.mts --> Creation Date: 07/02/2010 10:51
fileA_converted.mov --> Creation Date: Today 8:35
I'd like to change the Creation Date attribute of several files, using the date of the original files. I know I can do this by using Terminal "Touch" command in order to this:
touch -r fileA.mts fileA_converted.mov
touch -r fileB.mts fileB_converted.mov
As I have more than 200 files to change Creation Date, is it possible to automate this using automator Script Shell action, or any other way?
Like this in the bash shell - which is what you get in Terminal (untested):
#!/bin/bash
for orig in *.mts; do
# Generate new name from old one
new="${orig/.mts/_converted.mov}"
echo touch -r "$orig" "$new"
done
Save the above in a file called doDates and then type this in the Terminal
chmod +x doDates # make the script executable
./doDates # run the script
Sample output
touch -r Freddy Frog.mts Freddy Frog_converted.mov
touch -r fileA.mts fileA_converted.mov
At the moment it does nothing, but run it, see if you like what it says, and then remove the word echo and run it again if all looks ok.
Execute below command when we have all original and converted files in same folder
ls | grep ".mts" | awk -F. '{print $0" "$1"_converted.mov"}' | xargs touch -r
when we have different folder run below command on path where .mts files are present and add absolute path before $1 just like I have added /home/convertedfiles/
ls | grep ".mts" | awk -F. '{print $0" /home/convertedfiles/"$1"_converted.mov"}' | xargs touch -r

Rewriting 3 commands into one command or script that can be run on cron

Im currently using 3 different commands to achieve my goal of downloading a zip, extracting it, converting the txt file to utf8 and then converting the csv to json!
First I have:
wget https://www.example.com/example.zip -O temp.zip; unzip -o temp.zip; rm temp.zip
Which is good, but the problem to start with is how do I rename the file that is extracted so its the same every time for the next processes as it can be a different name within the zip every day? Next I run this script depending on the filename that converts the ISO to utf8:
sh dir_iconv.sh example1.txt ISO8859-1 UTF-8
Which is this script:
#!/bin/bash
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
then
echo "$0 dir from_charset to_charset"
exit
fi
for f in $1/*
do
if test -f $f
then
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
rm -f $f.old
else
echo -e "\nSkipping $f - not a regular file";
fi
done
And then finally I run a ruby script csv2json - https://github.com/darwin/csv2json - that is called as follows (pipe delimited) to give me a json output:
csv2json -s '|' example1.txt > example1.json
Is there a simple way to roll this into one command or script that can be called?
Pipe all your commands one after another and, if necessary, throw them in a shell script file.
wget -qO- https://www.example.com/example.zip | unzip | iconv -f ISO8859-1 -t UTF-8 | csv2json > example.json

creating a file downloading script with checksum verification

I want to create a shellscript that reads files from a .diz file, where information about various source files are stored, that are needed to compile a certain piece of software (imagemagick in this case). i am using Mac OSX Leopard 10.5 for this examples.
Basically i want to have an easy way to maintain these .diz files that hold the information for up-to-date source packages. i would just need to update these .diz files with urls, version information and file checksums.
Example line:
libpng:1.2.42:libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:http://downloads.sourceforge.net/project/libpng/00-libpng-stable/1.2.42/libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:9a5cbe9798927fdf528f3186a8840ebe
script part:
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
curl -L -v -O $url 2>> logfile.txt
$calculated_md5=`/sbin/md5 $file | /usr/bin/cut -f 2 -d "="`
echo $calculated_md5
done < "files.diz"
Actually I have more than just one question concerning this.
how to calculate and compare the checksums the best? i wanted to store md5 checksums in the .diz file and compare it with string comparison with "cut"ting out the string
is there a way to tell curl another filename to save to? (in my case the filename gets ugly libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks)
i seem to have issues with the backticks that should direct the output of the piped md5 and cut into the variable $calculated_md5. is the syntax wrong?
Thanks!
The following is a practical one-liner:
curl -s -L <url> | tee <destination-file> |
sha256sum -c <(echo "a748a107dd0c6146e7f8a40f9d0fde29e19b3e8234d2de7e522a1fea15048e70 -") ||
rm -f <destination-file>
wrapping it up in a function taking 3 arguments:
- the url
- the destination
- the sha256
download() {
curl -s -L $1 | tee $2 | sha256sum -c <(echo "$3 -") || rm -f $2
}
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
#use -o for output file. define $outputfile yourself
curl -L -v $url -o $outputfile 2>> logfile.txt
# use $(..) instead of backticks.
calculated_md5=$(/sbin/md5 "$file" | /usr/bin/cut -f 2 -d "=")
# compare md5
case "$calculated_md5" in
"$md5" )
echo "md5 ok"
echo "do something else here";;
esac
done < "files.diz"
My curl has a -o (--output) option to specify an output file. There's also a problem with your assignment to $calculated_md5. It shouldn't have the dollar sign at the front when you assign to it. I don't have /sbin/md5 here so I can't comment on that. What I do have is md5sum. If you have it too, you might consider it as an alternative. In particular, it has a --check option that works from a file listing of md5sums that might be handy for your situation. HTH.

Resources