I am trying to split a large file in a chunk of 16 lines in each output file. I can do that using split -l 16 q1.txt new. But I want the output to be like ratio1.txt, ratio2.txt, ......ratio100.txt etc. So I tried: split -l 16 -d --additional-suffix=.txt q1.txt ratio
Then I get this error message on my mac:
split: illegal option -- d
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
Can anybody please help me to get the desired output file strings? Thank you.
If you check man split you'll find that the argument --additional-suffix=SUFFIX is not supported in this version.
To achieve what I understand you want you'll need an Automator script or a shell script, e.g.:
#!/bin/sh
DONE=false
until $DONE; do
for i in $(seq 1 16); do
read line || DONE=true;
[ -z "$line" ] && continue;
lines+=$line$'\n';
done
ratio=${lines::${#lines}-10}
(cat "Ratio"; echo "$ratio .txt";)
#echo "--- DONE SPLITTING ---";
lines=;
done < $1
Related
I recently discovered that mutt allows me to do something I've been trying, without success, to do in my GUI e-mail client for years: (largely) automate the process of saving an e-mail message (*.eml) to a local directory of my choice.
This Unix & Linux StackExchange post shared a rough-and-ready mutt macro for handling this process. As you'll see, however, the macro's grep commands reach for the -P flag (i.e. Perl regular expressions) and, thus, do not run on the Macbook I'm currently using:
#!/usr/bin/env zsh
#Saved piped email to "$1/YYMMDD SUBJECT.eml"
# Don't overwrite existing file
set -o noclobber
message=$(cat)
mail_date=$(<<<"$message" grep -oPm 1 '^Date: ?\K.*')
formatted_date=$(date -d"$mail_date" +%y%m%d)
# Get the first line of the subject, and change / to ∕ so it's not a subdirectory
subject=$(<<<"$message" grep -oPm 1 '^Subject: ?\K.*' | sed 's,/,∕,g')
if [[ $formatted_date == '' ]]; then
echo Error: no date parsed
exit 1
elif [[ $subject == '' ]]; then
echo Warning: no subject found
fi
echo "${message}" > "$1/$formatted_date $subject.eml" && echo Email saved to "$1/$formatted_date $subject.eml"
I'm far from comfortable with complex grep queries, so my meager efforts to make this script work (e.g. swapping out the -P flag for the -e flag) have met with failure.
To wit, this is the error message thrown when I swap in the -e flag:
grep: 1: No such file or directory
grep: ^Date: ?\K.*: No such file or directory
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
grep: 1: No such file or directory
grep: ^Subject: ?\K.*: No such file or directory
Error: no date parsed
Press any key to continue...
Mercifully, the error messages here are pretty clear. The script's use of 1 appears to be faulty, as does the last bit of the anchored grep query (e.g. ^Date: ?\K.*).
Unfortunately, I have no idea how to begin to resolve these errors.
What I'm attempting to do is, in fact, quite simple. Rather than manually running | cat > FILE_PATH/email.eml I'd like to simply be able to hit a key in mutt, extract the selected e-mail's date (e.g. everything to the end-of-the-line after Date:) and subject (e.g. everything to the end-of-line after Subject), then use that information to generate the name of the *.eml file saved locally (e.g. YYYY-MM-DD subject.eml).
Does anyone have any suggestions on how to make this script play nice in MacOS?
One option is to use zsh parameter expansions to parse the values, so there's no need to worry about grep versions. As a bonus, this launches fewer subprocesses:
#!/usr/bin/env zsh
# Save piped email to "$1/YYMMDD SUBJECT.eml"
# Don't overwrite existing file
set -o noclobber
# stdin split into array of lines:
message=("${(#f)$(<&0)}")
mail_date=${${(M)message:#Date: *}[1]#Date: }
formatted_date=$(gdate -d"$mail_date" +%y%m%d)
# Get the subject, and change '/' to '-'
subject=${${${(M)message:#Subject: *}[1]#Subject: }//\//-}
if [[ -z $formatted_date ]]; then
print -u2 Error: no date parsed
exit 1
elif [[ -z $subject ]]; then
print -u2 Warning: no subject found
fi
outdir=${1:?Output directory must be specified}
if [[ ! -d $outdir ]]; then
print -u2 Error: no output directory $outdir
exit 1
fi
outfile="$outdir/$formatted_date $subject.eml"
print -l $message > "$outfile" && print Email saved to "$outfile"
This statement gets the date from the array of lines in message:
mail_date=${${(M)message:#Date: *}[1]#Date: }
${(M)...:#...} - gets elements that match a pattern from the array. Here we use it to find elements that start with Date: .
${...[1]} - returns the first match.
${...#Date: } - removes the prefix Date: , leaving the date string.
This similar statement has an additional expansion that replaces all instances of / with -:
subject=${${${(M)message:#Subject: *}[1]#Subject: }//\//-}
The parameter expansions are documented in the zshexpn man page.
PS: trailing newlines will be removed from the message written to the file. This is a difficult-to-avoid consequence of using a command substitution like $(...). It's not likely to be a significant issue.
I have a directory with multiple files
file1_1.txt
file1_2.txt
file2_1.txt
file2_2.txt
...
And I need to run a command structured like this
command [args] file1 file2
So I was wondering if there was a way to call the command just one time on all the files, instead of having to call It each time on each pair of files.
Use find and xargs, with sort, since the order appears meaningful in your case:
find . -name 'file?_?.txt' | sort | xargs -n2 command [args]
If your command can take multiple pairs of files on the command line then it should be sufficient to run
command ... *_[12].txt
The files in expanded glob patterns (such as *_[12].txt) are automatically sorted so the files will be paired correctly.
If the command can only take one pair of files then it will need to be run multiple times to process all of the files. One way to do this automatically is:
for file1 in *_1.txt; do
file2=${file1%_1.txt}_2.txt
[[ -f $file2 ]] && echo command "$file1" "$file2"
done
You'll need to replace echo command with the correct command name and arguments.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${file1%_1.txt}.
#!/bin/bash
cmd (){
readarray -d " " arr <<<"$#"
for ((i=0; i<${#arr[#]}; i+=2))
do
n=$(($i+1))
firstFile="${arr[$i]}"
secondFile="${arr[$n]}"
echo "pair -- ${firstFile} ${secondFile}"
done
}
cmd file*_[12].txt
pair -- file1_1.txt file1_2.txt
pair -- file2_1.txt file2_2.txt
I'm trying to write a shell script that deletes duplicate commands from my zsh_history file. Having no real shell script experience and given my C background I wrote this monstrosity that seems to work (only on Mac though), but takes a couple of lifetimes to end:
#!/bin/sh
history=./.zsh_history
currentLines=$(grep -c '^' $history)
wordToBeSearched=""
currentWord=""
contrastor=0
searchdex=""
echo "Currently handling a grand total of: $currentLines lines. Please stand by..."
while (( $currentLines - $contrastor > 0 ))
do
searchdex=1
wordToBeSearched=$(awk "NR==$currentLines - $contrastor" $history | cut -d ";" -f 2)
echo "$wordToBeSearched A BUSCAR"
while (( $currentLines - $contrastor - $searchdex > 0 ))
do
currentWord=$(awk "NR==$currentLines - $contrastor - $searchdex" $history | cut -d ";" -f 2)
echo $currentWord
if test "$currentWord" == "$wordToBeSearched"
then
sed -i .bak "$((currentLines - $contrastor - $searchdex)) d" $history
currentLines=$(grep -c '^' $history)
echo "Line deleted. New number of lines: $currentLines"
let "searchdex--"
fi
let "searchdex++"
done
let "contrastor++"
done
^THIS IS HORRIBLE CODE NOONE SHOULD USE^
I'm now looking for a less life-consuming approach using more shell-like conventions, mainly sed at this point. Thing is, zsh_history stores commands in a very specific way:
: 1652789298:0;man sed
Where the command itself is always preceded by ":0;".
I'd like to find a way to delete duplicate commands while keeping the last occurrence of each command intact and in order.
Currently I'm at a point where I have a functional line that will delete strange lines that find their way into the file (newlines and such):
#sed -i '/^:/!d' $history
But that's about it. Not really sure how get the expression to look for into a sed without falling back into everlasting whiles or how to delete the duplicates while keeping the last-occurring command.
The zsh option hist_ignore_all_dups should do what you want. Just add setopt hist_ignore_all_dups to your zshrc.
I wanted something similar, but I dont care about preserving the last one as you mentioned. This is just finding duplicates and removing them.
I used this command and then removed my .zsh_history and replacing it with the .zhistory that this command outputs
So from your home folder:
cat -n .zsh_history | sort -t ';' -uk2 | sort -nk1 | cut -f2- > .zhistory
This effectively will give you the file .zhistory containing the changed list, in my case it went from 9000 lines to 3000, you can check it with wc -l .zhistory to count the number of lines it has.
Please double check and make a backup of your zsh history before doing anything with it.
The sort command might be able to be modified to sort it by numerical value and somehow archieve what you want, but you will have to investigate further about that.
I found the script here, along with some commands to avoid saving duplicates in the future
I didn't want to rename the history file.
# dedupe_lines.zsh
if [ $# -eq 0 ]; then
echo "Error: No file specified" >&2
exit 1
fi
if [ ! -f $1 ]; then
echo "Error: File not found" >&2
exit 1
fi
sort $1 | uniq >temp.txt
mv temp.txt $1
Add dedupe_lines.zsh to your home directory, then make it executable.
chmod +x dedupe_lines.zsh
Run it.
./dedupe_lines.zsh .zsh_history
I am very new to shell scripting, I have a scenario where i have many of files which is inside a folder and which has a naming convention such as (test-2020-11-19-1652.tgz - yyyy-mm-dd-hhmm), i need to compare the date(need to get from file name) and pick the latest one and need to unzip them and need to rename that particular file. i tried in many ways but end-up with an error due to beginner level.can anyone help me with this?
Expectation
In this above case i need to pick file shop_db-2020-11-19-1652.tgz because it is the latest file in the folder.and need to unzip it and rename it it shop_db
Expanding a file pattern always returns a sorted list. So it makes it possible to extract the ultimate entry you want with:
Using POSIX shell syntax:
#!/usr/bin/env sh
last() {
shift $(($# - 1))
printf %s "$1"
}
lastfile=$(last shop_db*.tgz)
if [ "$lastfile" = 'shop_db*.tgz' ]; then
lastfile=
fi
shift $(($# - 1)): Shift all arguments away except the last one.
printf %s "$1": Print the last argument since there is only one left.
Using Bash syntax:
#!/usr/bin/env bash
shopt -s nullglob
lastfile=$(printf '%s\0' shop_db*.tgz | tail -z -n1 | tr -d \\0)
shopt -s nullglob: A Bash feature to return an empty list if no file matches the pattern.
printf '%s\0' shop_db*.tgz: Print a null delimited list of files matching the shop_db*.tgz globbing pattern.
| tail -z -n1: Extract the last record from this null delimited list.
Alternate method using only Bash built-in:
#!/usr/bin/env bash
shopt -s nullglob
while read -r -d '' f && [ "$f" ]
do
lastfile=$f
done < <(
printf '%s\0' shop_db*.tgz
)
echo "$lastfile"
And finally expanding the globbing pattern into an array, and extracting the last index:
#!/usr/bin/env bash
shopt -s nullglob
array=(shop_db*.tgz)
if [ ${#array[#]} -gt 0 ]
then
lastfile=${array[-1]}
fi
echo "$lastfile"
What you require can actually be achieved by utilising awk with find:
find . -name "shop_db*.tgz" | awk 'END { "tar -xvf "$0|getline fil;system("mv "fil" shop_db") }'
Look for files starting with shop_db and ending with .tgz. Pipe the output into awk and un tar the file in verbose mode, reading the uncompressed file name into the variable fil using awk's getline. Utilise this fil variable to run the necessary move command to rename the file to shop_db using awk's system function. As the output from find will already be ordered, we utilise the END block within awk to process the last compressed file piped from find.
I have list of files at a location ${POWERCENTER_FILE_DIR} .
The files consist of row header and values.
MART_Apple.csv
MART_SAMSUNG.csv
MART_SONY.csv
MART_BlackBerry.csv
Requirements:
select only those files which has atleast 1 row.
Add time stamp to the files which has at least 1 row.
For example:
If all the files except MART_BlackBerry.csv has atleast one row then my output files names should be
MART_Apple_20170811112807.csv
MART_SAMSUNG_20170811112807.csv
MART_SONY_20170811112807.csv
Code tried so far
#!/bin/ksh
infilename=${POWERCENTER_FILE_DIR}MART*.csv
echo File name is ${infilename}
if [ wc -l "$infilename"="0" ];
then
RV=-1
echo "input file name cannot be blank or *"
exit $RV
fi
current_timestamp=`date +%Y%m%d%H%M%S`
filename=`echo $infilename | cut -d"." -f1 `
sftpfilename=`echo $filename`_${current_timestamp}.csv
cp -p ${POWERCENTER_FILE_DIR}$infilename ${POWERCENTER_FILE_DIR}$sftpfilename
RV=$?
if [[ $RV -ne 0 ]];then
echo Adding timestamp to ${POWERCENTER_FILE_DIR}$infilename failed ... Quitting
echo Return Code $RV
exit $RV
fi
Encountering errors like:
line 3: [: -l: binary operator expected
cp: target `MART_Apple_20170811121023.csv' is not a directory
failed ... Quitting
Return Code 1
to be frank, i am not able to apprehend the errors nor i am sure i am doing it right. Beginner in unix scripting.Can any experts guide me where to the correct way.
Here's an example using just find, sh, mv, basename, and date:
find ${POWERCENTER_FILE_DIR}MART*.csv ! -empty -execdir sh -c "mv {} \$(basename -s .csv {})_\$(date +%Y%m%d%H%M%S).csv" \;
I recommend reading Unix Power Tools for more ideas.
When it comes to shell scripting there is rarely a single/one/correct way to accomplish the desired task.
Often times you may need to trade off between readability vs maintainability vs performance vs adhering-to-some-local-coding-standard vs shell-environment-availability (and I'm sure there are a few more trade offs). So, fwiw ...
From your requirement that you're only interested in files with at least 1 row, I read this to also mean that you're only interested in files with size > 0.
One simple ksh script:
#!/bin/ksh
# define list of files
filelist=${POWERCENTER_FILE_DIR}/MART*.csv
# grab current datetime stamp
dtstamp=`date +%Y%m%d%H%M%S`
# for each file in our list ...
for file in ${filelist}
do
# each file should have ${POWERCENTER_FILE_DIR} as a prefix;
# uncomment 'echo' line for debugging purposes to display
# the contents of the ${file} variable:
#echo "file=${file}"
# '-s <file>' => file exists and size is greater than 0
# '! -s <file>' => file doesn't exist or size is equal to 0, eg, file is empty in our case
#
# if the file is empty, skip/continue to next file in loop
if [ ! -s ${file} ]
then
continue
fi
# otherwise strip off the '.csv'
filebase=${file%.csv}
# copy our current file to a new file containing the datetime stamp;
# keep in mind that each ${file} already contains the contents of the
# ${POWERCENTER_FILE_DIR} variable as a prefix; uncomment 'echo' line
# for debug purposes to see what the cp command looks like:
#echo "cp command=cp ${file} ${filebase}.${dtstamp}.csv"
cp ${file} ${filebase}.${dtstamp}.csv
done
A few good resources for learning ksh:
O'Reilly: Learning the Korn Shell
O'Reilly: Learning the Korn Shell, 2nd Edition (includes the newer ksh93)
at your UNIX/Linux command line: man ksh
A simplified script would be something like
#!/bin/bash
# Note I'm using bash above, can't guarantee (but I hope) it would work in ksh too.
for file in ${POWERCENTER_FILE_DIR}/*.csv # Check Ref [1]
do
if [ "$( wc -l "$file" | grep -Eo '^[[:digit:]]+' )" -ne 0 ] # checking at least one row? Check Ref [2]
then
mv "$file" "${file%.csv}$(date +'%Y%m%d%H%M%S').csv" # Check Ref [3]
fi
done
References
File Globbing [1]
Command Substitution [2]
Parameter Substitution [3]