Error message "date: Argument list too long" bash - bash

So, everything works fine in the code, except for one tiny little thing.
This part:
if [ "$LIMITHOURS" -gt "0" -a "$LIMITHOURS" -lt "24" ]; then
x=$(($LIMITHOURS*60*60))
fi
SDATE=$( echo "01/jan/2003:11:00:06 +0100"| sed 's/[/]/ /g' |sed 's/:/ /')
EDATE=$(date --date "$SDATE - $x seconds" +"%d%m%Y%H%M%S")
#echo "$SDATE"
#echo "$EDATE"
while read LINE; do
CDATE=$( awk '{print $4}'| sed 's/[[]//' | sed 's/[/]//g' |sed 's/://g' )
DATE=$(date --date "$CDATE" +"%d%m%Y%H%M%S")
#echo "$CDATE"
done < "$FILENAME"
When I try to run the script, I get the error message "date: Argument list too long
" and I know that the problem is in the while loop, with:
DATE=$(date --date "$CDATE" +"%d%m%Y%H%M%S")
Anyone who know any solution for this? I want the date format in ddmmYYYYHHMMSS, eg. 23102002120022
You can find rest of the script here: http://pastebin.com/PMk2QDre

This code:
while read LINE; do
CDATE=$( awk '{print $4}'| sed 's/[[]//' | sed 's/[/]//g' |sed 's/://g' )
DATE=$(date --date "$CDATE" +"%d%m%Y%H%M%S")
#echo "$CDATE"
done < "$FILENAME"
will read one line from $FILENAME into the variable LINE, but then the first call to awk is reading the rest of the lines. The resulting CDATE value is probably too large to fit in a single command line, never mind it containing too many dates. You probably wanted
echo "$LINE" | awk '{print $4}' | ...
A simpler way to strip the undesirable characters from LINE, however, is
CDATE=${LINE//[\/[:]}

Related

Using sed to extract the middle of a line/filename

I have multiple files named:
Genus_species_strain.fasta
I want to use sed to print out:
Genus
species
strain
I want to use the "printed" words in a command like this (prokka is a tool for genome annotation):
prokka $file --outdir `echo $file | sed s/\.fasta//` --genus `echo $file | sed s/_.*\.fasta//` --species `echo $file | sed <something here>` --strain `echo $file | sed <something here>`
I would appreciate the help. I am very new to all of this, and as you see above, I only know how to print out Genus.
Below I have some additional questions (no need to answer these if it only complicates things further). This is one of my attempts to print species, and the questions are the following:
sed s/.*_//1 | sed s/_.*\.fasta//
I know the second command isn't correct. I assume it needs to start from the second _, but I don't know how to do that, since the continuation (that is .fasta) is unique.
When used alone, sed s/.*_//1 returns strain.fasta. How to make it not skip the first _?
Combining commands (either as you see above, or with ;) doesn't seem to work for me.
You can use string splitting with string manipulation:
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
outdir="${file%.*}"
Then you can use the variables in the command:
prokka "$file" --outdir "$outdir" --genus "$genus" --species "$species" --strain "$strain"
See this online demo:
#!/bin/bash
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
echo "${file%.*}" # outdir
echo "$genus"
echo "$species"
echo "$strain"
Output:
Genus_species_strain
Genus
species
strain
One liners without setting multiple varibles
Using sed capture groups:
One liner
file='Genus_species_strain.fasta'
$(echo "$file" | sed "s/\(^[^_]*\)_\([^_]*\)_\([^_]*\)\.\(.*\)/prokka "$(echo "$file")" --outdir \4 --genus \1 --species \2 --strain \3/")
Using Bash string manipulation:
One liner
file='Genus_species_strain.fasta'
$(echo prokka "$file" --outdir `echo "${file#*.}"` --genus `echo "${file%%_*}"` --species "$(echo `file=${file#*_} && echo "${file%%_*}"`)" --strain "$(echo `file=${file#*_} && file=${file#*_} && echo "${file%%.*}"`)")
Awk one liner
file='Genus_species_strain.fasta'
$(echo "$file" | awk -F [_\.] -v var="$file" '{print "prokka " $var " --outdir " $4 " --genus " $1 " --species " $2 " --strain " $4}')
Now you can use above commands within loop or with xargs with file variable pointing to filenames.
It will create a prokka command and directly evaluates/executes it.
Hoping it works for you. Accept answer if it is more efficient
Using sed
$ file=path_to_file
$ sed "s/\(\([^_]*\)_\([^_]*\)_\([^.]*\)\).*/prokka $file --outdir \1 --genus \2 --species \3 --strain \4/e" <(echo *.fasta)
Output of command executed
prokka path_to_file --outdir Genus_species_strain --genus Genus --species species --strain strain

Extracting a substring from a variable using bash script

I have a bash variable with value something like this:
10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
There are no spaces within value. This value can be very long or very short. Here pairs such as 65:3.0 exist. I know the value of a number from the first part of pair, say 65. I want to extract the number 3.0 or pair 65:3.0. I am not aware of the position (offset) of 65.
I will be grateful for a bash-script that can do such extraction. Thanks.
Probably awk is the most straight-forward approach:
awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0
Or to get the pair:
$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0
Here's a pure Bash solution:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
while read -r -d, i; do
[[ $i = 65:* ]] || continue
echo "$i"
done <<< "$var,"
You may use break after echo "$i" if there's only one 65:... in var, or if you only want the first one.
To get the value 3.0: echo "${i#*:}".
Other (pure Bash) approach, without parsing the string explicitly. I'm assuming you're only looking for the first 65 in the string, and that it is present in the string:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
value=${var#*,65:}
value=${value%%,*}
echo "$value"
This will be very slow for long strings!
Same as above, but will output all the values corresponding to 65 (or none if there are none):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
tmpvar=${tmpvar#*,65:}
echo "${tmpvar%%,*}"
done
Same thing, this will be slow for long strings!
The fastest I can obtain in pure Bash is my original answer (and it's fine with 10000 fields):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
IFS=, read -ra ary <<< "$var"
for i in "${ary[#]}"; do
[[ $i = 65:* ]] || continue
echo "$i"
done
In fact, no, the fastest I can obtain in pure Bash is with this regex:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"
Test of this vs awk,
where the 65:3.0 is at the end:
printf -v var '%s:3.0,' {100..11000}
var+=65:42.0
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (rough average) whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.008s (rough average too).
where the 65:3.0 is not at the end:
printf -v var '%s:3.0,' {1..10000}
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (rough average) and with early exit:
time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
shows 0m0.010s (rough average) whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.002s (rough average).
With grep:
grep -o '\b65\b[^,]*' <<<"$var"
65:3.0
Or
grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0
\K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .
Here is an gnu awk
awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0
try
echo $var | tr , '\n' | awk '/65/'
where
tr , '\n' turn comma to new line
awk '/65/' pick the line with 65
or
echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}'
where
-F: use : as separator
$1 == 65 pick line with 65 as first field
{ print $2} print second field
Using sed
sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"
output:
65:3.0
There are two different ways to protect against 65:3.0 being the first-in-line or last-in-line. Above, commas are added to surround the variable providing for an occurrence regardless. Below, the Gnu extension \? is used to specify zero-or-one occurrence.
sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var
Both handle 65:3.0 regardless of where it appears in the string.
Try egrep like below:
echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |

Weird bash results using cut

I am trying to run this command:
./smstocurl SLASH2.911325850268888.911325850268896
smstocurl script:
#SLASH2.911325850268888.911325850268896
model=$(echo \&model=$1 | cut -d'.' -f 1)
echo $model
imea1=$(echo \&simImea1=$1 | cut -d'.' -f 2)
echo $imea1
imea2=$(echo \&simImea2=$1 | cut -d'.' -f 3)
echo $imea2
echo $model$imea1$imea2
Result Received
&model=SLASH2911325850268888911325850268896
Result Expected
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
What am I missing here ?
You are cutting based on the dot .. In the first case your desired string contains the first string, the one containing &model, so then it is printed.
However, in the other cases you get the 2nd and 3rd blocks (-f2, -f3), so that the imea text gets cutted off.
Instead, I would use something like this:
while IFS="." read -r model imea1 imea2
do
printf "&model=%s&simImea1=%s&simImea2=%s\n" $model $imea1 $imea2
done <<< "$1"
Note the usage of printf and variables to have more control about what we are writing. Using a lot of escapes like in your echos can be risky.
Test
while IFS="." read -r model imea1 imea2; do printf "&model=%s&simImea1=%s&simImea2=%s\n" $model $imea1 $imea2
done <<< "SLASH2.911325850268888.911325850268896"
Returns:
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
Alternatively, this sed makes it:
sed -r 's/^([^.]*)\.([^.]*)\.([^.]*)$/\&model=\1\&simImea1=\2\&simImea2=\3/' <<< "$1"
by catching each block of words separated by dots and printing back.
You can also use this way
Run:
./program SLASH2.911325850268888.911325850268896
Script:
#!/bin/bash
String=`echo $1 | sed "s/\./\&simImea1=/"`
String=`echo $String | sed "s/\./\&simImea2=/"`
echo "&model=$String
Output:
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
awk way
awk -F. '{print "&model="$1"&simImea1="$2"&simImea2="$3}' <<< "SLASH2.911325850268888.911325850268896"
or
awk -F. '$0="&model="$1"&simImea1="$2"&simImea2="$3' <<< "SLASH2.911325850268888.911325850268896"
output
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896

Parsing date and time format - Bash

I have date and time format like this(yearmonthday):
20141105 11:30:00
I need assignment year, month, day, hour and minute values to variable.
I can do it year, day and hour like this:
year=$(awk '{print $1}' log.log | sed 's/^\(....\).*/\1/')
day=$(awk '{print $1}' log.log | sed 's/^.*\(..\).*/\1/')
hour=$(awk '{print $2}' log.log | sed 's/^\(..\).*/\1/')
How can I do this for month and minute?
--
And I need that every line of my log file:
20141105 11:30:00 /bla/text.1
20141105 11:35:00 /bla/text.2
20141105 11:40:00 /bla/text.3
....
I'm trying read line by line this log file and do this:
mkdir -p "/bla/backup/$year/$month/$day/$hour/$minute"
mv $file "/bla/backup/$year/$month/$day/$hour/$minute"
Here is my not working code:
#!/bin/bash
LOG=/var/log/LOG
while read line
do
year=${line:0:4}
month=${line:4:2}
day=${line:6:2}
hour=${line:9:2}
minute=${line:12:2}
file=$(awk '{print $3}')
if [ -f "$file" ]; then
printf -v path "%s/%s/%s/%s/%s" $year $month $day $hour $minute
mkdir -p "/bla/backup/$path"
mv $file "/bla/backup/$path"
fi
done < $LOG
You don't need to call out to awk to date at all, use bash's substring operations
d="20141105 11:30:00"
yr=${d:0:4}
mo=${d:4:2}
dy=${d:6:2}
hr=${d:9:2}
mi=${d:12:2}
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" $yr $mo $dy $hr $mi
echo "$dir"
/bla/2014/11/05/11/30/
Or directly, without all the variables.
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
Given your log file:
while read -r date time file; do
d="$date $time"
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
mkdir -p "$dir"
mv "$file" "$dir"
done < filename
or, making a big assumption that there are no whitespace or globbing characters in your filenames:
sed -r 's#(....)(..)(..) (..):(..):.. (.*)#mv \6 /blah/\1/\2/\3/\4/\5#' | sh
date command also do this work
#!/bin/bash
year=$(date +'%Y' -d'20141105 11:30:00')
day=$(date +'%d' -d'20141105 11:30:00')
month=$(date +'%m' -d'20141105 11:30:00')
minutes=$(date +'%M' -d'20141105 11:30:00')
echo "$year---$day---$month---$minutes"
You can use only one awk
month=$(awk '{print substr($1,5,2)}' log.log)
year=$(awk '{print substr($1,0,4)}' log.log)
minute=$(awk '{print substr($2,4,2)}' log.log)
etc
I guess you are processing the log file, which each line starts with the date string. You may have already written a loop to handle each line, in your loop, you could do:
d="$(awk '{print $1,$2}' <<<"$line")"
year=$(date -d"$d" +%Y)
month=$(date -d"$d" +%m)
day=$(date -d"$d" +%d)
min=$(date -d"$d" +%M)
Don't repeat yourself.
d='20141105 11:30:00'
IFS=' ' read -r year month day min < <(date -d"$d" '+%Y %d %m %M')
echo "year: $year"
echo "month: $month"
echo "day: $day"
echo "min: $min"
The trick is to ask date to output the fields you want, separated by a character (here a space), to put this character in IFS and ask read to do the splitting for you. Like so, you're only executing date once and only spawn one subshell.
If the date comes from the first line of the file log.log, here's how you can assign it to the variable d:
IFS= read -r d < log.log
eval "$(
echo '20141105 11:30:00' \
| sed 'G;s/\(....\)\(..\)\(..\) \(..\):\(..\):\(..\) *\(.\)/Year=\1\7Month=\2\7Day=\3\7Hour=\4\7Min=\5\7Sec=\6/'
)"
pass via a assignation string to evaluate. You could easily adapt to also check the content by replacing dot per more specific pattern like [0-5][0-9] for min and sec, ...
posix version so --posix on GNU sed
I wrote a function that I usually cut and paste into my script files
function getdate()
{
local a
a=(`date "+%Y %m %d %H %M %S" | sed -e 's/ / /'`)
year=${a[0]}
month=${a[1]}
day=${a[2]}
hour=${a[3]}
minute=${a[4]}
sec=${a[5]}
}
in the script file, on a line of it's own
getdate
echo "year=$year,month=$month,day=$day,hour=$hour,minute=$minute,second=$sec"
Of course, you can modify what I provided or use answer [6] above.
The function takes no arguments.

shell $ character overwrites the previous variable

I'm trying to write a script on shell but I'm stucked on a point.
I have a program creating data daily and puting it to a directory like this: home/meee/data/2013/07/22/mydata
My problem is I'm trying to change directory using date. Here is my script:
#!/bin/sh
x=$(date -u -v-2H "+%Y-%m-%d")
echo $x
year=$(echo $x | cut -d"-" -f1)
month=$(echo $x | cut -d"-" -f2)
day=$(echo $x | cut -d"-" -f3)
echo $year
echo $month
echo $day
d1='/home/sensor/data/'${year}/${month}
echo $d1
There is no problem related to year, day, month, they are working. But the output of echo d1 is /07me/sensor/data/2013. Similarly, when I write echo $year$day it gives 2312 (characters of day is overwritten on the first two characater of the year)
I tried many other syntax like instead of ' character put " or leave it empty. Removing { and so on. But nothing changed.
Shortly, when I write two variable ($var1 $var2) in same line the second $ behaves like go to the beginning of the line and start overwriting the first variable.
I've been looking for that but there is nothing related to that or I couldn't find anything related and there are a lot of solution in Stackoverflow that solves the problem using $var1$var2
What am I doing wrong, or how can I solve that.
I'm working on FreeBSD 9.0-RELEASE amd64 and using sh
Any help will be appreciated.
Thanks
Somehow, your commands are introducing carriage returns to your variables, which affect the output when the variable is not the last thing echoed. You can confirm this by passing the value through hexdump or od:
printf "%s" "$x" | hexdump -C # Look for 0d in the output.
printf "%s" "$year" | hexdump -C # Look for 0d in the output.
printf "%s" "$month" | hexdump -C # Look for 0d in the output.
printf "%s" "$day" | hexdump -C # Look for 0d in the output.
I don't think this will fix the problem, but you can get the year, month, and day without forking so many external programs:
IFS=- read year month day <<EOF
$(date -u -v-2H "+%Y-%m-%d")
EOF
or more simply
read year month day <<EOF
$(date -u -v-2H "+%Y %m %d")
EOF
You’re likely using MinGW or some other not-quite-Unix environment for Windows®, which is introducing Carriage Return (CR, \r) characters at end-of-line (from the Unix PoV).
So change this to either:
x=$(date -u -v-2H "+%Y-%m-%d" | sed $'s/\r$//')
echo $x
year=$(echo $x | cut -d"-" -f1 | sed $'s/\r$//')
month=$(echo $x | cut -d"-" -f2 | sed $'s/\r$//')
day=$(echo $x | cut -d"-" -f3 | sed $'s/\r$//')
Or, even better:
x=$(date -u -v-2H "+%Y %m %d ")
echo $x
set -- $x
year=$1
month=$2
day=$3
Note the extra space after %d which ensures that the CR will become $4 instead of attached to the day.

Resources