I'd like to know if there's a way to get the top 5 keywords by grouping them by days of the current month.
I'd like to receive a dataset like the following as result.
Supposing today is 4 of December i want to retrieve data for days from 1 to 4 of December, limiting the number of keywords for day to 5:
Day Keyword Visits
----------------------
01 keyword1 703
01 keyword2 688
01 keyword3 115
02 keyword1 109
02 keyword2 66
02 keyword3 53
02 keyword4 40
02 keyword5 23
03 keyword1 23
03 keyword2 19
03 keyword3 17
04 keyword1 14
04 keyword2 14
What i've currently done is setting the following parameters:
(you can test it here if you have a ganalytics account: http://code.google.com/intl/it-IT/apis/analytics/docs/gdata/gdataExplorer.html)
Dimensions: ga:day,ga:keyword
Metrics: ga:visits
Filters: ga:medium==organic;ga:keyword!=(not provided)
Sort: ga:day,-ga:visits,ga:keyword
Now i just need a method to limit the number of keywords for day (if possible).
Related
I have CVS files which contain date information in three separate colums which I would like to combine. The information I have is:
Two digit year (field 2)
Week number (field 3)
Day of week number (field 4)
How can I convert these 3 numbers into normal date format of the form YYYYMMDD?
My input file looks like:
740054,17,40,1,0000000000001,25,25,test1,1
740054,17,40,2,0000000000001,24,24,test2,1
740054,17,40,4,0000000000001,19,19,test3,1
And the expected output I would like to have is:
740054,20171002,0000000000001,25,25,test1,1
740054,20171003,0000000000001,24,24,test2,1
740054,20171005,0000000000001,19,19,test3,1
As an example for the first line: October 2, 2017 is the Monday (1) of the 40th week of the year 2017
Does anybody know how to do such a conversion?
I will make the assumption that your week-number is according to the ISO 8601 definition (for other definitions see here). This ISO 8601 standard is widely used in the world: EU and most of other
European countries, most of Asia, and Oceania
The ISO 8601 standard states the following:
There are 7 days in a week
The first day of the week is a Monday
The first week is the first week of the year which contains a
Thursday. This means it is the first week with 4 days or more
in January.
With this definition, it is possible to have a week number 53. These occur with the first of January is on a
Friday (E.g. 2016-01-01, 2010-01-01). Or, if the year before was a
leap year, also a Saturday. (E.g. 2005-01-01)
December 2015 January 2016
Mo Tu We Th Fr Sa Su CW Mo Tu We Th Fr Sa Su CW
1 2 3 4 5 6 49 1 2 3 53
7 8 9 10 11 12 13 50 4 5 6 7 8 9 10 01
14 15 16 17 18 19 20 51 11 12 13 14 15 16 17 02
21 22 23 24 25 26 27 52 18 19 20 21 22 23 24 03
28 29 30 31 53 25 26 27 28 29 30 31 04
Given the year, week_number and day_of_week, how can we reconstruct the date? The answer requires several steps and will compute the day of the year (doy) of the requested date.
To compute the day of the year doy we first need to figure out when the first-week starts as explained above. If Jan 01 is a Tuesday, then the first week only contains 6 days and not 7, while if Jan 01 is a Friday, the first week starts only the week after. So we can solve this by adding an offset. The offset can be found in the following table:
dow001 str: Mo Tu We Th Fr Sa Su
dow001 num: 01 02 03 04 05 06 07
offset : 0 -1 -2 -3 3 2 1
and this offset is computed as 3-(dow001+2)%7
So with this, the day of the year is very easily computed:
doy = (week_number-1) * 7 + 3-(dow001+2)%7 + day_of_week
So having this, we can write the following GNU awk tool:
awk 'function compute_date(YYYY,CW,DOW) {
dow001 = strftime("%u",mktime(YYYY " 01 01 00 00 00"))
doy = (CW-1)*7 + (3 - (dow001+2)%7) + DOW
return strftime("%Y%m%d",mktime(YYYY " 01 " doy " 00 00 00"))}
}
BEGIN { FS = OFS = "," }
{ datestr = compute_date(2000+$2,$3,$4) }
{ print $1, datestr , $5,$6,$7,$8,$9 }' file
740054,20171002,0000000000001,25,25,test1,1
740054,20171003,0000000000001,24,24,test2,1
740054,20171005,0000000000001,19,19,test3,1
I am missing some episodes of the TV series friends, and I would like to know how many files I am missing per season. I would like to print out the last episode of each season and the number of files for each season.
The files have the format:
Friends S01E01 The Pilot.mkv
Friends S10E11 The One Where the Stripper Cries.mkv
The following bash script/oneliner should give you what you need, with details because it might help if you have the last episode of a season but earlier episodes are missing:
#!/bin/bash
ls Friends* | cut -c10-14 | \
awk -F'E' '{arr[$1]=arr[$1]" "$2; num[$1]++;} END { for (i in arr) printf "Season %s (%2d files) : %s\n", i, num[i], arr[i] }' | \
sort
Using awk, arrays with index being the number of the season are incremented to count the number of episodes, and also print the list of episode numbers so you can easily see which ones are missing. I used cut with columns 10 to 14 because in this case, we can safely assume that the numbers are where we want them.
The output is as follows:
Season 01 ( 9 files) : 01 02 03 04 05 06 07 08 09
Season 02 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 03 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 04 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 05 ( 9 files) : 01 03 04 05 06 07 08 09 10
Season 06 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 07 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 08 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 09 (10 files) : 01 02 03 04 05 06 07 08 09 10
Season 10 ( 7 files) : 01 02 03 04 05 06 10
The following bash scipt will work:
#!/bin/bash
for i in {01..10}
do
ls Friends\ S$i* | tail -n 1
ls Friends\ S$i* | wc -l
printf "\n"
done
It will produce results as follows:
Friends S01E24 The One Where Rachel Finds Out.mkv
24
Friends S02E24 The One with Barry and Mindy's Wedding.mkv
24
I am trying to remove leading zeroes from a BASH array... I have an array like:
echo "${DATES[#]}"
returns
01 02 02 03 04 07 08 09 10 11 13 14 15 16 17 18 20 21 22 23
I'd like to remove the leading zeroes from the dates and store back into array or another array, so i can iterate in another step... Any suggestions?
I tried this,
for i in "${!DATES[#]}"
do
DATESLZ["$i"]=(echo "{DATES["$i"]}"| sed 's/0*//' )
done
but failed (sorry, i'm an old Java programmer who was tasked to do some BASH scripts)
Use parameter expansion:
DATES=( ${DATES[#]#0} )
With bash arithmetic, you can avoid the octal woes by specifying your numbers are base-10:
day=08
((day++)) # bash: ((: 08: value too great for base (error token is "08")
((day = 10#$day + 1))
echo $day # 9
printf "%02d\n" $day # 09
You can use bash parameter expansion (see http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) like this:
echo ${DATESLZ[#]#0}
If: ${onedate%%[!0]*} will select all 0's in front of the string $onedate.
we could remove those zeros by doing this (it is portable):
echo "${onedate#"${onedate%%[!0]*}"}"
For your case (only bash):
#!/bin/bash
dates=( 01 02 02 08 10 18 20 21 0008 00101 )
for onedate in "${dates[#]}"; do
echo -ne "${onedate}\t"
echo "${onedate#"${onedate%%[!0]*}"}"
done
Will print:
$ script.sh
01 1
02 2
02 2
08 8
10 10
18 18
20 20
21 21
0008 8
00101 101
How to add leading zero to bash range?
For example, I need cycle 01,02,03,..,29,30
How can I implement this using bash?
In recent versions of bash you can do:
echo {01..30}
Output:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Or if it should be comma separated:
echo {01..30} | tr ' ' ','
Which can also be accomplished with parameter expansion:
a=$(echo {01..30})
echo ${a// /,}
Output:
01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
another seq trick will work:
seq -w 30
if you check the man page, you will see the -w option is exactly for your requirement:
-w, --equal-width
equalize width by padding with leading zeroes
You can use seq's format option:
seq -f "%02g" 30
A "pure bash" way would be something like this:
echo {0..2}{0..9}
This will give you the following:
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Removing the first 00 and adding the last 30 is not too hard!
This works:
printf " %02d" $(seq 1 30)
cat file
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
I tried with
column -c 4 file
to get an output with 4 columns, but it didn't work - I just get the same as the input.
Do I misunderstand the column man-page?
A second question: what format should the argument to the -s flag have?
Give this a try:
fold -w 12 file
The number 12 is the number of data columns * the number of characters in a column (two digits + one space). The -w option is for designating a screen width in terms of character columns.
The column command won't work for this because it's intended to format newspaper-style columns.
This comes close to working the way you want:
sed 's/ /\n/g' file | column -xc 35
The "35" is somewhat arbitrary, but any value from 32 to 39 will work in this case. It's related to the width of the fields (2 characters which is less than the width of a tab stop), the number of fields desired per line and the width of tab stops (8 characters). So, basically, 8 * 4 is 32.
Here's a demonstration of the -s option (which is used with -t):
$ echo -e "a;b|c\naaaaa;bbbbb|ccccc"|column -t -s ';|'
a b c
aaaaa bbbbb ccccc
Without using column, the output looks like:
$ echo -e "a;b|c\naaaaa;bbbbb|ccccc"
a;b|c
aaaaa;bbbbb|ccccc
Let's guess you want:
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 16
In this case:
$ xargs -n4 < file