how to get top n count by each group in hive - hadoop

there are some datas like following format
url | ip
-----------------------+-----------------
http://aaa.com/ | 1.1.1.1
http://bbb.com/ | 1.2.3.5
http://ccc.com/ | 1.1.1.6
http://ddd.com/ | 1.2.3.4
http://ccc.com/ | 1.1.1.2
http://ccc.com/ | 1.1.1.2
http://ccc.com/ | 1.1.1.2
http://aaa.com/ | 1.1.1.1
http://bbb.com/ | 1.2.3.5
I am now try to count ip column which is the top n ip in each group by url. like
url | ip | ipcount
-----------------------+-----------------+-----------------
http://aaa.com/ | 1.1.1.1 | 2
http://aaa.com/ | 5.6.7.8 | 1
http://bbb.com/ | 1.2.3.5 | 2
http://ccc.com/ | 1.1.1.2 | 3
http://ccc.com/ | 1.1.1.6 | 1
http://ddd.com/ | 1.2.3.4 | 1
please tell me how can i write a HQL for implement this in a Hive ?
update: sorry i forgot to notice that i should get the top N records in each group like ...

Try: SELECT url, ip, COUNT(url) FROM tbl GROUP BY url, ip

SELECT url, ip, count(*) as ipcount
from table t
group by url, ip
This should work in your case.

Related

How to properly apply index_desc hint for a remote database object in oracle database?

We were facing some issues with execution plans while accessing remote database objects with dblink. Here is the query itself run on the remote database:
select --+ index_desc (d DAY_OPERATIONAL_PK)
d.oper_day
from day_operational d
where rownum = 1
The plan for this query is the following :
Plan Hash Value : 2761870770
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 2 | 00:00:01 |
| * 1 | COUNT STOPKEY | | | | | |
| 2 | INDEX FULL SCAN DESCENDING | DAY_OPERATIONAL_PK | 1 | 8 | 2 | 00:00:01 |
---------------------------------------------------------------------------------------------
This one works correct that is it returns the last operational day. In this case 14.09.2021. However if execute this exact same query from other database connecting to this one via dblink, wrong results are returned . In this case the first row of the table is returned - 05.09.2009.
Here is the query:
select --+ index_desc (d DAY_OPERATIONAL_PK)
d.oper_day
from day_operational#iabs d
where rownum = 1
The plan generated for this query in local database is the following:
Plan Hash Value :
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 1 | 8 | 2 | 00:00:01 |
| * 1 | COUNT STOPKEY | | | | | |
| 2 | INDEX FAST FULL SCAN | XPKDAY_OPERATIONAL | 1 | 8 | 2 | 00:00:01 |
---------------------------------------------------------------------------------------
As it can be seen, the plan generated when connected via dblink uses full table scan and ignores index_desc hint. How could we enforce oracle to use this index? Tried adding driving_site hint but it didn't help
Sorry for confusions. It occured that index names in local database and remote database are not same and therefore hint was ignoring since there was no index with that name in the remote db. With right index names, both the plan and result were correct

How do you measure query performance in Cassandra?

I am still new in learning Cassandra, and I am doing some measurements, regarding memory and processor resources for each of my query. Does Cassandra has her own way to show performance of query or should I use some third party tool?
You can use the tracing switched on to see the internal steps.
TRACING ON
For the query below
INSERT INTO cycling.cyclist_name (
id,
lastname,
firstname
)
VALUES (
e7ae5cf3-d358-4d99-b900-85902fda9bb0,
'FRAME',
'Alex'
);
The below is the trace log
Tracing session: 9b378c70-b114-11e6-89b5-b7fad52e1885
activity | timestamp | source | source_elapsed | client
-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
Execute CQL3 query | 2016-11-22 16:34:34.300000 | 127.0.0.1 | 0 | 127.0.0.1
Parsing INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME','Alex'); [Native-Transport-Requests-1] | 2016-11-22 16:34:34.305000 | 127.0.0.1 | 5935 | 127.0.0.1
Preparing statement [Native-Transport-Requests-1] | 2016-11-22 16:34:34.308000 | 127.0.0.1 | 9199 | 127.0.0.1
Determining replicas for mutation [Native-Transport-Requests-1] | 2016-11-22 16:34:34.330000 | 127.0.0.1 | 30530 | 127.0.0.1
Appending to commitlog [MutationStage-3] | 2016-11-22 16:34:34.330000 | 127.0.0.1 | 30979 | 127.0.0.1
Adding to cyclist_name memtable [MutationStage-3] | 2016-11-22 16:34:34.330000 | 127.0.0.1 | 31510 | 127.0.0.1
Request complete | 2016-11-22 16:34:34.333633 | 127.0.0.1 | 33633 | 127.0.0.1
Reference link: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshTracing.html

How to ssh into and issue command to list of ip addresses in a txt file Jenkins

I have a list of ip addresses in a text file that I wish to use in a script.
Here is the code outputting the ip addresses in the text file
openstack server list | grep agent | awk '{print \$9}' >> ${STACK}_list.txt
I would like to retrieve the ip addresses and use in a loop by ssh'ing in to them but not sure how to do that
Please refer this post.
script to read a file with IP addresses and login
Might be helpful for you.
Thanks.
Subhadeep
You can use a regex to filter all ip addresses from the server list-output:
openstack server list | grep -o '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'
You could pipe this output into a file if you need or to use this within a bash-script you could make something like this, without writing it into a file:
#!/bin/bash
#
ADDRESSES=$(openstack server list | grep -o '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*')
for ADDRESS in $ADDRESSES
do
echo "ip: $ADDRESS"
done
It reads all ip-addresses from ther server-list output and iterate within the for-loop over this output and prints each ip separate on the terminal. Instead of the echo you could insert your ssh-command.
Example-server on my deployment:
root#m1r1:~# openstack server list
+--------------------------------------+-----------------------+--------+--------------------------+----------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------------+--------+--------------------------+----------------+--------+
| 46d04a77-4d33-4bb3-8214-b1444eed33a3 | server1 | ACTIVE | l2-network=192.168.4.131 | cirros | XS |
| e9489aca-00c3-4fc9-afc5-515c08b17406 | server2 | ACTIVE | l2-network=192.168.4.61 | | XS |
| ea8cec6a-a8d5-4bbb-970e-aaf65d7374b2 | server3 | ACTIVE | l2-network=192.168.4.163 | cirros | S |
| 7d934ec4-1d53-467b-9220-d67b4b68a832 | server4 | ACTIVE | l2-network=192.168.4.184 | | XS |
| 74d3036e-372a-4566-8ba2-10a0760c5562 | server5 | ACTIVE | l2-network=192.168.4.232 | cirros | XS |
| e08e1637-f4df-478d-a478-6578d038cb22 | server6 | ACTIVE | l2-network=192.168.4.190 | | XS |
| 8307a481-679e-4df0-a64e-3a497b13ac81 | server7 | ACTIVE | l2-network=192.168.4.202 | | XS |
| 38d10b12-daa5-483e-b9a5-9a16ba14d841 | server8 | ACTIVE | l2-network=192.168.4.250 | cirros | XS |
+--------------------------------------+-----------------------+--------+--------------------------+----------------+--------+
Output of this example:
ip: 192.168.4.131
ip: 192.168.4.61
ip: 192.168.4.163
ip: 192.168.4.184
ip: 192.168.4.232
ip: 192.168.4.190
ip: 192.168.4.202
ip: 192.168.4.250
#!/bin/sh
openstack server list | grep -o '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' > stack
while $(wc -l stack | cut -d' ' -f1) -gt 0 ]
do
ipnumber=$(sed -n '1p' stack)
echo "${ipnumber}"
sed -i '1d' stack
done
The echo command there is just a placeholder. You can replace it with ssh, or whatever else you want to do, with the IP number in the variable.

ArrayFormula - If cell contains match, combine other cells with TEXTJOIN

I have a Google Sheet that contains names of characters, together with corresponding values for the group name, "selected" and attack power. It looks like this:
Sheet1
| NAME | GROUP NAME | SELECTED | ATTACK POWER |
|:---------|:-----------|----------:|-------------:|
| guile | Team Red | 1 | 333 |
|----------|------------|-----------|--------------|
| blanka | Team Red | 1 | 50 |
|----------|------------|-----------|--------------|
| sagat | Team Red | | 500 |
|----------|------------|-----------|--------------|
| ruy | Team Blue | 1 | 450 |
|----------|------------|-----------|--------------|
| vega | Team Blue | 2 | 150 |
Sheet2
In my second sheet, I have two columns. Group name, which contains names of each team from Sheet1 and names, which contains my current ArrayFormula:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT('Sheet1'!A:A; 1*('Sheet1'!B:B=A2))))
Using this formula I can combine all characters into one cell (with textjoin, repeated with row breaks) based on the value in Group name. The result looks like the following:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
| | sagat |
|------------|---------------------------|
| Team Blue | ruy |
| | vega |
|------------|---------------------------|
The problem is that I only want to combine the characters with having a selected value of 1. End-result should instead look like this:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
|------------|---------------------------|
| Team Blue | ruy |
|------------|---------------------------|
I tried the following setup using a IF-statement, but it just returns a string of FALSE:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT(IF('Sheet1'!C:C="1";'Sheet1'!A:A); 1*('Sheet1'!B:B=A2))))
Can this be one?
paste in F2 cell:
=UNIQUE(FILTER(B:B, C:C=1))
paste in G2 cell and drag down:
=TEXTJOIN(CHAR(10), 1, FILTER(A:A, B:B=F2, C:C=1))
or G2 cell be like:
=ARRAYFORMULA(TEXTJOIN(CHAR(10), 1,
REPT(FILTER(Sheet1!A:A, Sheet1!C:C=1), 1*(FILTER(Sheet1!B:B, Sheet1!C:C=1)=F2))))

Moving dev site to production on new account AWS

I am in the process of moving and testing a development site on the actual domain name now and I just wanted to check if I was missing anything and also get some advice.
It is a Magento 1.8.1 install from Turnkey Linux running on an m1.medium instance.
What I have done (so far) is, create an image of the development instance, made a new account and copied it over to there. I then made an elastic IP and associated it with the new instance. Next I pointed the A name record of the production domain to the elastic IP.
Now, if I go to the production domain I get redirected to the development domain. Is there a reason for this?
Ideally I would like to have two instances, one dev one that is off unless needed and of course the production on which is going to be live 24/7. However if I turn the development domain off it stops the other too.
I have a feeling it's just because I need to change instances of the dev domain in the Magento database / back-end however I wanted to get a more knowledgable answer as I don't want to break either of the instance.
Also, I should probably mention that the development domain is a subdomain i.e. shop.mysite.com and the live one is just normal i.e. mysite.com. Not entirely sure this is relevant but thought it worth a mention.
Thanks in advance for any help.
The reason your URL on your new instance is getting redirected to the old URL is because in the core_config_data table of your magento database the web/unsecure/base_url and web/secure/base_url paths point to your old URL.
So if you are using mysql you can query your database as follows:
mysql> use magento;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from core_config_data;
+-----------+---------+----------+-------------------------------+-------------------------------------+
| config_id | scope | scope_id | path | value |
+-----------+---------+----------+-------------------------------+-------------------------------------+
| 1 | default | 0 | web/seo/use_rewrites | 1 |
| 2 | default | 0 | admin/dashboard/enable_charts | 0 |
| 3 | default | 0 | web/unsecure/base_url | http://magento.myolddomain.com/ |
| 4 | default | 0 | web/secure/use_in_frontend | 1 |
| 5 | default | 0 | web/secure/base_url | https://magento.myolddomain.com/ |
| 6 | default | 0 | web/secure/use_in_adminhtml | 1 |
| 7 | default | 0 | general/locale/code | en_US |
| 8 | default | 0 | general/locale/timezone | Europe/London |
| 9 | default | 0 | currency/options/base | USD |
| 10 | default | 0 | currency/options/default | USD |
| 11 | default | 0 | currency/options/allow | USD |
| 12 | default | 0 | general/region/display_all | 1 |
| 13 | default | 0 | general/region/state_required | AT,CA,CH,DE,EE,ES,FI,FR,LT,LV,RO,US |
| 14 | default | 0 | catalog/category/root_id | 2 |
+-----------+---------+----------+-------------------------------+-------------------------------------+
14 rows in set (0.00 sec)
and you can change it as follows:
mysql> update core_config_data set value='http://magento.mynewdomain.com' where path='web/unsecure/base_url';
mysql> update core_config_data set value='https://magento.mynewdomain.com' where path='web/secure/base_url';

Resources