Sort the output of linux command - sorting

Is there any way to sort the results of the output of this command?
I will like to split the first column on dot "." and then sort on type (for e.g. "xlarge")
# curl 'https://ec2.shop?region=us-west-2&filter=m4,a1'
Instance Type Memory vCPUs Storage Network Price Monthly Spot Price
a1.xlarge 8 GiB 4 vCPUs EBS only Up to 10 Gigabit 0.1020 74.460 0.0334
m4.2xlarge 32 GiB 8 vCPUs EBS only High 0.4000 292.000 0.1398
...
m4.xlarge 16 GiB 4 vCPUs EBS only High 0.2000 146.000 0.0657
So that the results will look something like this...
a1.xlarge 8 GiB 4 vCPUs EBS only Up to 10 Gigabit 0.1020 74.460 0.0334
m4.xlarge 16 GiB 4 vCPUs EBS only High 0.2000 146.000 0.0657
This is easier to compare results. xlarge should be grouped together while 2xlarge should be next to each other.

$ curl -s 'https://ec2.shop?region=us-west-2&filter=m4,a1' | sort -k2 -t.
Instance Type Memory vCPUs Storage Network Price Monthly Spot Price
m4.10xlarge 160 GiB 40 vCPUs EBS only 10 Gigabit 2.0000 1460.000 0.7443
m4.16xlarge 256 GiB 64 vCPUs EBS only 20 Gigabit 3.2000 2336.000 1.1896
a1.2xlarge 16 GiB 8 vCPUs EBS only Up to 10 Gigabit 0.2040 148.920 0.0667
m4.2xlarge 32 GiB 8 vCPUs EBS only High 0.4000 292.000 0.1398
a1.4xlarge 32 GiB 16 vCPUs EBS only Up to 10 Gigabit 0.4080 297.840 0.1335
m4.4xlarge 64 GiB 16 vCPUs EBS only High 0.8000 584.000 0.3199
a1.large 4 GiB 2 vCPUs EBS only Up to 10 Gigabit 0.0510 37.230 0.0167
m4.large 8 GiB 2 vCPUs EBS only Moderate 0.1000 73.000 0.0341
a1.medium 2 GiB 1 vCPUs EBS only Up to 10 Gigabit 0.0255 18.615 0.0083
a1.metal 32 GiB 16 vCPUs EBS only Up to 10 Gigabit 0.4080 297.840 0.1335
m4.xlarge 16 GiB 4 vCPUs EBS only High 0.2000 146.000 0.0657
a1.xlarge 8 GiB 4 vCPUs EBS only Up to 10 Gigabit 0.1020 74.460 0.0334

Related

Performance of my MPI code does not improve when I use two NUMA nodes (dual Xeon chips)

I have a computer, Precision-Tower-7810 dual Xeon E5-2680v3 #2.50GHz × 48 threads.
Here is result of $lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2680 v3 # 2.50GHz
Stepping: 2
CPU MHz: 1200.000
CPU max MHz: 3300,0000
CPU min MHz: 1200,0000
BogoMIPS: 4988.40
Virtualization: VT-x
L1d cache: 768 KiB
L1i cache: 768 KiB
L2 cache: 6 MiB
L3 cache: 60 MiB
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
My MPI code is based on basic MPI (Isend, Irecv, Wait, Bcast). Fundamentally, the data will be distributed and sent to all processors. On each processor, data is used to calculate something and its value is changed. After the above procedure, the amount of data on each processor is exchanged between all processors. This work is repeated to a limit.
Now, the main issue is that when I increase the number of processors within the limit of one chip (24 threads), performance increases. However, performance does not improve while the number of processors > 24 threads.
An example:
$mpiexec -n 6 ./mywork : 72s
$mpiexec -n 12 ./mywork : 46s
$mpiexec -n 24 ./mywork : 36s
$mpiexec -n 32 ./mywork : 36s
$mpiexec -n 48 ./mywork : 35s
I have tried on the both OpenMPI and MPICH, obtained result is the same. So, I think issue of physical connect type (NUMA nodes) of two chips. It is assumption of mine, I have never used a really supercomputer. I hope anyone know this issue and help me. Thank you for reading.

Postgres configuration for better performance

We have done a PostreSql database Based ERP project. I have 32 GB RAM Windows Server 2012 R2 system. Out of 32 GB, I have used 8 GB for JVM and assuming 4 GB for OS, I have tried to tune the postgres with 20 GB RAM.
I have find out the configuration from the below link:
https://www.pgconfig.org/#/tuning?total_ram=20&max_connections=300&environment_name=OLTP&pg_version=9.2&os_type=Windows&arch=x86-64&share_link=true
But the performance goes down after the change. What could be the reason. As I am less knowledge in the postgres server maintenance, if anything more required for you to assess/answer let me know.
UPDATE
shared_buffers (integer) : 512 MB
effective_cache_size (integer) : 15 GB
work_mem (integer): 68 MB
maintenance_work_mem (integer): 1 GB
checkpoint_segments (integer): 96
checkpoint_completion_target (floating): 0.9
wal_buffers (integer): 16 MB

AWS EC2: Baseline of 3 IOPS per GiB with a minimum of 100 IOPS

I seem to remember the policy was Baseline of 3 IOPS per GiB. If I have a volumn of 8GB, I get 24 IOPS. Now with the a minimum of 100 IOPS, do I get at least 100 IOPS no matter how small my volumn is?
Yes, at 33.33 GiB and below, an EBS SSD (gp2) volume will have 100 IOPS. This is spelled out clearly in the docs.

Cluster Configuration - Worker Nodes

I am beginner in cluster configuration. I know in our cluster we have types of worker nodes:
16 x 4TB Disks
128 RAM
2 x 8 Core CPUs
12 x 1.2 TB Disks
256 RAM
2 x 10 Core CPUs
I am confused about the configuration. What does mean 2 x 8 cores? It means 2 processor with 8 physical core each? So if my processor are hyperthreading i will have 2 X 8 X 2 = 32 virtual cores?
And 12 x 1.2 TB means, 12 disks with 1.2 TB each?
Usually 2x 8 Core CPUs, means, that you have 2 physical chips on your motherboard, each having 8 Cores. If you enable hyperthreading, you then have 32 virtual cores.
The amount of disks is either the way, like you stated it, or its the number of nodes. Then you have 16 nodes with 4TB disk.... and 12 nodes with 1.2TB disk ....
I am just wondering, how someone can get to this hardware, not knowing what it means. Can you send me some nodes? :)

IRQ affinity handling in Linux

I have Linux running as a VM with 2 vCPUs and one interface. For the interface's rx interrupt I have the IRQ affinity set to both the vCPUs (in /proc/../smp_affinity).
How is the interrupt assignment to the CPU done in this case ?
With iperf traffic the combined cpu usage for 2 vCPUs is 100%, with most of it from soft-interrupt handling. At a given instance the split between the 2 vCPUs is random, 30-70, 60-40, 50-50 etc.
If I change the IRQ affinity to one vCPU that CPU goes to 100%.
If the kernel is doing plain round robin between the 2 vCPUs shouldn't the load on both the vCPUs be close to 100% instead of the combined load between 2 vCPUs being 100% ?

Resources