Improving Haskell performance for small GET requests - performance

In an effort to become better with Haskell, I'm rewriting a small CLI that I developed originally in Python. The CLI mostly makes GET requests to an API and allows for filtering/formatting the JSON result.
I'm finding my Haskell version to be a lot slower than my Python one.
To help narrow down the problem, I excluded all parts of my Haskell code except the fetching of data - essentially, it's this:
import Data.Aeson
import qualified Data.ByteString.Char8 as BC
import Data.List (intercalate)
import Network.HTTP.Simple
...
-- For testing purposes
getUsers :: [Handle] -> IO ()
getUsers hs = do
let handles = BC.pack $ intercalate ";" hs
req <- parseRequest (baseUrl ++ "/user.info")
let request = setRequestQueryString [("handles", Just handles)] $ req
response <- httpJSON request
let (usrs :: Maybe (MyApiResponseType [User])) = getResponseBody response
print usrs
And I'm using the following dependencies:
dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring
- http-conduit
To test this, I timed how long it takes for my Haskell program to retrieve data for a particular user (without any particular formatting). I compared it with my Python version (which formats the data), and Curl (which I piped into jq to format the data):
I ran each 5 times and took the average of the 3 middle values, excluding the highest and lowest times:
Haskell Python Curl
real: 1017 ms 568 ms 214 ms
user: 1062 ms 367 ms 26 ms
sys: 210 ms 45 ms 10 ms
Ok, so the Haskell version is definitely slower. Next I tried profiling tools to narrow down the cause of the problem.
I profiled the code using an SCC annotation for the function above:
> stack build --profile
...
> stack exec --profile -- my-cli-exe +RTS -p -sstderr
...
244,904,040 bytes allocated in the heap
27,759,640 bytes copied during GC
5,771,840 bytes maximum residency (6 sample(s))
245,912 bytes maximum slop
28 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 228 colls, 228 par 0.849s 0.212s 0.0009s 0.0185s
Gen 1 6 colls, 5 par 0.090s 0.023s 0.0038s 0.0078s
Parallel GC work balance: 30.54% (serial 0%, perfect 100%)
TASKS: 21 (1 bound, 20 peak workers (20 total), using -N8)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.004s ( 0.003s elapsed)
MUT time 0.881s ( 0.534s elapsed)
GC time 0.939s ( 0.235s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.010s ( 0.001s elapsed)
Total time 1.833s ( 0.773s elapsed)
Alloc rate 277,931,867 bytes per MUT second
Productivity 48.1% of total user, 69.1% of total elapsed
Seems like a lot of time is being spent in garbage collection.
I looked at the generated .prof file, which gave this:
COST CENTRE MODULE SRC %time %alloc
>>=.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:104:13-61 10.2 9.8
fromBase64.decode4 Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(299,9)-(309,37) 9.0 12.3
>>=.\ Data.ASN1.Parse Data/ASN1/Parse.hs:(54,9)-(56,43) 5.4 0.7
fromBase64.loop Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(264,9)-(296,45) 4.2 7.4
>>=.\ Data.ASN1.Get Data/ASN1/Get.hs:(104,9)-(105,38) 4.2 3.5
decodeSignedObject.onContainer Data.X509.Signed Data/X509/Signed.hs:(167,9)-(171,30) 3.6 2.9
runParseState.go Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(98,12)-(129,127) 3.0 3.2
getConstructedEndRepr.getEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(37,11)-(41,82) 3.0 12.7
getConstructedEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(23,1)-(28,93) 3.0 7.8
readCertificates Data.X509.CertificateStore Data/X509/CertificateStore.hs:(92,1)-(96,33) 3.0 2.2
fmap.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:88:13-52 1.8 2.2
decodeConstruction Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(48,1)-(50,66) 1.8 0.0
fmap Data.ASN1.Parse Data/ASN1/Parse.hs:41:5-57 1.8 1.0
concat.loopCopy Data.ByteArray.Methods Data/ByteArray/Methods.hs:(210,5)-(215,28) 1.2 0.4
fromBase64.rset Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(312,9)-(314,53) 1.2 0.0
localTimeParseE.allDigits Data.Hourglass.Format Data/Hourglass/Format.hs:358:9-37 1.2 0.3
getWord8 Data.ASN1.Get Data/ASN1/Get.hs:(200,1)-(204,43) 1.2 0.0
fmap.\ Data.ASN1.Get Data/ASN1/Get.hs:(88,9)-(89,38) 1.2 0.6
runParseState.runGetHeader.\ Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:131:44-66 1.2 0.0
mplusEither Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(67,1)-(70,45) 1.2 4.9
getOID.groupOID Data.ASN1.Prim Data/ASN1/Prim.hs:299:9-92 1.2 0.3
getConstructedEndRepr.getEnd.zs Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.0
getConstructedEndRepr.getEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.4
getConstructedEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:28:48-80 1.2 0.3
decodeEventASN1Repr.loop Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(54,11)-(67,69) 1.2 2.5
put Data.ASN1.Parse Data/ASN1/Parse.hs:(72,1)-(74,24) 1.2 0.0
fromASN1 Data.X509.ExtensionRaw Data/X509/ExtensionRaw.hs:(55,5)-(61,71) 1.2 0.0
compare Data.X509.DistinguishedName Data/X509/DistinguishedName.hs:31:23-25 1.2 0.0
putBinaryVersion Network.TLS.Packet Network/TLS/Packet.hs:(109,1)-(110,41) 1.2 0.0
parseLBS.onSuccess Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(147,11)-(149,64) 0.6 1.7
pemParseLBS Data.PEM.Parser Data/PEM/Parser.hs:(92,1)-(97,41) 0.6 1.0
runParseState.terminateAugment Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(87,12)-(93,53) 0.0 1.7
parseOnePEM.getPemContent Data.PEM.Parser Data/PEM/Parser.hs:(56,9)-(64,93) 0.0 1.8
This doesn't seem too bad, and when I scrolled down to functions I had defined and they didn't seem to be taking much time either.
This leads me to believe it's a memory leak problem(?), so I profiled the heap:
stack exec --profile -- my-cli-exe +RTS -h
hp2ps my-cli-exe.hp
open my-cli.exe.ps
So it seems as though lots of space is being allocated on the heap, and then suddenly cleared.
The main issue is, I'm not sure where to go from here. My function is relatively small and is only getting a small JSON response of around 500 bytes. So where could the issue be coming from?

It seemed odd that the performance of a common Haskell library was so slow for me, but somehow this approach solved my concerns:
I found that the performance of my executable was faster when I used stack install to copy the binaries:
stack install
my-cli-exe
instead of using build and run.
Here are the running times again for comparison:
HS (stack install) HS (stack run) Python Curl
real: 373 ms 1017 ms 568 ms 214 ms
user: 222 ms 1062 ms 367 ms 26 ms
sys: 141 ms 210 ms 45 ms 10 ms

Related

High memory usage on digital ocean droplet

I have a laravel application which I've installed on a 1GB standard droplet running ubuntu 20.4, nginx, MySQL 8 and php 7.4
The application isn't even live yet and I notice it's already using over 50% memory. Yesterday it was using 80% and after a system reboot its returned to using around 60% memory usage.
Below is a snap shot of the current high memory running processes. Is this level of memory usage normal for a laravel application which is not even live i.e. limited load?
top - 19:41:00 up 3:46, 1 user, load average: 0.08, 0.04, 0.01
Tasks: 101 total, 1 running, 100 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.7 sy, 0.0 ni, 98.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 981.3 total, 90.6 free, 601.4 used, 289.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 212.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
815 mysql 20 0 1305900 417008 13352 S 0.7 41.5 1:32.39 mysqld
2257 www-data 20 0 245988 44992 30180 S 0.0 4.5 0:04.67 php-fpm7.4
2265 www-data 20 0 243700 42204 29572 S 0.0 4.2 0:04.41 php-fpm7.4
2259 www-data 20 0 243960 42104 30380 S 0.0 4.2 0:04.44 php-fpm7.4
988 root 20 0 125160 36188 10604 S 0.3 3.6 0:09.89 php
388 root 19 -1 84404 35116 33932 S 0.0 3.5 0:01.14 systemd-journ+
741 root 20 0 627300 20936 6656 S 0.0 2.1 0:02.11 snapd
738 root 20 0 238392 18588 12624 S 0.0 1.8 0:00.83 php-fpm7.4
743 root 20 0 31348 18344 3844 S 0.0 1.8 0:02.75 supervisord
544 root rt 0 280180 17976 8184 S 0.0 1.8 0:00.90 multipathd
825 root 20 0 108036 15376 7732 S 0.0 1.5 0:00.10 unattended-up+
736 root 20 0 29220 13200 5544 S 0.0 1.3 0:00.11 networkd-disp+
726 do-agent 20 0 559436 12120 6588 S 0.0 1.2 0:01.78 do-agent
1 root 20 0 101964 11124 8024 S 0.0 1.1 0:02.52 systemd
623 systemd+ 20 0 23912 10488 6484 S 0.0 1.0 0:00.42 systemd-resol+
778 www-data 20 0 71004 9964 5240 S 0.0 1.0 0:02.43 nginx
My concern is once the application goes live and the load increases, more database connection it going to run out of memory. I know I can resize the droplet and increase the memory or set up some swap space but is this amount of memory usage normal for an unused application?
How can I optimize the high memory usage processes such as mysql, niginx, php. Mysql8 appear to be the main culprit hogging all the memory. Below are my mysql setting:
#
# The MySQL database server configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
[mysqld]
#
# * Basic Settings
#
user = mysql
# pid-file = /var/run/mysqld/mysqld.pid
# socket = /var/run/mysqld/mysqld.sock
# port = 3306
# datadir = /var/lib/mysql
# If MySQL is running as a replication slave, this should be
# changed. Ref https://dev.mysql.com/doc/refman/8.0/en/server-system- variables.html#sysvar_tmpdir
# tmpdir = /tmp
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = 127.0.0.1
#
# * Fine Tuning
#
key_buffer_size = 16M
# max_allowed_packet = 64M
# thread_stack = 256K
# thread_cache_size = -1
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover-options = BACKUP
# max_connections = 151
# table_open_cache = 4000
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
#
# Log all queries
# Be aware that this log type is a performance killer.
# general_log_file = /var/log/mysql/query.log
# general_log = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Here you can see queries with especially long duration
# slow_query_log = 1
# slow_query_log_file = /var/log/mysql/mysql-slow.log
# long_query_time = 2
# log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
# server-id = 1
# log_bin = /var/log/mysql/mysql-bin.log
# binlog_expire_logs_seconds = 2592000
max_binlog_size = 100M
# binlog_do_db = include_database_name
# binlog_ignore_db = include_database_name
Any tips and advice much appreciate as this is the first time I'm using a vps.

Ho to properly interpret HeapInuse / HeapIdle / HeapReleased memory stats in golang

I want to monitor memory usage of my golang program and clean up some internal cache if system lacks of free memory.
The problem is that HeapAlloc / HeapInuse / HeapReleased stats aren't always add up properly (to my understanding).
I'm looking at free system memory (+ buffers/cache) - the value that is shown as available by free utility:
$ free
total used free shared buff/cache available
Mem: 16123232 409248 15113628 200 600356 15398424
Swap: 73242180 34560 73207620
And also I look at HeapIdle - HeapReleased, that, according to comments in https://godoc.org/runtime#MemStats,
HeapIdle minus HeapReleased estimates the amount of memory
that could be returned to the OS, but is being retained by
the runtime so it can grow the heap without requesting more
memory from the OS.
Now the problem: sometimes Available + HeapInuse + HeapIdle - HeapReleased exceeds total amount of system memory. Usually it happens when HeapIdle is quite high and HeapReleased is neither close to HeapIdle nor to zero:
# Start of test
Available: 15379M, HeapAlloc: 49M, HeapInuse: 51M, HeapIdle: 58M, HeapReleased: 0M
# Work is in progress
# Looks good: 11795 + 3593 = 15388
Available: 11795M, HeapAlloc: 3591M, HeapInuse: 3593M, HeapIdle: 0M, HeapReleased: 0M
# Work has been done
# Looks good: 11745 + 45 + 3602 = 15392
Available: 11745M, HeapAlloc: 42M, HeapInuse: 45M, HeapIdle: 3602M, HeapReleased: 0M
# Golang released some memory to OS
# Looks good: 15224 + 14 + 3632 - 3552 = 15318
Available: 15224M, HeapAlloc: 10M, HeapInuse: 14M, HeapIdle: 3632M, HeapReleased: 3552M
# Some other work started
# Looks SUSPICIOUS: 13995 + 1285 + 2360 - 1769 = 15871
Available: 13995M, HeapAlloc: 1282M, HeapInuse: 1285M, HeapIdle: 2360M, HeapReleased: 1769M
# 5 seconds later
# Looks BAD: 13487 + 994 + 2652 - 398 = 16735 - more than system memory
Available: 13487M, HeapAlloc: 991M, HeapInuse: 994M, HeapIdle: 2652M, HeapReleased: 398M
# This bad situation holds for quite a while, even when work has been done
# Looks BAD: 13488 + 14 + 3631 - 489 = 16644
Available: 13488M, HeapAlloc: 10M, HeapInuse: 14M, HeapIdle: 3631M, HeapReleased: 489M
# It is strange that at this moment HeapIdle - HeapReleased = 3142
# > than 2134M of used memory reported by "free" utility.
$ free
total used free shared buff/cache available
Mem: 16123232 2185696 13337632 200 599904 13621988
Swap: 73242180 34560 73207620
# Still bad when another set of work started
# Looks BAD: 13066 + 2242 + 1403 = 16711
Available: 13066M, HeapAlloc: 2240M, HeapInuse: 2242M, HeapIdle: 1403M, HeapReleased: 0M
# But after 10 seconds it becomes good
# Looks good: 11815 + 2325 + 1320 = 15460
Available: 11815M, HeapAlloc: 2322M, HeapInuse: 2325M, HeapIdle: 1320M, HeapReleased: 0M
I do not understand from where this additional "breathing" 1.3GB (16700 - 15400) of memory comes from. Used swap space remained the same during the whole test.

Sphinx doesn't show almost all data in production

I use sphinx + thinking_sphinx on my local machine and met really strange problem.
I have 2 models - Users and Microposts. Sphinx gets and show users but it can't find Microposts on my Production machine. On my local my machine it does.
In production i try:
root#serverserj:/vol/www/apps/ror_tutorial/current# rake ts:config
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
root#serverserj:/vol/www/apps/ror_tutorial/current# rake ts:index
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
Sphinx 2.0.3-release (r3043)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 12 bytes
total 0.012 sec, 1000 bytes/sec, 83.33 docs/sec
skipping non-plain index 'user'...
total 4 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=1959).
root#serverserj:/vol/www/apps/ror_tutorial/current $ rake ts:rebuild
Stopped search daemon (pid 1959).
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
Sphinx 2.0.3-release (r3043)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 12 bytes
total 0.008 sec, 1500 bytes/sec, 125.00 docs/sec
skipping non-plain index 'user'...
total 4 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
Started successfully (pid 2218).
For example, lets try something on local machine:
$ rails c
Loading development environment (Rails 3.1.3)
system :001 > Micropost.search 'Minus'
(0.3ms) SHOW search_path
Sphinx Query (2.3ms) Minus
Sphinx Found 12 results
Micropost Load (1.6ms) SELECT "microposts".* FROM "microposts" WHERE "microposts"."id" IN (30, 32, 91, 106, 121, 128, 160, 171, 172, 239, 258, 260) ORDER BY microposts.created_at DESC
=> [#<Micropost id: 30, content: "Sed minus magni culpa reiciendis unde.", user_id: 1, created_at: "2012-01-15 21:11:03", updated_at: "2012-01-15 21:11:03">, #<Micropost id: 32, content: "Placeat pariatur quisquam provident velit veniam vo...", user_id: 1, created_at: "2012-01-15 21:11:03", updated_at: "2012-01-15 21:11:03">...]
And on production machine:
$ rails c
Loading development environment (Rails 3.1.3)
1.9.1 :001 > Micropost.search 'Minus'
(4.0ms) SHOW search_path
Sphinx Query (4.0ms) Minus
Sphinx Found 0 results
=> []
config/deploy.rb
#Add RVM's lib directory to the load path.
$:.unshift(File.expand_path('./lib', ENV['rvm_path']))
#Load RVM's capistrano plugin.
require "rvm/capistrano"
require 'bundler/capistrano'
#require 'thinking_sphinx/deploy/capistrano'
set :rvm_ruby_string, '1.9.3-head' #This is current version of ruby which is uses by RVM. To get version print: $ rvm list
set :rvm_type, :root #Don't use system-wide RVM, use my user, which name is root.
set :user, "root" #If you log into your server with a different user name than you are logged into your local machine with, you’ll need to tell Capistrano about that user name.
set :rails_env, "production"
set :application, "ror_tutorial"
set :deploy_to, "/vol/www/apps/#{application}"
set :scm, :git
set :repository, "git://github.com/Loremaster/sample_app.git"
set :branch, "master"
set :deploy_via, :remote_cache
default_run_options[:pty] = true #Must be set for the password prompt from git to work#Keep cash of repository locally and with ney deploy get only changes.
server "188.127.224.136", :app, # This may be the same as your `Web` server
:web,
:db, :primary => true # This is where Rails migrations will run
# If you are using Passenger mod_rails uncomment this:
namespace :deploy do
task :start do ; end
task :stop do ; end
task :restart, :roles => :app, :except => { :no_release => true } do
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
end
end
desc "Prepare system"
task :prepare_system, :roles => :app do
run "cd #{current_path} && bundle install --without development test && bundle install --deployment"
end
after "deploy:update_code", :prepare_system
My System
Ubuntu 10.04.1 LTS
Phusion Passenger
PostgreSQL 9
Nginx
Rails 3.1.3
Ruby 1.9.3
Capistrano
Sphinx 2.0.3
It looks like that you don't have documents for micropost_core index:
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
There is 0 docs. Could you please provide config, and check your data on production?

Need help analysing the VarnishStat results

I am a newbie with Varnish. I have successfully installed it and now its working, but I need some guidance from the more knowledgeable people about how the server is performing.
I read this article - http://kristianlyng.wordpress.com/2009/12/08/varnishstat-for-dummies/ but I am still not sure howz the server performance.
The server has been running since last 9 hours. I understand that more content will be cached with time so cache hit ratio will better, but right now my concern is about intermediate help from your side on server performance.
Hitrate ratio: 10 100 613
Hitrate avg: 0.2703 0.3429 0.4513
239479 8.00 7.99 client_conn - Client connections accepted
541129 13.00 18.06 client_req - Client requests received
157594 1.00 5.26 cache_hit - Cache hits
3 0.00 0.00 cache_hitpass - Cache hits for pass
313499 9.00 10.46 cache_miss - Cache misses
67377 4.00 2.25 backend_conn - Backend conn. success
316739 7.00 10.57 backend_reuse - Backend conn. reuses
910 0.00 0.03 backend_toolate - Backend conn. was closed
317652 8.00 10.60 backend_recycle - Backend conn. recycles
584 0.00 0.02 backend_retry - Backend conn. retry
3 0.00 0.00 fetch_head - Fetch head
314040 9.00 10.48 fetch_length - Fetch with Length
4139 0.00 0.14 fetch_chunked - Fetch chunked
5 0.00 0.00 fetch_close - Fetch wanted close
386 . . n_sess_mem - N struct sess_mem
55 . . n_sess - N struct sess
313452 . . n_object - N struct object
313479 . . n_objectcore - N struct objectcore
38474 . . n_objecthead - N struct objecthead
368 . . n_waitinglist - N struct waitinglist
12 . . n_vbc - N struct vbc
61 . . n_wrk - N worker threads
344 0.00 0.01 n_wrk_create - N worker threads created
2935 0.00 0.10 n_wrk_queued - N queued work requests
1 . . n_backend - N backends
47 . . n_expired - N expired objects
149425 . . n_lru_moved - N LRU moved objects
1 0.00 0.00 losthdr - HTTP header overflows
461727 10.00 15.41 n_objwrite - Objects sent with write
239468 8.00 7.99 s_sess - Total Sessions
541129 13.00 18.06 s_req - Total Requests
64678 3.00 2.16 s_pipe - Total pipe
5346 0.00 0.18 s_pass - Total pass
318187 9.00 10.62 s_fetch - Total fetch
193589421 3895.84 6459.66 s_hdrbytes - Total header bytes
4931971067 14137.41 164569.09 s_bodybytes - Total body bytes
117585 3.00 3.92 sess_closed - Session Closed
2283 0.00 0.08 sess_pipeline - Session Pipeline
892 0.00 0.03 sess_readahead - Session Read Ahead
458468 10.00 15.30 sess_linger - Session Linger
414010 9.00 13.81 sess_herd - Session herd
36912073 880.96 1231.68 shm_records - SHM records
What VCL are you using? If the answer is 'none' then you are probably not getting a very good hitrate. On a fresh install, Varnish is quite conservative about what it caches (and rightly so), but you can probably improve matters by reading how to achieve a high hitrate. If it's safe to, you can selectively unset cookies and normalise requests with your VCL, which will result in fewer backend calls.
How much of your website is cacheable? Is your object cache big enough? If you can answer those two questions, you ought to be able to achieve a great hitrate with Varnish.

sphinx, xmlpipe2, and cassandra

I start to using cassandra and I want to index my db with sphinx.
I wrote ruby script which is used as xmlpipe, and I configure sphinx to use it.
source xmlsrc
{
type = xmlpipe2
xmlpipe_command = /usr/local/bin/ruby /home/httpd/html/app/script/sphinxpipe.rb
}
When I run script from console output looks fine, but when I run indexer sphinx return error
$ indexer test_index
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/etc/sphinx.conf'...
indexing index 'test_index'...
ERROR: index 'test_index': source 'xmlsrc': attribute 'id' required in <sphinx:document> (line=10, pos=0, docid=0).
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
my script is very simple
$stdout.sync = true
puts %{<?xml version="1.0" encoding="utf-8"?>}
puts %{<sphinx:docset>}
puts %{<sphinx:schema>}
puts %{<sphinx:field name="body"/>}
puts %{</sphinx:schema>}
puts %{<sphinx:document id="ba32c02e-79e2-11df-9815-af1b5f766459">}
puts %{<body><![CDATA[aaa]]></body>}
puts %{</sphinx:document>}
puts %{</sphinx:docset>}
I use ruby 1.9.2-head, ubuntu 10.04, sphinx 0.9.9
How can I get this to work?
I have answer to my own quiestion :)
sphinx has defined document max id in source code
for 64 bit mashines
#define DOCID_MAX U64C(0xffffffffffffffff)
document id must be less than 18446744073709551615
for 32 bit mashines
#define DOCID_MAX 0xffffffffUL
document id must be less than 4294967295
I used SimpleUUID that why it don't work

Resources