sphinx, xmlpipe2, and cassandra - ruby

I start to using cassandra and I want to index my db with sphinx.
I wrote ruby script which is used as xmlpipe, and I configure sphinx to use it.
source xmlsrc
{
type = xmlpipe2
xmlpipe_command = /usr/local/bin/ruby /home/httpd/html/app/script/sphinxpipe.rb
}
When I run script from console output looks fine, but when I run indexer sphinx return error
$ indexer test_index
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/etc/sphinx.conf'...
indexing index 'test_index'...
ERROR: index 'test_index': source 'xmlsrc': attribute 'id' required in <sphinx:document> (line=10, pos=0, docid=0).
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
my script is very simple
$stdout.sync = true
puts %{<?xml version="1.0" encoding="utf-8"?>}
puts %{<sphinx:docset>}
puts %{<sphinx:schema>}
puts %{<sphinx:field name="body"/>}
puts %{</sphinx:schema>}
puts %{<sphinx:document id="ba32c02e-79e2-11df-9815-af1b5f766459">}
puts %{<body><![CDATA[aaa]]></body>}
puts %{</sphinx:document>}
puts %{</sphinx:docset>}
I use ruby 1.9.2-head, ubuntu 10.04, sphinx 0.9.9
How can I get this to work?

I have answer to my own quiestion :)
sphinx has defined document max id in source code
for 64 bit mashines
#define DOCID_MAX U64C(0xffffffffffffffff)
document id must be less than 18446744073709551615
for 32 bit mashines
#define DOCID_MAX 0xffffffffUL
document id must be less than 4294967295
I used SimpleUUID that why it don't work

Related

saving dataframe and corresponding chart in a single pdf file in python matplotlib

I have a dataframe:
id1 id2 fields valid invalid missing
0 1001.0 0.0 State 158.0 0.0 0.0
1 1001.0 0.0 Zip 156.0 0.0 2.0
2 1001.0 0.0 Race 128.0 20.0 10.0
3 1001.0 0.0 LastName 158.0 0.0 0.0
4 1001.0 0.0 Email 54.0 0.0 104.0
... ... ... ... ... ... ...
28859 5276.0 36922.0 Phone 0.0 0.0 8.0
28860 5276.0 36922.0 Email 1.0 0.0 7.0
28861 5276.0 36922.0 State 8.0 0.0 0.0
28862 5276.0 36922.0 office ID 8.0 0.0 0.0
28863 5276.0 36922.0 StreetAdd 8.0 0.0 0.0
with an initial goal of grouping into individual id and create a pdf file. I was able to create a pdf file from the plot I created but I would like to save the dataframe that goes with the graph in the same pdf file.
# read the csv file
cme_df = pd.read_csv('sample.csv')
# fill na with 0
cme_df = cme_df.fillna(0)
# iterate through the unique id2 in the file
for i in cme_df['id2'].unique():
with PdfPages('home/'+'id2_'+str(i)+'.pdf') as pdf:
cme_i = cme_df[cme_df['id2'] == i].sort_values('fields')
print(cme_i)
# I feel this is where I must have something to create or save the table into pdf with the graph created below #
# create the barh graph
plt.barh(cme_i['fields'],cme_i['valid'], color = 'g', label='valid')
plt.barh(cme_i['fields'],cme_i['missing'], left = cme_i['valid'],color='y',label='missing')
plt.barh(cme_i['fields'],cme_i['invalid'],left = cme_i['valid']+cme_i['missing'], color='r',label='invalid')
plt.legend(bbox_to_anchor=(0.5, -0.05), loc='upper center', shadow=True, ncol=3)
plt.suptitle('valid, invalid, missing', fontweight='bold')
plt.title('id2: '+ str(i))
pdf.savefig()
plt.clf()
my code above prints the table in the results window, then goes to creating the horizontal bar. the last few lines save the graph into pdf. I would like save both the dataframe and the graph in a single file.
In some searches, it suggested to convert to html then to pdf, I cannot seem to make it work.
cme_i.to_html('id2_'+str(i)+'.html')
# then convert to pdf
pdf.from_file(xxxxx)

Improving Haskell performance for small GET requests

In an effort to become better with Haskell, I'm rewriting a small CLI that I developed originally in Python. The CLI mostly makes GET requests to an API and allows for filtering/formatting the JSON result.
I'm finding my Haskell version to be a lot slower than my Python one.
To help narrow down the problem, I excluded all parts of my Haskell code except the fetching of data - essentially, it's this:
import Data.Aeson
import qualified Data.ByteString.Char8 as BC
import Data.List (intercalate)
import Network.HTTP.Simple
...
-- For testing purposes
getUsers :: [Handle] -> IO ()
getUsers hs = do
let handles = BC.pack $ intercalate ";" hs
req <- parseRequest (baseUrl ++ "/user.info")
let request = setRequestQueryString [("handles", Just handles)] $ req
response <- httpJSON request
let (usrs :: Maybe (MyApiResponseType [User])) = getResponseBody response
print usrs
And I'm using the following dependencies:
dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring
- http-conduit
To test this, I timed how long it takes for my Haskell program to retrieve data for a particular user (without any particular formatting). I compared it with my Python version (which formats the data), and Curl (which I piped into jq to format the data):
I ran each 5 times and took the average of the 3 middle values, excluding the highest and lowest times:
Haskell Python Curl
real: 1017 ms 568 ms 214 ms
user: 1062 ms 367 ms 26 ms
sys: 210 ms 45 ms 10 ms
Ok, so the Haskell version is definitely slower. Next I tried profiling tools to narrow down the cause of the problem.
I profiled the code using an SCC annotation for the function above:
> stack build --profile
...
> stack exec --profile -- my-cli-exe +RTS -p -sstderr
...
244,904,040 bytes allocated in the heap
27,759,640 bytes copied during GC
5,771,840 bytes maximum residency (6 sample(s))
245,912 bytes maximum slop
28 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 228 colls, 228 par 0.849s 0.212s 0.0009s 0.0185s
Gen 1 6 colls, 5 par 0.090s 0.023s 0.0038s 0.0078s
Parallel GC work balance: 30.54% (serial 0%, perfect 100%)
TASKS: 21 (1 bound, 20 peak workers (20 total), using -N8)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.004s ( 0.003s elapsed)
MUT time 0.881s ( 0.534s elapsed)
GC time 0.939s ( 0.235s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.010s ( 0.001s elapsed)
Total time 1.833s ( 0.773s elapsed)
Alloc rate 277,931,867 bytes per MUT second
Productivity 48.1% of total user, 69.1% of total elapsed
Seems like a lot of time is being spent in garbage collection.
I looked at the generated .prof file, which gave this:
COST CENTRE MODULE SRC %time %alloc
>>=.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:104:13-61 10.2 9.8
fromBase64.decode4 Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(299,9)-(309,37) 9.0 12.3
>>=.\ Data.ASN1.Parse Data/ASN1/Parse.hs:(54,9)-(56,43) 5.4 0.7
fromBase64.loop Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(264,9)-(296,45) 4.2 7.4
>>=.\ Data.ASN1.Get Data/ASN1/Get.hs:(104,9)-(105,38) 4.2 3.5
decodeSignedObject.onContainer Data.X509.Signed Data/X509/Signed.hs:(167,9)-(171,30) 3.6 2.9
runParseState.go Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(98,12)-(129,127) 3.0 3.2
getConstructedEndRepr.getEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(37,11)-(41,82) 3.0 12.7
getConstructedEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(23,1)-(28,93) 3.0 7.8
readCertificates Data.X509.CertificateStore Data/X509/CertificateStore.hs:(92,1)-(96,33) 3.0 2.2
fmap.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:88:13-52 1.8 2.2
decodeConstruction Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(48,1)-(50,66) 1.8 0.0
fmap Data.ASN1.Parse Data/ASN1/Parse.hs:41:5-57 1.8 1.0
concat.loopCopy Data.ByteArray.Methods Data/ByteArray/Methods.hs:(210,5)-(215,28) 1.2 0.4
fromBase64.rset Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(312,9)-(314,53) 1.2 0.0
localTimeParseE.allDigits Data.Hourglass.Format Data/Hourglass/Format.hs:358:9-37 1.2 0.3
getWord8 Data.ASN1.Get Data/ASN1/Get.hs:(200,1)-(204,43) 1.2 0.0
fmap.\ Data.ASN1.Get Data/ASN1/Get.hs:(88,9)-(89,38) 1.2 0.6
runParseState.runGetHeader.\ Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:131:44-66 1.2 0.0
mplusEither Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(67,1)-(70,45) 1.2 4.9
getOID.groupOID Data.ASN1.Prim Data/ASN1/Prim.hs:299:9-92 1.2 0.3
getConstructedEndRepr.getEnd.zs Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.0
getConstructedEndRepr.getEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.4
getConstructedEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:28:48-80 1.2 0.3
decodeEventASN1Repr.loop Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(54,11)-(67,69) 1.2 2.5
put Data.ASN1.Parse Data/ASN1/Parse.hs:(72,1)-(74,24) 1.2 0.0
fromASN1 Data.X509.ExtensionRaw Data/X509/ExtensionRaw.hs:(55,5)-(61,71) 1.2 0.0
compare Data.X509.DistinguishedName Data/X509/DistinguishedName.hs:31:23-25 1.2 0.0
putBinaryVersion Network.TLS.Packet Network/TLS/Packet.hs:(109,1)-(110,41) 1.2 0.0
parseLBS.onSuccess Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(147,11)-(149,64) 0.6 1.7
pemParseLBS Data.PEM.Parser Data/PEM/Parser.hs:(92,1)-(97,41) 0.6 1.0
runParseState.terminateAugment Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(87,12)-(93,53) 0.0 1.7
parseOnePEM.getPemContent Data.PEM.Parser Data/PEM/Parser.hs:(56,9)-(64,93) 0.0 1.8
This doesn't seem too bad, and when I scrolled down to functions I had defined and they didn't seem to be taking much time either.
This leads me to believe it's a memory leak problem(?), so I profiled the heap:
stack exec --profile -- my-cli-exe +RTS -h
hp2ps my-cli-exe.hp
open my-cli.exe.ps
So it seems as though lots of space is being allocated on the heap, and then suddenly cleared.
The main issue is, I'm not sure where to go from here. My function is relatively small and is only getting a small JSON response of around 500 bytes. So where could the issue be coming from?
It seemed odd that the performance of a common Haskell library was so slow for me, but somehow this approach solved my concerns:
I found that the performance of my executable was faster when I used stack install to copy the binaries:
stack install
my-cli-exe
instead of using build and run.
Here are the running times again for comparison:
HS (stack install) HS (stack run) Python Curl
real: 373 ms 1017 ms 568 ms 214 ms
user: 222 ms 1062 ms 367 ms 26 ms
sys: 141 ms 210 ms 45 ms 10 ms

High memory usage on digital ocean droplet

I have a laravel application which I've installed on a 1GB standard droplet running ubuntu 20.4, nginx, MySQL 8 and php 7.4
The application isn't even live yet and I notice it's already using over 50% memory. Yesterday it was using 80% and after a system reboot its returned to using around 60% memory usage.
Below is a snap shot of the current high memory running processes. Is this level of memory usage normal for a laravel application which is not even live i.e. limited load?
top - 19:41:00 up 3:46, 1 user, load average: 0.08, 0.04, 0.01
Tasks: 101 total, 1 running, 100 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.7 sy, 0.0 ni, 98.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 981.3 total, 90.6 free, 601.4 used, 289.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 212.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
815 mysql 20 0 1305900 417008 13352 S 0.7 41.5 1:32.39 mysqld
2257 www-data 20 0 245988 44992 30180 S 0.0 4.5 0:04.67 php-fpm7.4
2265 www-data 20 0 243700 42204 29572 S 0.0 4.2 0:04.41 php-fpm7.4
2259 www-data 20 0 243960 42104 30380 S 0.0 4.2 0:04.44 php-fpm7.4
988 root 20 0 125160 36188 10604 S 0.3 3.6 0:09.89 php
388 root 19 -1 84404 35116 33932 S 0.0 3.5 0:01.14 systemd-journ+
741 root 20 0 627300 20936 6656 S 0.0 2.1 0:02.11 snapd
738 root 20 0 238392 18588 12624 S 0.0 1.8 0:00.83 php-fpm7.4
743 root 20 0 31348 18344 3844 S 0.0 1.8 0:02.75 supervisord
544 root rt 0 280180 17976 8184 S 0.0 1.8 0:00.90 multipathd
825 root 20 0 108036 15376 7732 S 0.0 1.5 0:00.10 unattended-up+
736 root 20 0 29220 13200 5544 S 0.0 1.3 0:00.11 networkd-disp+
726 do-agent 20 0 559436 12120 6588 S 0.0 1.2 0:01.78 do-agent
1 root 20 0 101964 11124 8024 S 0.0 1.1 0:02.52 systemd
623 systemd+ 20 0 23912 10488 6484 S 0.0 1.0 0:00.42 systemd-resol+
778 www-data 20 0 71004 9964 5240 S 0.0 1.0 0:02.43 nginx
My concern is once the application goes live and the load increases, more database connection it going to run out of memory. I know I can resize the droplet and increase the memory or set up some swap space but is this amount of memory usage normal for an unused application?
How can I optimize the high memory usage processes such as mysql, niginx, php. Mysql8 appear to be the main culprit hogging all the memory. Below are my mysql setting:
#
# The MySQL database server configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
[mysqld]
#
# * Basic Settings
#
user = mysql
# pid-file = /var/run/mysqld/mysqld.pid
# socket = /var/run/mysqld/mysqld.sock
# port = 3306
# datadir = /var/lib/mysql
# If MySQL is running as a replication slave, this should be
# changed. Ref https://dev.mysql.com/doc/refman/8.0/en/server-system- variables.html#sysvar_tmpdir
# tmpdir = /tmp
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = 127.0.0.1
#
# * Fine Tuning
#
key_buffer_size = 16M
# max_allowed_packet = 64M
# thread_stack = 256K
# thread_cache_size = -1
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover-options = BACKUP
# max_connections = 151
# table_open_cache = 4000
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
#
# Log all queries
# Be aware that this log type is a performance killer.
# general_log_file = /var/log/mysql/query.log
# general_log = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Here you can see queries with especially long duration
# slow_query_log = 1
# slow_query_log_file = /var/log/mysql/mysql-slow.log
# long_query_time = 2
# log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
# server-id = 1
# log_bin = /var/log/mysql/mysql-bin.log
# binlog_expire_logs_seconds = 2592000
max_binlog_size = 100M
# binlog_do_db = include_database_name
# binlog_ignore_db = include_database_name
Any tips and advice much appreciate as this is the first time I'm using a vps.

How to eliminate JIT overhead in a Julia executable (with MWE)

I'm using PackageCompiler hoping to create an executable that eliminates just-in-time compilation overhead.
The documentation explains that I must define a function julia_main to call my program's logic, and write a "snoop file", a script that calls functions I wish to precompile. My julia_main takes a single argument, the location of a file containing the input data to be analysed. So to keep things simple my snoop file simply makes one call to julia_main with a particular input file. So I'd hope to see the generated executable run nice and fast (no compilation overhead) when executed against that same input file.
But alas, that's not what I see. In a fresh Julia instance julia_main takes approx 74 seconds for the first execution and about 4.5 seconds for subsequent executions. The executable file takes approx 50 seconds each time it's called.
My use of the build_executable function looks like this:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/SCRiPTMain.jl",
"testexecutable",
builddir = "d:/temp/builddir4",
snoopfile = "d:/philip/source/script/julia/jsource/snoop.jl",
compile = "all",
verbose = true)
Questions:
Are the above arguments correct to achieve my aim of an executable with no JIT overhead?
Any other advice for me?
Here's what happens in response to that call to build_executable. The lines from Start of snoop file execution! to End of snoop file execution! are emitted by my code.
Julia program file:
"d:\philip\source\script\julia\jsource\SCRiPTMain.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir4"
Executing snoopfile: "d:\philip\source\script\julia\jsource\snoop.jl"
Start of snoop file execution!
┌ Warning: The 'control file' contains the key 'InterpolateCovariance' with value 'true' but that is not supported. Pass a value of 'false' or omit the key altogether.
└ # ValidateInputs d:\Philip\Source\script\Julia\JSource\ValidateInputs.jl:685
Time to build model 20.058000087738037
Saving c:/temp/SCRiPT/SCRiPTModel.jls
Results written to c:/temp/SCRiPT/SCRiPTResultsJulia.json
Time to write file: 3620 milliseconds
Time in method runscript: 76899 milliseconds
End of snoop file execution!
[ Info: used 1313 out of 1320 precompile statements
Build static library "testexecutable.a":
atexit_hook_copy = copy(Base.atexit_hooks) # make backup
# clean state so that any package we use can carelessly call atexit
empty!(Base.atexit_hooks)
Base.__init__()
Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
using REPL
Base.REPL_MODULE_REF[] = REPL
Mod = #eval module $(gensym("anon_module")) end
# Include into anonymous module to not polute namespace
Mod.include("d:\\\\temp\\\\builddir4\\\\julia_main.jl")
Base._atexit() # run all exit hooks we registered during precompile
empty!(Base.atexit_hooks) # don't serialize the exit hooks we run + added
# atexit_hook_copy should be empty, but who knows what base will do in the future
append!(Base.atexit_hooks, atexit_hook_copy)
Build shared library "testexecutable.dll":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' -shared '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.dll -Wl,--whole-archive testexecutable.a -Wl,--no-whole-archive -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64 -Wl,--export-all-symbols`
Build executable "testexecutable.exe":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.exe 'C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c' testexecutable.dll -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64`
Copy Julia libraries to build directory:
7z.dll
BugpointPasses.dll
libamd.2.4.6.dll
libamd.2.dll
libamd.dll
libatomic-1.dll
libbtf.1.2.6.dll
libbtf.1.dll
libbtf.dll
libcamd.2.4.6.dll
libcamd.2.dll
libcamd.dll
libccalltest.dll
libccolamd.2.9.6.dll
libccolamd.2.dll
libccolamd.dll
libcholmod.3.0.13.dll
libcholmod.3.dll
libcholmod.dll
libclang.dll
libcolamd.2.9.6.dll
libcolamd.2.dll
libcolamd.dll
libdSFMT.dll
libexpat-1.dll
libgcc_s_seh-1.dll
libgfortran-4.dll
libgit2.dll
libgmp.dll
libjulia.dll
libklu.1.3.8.dll
libklu.1.dll
libklu.dll
libldl.2.2.6.dll
libldl.2.dll
libldl.dll
libllvmcalltest.dll
libmbedcrypto.dll
libmbedtls.dll
libmbedx509.dll
libmpfr.dll
libopenblas64_.dll
libopenlibm.dll
libpcre2-8-0.dll
libpcre2-8.dll
libpcre2-posix-2.dll
libquadmath-0.dll
librbio.2.2.6.dll
librbio.2.dll
librbio.dll
libspqr.2.0.9.dll
libspqr.2.dll
libspqr.dll
libssh2.dll
libssp-0.dll
libstdc++-6.dll
libsuitesparseconfig.5.4.0.dll
libsuitesparseconfig.5.dll
libsuitesparseconfig.dll
libsuitesparse_wrapper.dll
libumfpack.5.7.8.dll
libumfpack.5.dll
libumfpack.dll
libuv-2.dll
libwinpthread-1.dll
LLVM.dll
LLVMHello.dll
zlib1.dll
All done
julia>
EDIT
I was afraid that creating a minimal working example would be hard, but it was straightforward:
TestBuildExecutable.jl contains:
module TestBuildExecutable
Base.#ccallable function julia_main(ARGS::Vector{String}=[""])::Cint
#show sum(myarray())
return 0
end
#Function which takes approx 8 seconds to compile. Returns a 500 x 20 array of 1s
function myarray()
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
# PLEASE EDIT TO INSERT THE MISSING 496 LINES, EACH IDENTICAL TO THE LINE ABOVE!
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
end
end #module
SnoopFile.jl contains:
module SnoopFile
currentpath = dirname(#__FILE__)
push!(LOAD_PATH, currentpath)
unique!(LOAD_PATH)
using TestBuildExecutable
println("Start of snoop file execution!")
TestBuildExecutable.julia_main()
println("End of snoop file execution!")
end # module
In a fresh Julia instance, julia_main takes 8.3 seconds for the first execution and half a millisecond for the second execution:
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
8.355108 seconds (425.36 k allocations: 25.831 MiB, 0.06% gc time)
0
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
0.000537 seconds (25 allocations: 82.906 KiB)
0
So next I call build_executable:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/TestBuildExecutable.jl",
"testexecutable",
builddir = "d:/temp/builddir15",
snoopfile = "d:/philip/source/script/julia/jsource/SnoopFile.jl",
verbose = false)
Julia program file:
"d:\philip\source\script\julia\jsource\TestBuildExecutable.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir15"
Start of snoop file execution!
sum(myarray()) = 10000
End of snoop file execution!
[ Info: used 79 out of 79 precompile statements
All done
Finally, in a Windows Command Prompt:
D:\temp\builddir15>testexecutable
sum(myarray()) = 1000
D:\temp\builddir15>
which took (by my stopwatch) 8 seconds to run, and it takes 8 seconds to run every time it's executed, not just the first time. This is consistent with the executable doing a JIT compile every time it's run, but the snoop file is designed to avoid that!
Version information:
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6700 CPU # 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 8
JULIA_EDITOR = "C:\Users\Philip\AppData\Local\Programs\Microsoft VS Code\Code.exe"
Looks like you are using Windows.
At some point PackageCompiler.jl will be mature for Windows at which you can try it.
The solution was indeed to wait for progress on PackageCompilerX, as suggested by #xiaodai.
On 10 Feb 2020 what was formerly PackageCompilerX became a new (version 1.0 of) PackageCompiler, with a significantly changed API, and more thorough documentation.
In particular, the MWE above (mutated for the new API to PackageCompiler) now works correctly without any JIT overhead.

Sphinx doesn't show almost all data in production

I use sphinx + thinking_sphinx on my local machine and met really strange problem.
I have 2 models - Users and Microposts. Sphinx gets and show users but it can't find Microposts on my Production machine. On my local my machine it does.
In production i try:
root#serverserj:/vol/www/apps/ror_tutorial/current# rake ts:config
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
root#serverserj:/vol/www/apps/ror_tutorial/current# rake ts:index
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
Sphinx 2.0.3-release (r3043)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 12 bytes
total 0.012 sec, 1000 bytes/sec, 83.33 docs/sec
skipping non-plain index 'user'...
total 4 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=1959).
root#serverserj:/vol/www/apps/ror_tutorial/current $ rake ts:rebuild
Stopped search daemon (pid 1959).
Generating Configuration to /vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf
Sphinx 2.0.3-release (r3043)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 12 bytes
total 0.008 sec, 1500 bytes/sec, 125.00 docs/sec
skipping non-plain index 'user'...
total 4 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
Started successfully (pid 2218).
For example, lets try something on local machine:
$ rails c
Loading development environment (Rails 3.1.3)
system :001 > Micropost.search 'Minus'
(0.3ms) SHOW search_path
Sphinx Query (2.3ms) Minus
Sphinx Found 12 results
Micropost Load (1.6ms) SELECT "microposts".* FROM "microposts" WHERE "microposts"."id" IN (30, 32, 91, 106, 121, 128, 160, 171, 172, 239, 258, 260) ORDER BY microposts.created_at DESC
=> [#<Micropost id: 30, content: "Sed minus magni culpa reiciendis unde.", user_id: 1, created_at: "2012-01-15 21:11:03", updated_at: "2012-01-15 21:11:03">, #<Micropost id: 32, content: "Placeat pariatur quisquam provident velit veniam vo...", user_id: 1, created_at: "2012-01-15 21:11:03", updated_at: "2012-01-15 21:11:03">...]
And on production machine:
$ rails c
Loading development environment (Rails 3.1.3)
1.9.1 :001 > Micropost.search 'Minus'
(4.0ms) SHOW search_path
Sphinx Query (4.0ms) Minus
Sphinx Found 0 results
=> []
config/deploy.rb
#Add RVM's lib directory to the load path.
$:.unshift(File.expand_path('./lib', ENV['rvm_path']))
#Load RVM's capistrano plugin.
require "rvm/capistrano"
require 'bundler/capistrano'
#require 'thinking_sphinx/deploy/capistrano'
set :rvm_ruby_string, '1.9.3-head' #This is current version of ruby which is uses by RVM. To get version print: $ rvm list
set :rvm_type, :root #Don't use system-wide RVM, use my user, which name is root.
set :user, "root" #If you log into your server with a different user name than you are logged into your local machine with, you’ll need to tell Capistrano about that user name.
set :rails_env, "production"
set :application, "ror_tutorial"
set :deploy_to, "/vol/www/apps/#{application}"
set :scm, :git
set :repository, "git://github.com/Loremaster/sample_app.git"
set :branch, "master"
set :deploy_via, :remote_cache
default_run_options[:pty] = true #Must be set for the password prompt from git to work#Keep cash of repository locally and with ney deploy get only changes.
server "188.127.224.136", :app, # This may be the same as your `Web` server
:web,
:db, :primary => true # This is where Rails migrations will run
# If you are using Passenger mod_rails uncomment this:
namespace :deploy do
task :start do ; end
task :stop do ; end
task :restart, :roles => :app, :except => { :no_release => true } do
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
end
end
desc "Prepare system"
task :prepare_system, :roles => :app do
run "cd #{current_path} && bundle install --without development test && bundle install --deployment"
end
after "deploy:update_code", :prepare_system
My System
Ubuntu 10.04.1 LTS
Phusion Passenger
PostgreSQL 9
Nginx
Rails 3.1.3
Ruby 1.9.3
Capistrano
Sphinx 2.0.3
It looks like that you don't have documents for micropost_core index:
using config file '/vol/www/apps/ror_tutorial/releases/20120125204127/config/development.sphinx.conf'...
indexing index 'micropost_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.012 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'micropost'...
There is 0 docs. Could you please provide config, and check your data on production?

Resources