Django haystack with elasticsearch, indexing issue - elasticsearch

Im using django-haystack with elasticsearch but there is a problem with indexing. When rebuilding my index python manage.py rebuild_index following error is raised:
Traceback (most recent call last):
File "/home/palo/.virtualenvs/toro/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 210, in handle_label
self.update_backend(label, using)
File "/home/palo/.virtualenvs/toro/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 256, in update_backend
do_update(backend, index, qs, start, end, total, self.verbosity)
File "/home/palo/.virtualenvs/toro/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 78, in do_update
backend.update(index, current_qs)
File "/home/palo/.virtualenvs/toro/local/lib/python2.7/site-packages/haystack/backends/elasticsearch_backend.py", line 177, in update
self.conn.bulk_index(self.index_name, 'modelresult', prepped_docs, id_field=ID)
File "/home/palo/.virtualenvs/toro/src/pyelasticsearch/pyelasticsearch/client.py", line 95, in decorate
return func(*args, query_params=query_params, **kwargs)
File "/home/palo/.virtualenvs/toro/src/pyelasticsearch/pyelasticsearch/client.py", line 366, in bulk_index
query_params=query_params)
File "/home/palo/.virtualenvs/toro/src/pyelasticsearch/pyelasticsearch/client.py", line 221, in send_request
**({'data': request_body} if body else {}))
File "/home/palo/.virtualenvs/toro/src/requests/requests/sessions.py", line 387, in post
return self.request('POST', url, data=data, **kwargs)
File "/home/palo/.virtualenvs/toro/src/requests/requests/sessions.py", line 345, in request
resp = self.send(prep, **send_kwargs)
File "/home/palo/.virtualenvs/toro/src/requests/requests/sessions.py", line 448, in send
r = adapter.send(request, **kwargs)
File "/home/palo/.virtualenvs/toro/src/requests/requests/adapters.py", line 324, in send
raise Timeout(e)
Timeout: HTTPConnectionPool(host='127.0.0.1', port=9200): Request timed out. (timeout=10)
Timeout: HTTPConnectionPool(host='127.0.0.1', port=9200): Request timed out. (timeout=10)
I used django-haystack - 2.0.0-beta, pyelasticsearch - 0.5, elasticsearch 0.20.6, java version "1.6.0_24"
Haystack Settings
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
And Im sure my elasticsearch serivce is running.

This does not necessarily mean that your es server is down, especially if you get something reasonable returned with curl -I "127.0.0.1:9200". More likely, this is an issue of your request simply not getting enough time given the speed of connections involved.
Interestingly, the default timeout set in pyelasticsearch is 60 seconds, see def __init__(self, urls, timeout=60, max_retries=0, revival_delay=300): in https://github.com/rhec/pyelasticsearch/blob/master/pyelasticsearch/client.py. However, haystack overwrites that with its default setting, which is 10 seconds, as per self.timeout = connection_options.get('TIMEOUT', 10) in https://github.com/toastdriven/django-haystack/blob/master/haystack/backends/__init__.py.
As you can see though, haystack allows you to easily modify your setting, by adding 'TIMEOUT': 60, to your engine configuration.
And solved :)

I too had the similar problem
sudo service elasticsearch restart
then it worked

are you running
bin/elasticsearch -f
I think you are not running the searchengine.

Related

StatusCode.UNIMPLEMENTED when making Vertex AI API call

I have a simple Python app that invokes a Vertex AI API that fails when it runs and I can't understand why. The application is as follows:
from google.cloud import aiplatform_v1
def sample_list_datasets():
client = aiplatform_v1.DatasetServiceClient()
request = aiplatform_v1.ListDatasetsRequest(
parent="projects/MYPROJECT/locations/us-central1",
)
page_result = client.list_datasets(request=request)
for response in page_result:
print(response)
sample_list_datasets()
when run, it fails with:
E0126 03:52:04.146970105 22462 hpack_parser.cc:1218] Error parsing metadata: error=invalid value key=content-type value=text/html; charset=UTF-8
Traceback (most recent call last):
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
return callable_(*args, **kwargs)
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = "Received http2 header with status: 404"
debug_error_string = "UNKNOWN:Error received from peer ipv4:108.177.120.95:443 {created_time:"2023-01-26T03:52:04.147076255+00:00", grpc_status:12, grpc_message:"Received http2 header with status: 404"}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run.py", line 25, in <module>
sample_list_datasets()
File "run.py", line 19, in sample_list_datasets
page_result = client.list_datasets(request=request)
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/dataset_service/client.py", line 1007, in list_datasets
metadata=metadata,
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
return wrapped_func(*args, **kwargs)
File "/home/kolban/projects/vertex-ai/datasets/env/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.MethodNotImplemented: 501 Received http2 header with status: 404
What might I be doing wrong?
Changing the code to the following caused it to work:
from google.cloud import aiplatform_v1
from google.api_core.client_options import ClientOptions
def sample_list_datasets():
service_base_path='aiplatform.googleapis.com'
region='us-central1'
client_options = ClientOptions(api_endpoint=f"{region}-{service_base_path}")
client = aiplatform_v1.DatasetServiceClient(client_options=client_options)
request = aiplatform_v1.ListDatasetsRequest(
parent="projects/MYPROJECT/locations/us-central1",
)
# Make the request
page_result = client.list_datasets(request=request)
# Handle the response
for response in page_result:
print(response)
sample_list_datasets()
The resolution was hinted at in the documentation for the API request found here. At that article there is a code sample and in the code sample there are some comments and in the comments the following is written:
It may require specifying regional endpoints when creating the service
client as shown in:
https://googleapis.dev/python/google-api-core/latest/client_options.html
And this was the core clue. When we make Vertex AI calls we must specify where the request is to be sent. We do this by setting the api_endpoint option to a URL of the form [REGION]-aiplatform.googleapis.com.

ServerSelectionTimeoutError( pymongo.errors.ServerSelectionTimeoutError: 147.234.32.246:27017

I try to connect remotely to mongodb server from pycharm(with RDP).
this is thefunction that i run:
cluster = MongoClient("mongodb://admin:Passw0rd!#147.234.32.246:27017/NEG")
db = cluster["NEG"]
for word in Setting.dictionary_global.keys():
if word in db.list_collection_names():
collection = db[word]
for file in Setting.dictionary_global[word].keys():
if collection.find({"url":Setting.dictionary_global[word][file].url}):
continue
num_of_appearance = len(Setting.dictionary_global[word][file].indexes.get(word))
post = {"url": file, "title": Setting.dictionary_global[word][file].title,
"description": Setting.dictionary_global[word][file].description,"word in page": Setting.dictionary_global[word][file].indexes,"appearance": num_of_appearance, "date modified": Setting.dictionary_global[word][file].time}
collection.insert_one(post)
else:
collection = db.create_collection(word)
for file in Setting.dictionary_global[word].keys():
#print(Setting.dictionary_global)
num_of_appearance = len(Setting.dictionary_global[word][file].indexes.get(word))
post = {"url": file, "title": Setting.dictionary_global[word][file].title,
"description": Setting.dictionary_global[word][file].description,"word in page": Setting.dictionary_global[word][file].indexes, "appearance": num_of_appearance, "date modified":Setting.dictionary_global[word][file].time}
collection.insert_one(post)
and i get this error:
'''
Traceback (most recent call last):
File "C:/Users/edend/PycharmProjects/pythonProject11/main.py", line 118, in
crawler.start()
File "C:/Users/edend/PycharmProjects/pythonProject11/main.py", line 110, in start
insertDB()
File "C:\Users\edend\PycharmProjects\pythonProject11\DB.py", line 10, in insertDB
if word in db.list_collection_names():
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\database.py", line 863, in list_collection_names
for result in self.list_collections(session=session, **kwargs)]
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\database.py", line 825, in list_collections
return self.__client._retryable_read(
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\mongo_client.py", line 1460, in _retryable_read
server = self._select_server(
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\mongo_client.py", line 1278, in _select_server
server = topology.select_server(server_selector)
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\topology.py", line 241, in select_server
return random.choice(self.select_servers(selector,
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\topology.py", line 199, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Users\edend\PycharmProjects\pythonProject11\venv\lib\site-packages\pymongo\topology.py", line 215, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 147.234.32.246:27017: timed out, Timeout: 30s, Topology Description: <TopologyDescription id: 5ff3d15a2dcaa1e4fb3db4cd, topology_type: Single, servers: [<ServerDescription ('147.234.32.246', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('147.234.32.246:27017: timed out')>]>
'''
please help me im stuck and iv'e try everything
thank you in advance!!
Common causes:
MongoDB server is not running
MongoDB server is running on a different port
No connectivity between client and server (can you ping)
mongod.conf is configure to only allow local connections by default (set bind_ip_all?)

AWS Lambda EC2-Instances Client Timeout Error

It's very often for me to get error when trying to stop or start ec2-instances through AWS Lambda. Quite strange for me, because sometimes it works (for both start and stop ec2-instances).
The error I get is like below. When I run test on Lambda console, most of the time it successfully executed. But when I run it through AWS Event Rules (CloudWatch), it's very often the function got fail.
This is my code on line 48
[ERROR] ConnectTimeoutError: Connect timeout on endpoint URL: "https://ec2.ap-southeast-2.amazonaws.com/"
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 48, in lambda_handler
if stop_ec2_instances():
File "/var/task/lambda_function.py", line 155, in stop_ec2_instances
ec2_client.stop_instances(InstanceIds=ec2_instances)
File "/var/task/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/task/botocore/client.py", line 621, in _make_api_call
http, parsed_response = self._make_request(
File "/var/task/botocore/client.py", line 641, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/var/task/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/var/task/botocore/endpoint.py", line 136, in _send_request
while self._needs_retry(attempts, operation_model, request_dict,
File "/var/task/botocore/endpoint.py", line 253, in _needs_retry
responses = self._event_emitter.emit(
File "/var/task/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/var/task/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/var/task/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/var/task/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/var/task/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File "/var/task/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/var/task/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/var/task/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/var/task/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/var/task/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/var/task/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/var/task/botocore/httpsession.py", line 287, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
This is my code for start & stop the instances:
Even, I already move instantiation og ec2_res ec2_client inside the function but it did not help.,
def start_ec2_instances():
try:
ec2_res = boto3.resource('ec2', region_name="ap-southeast-2")
ec2_client = boto3.client('ec2', region_name="ap-southeast-2")
ec2_client.start_instances(InstanceIds=ec2_instances)
for ec2_id in ec2_instances:
instance = ec2_res.Instance(id=ec2_id)
logger.info("Waiting instance " + ec2_id + " to start")
instance.wait_until_running()
return True
except bex.ClientError as err:
logger.error(err.response['Error']['Message'])
return False
def stop_ec2_instances():
try:
ec2_res = boto3.resource('ec2', region_name="ap-southeast-2")
ec2_client = boto3.client('ec2', region_name="ap-southeast-2")
ec2_client.stop_instances(InstanceIds=ec2_instances)
for ec2_id in ec2_instances:
instance = ec2_res.Instance(id=ec2_id)
logger.info("Waiting instance " + ec2_id + " to stop")
instance.wait_until_stopped()
return True
except bex.ClientError as err:
logger.error(err.response['Error']['Message'])
return False
If any one of you ever face the same guys?
Thanks
Edit: I set function timeout to 8 minutes. In normal condition, time required to execute the function is less than 5 minutes.
Additional note:
Sometimes I work using VPN (south-east-2) in which this VPN is in a different region from the region I live. Instances (and another components) also deployed on this region VPN (south-east-2).
Your code to start and stop the instance looks right to me. The timeout is happening because the time taken to perform your operation is not getting completed in the configured timeout for your lambda function.
You can measure what is the time taken for your function by simply subtracting the time between start and stop of the function.
The default timeout is 3 seconds. So you should consider increasing this timeout interval for your lambda function. Say to 5 minutes.
Please note that the maximum value for this timeout is 300 seconds(15 minutes) and you can not configure value higher than this. I am sure the above code would complete within that limit and hence it should not be a problem for you.
How do I increase my timeout interval for my lambda function?
There are multiple ways to do this. By AWS CLI, AWS console, or probably some other way.
In AWS Console you can do like this:
Click on the Save button after doing this change.
Hope this helps.

Hue server error "checkJobBrowserStatus" cloudera hadoop

I have some problem with accessing to Hue webUI. I just get
"500 server error"
to every acces to any web page on the Hue. The sample of the error:
From log file i got some information about type of this error
[12/Dec/2017 01:00:53 -0800] views ERROR JS ERROR: {"msg":"ReferenceError: checkJobBrowserStatus is not defined","url":"http://10.40.2.89:8888/hue/","line":1584,"column":12,"stack":"#http://10.40.2.89:8888/hue/:1584:13\nn.Callbacks/j#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:26852\nn.Callbacks/k.fireWith#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:27661\n.ready#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:29482\nI#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:29656\n"}
when i try to open other web pages i get the same error
[12/Dec/2017 01:04:56 -0800] views ERROR JS ERROR: {"msg":"ReferenceError: checkJobBrowserStatus is not defined","url":"http://10.40.2.89:8888/metastore/tables/","line":1584,"column":12,"stack":"#http://10.40.2.89:8888/metastore/tables/:1584:13\nn.Callbacks/j#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:26852\nn.Callbacks/k.fireWith#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:27661\n.ready#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:29482\nI#http://10.40.2.89:8888/static/desktop/ext/js/jquery/jquery-2.1.1.min.e40ec2161fe7.js:2:29656\n"}
and so on , similar errors on the any web page.
I've attached the full log file on Google Drive
I'm trying to open a web interface HUE with Iceweasel on Debian.
Cloudera version is CDH5 (CDH 5.13.0).
It seems like [indexer] app was blacklisted ([desktop] section in /hue/desktop/dump_config), so Hue won't work properly.
Also, looks like you are on the old UI, it is recommended to switch back to
Hue 4.
[12/Dec/2017 01:04:53 -0800] middleware INFO Processing exception: u'indexer' is not a registered namespace: Traceback (most recent call last):
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/core/handlers/base.py", line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/db/transaction.py", line 371, in inner
return func(*args, **kwargs)
File "/usr/lib/hue/apps/metastore/src/metastore/views.py", line 240, in show_tables
'source_type': _get_servername(db),
File "/usr/lib/hue/desktop/core/src/desktop/lib/django_util.py", line 230, in render
**kwargs)
File "/usr/lib/hue/desktop/core/src/desktop/lib/django_util.py", line 148, in _render_to_response
return django_mako.render_to_response(template, *args, **kwargs)
File "/usr/lib/hue/desktop/core/src/desktop/lib/django_mako.py", line 125, in render_to_response
return HttpResponse(render_to_string(template_name, data_dictionary), **kwargs)
File "/usr/lib/hue/desktop/core/src/desktop/lib/django_mako.py", line 114, in render_to_string_normal
result = template.render(**data_dict)
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/template.py", line 443, in render
return runtime._render(self, self.callable_, args, data)
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 786, in _render
**_kwargs_for_callable(callable_, data))
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 818, in _render_context
_exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 844, in _exec_template
callable_(context, *args, **kwargs)
File "/tmp/tmpMMurVL/metastore/metastore.mako.py", line 142, in render_body
__M_writer(escape(unicode( assist.assistPanel() )))
File "/tmp/tmpMMurVL/metastore/assist.mako.py", line 497, in render_assistPanel
__M_writer(escape(unicode( url('indexer:importer_prefill', source_type='all', target_type='table') )))
File "/usr/lib/hue/desktop/core/src/desktop/lib/django_mako.py", line 131, in url
return reverse(view_name, args=args, kwargs=view_args)
File "/usr/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/core/urlresolvers.py", line 532, in reverse
key)
NoReverseMatch: u'indexer' is not a registered namespace
This might be a bit late, but thought of posting the solution to help others.
I was able to get this page working by setting "share_jobs" to "true" in "/etc/hue/conf.empty/hue.ini"
So the following worked for me:
[jobbrowser]
# Share submitted jobs information with all users. If set to false,
# submitted jobs are visible only to the owner and administrators.
share_jobs=true

gspread update_cells always return 502 with httpsession error

I am currently trying to overwrite and google spreadsheet with new data using gspread api (version 0.4.1) with sheet.update_cells but it keeps giving me 502 with err msg as follows:
The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.
It seems to be http session problem from the stacktrace:
File "/usr/local/lib/python2.7/dist-packages/gspread/models.py", line 476, in update_cells
self.client.post_cells(self, ElementTree.tostring(feed))
File "/usr/local/lib/python2.7/dist-packages/gspread/client.py", line 303, in post_cells
r = self.session.post(url, data, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/gspread/httpsession.py", line 81, in post
return self.request('POST', url, data=data, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/gspread/httpsession.py", line 67, in request
response = func(url, data=data, headers=request_headers)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 111, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 57, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 475, in request
I did a little bit investigation but it seems that the answer sort of varies, so i think i should better just my version here.
The code snippet is really nothing special like something as follows:
sheet = gspread.authorize(credentials).open_by_key(spreadsheet_key).worksheet(worksheet_title)
if not sheet:
return
if not len(new_rows):
return
sheet.resize(len(new_rows), sheet.col_count)
active_range = 'A1:{0}{1}'.format(last_col, len(new_rows))
cell_list = sheet.range(active_range)
k = 0
for row in new_rows:
for field in row:
cell_list[k].value = field
k+=1
sheet.update_cells(cell_list)
where my new_rows are just the new cell value i want to overwrite the sheet with. I don't think it is an authentication issue, as the same code snippet used to work but somehow at some point it keeps giving the 502.

Resources