Scaling up a Serverless Web Crawler and Search Engine in aws - aws-lambda

https://github.com/aws-samples/aws-step-functions-kendra-web-crawler-search-engine
I was referring above link and implementing web crawling on particular website.
I have deployed the stack using command deploy --profile <YOUR_AWS_PROFILE> --with-kendra
but when i am using
crawl --profile <YOUR_AWS_PROFILE> --name lambda-docs --base-url https://docs.aws.amazon.com/ --start-paths /lambda --keywords lambda/latest/dg
it is giving me error:
'/crawl' is not recognized as an internal or external command,
operable program or batch file.
in the link it has been shown like "When the infrastructure has been deployed, you can trigger a run of the crawler with the included utility script"
is there any something to install the crawl command.

That should be ./crawl according to the README of the project.
Your error message also sounds like it's coming from Windows but the crawl script is written in Bash so you may run in to issues unless you switch to Linux/MacOS/BSD (or WSL).

Related

Flink on GCP No FileSystem for scheme: gs

I've been trying to use Flink on GCP (https://github.com/spotify/flink-on-k8s-operator) but there is a problem with google cloud storage access.
So, I've just followed the steps that explained here (https://github.com/spotify/flink-on-k8s-operator/blob/master/images/flink/README.md)
So, I've created a docker image like;
ARG GCS_CONNECTOR_VERSION=latest-hadoop2
ARG FLINK_HADOOP_VERSION=2.8.3-10.0
ARG GCS_CONNECTOR_NAME=gcs-connector-${GCS_CONNECTOR_VERSION}.jar
ARG GCS_CONNECTOR_URI=https://storage.googleapis.com/hadoop-lib/gcs/${GCS_CONNECTOR_NAME}
ARG FLINK_HADOOP_JAR_NAME=flink-shaded-hadoop-2-uber-${FLINK_HADOOP_VERSION}.jar
ARG FLINK_HADOOP_JAR_URI=https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/${FLINK_HADOOP_VERSION}/${FLINK_HADOOP_JAR_NAME}
RUN echo "Downloading ${GCS_CONNECTOR_URI}" && \
wget -q -O /opt/flink/lib/${GCS_CONNECTOR_NAME} ${GCS_CONNECTOR_URI}
RUN echo "Downloading ${FLINK_HADOOP_JAR_URI}" && \
wget -q -O /opt/flink/lib/${FLINK_HADOOP_JAR_NAME} ${FLINK_HADOOP_JAR_URI}
I can see the jars on task manager and job manager's lib folder after deploying job, but task manager throws error like;
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
The interesting thing here is that the task manager throws an error but I can see the base path that should be created for the checkpoint on GCS successfully. For example;
I gave gs://bucket/flink/job/checkpoint config for checkpoint, i can see this folder after deploying but of course there is no data inside.
What can the problem be?
You should check the official GCS connector docs. Basically you need to copy the optional gcs plugin under the plugins directory to make it available to Flink in your container image.
In adittion to this I recommend you check out the recently added Flink Kubernetes Operator project which should provide you some benefits over your current setup and improve integration with newer Flink versions.

Azure DevOps - The system cannot find the path specified (YAML Script)

I am running a simple YAML script in Azure Devops to run a command line batch script. The script calls a powershell script that converts ctest outputs to junit formatting for publishing test results through Azure Devops. When I run the build pipeline, the task fails with the following error:
'build\collect_results.cmd' is not recognized as an internal or external command, operable program or batch file.
My first hunch is it has something to do with placing the script in the same folder as the .vsts.yml file, since the script before it in the pipeline works with the call to the windows_c.cmd in the folder jenkins:
call jenkins\windows_c.cmd
Has anyone else seen this error? What is the root cause of it? Is it simply a bug in Azure DevOps?

Upload of application bundle failed with error: EISDIR: illegal operation on directory (Elastic Beanstalk deploy VSTS)

I am trying to set up a CICD pipeline using Visual Studio > Visual Studio Team Services > Elastic Beanstalk Create version.
I have been able to check in my code OK, and kick off my build pipeline which contains the following step in place of 'publish artefact':
"Create Elastic Beanstalk Revision:"
This step is attached to an AWS IAM User with Administrator privileges. This step fails when I try to run my pipeline to deploy the ASP.NET application (Webforms, so not Core) via this method.
The error output is as follows:
2018-07-30T04:46:22.7765736Z ##[section]Starting: Create Elastic
Beanstalk Revision: Sparky 2018-07-30T04:46:22.7771363Z
============================================================================== 2018-07-30T04:46:22.7771634Z Task : AWS Elastic Beanstalk
Create Version 2018-07-30T04:46:22.7771964Z Description : Create an
application revision for deployment to an environment.
2018-07-30T04:46:22.7772192Z Version : 1.0.21
2018-07-30T04:46:22.7772403Z Author : Amazon Web Services
2018-07-30T04:46:22.7772908Z Help : Please refer to AWS
Elastic Beanstalk User
Guide
for more details on deploying applications with AWS Elastic Beanstalk.
2018-07-30T04:46:22.7773336Z
============================================================================== 2018-07-30T04:46:23.2641747Z ac747f99-1789-4d43-86c5-c8283d1a72c0
exists true 2018-07-30T04:46:23.2671026Z Deployment type set to aspnet
2018-07-30T04:46:24.8994140Z Determine S3 bucket
elasticbeanstalk-ap-southeast-2-153247006570 to store application
bundle 2018-07-30T04:46:24.9038683Z Upload of application bundle
failed with error: EISDIR: illegal operation on a directory, read {
Error: EISDIR: illegal operation on a directory, read
2018-07-30T04:46:24.9047409Z Uploading application bundle D:\a\1\a to
object Sparky/a-cicd_test.zip in bucket
elasticbeanstalk-ap-southeast-2-153247006570
2018-07-30T04:46:24.9048878Z ##[error]Error: EISDIR: illegal operation
on a directory, read 2018-07-30T04:46:24.9053846Z at Error
(native) errno: -4068, code: 'EISDIR', syscall: 'read' }
2018-07-30T04:46:24.9172250Z ##[section]Finishing: Create Elastic
Beanstalk Revision: Sparky
I could find very little (pretty much no) results online about this error. I'm not sure how to resolve it. Any ideas anyone? I know it's not IAM permissions as I am using ADMIN for the AWS User just for testing.
EDIT: Added image of build definition. (Note, I don't really know how to use version label output variables so just put something there, but I don't think that's the issue, this failure is something else entirely. I'm just following online example/tutorials for a basic deployment)
The build extension is https://aws.amazon.com/vsts/ and looking back over my screenshot and the instructions I was following here https://aws.amazon.com/blogs/developer/deploying-net-web-applications-using-aws-elastic-beanstalk-with-visual-studio-team-services/ I just realised a mistake! I didn't specify the file name in the web deploy archive.
I changed
$(build.artifactstagingdirectory)
To
$(build.artifactstagingdirectory)\SparkIdeaGenerator.zip
And the build succeeded! However, clearly I didn't understand the purpose of this build task, as it has only created an application revision in AWS, it hasn't actually deployed the updated code. This isn't much good, as I still need to go into the console and click 'Deploy'. This doesn't seem ideal. Here's what I mean:
Clearly I didn't understand the limits of this build task. I thought it would create the revision and deploy the code. It doesn't. There is only one other Elastic Beanstalk build task available in the toolset I downloaded, which is 'create application'. I don't want this, as I already have the application present, I Just want to update it. I will take a look further into this, as I need that full end-to-end automation, commit code, run build, deploy code, update site.
I will however mark the question as answered, as I have solved this specific question/error by specifically referencing a .zip with the package name of the solution itself.
The process is defined in: https://docs.aws.amazon.com/vsts/latest/userguide/tutorial-eb.html
It states that you use the zip file name along with the $(build.artifactstagingdirectory), like what was identified above.
This does appear to fix the issue.

Stuck on Google Home Tutorial

I am working on an intro to Actions on Google tutorial. I made it to page 4 and I am stuck on the "Fulfillment Webhook and Deployment" stage. I put the sample backend code into a Go file called populationai.go. I'm confused as to how to do the commands listed in the "Using ngrok to locally run the Webhook" section in Windows, as they are designed for a different operating system. Should I be doing these steps in the command prompt of Windows in the first place? Thanks.
https://www.programmableweb.com/news/how-to-get-started-google-actions/how-to/2017/01/31?page=4
Here are the steps I'm confused on:
We start up the Go application, which exposes the API Server via go run populationapi.go
$ go run populationapi.go
We start ngrok to expose a secure public tunnel on port 9000 via the following command:
$ ngrok http 9000
Edit: every time I try the "go run populationapi.go" command it says
'go' is not recognized as an internal or external command,
operable program or batch file.
Edit: my go file is located on my desktop. Is the issue the location of the file? The installer put Go distribution in c:\Go.
The error means you do not have Go installed.
You need to install go, also known as Golang to run that example code they provide on Step 4.
Make sure to follow the installation instructions as well.
It was just an example. You can write the code in any language as well.

How to see Parse Server cloud code logs?

I have Bitnami's Parse Server set up on Azure.
I'm logging some info from cloud code using console.log and console.error. When using hosted Parse these logs were displayed in the Info & Error Logs section on the Dashboard. Any idea where the logs go to now?
The issue is not specific to Bitnami's distribution. I also tested on a local machine with parse-server-example & Parse Dashboard and got the same result (no logs).
I use AWS but you can see the logs by downloading them or running it on localhost just cd into your folder then do Npm start on terminal and switch you parse server URL to http://localhost:1337/parse.
You can manually download them through the azure cli
Take a look here for installation : https://azure.microsoft.com/en-us/documentation/articles/xplat-cli-install/
I used npm : npm install azure-cli -g
open up terminal and type in : azure site log download webappname
This will save the logs for the web app named 'webappname' to a file named diagnostics.zip in the current directory.
Unzip and open the folder diagnostics -> LogFiles -> Application
The text file with -stderr- in the name of it will display the logs you display by using console.error() in your cloud code.
The text file with -stdout- in the name of it will display the logs you display by using console.log() in your cloud code.
This is a known issue on Bitnami Parse. We are working on fixing it for the next release.
You have to log in your server via SSH and modify the line below at the /opt/bitnami/apps/parse/htdocs/server.js file:
From:
cloud: "./node_modules/parse-server/lib/cloud-code/Parse.Cloud.js",
To:
cloud: "./cloud/main.js",
You have to include the path to the ./cloud/main.js you previously created (assuming you created it in /opt/bitnami/apps/parse/htdocs/).
Remember to restart the Server after applying those changes running:
sudo /opt/bitnami/ctlscript.sh restart

Resources