How to write if-else statements in a lifecycle configuration script - aws-lambda

I have a sagemaker notebook instance having two jupyter notebook ipynb files. When I had one jupyter notebook, I was able to run it automatically with one lambda function trigger and lifecycle configuration.
Now I have two jupyter notebooks and corresponding two lambda function triggers. How can I run them based on the trigger by changing the lifecycle configuration script.
The trigger is file uploading into S3. Based on what location the file is added, the corresponding jupyter notebook should run

It is not possible to change the Life Cycle configuration script on the fly. You need to stop the notebook instance and then change the script which might not be ideal.
I would recommend you to write a LifeCycle configuration script in such a way that it checks the location of the S3 bucket and based on that it will run the command of running a specific notebook.

Related

Issues while creating custom windows image using packer

I'm trying to create a custom windows image using packer script provided by github from https://github.com/actions/runner-images.
The script that is there in this repo uses azure platform but I want to use GCP to build it.
I have tried making some changes in the script but in the end I end with an Image where startup scripts don't run. Raised an issue for this https://github.com/actions/runner-images/issues/6565. Github has replied with "We only support azure".
Is there any alternative to this? I just want to install few tools like java, maven, etc on top of a windows-2019 image.
I've tried using this script https://github.com/actions/runner-images/blob/main/images/win/windows2019.json. But the resulting image has few issues and the startup script doesn't run when i create a new gcp vm instance with it.

Sharing environments for Jupyter notebook via github

My company has a rat's nest of a github repo in which multiple Jupyter notebook tutorials are shared with associated requirements.txt files.
I would like to streamline this so that ideally one would need to only open one of these notebooks, easily create the correct environment from the associated requirements.txt file within the notebook itself, and then just execute the notebook as normal.
An additional wrinkle is that not all of these dependencies can be installed via conda. I will need to use pip as the package installer.
Is there a way to do this entirely within a notebook without creating any additional files (ex. environment.yml)?

FileNotFoundError: No such file or directory for spark-submit encountered when running pyspark commands on Heroku

Background: I built an XGBClassifier model for content-based filtering and an ALS model for collaborative filtering (for ALS, I imported from pyspark.ml) and took the weighted sum of rating predictions from both to yield the final rating predictions, which are sorted in descending order (and top 5 rows are shown for user as top 5 recommendations) for a hybrid recommendation system which was built on scraped Yelp data containing Singapore's coffee-drinking outlets - basically, I have built a hybrid recommender to recommend coffee-drinking outlets to coffee lovers in Singapore based on Yelp data.
I have built and run it successfully in a local jupyter notebook as well as in a virtual environment as a Flask app (code from jupyter notebook was copied and pasted into a flaskr.py and together with its accompanying static stylesheets and html templates, constitute the flask app).
In preparing for deployment with Heroku, I have also prepared a requirements.txt based on pip freeze command, a Procfile that contains gunicorn and the various arguments such as --timeout 1800 for instance (as my flask app took 20 mins to churn out the recommendations so I thought of lengthening the worker timeout to 20 mins (1800s)), and even copied and pasted my .bash_profile into the flaskr folder (within this flaskr folder, there is another flaskr folder containing flaskr.py, requirements.txt, Procfile, and the relevant datasets used).
In my flaskr.py, I did not use SparkContext nor spark-submit but only SparkSession and the flask app worked both in a local virtual environment and my local jupyter notebook but when I tried to deploy on Heroku with gunicorn in the Procfile, the FileNotFoundError [ErrNo2] where spark-submit is not found was raised...
I tried running heroku run .bin/pyspark(or spark-shell) -a on Terminal with virtual environment activated and the pyspark command generated the following output:
While for the spark-shell command, only spark-submit was not found but the issue is, both files are very much present in the respective paths when I checked!
The following is the error log encountered when I click "submit" in the deployed app: coffee-recsys.herokuapp.com , where the main problem (I think) is the stuff located inside the red box...
Would really appreciate if anyone can enlighten me on how I can possibly resolve this issue as I have been researching online and permutating my google search terms for the past few days to no avail. Or should I try other search engines like bing or yahoo instead?
Any help rendered is appreciated, even if it does not result in the successful deployment of my app on heroku (eg. due to possible incompatibility issues between spark-2.4.5 and heroku)..
Probably you moved your Spark location, check if the $SPARK_HOME environment variable is reachable/point to the intended installation

AWS CloudFormation and Windows Server 2008 R2 for Bootstrap file downloads

AWS released a new AMI recently which has CloudFormation tools installed by default on their Windows Server 2008 R2. The AMI itself can be found here :
[https://aws.amazon.com/amis/microsoft-windows-server-2008-r2-base-cloudformation]
When using this AMI directly within a CloudFormation template and launching the stack, I am able to launch my stack easily and the instance downloads my files located in S3 without any problem during boot up, all the folders created by cfn-init command can also be seen as expected.
However, if I modify the AMI to customize (just enabling IIS) it and recreate a new AMI and use this AMI within the template, the files don't get downloaded neither are the other folders suppose to be created by cfn-init command can be seen.
Any suggestions ?! Am I missing something ?!
Most probable cause of this is that the custom AMI was created without using EC2Config Service's Bundle tab.
CloudFormaion support on Windows depends on EC2Config service's functionality of running commands specified in User Data on first boot. This functionality automatically gets disabled after first boot so that the subsequent boots do not cause re-runs of the same commands.
If the custom AMI is created using EC2Config's Bundle tab , it ensures that the resulting AMI has the User Data command execution functionality enabled. Hence it is necessary (and always recommended) to create the custom AMI using EC2Config's Bundle tab.
Hope this helps.
Regards,
Shon

What is the Cloud-Init equivalent for Windows?

It seems that the stock bootstrapping process is a bit lacking on Windows.
Linux has cloud-init which will install packages, store files, and run a bash script from user data.
Windows has ec2config but there is currently no support to run a cmd or powershell script when the system is "ready"--meaning that all the initial reboots are completed.
There seem to be third party options. For example RightScale has the RightLink agent which performs this function.
Are there open source options available?
Are there any plans to add this feature to Ec2Config?
Do I have to build this my self?
Am I missing something?
It appears that EC2Config on the Amazon-provided AMIs now supports "User Data Scripts" as of the 11-April-2012 updates.
The documentation has not yet been updated, so it's hard to tell if it supports PowerShell or just cmd.exe scripts. I've posted a question on the AWS forums to try and get some more detail, and will update here when I learn more.
UPDATE: It looks like cmd.exe batch syntax is supported, which can in turn invoke PowerShell. There's a new version of the EC2Config documentation included on the AMI. Quoting from it:
[EC2Config] will read in the user data specified for the instance and then check if it contain the tags <script> and </script>. If it finds both then it will take the information between those two tags and save it to a batch file located in the Settings folder of this application. It will then execute the batch file during the start of an instance.
The batch file will only be created and executed on the first launch of an instance after a sysprep. If you want to have the batch file created and executed again set the Ec2HandleUserdata plugin state to Enabled.
UPDATE 2: My interpretation is confirmed by Shon from the AWS Team
UPDATE 3: And as of the May-2012 AMIs, PowerShell is supported using the <powershell/> tag.
Cloudbase.it have opensourced a python windows service they call cloudbase-init which follows the configdrive and HTTP datasources.
http://www.cloudbase.it/cloud-init-for-windows-instances/
github here
https://github.com/stackforge/cloudbase-init/
I had to build one myself however it was very easy. Just made a service that reads the user-data when starts up and executes the file as a powershell script.
To get around the issue of not knowing when to start the service I just made the service start type as "delayed-auto" and that seemed to fix the problem. Depending on what you need to do to the system that may or may not work for you however in my case that was all I had to do.
I added a new codeplex project that already has this tool built for windows. Looking forward to some feedback.
http://cloudinitnet.codeplex.com/
We had to build it ourselves; we did it with a custom service and built our own AMIs. There's no provision currently within EC2Config to do it.
Even better, there is no easy way to determine when the instance is "ready". We had to do it by tailing the logfile of EC2Config.
I've recently found nssm (at nssm.cc) which easily wraps a simple batch file (or pretty much anything else) as a service. You can then us sc config servic1 depend= service0 to force the batch file to be run at a particular point in the service initialization sequence. I am using it in between ex2config and sql express to create a folder on d, for instance. You'll have to use the services tool to make it run as network services and change the AppExit property to Ignore using regedit, but it works once you get it all in place.

Resources