How to run pelias with custom data - elasticsearch

I have millions of Bangladesh address data. They are clean and well structured ( Poi, House,Road,Area,Postcode,latitude,longitude,etc )
Now I want to load them into pelias geocoder. Can you help me by suggesting the steps with installation process.
Note: I also have my own pbf

You can use the csv-import component.
See: https://github.com/pelias/csv-importer
An example I've used in the csv file:
name,source,layer,lon,lat,number,street,unit,city,district,region,postcode,id,addendum_json_custom
123 JACKSON AVE,custom,address,-86.66,40.7666,344,JACKSON AVE,,PERU,,IN,46970,651ds651d651,"{ ""customInfo"":""Something Custom"" }"
123 S 600 E,custom,address,-86.66,40.7666,4503,S 600 E,,PIERCETON,,IN,46562,651ewd2332e,"{ ""customInfo"":""Something Custom"" }"
Be sure to use the double quotes in the addendum json as specified in the documentation.
If you are doing Points of Interest, then you may choose a different layer like venue.
You will have to configure the pelias.conf with the source of the files for inport and the api target to specify the source > layer.
As for installation, I've found the dockerized situation to be best for getting started.
https://github.com/pelias/docker

Related

How can I auto generate inputs.tf and outputs.tf variables when working with Terraform?

Note: Please see the #### UPDATE ### section below. I've heavily modified the question for clarity on what I'm trying to achieve, but added it as an addendum rather than rewrite the question.
As my infrastructure grows, adding input variables in my variables.tf files and then syncing those values to output variables in my outputs.tf file is now impossible to do manually. Not only is it taking up a lot of unnecessary time, probably more time is spent going back and fixing the ones that terraform validate told me that I missed by human error. This is especially true when building / using modules whose arguments add an additional layer to manage.
There has to be a better way? Here is what I want to achieve.
Let's say I'm creating an Azure AKS Kubernetes cluster. The Terraform resource is azurerm_kubernetes_cluster.
Only 8 arguments are required to create a base install, but there are almost 250 additional ones. They all have default values. Per the documentation page, they also already have fantastic descriptions. (I'm tired of copying and pasting into my variables { description = "this"} block.)
The information is there in the documentation. terraform plan also has knowledge of every single additional one because it of course comes up in the pre-apply plan. (known after apply) means its optional, but will have a default value.
In my dream world, I'd run this hypothetical command sequence:
terraform plan
terraform document <- Here it auto generates every argument as a variables block and inserts it into variables.tf. It also auto generates every possible output "out_putable" {} block and inserts it into outputs.tf.
terraform apply -update-inputs -update-outputs <- Here everything that was optional (known after apply) is now known and it should auto update variables.tf and outputs.tf accordingly. Adding a -update-modules flag lets it take care of that additional layer introduced by using modules.
This feels like a problem that has been addressed before. Before I write a custom tool that parses Terraform web docs and the output of terraform show, is there already a way to do this? Terraform-docs is the closest I've come to finding a solution for README.md. If it can do what I need, I haven't figure it out yet.
How can I automate all this?
############
UPDATE
############
This article and video is spot-on when it comes to Terraform's evolution in an organization. My organization is somewhere between late-stage pattern 3 and early 5. As we decompose our "Terralith" we have inconsistencies among teams (patterns, naming conventions, variable and argument choices etc). These are starting to cause errors in CI/CD forcing a ticket-review process that is slowing things down.
All resources have required and optional arguments. But in my organization, we have, for example, additional optional arguments that are required for us.
Scenario: Dev A in Japan creates a resource, forgets an optional variable or two or names them something obscure, etc. Dev B in America is blocked until they can convene and discuss. Given time zones, language differences, ticket review, this one issue is now a week or more delayed.
I need to automate this and create exact consistency so that Dev A starts out with exactly what Dev B would start with or is expecting; and, what CI/CD tests are expecting - templating the initial process, if you will. In other words, I need to remove the human element of manually creating main.tf, variables.tf, outputs.tf, etc.
Here are thoughts on how to achieve this:
Use Golang to autogenerate the files by querying the API
How can I query the API to get a list of all required arguments for a specific resource?
I found that I can query for provider information, but I can't find info to retrieve resource information. My thinking is when a developer wants to create a new resource, He'll run a go or typescript to generate the manifest files along with expected naming conventions, and populate main.tf, variables.tf, outputs.tf, etc, with exactly what data that everyone is expecting. I'm looking form something like curl registry.terraform.io/providers/hashicorp/azurerm/v2.99/resource_group?required=yes This should show me all required arguments along with descriptions and other info I can use straight from the API.
Use CDKTF to generate an HCL manifest.tf file from JSON
How can I use CDKTF to generate an HCL .tf file?
CDKTF is EXACTLY what I'm looking for - except in reverse. HCL is seamlessly compatible with JSON. Running cdktf synth creates ./out/cdk.tf.out I'm so close! How do I turn that file into main.tf?!?
The goal here is to have a master file from which all future manifest files are derived. Whether we use azurerm_kubernetes_cluster 1 time or 1000 times, I know for certain that every argument, every variable name, every desired output is exactly the same. If a chance is needed in our desired structure, it will be updated at the JSON level, and CI/CD can ensure those changes are propagated across instances of its use.
I know that I can use the cdk.out.tf file as a drop in replacement for a module, but I don't want my team members to have to learn typescript or how to read json. If I can create a templatized JSON file containing exactly what I'm expecting users to start with, and if they can run some command like cdktf convert cdk.tf.out --HCL output-file.tf then I've accomplished my goal.
If cdktf synth can create an HCL JSON file, and cdktf convert can take a manifest.tf file and turn it into HCL JSON, can't it do the exact opposite? Turn the HCL JSON file into the human-readable, declarative, manifest.tf file?
Perhaps think of it this way. Terraform has a required file structure for a module if it's to be allowed into the module registry. I'm trying to create a similar required structure for each of the resources our organization uses regardless of when and where it's used.
If your goal is to derive input variables and output values from resource type schemas then Terraform can provide you with the information to do so.
In the working directory of a configuration that already uses the provider whose resource type you want to use, run the following command:
terraform providers schema -json
The result contains a JSON description of all of the resource types available in the providers for the current configuration, and for each one the metadata about its attributes, including the type constraint information and descriptions for each one.
From that you can generate whatever other files you need based on that information.
Note that if you are intending to build modules which export the entire surface area (all inputs and all outputs) of a particular resource type the Terraform documentation explicitly recommends against this, suggesting to just use the resource type directly instead since such a module would often not offer sufficient benefit to outweigh the additional complexity and maintenance overhead it implies:
In principle any combination of resources and other constructs can be factored out into a module, but over-using modules can make your overall Terraform configuration harder to understand and maintain, so we recommend moderation.
A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers.
For example, aws_instance and aws_elb are both resource types belonging to the AWS provider. You might use a module to represent the higher-level concept "HashiCorp Consul cluster running in AWS" which happens to be constructed from these and other AWS provider resources.
We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
I've got the same question and develop a small bash script to create output definitions based on module code
This code required the hcledit tool to extract blocks from hcl code
#!/usr/bin/env bash
set -o pipefail
_hcledit=$(which hcledit)
for tf_file in $(ls *.tf); do
cat $tf_file | $_hcledit block list | while read line; do
block_type="${line%%.*}"
line="${line#*.}"
case $block_type in
locals|output|variable|data) continue; break ;;
module)
output_name=$line
output_description="Module '$output_name' attributes"
output_value="$block_type.$output_name"
;;
resource)
label_kind="${line%.*}"
label_name="${line#*.}"
output_name="${label_kind}_${label_name//[\-]/_}"
output_description="Resource '$label_kind.$label_name' attributes"
output_value="$label_kind.$label_name"
;;
esac
cat <<-EOT
output "$output_name" {
description = "$output_description"
value = $output_value
}
EOT
done
done

Misinformation in DataStage XML Export

As the title suggests, I am just trying to do a simple export of a datastage job. The issue occurs when we export the XML and begin examination. For some reason, the wrong information is being pulled from the job and placed in the XML.
As an example the SQL in a transform of the job may be:
SELECT V1,V2,V3 FROM TABLE_1;
Whereas the XML for the same transform may produce:
SELECT V1,Y6,Y9 FROM TABLE_1,TABLE_2;
It makes no sense to me how the export of a job could be different then the actual architecture.
The parameters I am using to export are:
Exclude Read Only Items: No
Include Dependent Items: Yes
Include Source Code with Routines: Yes
Include Source Code with Job Executable: Yes
Include Source Content with Data Quality Specifications: No
What tool are you using to view the XML? Try using something less smart, such as Notepad or Wordpad. This will determine/eliminate whether the problem is with your XML viewer.
You might also try exporting in DSX format and examining that output, to see whether the same symptoms are visible there.
Thank you all for the feedback. I realized that the issue wasn't necessarily with the XML. It had to do with numerous factors within our data stage environment. As mentioned above, the data connections were old and unreliable. For some reason this does not impact our current production refresh, so it's a non issue.
The other issue was the way that the generated SQL and custom SQL options work when creating the XML. In my case, there were times when old code was kept in the system, but the option was switched from custom code to generate SQL based on columns. This lead to inconsistent output from my script. Thus the mini project was scrapped.

How to create external link references in AsciiDoc without repeating the URL multiple times?

In markdown I can write:
[example1][myid]
[example2][myid]
[myid]: http://example.com
so I don't have to retype the full external link multiple times.
Is there an analogous feature in AsciiDoc? Specially interested in the Asciidoctor implementation.
So far I could only find:
internal cross references with <<>>
I think I saw a replacement feature of type :myid:, but I can't find it anymore. And I didn't see how to use different texts for each link however.
Probably you mean something like this:
Userguide Chapter 28.1. Setting configuration entries
...
Attribute entries promote clarity and eliminate repetition
URLs and file names in AsciiDoc3 macros are often quite long — they break paragraph flow and readability suffers. The problem is compounded by redundancy if the same name is used repeatedly. Attribute entries can be used to make your documents easier to read and write, here are some examples:
:1: http://freshmeat.net/projects/asciidoc3/
:homepage: http://asciidoc3.org[AsciiDoc3 home page]
:new: image:./images/smallnew.png[]
:footnote1: footnote:[A meaningless latin term]
Using previously defined attributes: See the {1}[Freshmeat summary]
or the {homepage} for something new {new}. Lorem ispum {footnote1}.
...
BTW, there is a 100% Python3 port available now: https://asciidoc3.org
I think you are looking for this (and both will work just fine),
https://www.google.com[Google]
or
link: https://google.com[Google]
Reference:
Ascii Doc User Manual Link
Update #1: Use of link along with variables in asciidoc
Declare variable
:url: https://www.google.com
Use variable feature using format mentioned above
Using ' Link with label '
{url}[Google]
Using a relative link
link:{url}[Google]

Teamcity - Combine ldap properties into a single line in ldap-config.properties

I have a teamcity instance connecting to an LDAP server (AD) and I need to configure my VCS user name to be in the following format:
firstname.surname
I know that I need to modify the following configuration entry in the ldap-config.properties file to do this
teamcity.users.property.plugin:vcs:anyVcs:anyVcsRoot
I know that I need to use the following two ldap properties: givenName, sn.
However, I'm struggling to understand the syntax for doing this in the config. I've tried different combinations, and whilst the properties work on their own, I cannot find a way of concatenating them together. Even the following (which misses out the full stop in the middle) does not work
(&(givenName)(sn))
Fix is to use the following syntax:
%ldap.userEntry.%
(as per line 160 - 161 in the ldap-config.properties.dist file)
So for my example, where I want firstname then a full stop then the surname, my entire config line would be the following:
teamcity.users.property.plugin\:vcs\:anyVcs\:anyVcsRoot=%ldap.userEntry.givenName%.%ldap.userEntry.sn%

Generating vector data (points) for OpenLayers Cluster

In my web application I am going to use OpenLayers.Strategy.AnimatedCluster strategy due to the fact that I need to visualize a great amount of point features. Here is a very good example of what it looks like. In both examples in above mentioned example the data (point features) are generated of taken from the GeoJSON file.
So, can anybody provide me with a file containing 100 000+ (better is even 500 000+) features (world cities, for instance), or explain how I can generate them so that they will be located all over the world (not like in Spain in the first example in above mentioned link).
use a geolocation database to supply you the data you need. GeoLite, for example
If 400K+ locations is ok, use download their CSV CITY LIST
If you want more, then you might want to give the Nominatim downloads, but they are quite bulky (more than 25GB) and parsing data is not as simple as a csv file.

Resources