Automatically format YAML to use dot separator where possible - yaml

I am using IntelliJ and want to find a quick way to format a lot of yaml files from this:
something:
some:
thing:
enable: true
a:
b:
c: true
d:
e: false
to this:
something.some.thing.enable: true
a.b:
c: true
d.e: false
whereever possible. It is still valid syntax, but in the files in question, it is way better to read and often times, elements really just contain one value.
Is there an offline tool, plugin or maven/gradle build step I can use to achieve this?

It is still valid syntax
Yes, but it doesn't do what you think it does. d.e in YAML will simply load as string "d.e". There is no YAML logic that splits at the dot like you seem to assume.
Java Spring does some postprocessing that causes this to be loaded properly, see also this question. However, this does not work generally with YAML.
See this answer for a yq command that does something like this – however it will print a full path for each value and you probably don't want that. Doing what you want is probably too complex for yq, but not impossible to achieve.
Also mind that queries about finding external tools are off-topic on StackOverflow.

Related

How to use $ (dollar sign) ^(exponent sign) in yaml?

I saw a YAML file that includes some signs like $, ^. For $, I think it tries to get value from a JSON file. But for ^, I'm not sure about that.
I tried to search for the YAML syntax but cannot find the usage of those signs.
Could anyone point out where that usage from? Thanks a lot!
examples:
json: $.A.Documents[*]
input: ^.B.ID
YAML doesn't assign any special meaning to those characters. As far as YAML is concerned, they are simply part of the content.
Of course, the software loading that YAML can do anything with the loaded data – including inspecting the loaded scalars for $ and ^ and implementing some action on them.
While someone might be able to correctly guess which software expects a YAML file like the one you show, it would be vastly easier for you to check the context in which you found that YAML file. This should lead you to the information you seek – i.e., for which software that YAML file has been written. That software's documentation will then describe how those characters are processed.

How do I effectively identify an unknown file format

I want to write a program that parses yum config files. These files look like this:
[google-chrome]
name=google-chrome - 64-bit
baseurl=http://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub
This format looks like it is very easy to parse, but I do not want to reinvent the wheel. If there is an existing library that can generically parse this format, I want to use it.
But how to find a library for something you can not name?
The file extension is no help here. The term ".repo" does not yield any general results besieds yum itself.
So, please teach me how to fish:
How do I effectively find the name of a file format that is unknown to me?
Identifying an unknown file format can be a pain.
But you have some options. I will start with a very obvious one.
Ask
Showing other people the format is maybe the best way to find out its name.
Someone will likely recognize it. And if no one does, chances are good that
you have a proprietary file format in front of you.
In case of your yum repository file, I would say it is a plain old INI file.
But let's do some more research on this.
Reverse Engineering
Reverse Engineering maybe your best bet if nobody recognizes your format.
Take the reference implementation and find out what they are using to parse the format.
Luckily, yum is open source. So it is easy to look up.
Let's see, what the yum authors use to parse their repo file:
try:
ini = INIConfig(open(repo.repofile))
except:
return None
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L1304
Now the import of this function can be found here:
from iniparse import INIConfig
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L32
This leads us to a library called iniparse (https://pypi.org/project/iniparse/).
So yum uses an INI parser for its config files.
I will show you how to quickly navigate to those kind of code passages
since navigating in somewhat large projects can be intimidating.
I use a tool called ripgrep (https://github.com/BurntSushi/ripgrep).
My initial anchors are usually well known filepaths. In case of yum, I took /etc/yum.repos.d for my initial search:
# assuming you are in the root directory of yum's source code
rg /etc/yum.repos.d yum
yum/config.py
769: reposdir = ListOption(['/etc/yum/repos.d', '/etc/yum.repos.d'])
yum/__init__.py
556: # (typically /etc/yum/repos.d)
This narrows it down to two files. If you go on further with terms like read or parse,
you will quickly find the results you want.
What if you do not have the reference source?
Well, sometimes, you have no access to the source code of a reference implementation. E.g: The reference implementation is closed source.
Try to break the format. Insert some garbage and observe the log files afterwards. If you are lucky, you may find
a helpful error message which might give you hints about the format.
If you feel very brave, you can try to use an actual decompiler as well. This may or may not be illegal and may or may not be a waste of time.
I personally would only do this as a last resort.

Rainmeter: How to concatenate strings

I am getting data from a broken RSS feed that gives me wrong link. I wanted to fix this link so I made this code:
<link.*>(.*)&.*tid(.*)</link>
and the link could be like:
www.somedomain.com/?value=50&burrrdurrrr;tid=120
But the real working link is in this form:
www.somedomain.com/?value=50&tid=120
The thing that I'm asking is if my measure thing looks like this:
[FeedURL]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[Feed]
StringIndex=2 ;now I only get www.somedomain.com/?value=50
Substitute=#SubstituteFeed#
How am I supposed to concatenate the strings together to complete the url?
I'm guessing rather than &burrrdurrrr;, the link has &, which is how you have to write & in an HTML or XML file.
If that's the case, you just need to set the DecodeCharacterReference option, as described in this handy-looking tutorial. Another option mentioned there is Substitute, which would be able to strip it out even if it really was &burrrdurrrr;.
None of this is a particularly sensible way of dealing with HTML or XML - a much better approach would be a plugin which actually parsed the document structure and let you reference nodes using XPath or CSS rules - but you work with what you've got, I guess. (I've never heard of this "Rainmeter" before, despite its claim to be "the best known and most popular desktop customization program for Windows"; maybe because nobody else calls their program that, instead almost universally using the word "widget"?)

Trivial solution for parsing nested configuration in Bash?

I want to have parse nested configurations in Bash, like below:
[foo]
[bar]
key="value"
[baz]
key="value"
I tried this .ini parser but it does not support nesting. Later I found out that nesting isn't allowed in .ini files.
I searched for a YAML parser for bash, but I couldn't find a lot. Nested configuration parsing in bash seems to me as a basic problem, so I guess a trivial solution exists, but I could not find one. Does a triivial solution for parsing nested configuration in Bash exists? If yes, which one?
EDIT
I want to write a script/program for automated backup and restore of databases. The configuration needs to flexible so that I can select databases on different hosts, with different users and passwords and with different backup intervals. Oh, and I want to learn bash. But I am starting to think that Bash is not the right tool for my problem.
Bash is not the right language for this. There are no nested arrays, and dynamic variable assignment is a bit of a mine field compared to languages like Python and Ruby. That said, it sounds like you're specifying the format and parser yourself, so you could simply use a hierarchical naming scheme for your configuration:
foo_bar_key="value"
foo_baz_key="value"
I wrote a Yamlesque parser in response to this similar question.
It will parse
foo:
bar:
key: value
baz:
key: value
into bash associative arrays. 100% Bash, but it needs to be Bash 4.x.

Eliminating code duplication in a single file

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.
Thanks in advance.
Edit:
Thanks for all the great tools! I'll definitely check them out.
This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.
Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.
http://www.getatomiq.com/
If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.
You don't say what language you are using, which is going to affect what tools you can use.
For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.
See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.
The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.
Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you
Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.
One with some Office skills can do following sequence in 1 minute:
use ordinary formatter to unify the code style, preferably without line wrapping
feed the code text into Microsoft Excel as a single column
search and replace all dual spaces with single one and do other replacements
sort column
At this point the keywords for duplicates will be already well detected. But to go further
add comparator formula to 2nd column and counter to 3rd
copy and paste values again, sort and see the most repetitive lines
There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.
Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd

Resources