I have a Rails project where we're migrating towards using an image proxy to serve our ActiveStorage images. As part of this migration, we need to change all existing usages of url_for() to ix_image_url().
url_for will continue to exist, but in order to prevent other developers from accidentally using the wrong function in the future and accidentally serving an unproxied image, I'd like to "ban" use of url_for within our project.
Is this possible?
Just Grep for the Method Call
While you could certainly write your own cop, the use case you're defining seems like it would be a better fit for a simple grep done on your continuous integration server. For example:
if egrep -Rn '\burl_for' '/path/to/source/tree'; then
echo 'error: use ix_image_url instead of url_for' >&2
exit 1
fi
can be run as a build or test step in your CI pipeline. It will print the offending lines, along with filenames and line numbers, and send your "how to fix it" custom message to standard error for reporting the build failure.
Note that shell commands like this can be run in most CI implementations, but you can certainly embody it in a cop or inline it into your pipeline (e.g. Groovy code for Jenkins jobs) if that works better for you. However, the KISS principle argues for keeping the implementation as simple as possible, so I'd avoid the necessity of mucking around with ASTs or the maintenance overhead of custom cops unless you are getting a lot more bang for your buck than the use case described in your original question.
Related
I need to do refactoring in a big legacy Python code base.
Often I think "these lines don't get executed in production any more".
But I am unsure.
There are some tests which touch these lines. But I can't tell for sure if really no usage happens in production.
What can I do in this situation?
This question is about coverage on a production system. This question is not about coverage during testing/CI.
I don't want to comment out that lines, since I don't want to produce an error in the production system.
Common practice is to use logging inside that lines of code. e.g. you have a block of code you think is not in use. You add try catch block in the beginning of that block of code. Inside trycatch you add line to a specific json named same as your suspicious block of code.
try:
with open("block1.dat", "rb") as file:
activity = pickle.load(file)
curtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
currentact = "dt = {}; code done that: var1 = {},
var2 = {}".format(curdate, var1, var2)
activity.append(currentact)
file = open("block1.dat", "ab")
pickle.dump(activity, file)
file.close()
except Exception: pass
You can use telegram api to log code to. After a while You'll get info how often your code works and what does it do.
Then you monitor for a while and if nothing happens in a month, You can comment the block.
Is the production system deterministic?
Is it interactive?
Does control flow depend on input data?
Do you have access to all possible inputs?
Do the tests exist for a reason or just because?
I'd be careful removing code based on what is needed based on logging unless I knew there are no exceptional situations that occur rarely.
I would follow the common code paths to try to understand the codebase piece by piece in order to figure out what can be simplified. It's hard to give more specific advice without knowing more about the system you're dealing with.
We use a simple pattern to handle this: looks_like_dead_code(my_string)
This is a method which logs the string "my_string".
Example usage:
if ext == '.jpe':
looks_like_dead_code('2018-11-30 tguettler: looks fixed in mime_type_to_extension')
Using the date and the developer login is not enforced, it is just best practice.
If the line gets executed the one who is responsible for checking the logs will talk to the developer.
Since our production environments get updated roughly once in two weeks, you can be sure that this line was not executed during the last months.
I like this solution since in most cases it is like this:
you want to fix a bug or implement a new feature
you look at the code and see some lines which look like dead code. I mean code which is useless, since it won't get executed any more.
You don't have hours of time to investigate. You can dive into your vague guess that this is dead code. You want to do your actual work (fix a bug or implement a new feature. See Step1)
The method looks_like_dead_code() gives you a way to actually do something and leave a note for other developers. It only costs some seconds improve the current situation.
If you have a Tickler file System you can remind yourself to check this code in six months. At least in my context I can be very sure that this is dead code if this line was not executed for several months.
I have a lot of old Perl code that gets called frequently, I have been writing a new module and all of a sudden I'm getting a lot of warnings in my error_log for Apache, they are for every module currently being used. e.g,
"my" variable $variable masks earlier declaration in same statement at
/path/to/module.pm line 40 (#1)
Useless use of hash element in void context at
/path/to/another/module.pm line 212 (#2)
The main layout of the codebase is one giant script that includes the modules and directs the requests to them needed to create certain pages for the website and the main script then handles static elements like menus.
My current project is separated from this main script and doesn't use it however any time I call my code using ajax, there are some other ajax calls that will use the main script and the warnings only seem to appear from those request but only when I'm calling my project.
I have grepped every module and none of them have use warnings (or -w) in them, I have also tried using no warnings 'all' in the main script and my own project but it's not doing anything.
At this point I'm out of ideas on what to do next so all help is appreciated, I'd just like to suppress the warnings, the codebase is quite old and poorly written so going and correcting each issue that causes the warns in the first place isn't do-able.
The Apache server is running mod_perl as well, if that might make a difference I have a feeling it might be something to do with CGI, but I can't seem to find any evidence.
I take it that the code gets called by running certain top-level Perl script(s).
Then use the __WARN__ hook in those script(s) to stop printing of warnings
BEGIN { $SIG{__WARN__} = sub {} };
Place this BEGIN block before the use statements so to affect modules as well.
An empty subroutine is the way to mute warnings since __WARN__ doesn't support 'IGNORE'.
See warn and %SIG in perlvar.
See this post and this post for comments and some examples.
To investigate further and track the warnings you can use Carp
BEGIN {
$SIG{__WARN__} = \&Carp::cluck; # or Carp::confess; to also die
}
which will make it print full stack traces. This can be fine-tuned as you please since we can write our own sub to be called. Or use Carp::Always.
See this post
for some more drastic measures (like overriding CORE::GLOBAL::warn)
Once you find a more precise level at which to suppress warnings then local $SIG{__WARN__} is the way to go, if possible. This is used in a post linked above, and here is another example. It is of course far better to suppress warnings only where needed instead of everywhere.
More detail
Getting stack traces in Perl?
How can I get a call stack listing in Perl?
Note that longmess is unfortunately no longer so standard and well supported.
I have had several people tell me at this point that eval is evil in makefiles. I originally took their word for it, but now I'm starting to question it. Take the following makefile:
%.o:
$(eval targ=$*)
echo making $targ
%.p:
echo making $*
I understand that if you then did make "a;blah;".o, then it would run blah (Which could be an rm -rf \, or worse). However, if you ran make "a;blah;".p you would get the same result without the eval. Furthermore, if you have permissions to run make, you would also have permissions to run blah directly as well, and wouldn't need to run make at all. So now I'm wondering, is eval really an added security risk in makefiles, and if so, what should be avoided?
Why is eval evil?
Because it grants a whole power of language to things you actually don't want to give that power.
Often it is used as "poor man's metaprogramming" to construct some piece of code and then run it. Often it looks like eval("do stuff with " + thing) - and thing is only known during runtime, because it gets supplied from outside.
However, if you don't make sure that thing belongs to some tiny subset of language you need in that particular case (like, is a string representation of one valid name), your code would grant permissions to stuff you didn't intend to. For example, if thing is "apples; steal all oranges" then oranges would be stolen.
If you do make sure that thing belongs to some subset of language you actually need then 2 problems arise:
You are reimplementing language features (parsing source) which is not DRY and is often a sign of misusing a language.
If you resort to this that means simpler means are not suitable and your use case is somewhat complicated which makes validating your input harder.
Thus, it's really easy to break security with eval and taking enough precautions to make it safe is hard, that's why if you see an eval you should suspect possible security flaw. That's just a heuristic, not a law.
eval is a very powerful tool - as powerful as the whole language - and it's too easy to shoot your leg off with it.
Why this particular use of eval is not good?
Imagine a task that requires making some steps that depend on a file. Task can be done with various files. (like, user gives Virtualbox image of a machine that is to be brought up and integrated into existing network infrastructure)
Imagine, say, lazy administrator that automated this task - all commands are written in a makefile because it fits better than sh script (some steps depend on other and sometimes don't need to be re-done).
Administrator made sure that all commands are ok and correct and had given sudoers permission to run make with that specific makefile. Now, if makefile contains string like yours then using properly crafted name for your Virtualbox image you could pwn the system, or something like that.
Of course, I had to stretch far to make this particular case a problem, but it's a potential problem anyway.
Makefiles usually offer simple contracts: you name the target and some very specific stuff - written in makefile - gets done. Using eval the way you've used it offers a different contract: the same stuff as above but you also can supply commands in some complicated way and they would get executed too.
You could try patching the contract by making sure that $* would not cause any trouble. Describing what that means exactly could be an interesting exercise in language if you want to keep as much flexibility in target names as possible.
Otherwise, you should be aware of extended contract and don't use solutions like this in cases where that extension would cause problems. If you intend your solution to be reusable by as many people as possible, you should make its contract cause as little problems as possible, too.
I'm into unit testing of some legacy shell scripts.
In the real world scripts are often used to call utility programs
like find, tar, cpio, grep, sed, rsync, date and so on with some rather complex command lines containing a lot of options. Sometimes regular expressions or wildcard patterns are constructed and used.
An example: A shell script which is usually invoked by cron in regular intervals has the task to mirror some huge directory trees from one computer to another using the utility rsync.
Several types of files and directories should be excluded from the
mirroring process:
#!/usr/bin/env bash
...
function mirror() {
...
COMMAND="rsync -aH$VERBOSE$DRY $PROGRESS $DELETE $OTHER_OPTIONS \
$EXCLUDE_OPTIONS $SOURCE_HOST:$DIRECTORY $TARGET"
...
if eval $COMMAND
then ...
else ...
fi
...
}
...
As Michael Feathers wrote in his famous book Working Effectively with Legacy Code, a good unit test runs very fast and does not touch the network, the file-system or opens any database.
Following Michael Feathers advice the technique to use here is: dependency injection. The object to replace here is utility program rsync.
My first idea: In my shell script testing framework (I use bats) I manipulate $PATH in a way that a mockup rsync is found instead of
the real rsync utility. This mockup object could check the supplied command line parameters and options. Similar with other utilities used in this part of the script under test.
My past experience with real problems in this area of scripting were often bugs caused by special characters in file or directory names, problems with quoting or encodings, missing ssh keys, wrong permissions and so on. These kind of bugs would have escaped this technique of unit testing. (I know: for some of these problems unit testing is simply not the cure).
Another disadvantage is that writing a mockup for a complex utility like rsync or find is error prone and a tedious engineering task of its own.
I believe the situation described above is general enough that other people might have encountered similar problems. Who has got some clever ideas and would care to share them here with me?
You can mockup any command using a function, like this:
function rsync() {
# mock things here if necessary
}
Then export the function and run the unittest:
export -f rsync
unittest
Cargill's quandary:
" Any design problem can be solved by adding an additional level of indirection, except for too many levels of indirection."
Why mock system commands ? After all if you are programming Bash, the system is your target goal and you should evaluate your script using the system.
Unit test, as the name suggests, will give you a confidence in a unitary part of the system you are designing. So you will have to define what is your unit in the case of a bash script. A function ? A script file ? A command ?
Given you want to define the unit as a function I would then suggest writing a list of well known errors as you listed above:
Special characters in file or directory names
Problems with quoting or encodings
Missing ssh keys
Wrong permissions and so on.
And write a test case for it. And try to not deviate from the system commands, since they are integral part of the system you are delivering.
This seems to me to be a novel idea (since i haven't found any solutions or anyone having implemented it)...
A shell script that automatically runs whenever you git commit or whatever that will let you know if you forgot to delete any debugging or development env specific lines of code in your project.
For example:
Often times (in my Ruby projects) I'll leave lines of code to output variables like
puts params.inspect
or
raise params.inspect
Also, sometimes I'll use different methods so I can easily see the effects such as in cases like using delayed_job where I'd rather call the method without a delay during development.
The problem is sometimes I forget to change those methods back or forget to delete a call to raise params.inspect and I'll inadvertently push that code.
So I thought maybe the simplest solution was to add a comment to any such debugging line such as
raise params.inspect #debug
In essence flagging that line as a development only/debug line. Then in a shell script that runs before some other command like git commit it can use awk or grep to search through all the latest modified files for that #debug comment and stop execution and alert you. However i don't know much about shell scripting so I thought I'd ask for help :)
Although I whole-heartedly recommend following cdeszaq'a advice and discourage doing this sort of thing, it is pretty easy to write a git hook that will prevent you from committing any lines with a particular string. For simplicity, I'm not showing the git rev-parse --verify HEAD that you should use to make this hook work on an initial commit, but if you simply put the following in .git/hooks/pre-commit (and make it executable), you will not be able to commit any lines of code that contain the string '#debug':
#!/bin/sh
if git diff-index -p -M --cached HEAD | grep '#debug' > /dev/null; then
echo 'debug lines found in commit. Aborting' >&2
exit 1
fi
Rather than having to remember to do additional work (removing lines of code) only to have to do more work later when things break again (re-adding that code), why not put in sensible debugging statements from the beginning?
Most languages have fairly expressive and often cheap logging libraries that will allow you to write out various levels of information (error, info, debug, trace) to a number of different locations (a file, a database). Many of these libraries will even let you adjust the logging level for a specific chunk of the code at runtime or even while the program is running.
So, rather than try to bandage up brute-force debugging by scripting away the problem, why not do yourself, and the rest of the world that has to use what you produce, a favor and use an actual logging framework for logging?
As I said in my comment, you can use any programming language you feel comfortable with.
Anyway, searching for other commit hooks, I think this one could be a good one to start with. It basically looks for some words in your files and can be customized just changing the checks array in the top of the file.
#cdeszaq is correct about the logging part.
For behaviour that differs depending on environment, the common way to achieve this is to make the behaviour configurable. delayed_job should read a value from the config file to decide how long to delay. For production environments the config would have one value and for development environments the config would have a different value.