This repository was archived by the owner on Dec 24, 2019. It is now read-only.
Open
Conversation
3f0a770 to
6225e63
Compare
added 2 commits
August 5, 2015 22:22
This file can be checked with codevalidator (complains about trailing white space, use of tabs, use of carriage return), but with the current code it can't be fixed (will be overwritten with an empty file). hjacobs#37
6225e63 to
31a2fcc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a try to fix #37.
I guess it needs some more documentation and some tests, but before I do that, I wanted to show what way I did go. (So please don't merge this yet. I also need to add a better commit comment.)
The problem was that some fix functions had
dst.write(fixed.decode())and laterfix_filehadfd.write(fixed.encode()), which use the system's default encoding, which in my system seems to be ASCII. Also, this way we got possibly multiple decode steps (one for each problem), and if we have after one of the decode steps a Jalopy run or similar, it gets even more unclear what's happening.Basically the idea of my fix is that for each file-type, there is one rule which indicates the encoding. It is named after one of the encodings supported by the codecs package. (In the default configuration, this is just utf-8 (for most file types) and ascii (for .properties), there should be some test-cases for latin-1/iso-8859-1, at least, too.) (The function
get_encoding_rule(rules)simply iterates over the rule names and passes each one to codecs.lookup, until one of them is successful. For this reason I needed to reorder the rules for the*.propertiesentry in the default config.)That encoding is then (other than for the check that one actually can decode the file using this encoding) used for those fix functions which actually work on the content of the file by themselves and don't just call external programs (like Jalopy, pythontidy, etc.). For those methods (for now just
_fix_notabs,_fix_nocrand_fix_notrailingws) I invented a decorator@needs_unicode, which wraps the function call intocodecs.EncodedFile.I needed to store the used encoding in a global dict
ENCODING_BY_FILE, analogously to the VALIDATION_ERRORS list – please propose a better way to do this, if this is not the way to go.I noted that for the added functionality it would be useful to have some automated tests (i.e. a bundle of config file, command line arguments, input file and expected output/expected fixed file). Is there some standard python way of doing this?