Bugfix/issue 37 fix with non ascii by ePaul · Pull Request #39 · hjacobs/codevalidator

ePaul · 2015-08-02T20:38:42Z

This is a try to fix #37.

I guess it needs some more documentation and some tests, but before I do that, I wanted to show what way I did go. (So please don't merge this yet. I also need to add a better commit comment.)

The problem was that some fix functions had dst.write(fixed.decode()) and later fix_file had fd.write(fixed.encode()), which use the system's default encoding, which in my system seems to be ASCII. Also, this way we got possibly multiple decode steps (one for each problem), and if we have after one of the decode steps a Jalopy run or similar, it gets even more unclear what's happening.

Basically the idea of my fix is that for each file-type, there is one rule which indicates the encoding. It is named after one of the encodings supported by the codecs package. (In the default configuration, this is just utf-8 (for most file types) and ascii (for .properties), there should be some test-cases for latin-1/iso-8859-1, at least, too.) (The function get_encoding_rule(rules) simply iterates over the rule names and passes each one to codecs.lookup, until one of them is successful. For this reason I needed to reorder the rules for the *.properties entry in the default config.)

That encoding is then (other than for the check that one actually can decode the file using this encoding) used for those fix functions which actually work on the content of the file by themselves and don't just call external programs (like Jalopy, pythontidy, etc.). For those methods (for now just _fix_notabs, _fix_nocr and _fix_notrailingws) I invented a decorator @needs_unicode, which wraps the function call into codecs.EncodedFile.

I needed to store the used encoding in a global dict ENCODING_BY_FILE, analogously to the VALIDATION_ERRORS list – please propose a better way to do this, if this is not the way to go.

I noted that for the added functionality it would be useful to have some automated tests (i.e. a bundle of config file, command line arguments, input file and expected output/expected fixed file). Is there some standard python way of doing this?

This file can be checked with codevalidator (complains about trailing white space, use of tabs, use of carriage return), but with the current code it can't be fixed (will be overwritten with an empty file). hjacobs#37

ePaul force-pushed the bugfix/issue-37-fix-with-non-ascii branch from 3f0a770 to 6225e63 Compare August 2, 2015 20:51

Paŭlo Ebermann added 2 commits August 5, 2015 22:22

issue 37: add test file.

48b8ffd

This file can be checked with codevalidator (complains about trailing white space, use of tabs, use of carriage return), but with the current code it can't be fixed (will be overwritten with an empty file). hjacobs#37

convert to unicode and back when fixing stuff on line-level.

31a2fcc

ePaul force-pushed the bugfix/issue-37-fix-with-non-ascii branch from 6225e63 to 31a2fcc Compare August 5, 2015 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/issue 37 fix with non ascii#39

Bugfix/issue 37 fix with non ascii#39
ePaul wants to merge 2 commits intohjacobs:masterfrom
ePaul:bugfix/issue-37-fix-with-non-ascii

ePaul commented Aug 2, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ePaul commented Aug 2, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant