Sunday, 5 January 2014

DETECTING MINIFIED FILES

In August 2013, I started contributing to the Firefox Developer Tools project in my spare time. So far it has been an amazing experience. It's been great, having the privilege to contribute my little quota to the moving forward of the web.

Last month, I fixed Bug 913665, with many thanks to Nick Fitzgerald for this patient insightful reviews. The feature request was to automatically detect minified JavaScript files and pretty print them. One of key to fixing this bug was the heuristics used to detect the required files.
In this short post, I would try to look into the approach I took, hoping it might be helpful to someone.

Webkit WebInspector approach
My first step was to dig into the webkit source to see the approach used in the web inspector. Below is the code from the webkit source. The full source code can be viewed here.

The approach here is direct & quite simple, it checks if any of the lines in the source is over 500 characters then sets the this.autoFormat property to true.

Some issues which make this approach less of a perfect solution are: -
  1. There maybe minified files with lines shorter than 500 characters.
  2. For un-minified sources, all the lines still need to be scanned before a decision is made. For really large source files, this would take more time, the whole source should not need to be scanned.
Firefox Devtools approach
The idea to this approach was off a suggestion by Nick to test based on indents instead.
Firstly, so as not to scan through all the source lines, a SAMPLE_SIZE is set, which specifies the maximum number of lines to be used for the test. An INDENT_COUNT_THRESHOLD is also set, this specifies the percentage number of indents allowed, for a file to still be seen as minified.
Next, all comments are stripped out. We then loop through the number of lines specified (in this case the first 30), checking for those with  indents and updating our indentCount. The percentage of the indentCount in relation to the number of lines scanned is then calculated and compared to the INDENT_COUNT_THRESHOLD value to determine if the file is minified or not.
Some of the gains this this approach are

  1. The whole source file.
  2. The INDENT_COUNT_THRESHOLD can be reduced to increase the accuracy

Summary
I feel that there are different and possibly better ways to approach the problem. If any one comes up with something, please drop a comment.
Also, some guys did some work on detecting minified css files.
Cheers

No comments:

Post a Comment