Hacking Jasper to Get Object Model of a JSP Page
To perform some checks and statistical analysis on my JSPs I needed a DOM-like, hierarchical model of elements contained in them. But parsing JSP pages isn't trivial and is best left to a tool that excels in it - the Jasper JSP compiler used by Tomcat, Jetty, GlassFish and likely also by all others. There is an easy way to tweak it to produce whatever output you need nad to transform a JSP into whatever form you want, including an object model of the page:
Notes:
(I do not print "closing tags" for it's clear that a tag ends when another node with the same or smaller indentation appears or the output ends.)
Notes:
Notes:
- Define a Node.Visitor subclass for handling the nodes (tags etc.) of a JSP
- Write a simple subclass of Compiler, overriding its generateJava() to invoke the visitor
- Subclass the compiler executor JspC overriding its method getCompilerClassName() to return the class of the Compiler of yours
Implementation
1. Custom Visitor
A Visitor is invoked by the compiler to process a tree object model of a parsed JSP. This implementation just prints information about an interesting subset of nodes in the page, indented to make their nesting clear.
package org.apache.jasper.compiler;
import java.util.LinkedList;
import org.apache.jasper.JasperException;
import org.apache.jasper.compiler.Node.CustomTag;
import org.apache.jasper.compiler.Node.ELExpression;
import org.apache.jasper.compiler.Node.IncludeDirective;
import org.apache.jasper.compiler.Node.Visitor;
import org.xml.sax.Attributes;
public class JsfElCheckingVisitor extends Visitor {
private String indent = "";
@Override
public void visit(ELExpression n) throws JasperException {
logEntry("ELExpression", n, "EL: " + n.getEL());
super.visit(n);
}
@Override
public void visit(IncludeDirective n) throws JasperException {
logEntry("IncludeDirective", n, toString(n.getAttributes()));
super.visit(n);
}
@Override
public void visit(CustomTag n) throws JasperException {
logEntry("CustomTag", n, "Class: " + n.getTagHandlerClass().getName() + ", attrs: "
+ toString(n.getAttributes()));
doVisit(n);
indent += " ";
visitBody(n);
indent = indent.substring(0, indent.length() - 1);
}
private String toString(Attributes attributes) {
if (attributes == null || attributes.getLength() == 0) return "";
LinkedList<String> details = new LinkedList<String>();
for (int i = 0; i < attributes.getLength(); i++) {
details.add(attributes.getQName(i) + "=" + attributes.getValue(i));
}
return details.toString();
}
private void logEntry(String what, Node n, String details) {
System.out.println(indent + n.getQName() + " at line:"
+ n.getStart().getLineNumber() + ": " + details);
}
}
Notes:
- The Visitor must be in the org.apache.jasper.compiler package because the essential class org.apache.jasper.compiler.Node is package-private
- The method visitBody triggers processing of the nested nodes
- There are more methods I could have overridden (and the catch-all method doVisit) but I've selected only those interesting for me
- The node's attributes are of the type ...sax.Attributes, which contains attribute names and values as strings
- attributes.getType(i) is usually CDATA
- The Node structure contains information about the parent node, tag name, tag handler class, the corresponding line of the source file and the name of the source file and other useful information
- CustomTag is likely the most interesting node type, e.g. all the JSF tags are of this type
Example Output (for a JSF Page)
jsp:directive.include at line:5: [file=includes/stdjsp.jsp]
jsp:directive.include at line:6: [file=includes/ssoinclude.jsp]
f:verbatim at line:14: Class: com.sun.faces.taglib.jsf_core.VerbatimTag, attrs:
htm:div at line:62: Class: com.exadel.htmLib.tags.DivTag, attrs: [style=width:100%;]
h:form at line:64: Class: com.sun.faces.taglib.html_basic.FormTag, attrs: [id=inputForm]
htm:table at line:66: Class: com.exadel.htmLib.tags.TableTag, attrs: [cellpadding=0, width=100%, border=0, styleClass=clear box_main]
htm:tr at line:71: Class: com.exadel.htmLib.tags.TrTag, attrs:
htm:td at line:72: Class: com.exadel.htmLib.tags.TdTag, attrs:
f:subview at line:73: Class: com.sun.faces.taglib.jsf_core.SubviewTag, attrs: [id=cars]
jsp:directive.include at line:74: [file=/includes/cars.jsp]
h:panelGroup at line:8: Class: com.sun.faces.taglib.html_basic.PanelGroupTag, attrs: [rendered=#{bookingHandler.flowersAvailable}]
...
htm:tr at line:87: Class: com.exadel.htmLib.tags.TrTag, attrs: [style=height:5px]
htm:td at line:87: Class: com.exadel.htmLib.tags.TdTag, attrs:
(I do not print "closing tags" for it's clear that a tag ends when another node with the same or smaller indentation appears or the output ends.)
2. Compiler Subclass
The important part is generateJava, which I have just copied, removed some code from it and added an invocation of my Visitor. So actually only 3 lines in the listing below are new.
public class OnlyReadingJspPseudoCompiler extends Compiler {
/** We're never compiling .java to .class. */
@Override protected void generateClass(String[] smap) throws FileNotFoundException,
JasperException, Exception {
return;
}
/** Copied from {@link Compiler#generateJava()} and adjusted */
@Override protected String[] generateJava() throws Exception {
// Setup page info area
pageInfo = new PageInfo(new BeanRepository(ctxt.getClassLoader(),
errDispatcher), ctxt.getJspFile());
// JH: Skipped processing of jsp-property-group in web.xml for the current page
if (ctxt.isTagFile()) {
try {
double libraryVersion = Double.parseDouble(ctxt.getTagInfo()
.getTagLibrary().getRequiredVersion());
if (libraryVersion < 2.0) {
pageInfo.setIsELIgnored("true", null, errDispatcher, true);
}
if (libraryVersion < 2.1) {
pageInfo.setDeferredSyntaxAllowedAsLiteral("true", null,
errDispatcher, true);
}
} catch (NumberFormatException ex) {
errDispatcher.jspError(ex);
}
}
ctxt.checkOutputDir();
try {
// Parse the file
ParserController parserCtl = new ParserController(ctxt, this);
// Pass 1 - the directives
Node.Nodes directives =
parserCtl.parseDirectives(ctxt.getJspFile());
Validator.validateDirectives(this, directives);
// Pass 2 - the whole translation unit
pageNodes = parserCtl.parse(ctxt.getJspFile());
// Validate and process attributes - don't re-validate the
// directives we validated in pass 1
/**
* JH: The code above has been copied from Compiler#generateJava() with some
* omissions and with using our own Visitor.
* The code that used to follow was just deleted.
* Note: The JSP's name is in ctxt.getJspFile()
*/
pageNodes.visit(new JsfElCheckingVisitor());
} finally {}
return null;
}
/**
* The parent's implementation, in our case, checks whether the target file
* exists and returns true if it doesn't. However it is expensive so
* we skip it by returning true directly.
* @see org.apache.jasper.JspCompilationContext#getServletJavaFileName()
*/
@Override public boolean isOutDated(boolean checkClass) {
return true;
}
}
Notes:
- I have deleted quite lot of code unimportant for me from generate Java; for a different type of analysis than I intend some of that code could have been useful, so look into the original Compiler class and decide for yourself.
- I do not really care about JSP ELs so it might be possible to optimize the compiler to need only one pass.
3. Compiler Executor
It is difficult to use a Compiler directly because it depends on quite a number of complex settings and objects. The easiest thing is thus to reuse the Ant task JspC, which has the additional benefit of finding the JSPs to process. As mentioned, the key thing is the overriding of getCompilerClassName to return my compiler's class.
import org.apache.jasper.JspC;
/** Extends JspC to use the compiler of our choice; Jasper version 6.0.29. */
public class JspCParsingToNodesOnly extends JspC {
/** Overriden to return the class of ours (default = null => JdtCompiler) */
@Override public String getCompilerClassName() {
return OnlyReadingJspPseudoCompiler.class.getName();
}
public static void main(String[] args) {
JspCParsingToNodesOnly jspc = new JspCParsingToNodesOnly();
jspc.setUriroot("web"); // where to search for JSPs
//jspc.setVerbose(1); // 0 = false, 1 = true
jspc.setJspFiles("helloJSFpage.jsp"); // leave unset to process all; comma-separated
try {
jspc.execute();
} catch (JasperException e) {
throw new RuntimeException(e);
}
}
}
Notes:
- JspC normally finds all files under the specified Uriroot but you can tell it to ignore all but some selected ones by passing their comma-separated names into setJspFiles.
Compile Dependencies
In thy Ivy form:<dependency org="org.apache.tomcat" name="jasper" rev="6.0.29" /> <dependency org="org.apache.tomcat" name="jasper-jdt" rev="6.0.29" /> <dependency org="org.apache.ant" name="ant" rev="1.8.2" />