Hacking Jasper to Get Object Model of a JSP Page

To perform some checks and statistical analysis on my JSPs I needed a DOM-like, hierarchical model of elements contained in them. But parsing JSP pages isn't trivial and is best left to a tool that excels in it - the Jasper JSP compiler used by Tomcat, Jetty, GlassFish and likely also by all others. There is an easy way to tweak it to produce whatever output you need nad to transform a JSP into whatever form you want, including an object model of the page:
  1. Define a Node.Visitor subclass for handling the nodes (tags etc.) of a JSP
  2. Write a simple subclass of Compiler, overriding its generateJava() to invoke the visitor
  3. Subclass the compiler executor JspC overriding its method getCompilerClassName() to return the class of the Compiler of yours
Let's see the code.

Implementation

1. Custom Visitor

A Visitor is invoked by the compiler to process a tree object model of a parsed JSP. This implementation just prints information about an interesting subset of nodes in the page, indented to make their nesting clear.


package org.apache.jasper.compiler;

import java.util.LinkedList; import org.apache.jasper.JasperException; import org.apache.jasper.compiler.Node.CustomTag; import org.apache.jasper.compiler.Node.ELExpression; import org.apache.jasper.compiler.Node.IncludeDirective; import org.apache.jasper.compiler.Node.Visitor; import org.xml.sax.Attributes;

public class JsfElCheckingVisitor extends Visitor {

private String indent = "";

@Override public void visit(ELExpression n) throws JasperException { logEntry("ELExpression", n, "EL: " + n.getEL()); super.visit(n); }

@Override public void visit(IncludeDirective n) throws JasperException { logEntry("IncludeDirective", n, toString(n.getAttributes())); super.visit(n); }

@Override public void visit(CustomTag n) throws JasperException { logEntry("CustomTag", n, "Class: " + n.getTagHandlerClass().getName() + ", attrs: " + toString(n.getAttributes()));

doVisit(n);

indent += " "; visitBody(n); indent = indent.substring(0, indent.length() - 1); }

private String toString(Attributes attributes) { if (attributes == null || attributes.getLength() == 0) return ""; LinkedList<String> details = new LinkedList<String>();

for (int i = 0; i < attributes.getLength(); i++) { details.add(attributes.getQName(i) + "=" + attributes.getValue(i)); }

return details.toString(); }

private void logEntry(String what, Node n, String details) { System.out.println(indent + n.getQName() + " at line:" + n.getStart().getLineNumber() + ": " + details); }

}


Notes:
  • The Visitor must be in the org.apache.jasper.compiler package because the essential class org.apache.jasper.compiler.Node is package-private
  • The method visitBody triggers processing of the nested nodes
  • There are more methods I could have overridden (and the catch-all method doVisit) but I've selected only those interesting for me
  • The node's attributes are of the type ...sax.Attributes, which contains attribute names and values as strings
    • attributes.getType(i) is usually CDATA
  • The Node structure contains information about the parent node, tag name, tag handler class, the corresponding line of the source file and the name of the source file and other useful information
  • CustomTag is likely the most interesting node type, e.g. all the JSF tags are of this type

Example Output (for a JSF Page)


jsp:directive.include at line:5: [file=includes/stdjsp.jsp]
jsp:directive.include at line:6: [file=includes/ssoinclude.jsp]
f:verbatim at line:14: Class: com.sun.faces.taglib.jsf_core.VerbatimTag, attrs:
htm:div at line:62: Class: com.exadel.htmLib.tags.DivTag, attrs: [style=width:100%;]
 h:form at line:64: Class: com.sun.faces.taglib.html_basic.FormTag, attrs: [id=inputForm]
  htm:table at line:66: Class: com.exadel.htmLib.tags.TableTag, attrs: [cellpadding=0, width=100%, border=0, styleClass=clear box_main]
   htm:tr at line:71: Class: com.exadel.htmLib.tags.TrTag, attrs:
    htm:td at line:72: Class: com.exadel.htmLib.tags.TdTag, attrs:
    f:subview at line:73: Class: com.sun.faces.taglib.jsf_core.SubviewTag, attrs: [id=cars]
      jsp:directive.include at line:74: [file=/includes/cars.jsp]
      h:panelGroup at line:8: Class: com.sun.faces.taglib.html_basic.PanelGroupTag, attrs: [rendered=#{bookingHandler.flowersAvailable}]
...
   htm:tr at line:87: Class: com.exadel.htmLib.tags.TrTag, attrs: [style=height:5px]
    htm:td at line:87: Class: com.exadel.htmLib.tags.TdTag, attrs:


(I do not print "closing tags" for it's clear that a tag ends when another node with the same or smaller indentation appears or the output ends.)

2. Compiler Subclass

The important part is generateJava, which I have just copied, removed some code from it and added an invocation of my Visitor. So actually only 3 lines in the listing below are new.


public class OnlyReadingJspPseudoCompiler extends Compiler {

/** We're never compiling .java to .class. */ @Override protected void generateClass(String[] smap) throws FileNotFoundException, JasperException, Exception { return; }

/** Copied from {@link Compiler#generateJava()} and adjusted */ @Override protected String[] generateJava() throws Exception {

// Setup page info area pageInfo = new PageInfo(new BeanRepository(ctxt.getClassLoader(), errDispatcher), ctxt.getJspFile());

// JH: Skipped processing of jsp-property-group in web.xml for the current page

if (ctxt.isTagFile()) { try { double libraryVersion = Double.parseDouble(ctxt.getTagInfo() .getTagLibrary().getRequiredVersion()); if (libraryVersion < 2.0) { pageInfo.setIsELIgnored("true", null, errDispatcher, true); } if (libraryVersion < 2.1) { pageInfo.setDeferredSyntaxAllowedAsLiteral("true", null, errDispatcher, true); } } catch (NumberFormatException ex) { errDispatcher.jspError(ex); } }

ctxt.checkOutputDir();

try { // Parse the file ParserController parserCtl = new ParserController(ctxt, this);

// Pass 1 - the directives Node.Nodes directives = parserCtl.parseDirectives(ctxt.getJspFile()); Validator.validateDirectives(this, directives);

// Pass 2 - the whole translation unit pageNodes = parserCtl.parse(ctxt.getJspFile());

// Validate and process attributes - don't re-validate the // directives we validated in pass 1 /** * JH: The code above has been copied from Compiler#generateJava() with some * omissions and with using our own Visitor. * The code that used to follow was just deleted. * Note: The JSP's name is in ctxt.getJspFile() */ pageNodes.visit(new JsfElCheckingVisitor());

} finally {}

return null; }

/** * The parent's implementation, in our case, checks whether the target file * exists and returns true if it doesn't. However it is expensive so * we skip it by returning true directly. * @see org.apache.jasper.JspCompilationContext#getServletJavaFileName() */ @Override public boolean isOutDated(boolean checkClass) { return true; }

}


Notes:
  • I have deleted quite lot of code unimportant for me from generate Java; for a different type of analysis than I intend some of that code could have been useful, so look into the original Compiler class and decide for yourself.
  • I do not really care about JSP ELs so it might be possible to optimize the compiler to need only one pass.

3. Compiler Executor

It is difficult to use a Compiler directly because it depends on quite a number of complex settings and objects. The easiest thing is thus to reuse the Ant task JspC, which has the additional benefit of finding the JSPs to process. As mentioned, the key thing is the overriding of getCompilerClassName to return my compiler's class.


import org.apache.jasper.JspC;

/** Extends JspC to use the compiler of our choice; Jasper version 6.0.29. */ public class JspCParsingToNodesOnly extends JspC {

/** Overriden to return the class of ours (default = null => JdtCompiler) */ @Override public String getCompilerClassName() { return OnlyReadingJspPseudoCompiler.class.getName(); }

public static void main(String[] args) { JspCParsingToNodesOnly jspc = new JspCParsingToNodesOnly();

jspc.setUriroot("web"); // where to search for JSPs //jspc.setVerbose(1); // 0 = false, 1 = true jspc.setJspFiles("helloJSFpage.jsp"); // leave unset to process all; comma-separated

try { jspc.execute(); } catch (JasperException e) { throw new RuntimeException(e); } } }


Notes:
  • JspC normally finds all files under the specified Uriroot but you can tell it to ignore all but some selected ones by passing their comma-separated names into setJspFiles.

Compile Dependencies

In thy Ivy form:
<dependency org="org.apache.tomcat" name="jasper" rev="6.0.29" />
<dependency org="org.apache.tomcat" name="jasper-jdt" rev="6.0.29" />
<dependency org="org.apache.ant" name="ant" rev="1.8.2" />

License

All the code here is directly derived from Jasper and thus falls under the same license, i.e. the Apache License, Version 2.0.

Conclusion

Jasper wasn't really designed for extension and modularity as documented by the fact that the crucial Node class is package private and by its API being so complex that reusing just a part of it is very hard. Fortunately the Ant task JspC makes it usable outside of a servlet container by providing some "fake" objects and there is a way to tweak it to our needs with very little work though it wasn't that easy to figure it out :-). I had to apply some dirty tricks, namely using stuff from a package-private class and overriding a method not intended to be overriden (generateJava) but it works and provides very valuable output, which makes it possible to do just anything you might want to do with the a JSP.

Example Code

My Static JSF EL Expression Validator uses this code to parse JSPs to find tags and EL expressions, you can check it out and try it. See the core module, especially the package org.apache.jasper.compiler and the class AbstractJsfStaticAnalyzer that makes use of it (the method createJsfElValidatingJspParser).

Tags: java analysis


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob