Bad Code: Too Many Object Conversions Between Application Layers And How to Avoid Them
Have you ever worked with an application where you had to copy data from one object to another and another and so on before you actually could do something with it? Have you ever written code to convert data from XML to a DTO to a Business Object to a JDBC Statement? Again and again for each of the different data types being processed? Then you have encountered an all too common antipattern of many "enterprise" (read "overdesigned") applications, which we could call The Endless Mapping Death March. Let's look at an application suffering from this antipattern and how to rewrite it in a much nicer, leaner and easier to maintain form.
The application, The World of Thrilling Fashion (or WTF for short) collects and stores information about newly designed dresses and makes it available via a REST API. Every poor dress has to go through the following conversions before reaching a devoted fashion fan:
Uff, that's lot of work! Each of the conversions is coded manually and if we want to extend WTF to provide information also about trendy shoes, we will need to code all of them again. (Plus couple of methods in our MongoDAO, such as getAllShoes and storeShoes.) But we can do much better than that!
It's time-consuming, error-prone and annoying to code all the conversions while you actually want to use your time to build business logic and not some boilerplate code. We can eliminate the manual work in two ways:
To be fair, I have to admit that the manual approach also has some advantages: you have full (fool? :-)) power over the form of the objects and can fit them perfectly for the processing required, you don't need to introduce huge and buggy libraries, and you earn more money, if paid by LOC.
However the disadvantages named above nearly always overweight the advantages, especially if you reuse suitable, mature, high-quality libraries that allow you to customize the processing to any detail on an on-need basis.
One question remains: how do we represent the data? We have two possibilities:
You might skip this section and only come back later if you want to understand the reasoning behind the design.
WTF has to do some processing of the dress elements that it retrieves, mainly because multiple elements may represent the same dress only with slight variations such as color. WTF thus stores such a group of related elements as a list of DressVariant items inside a parent Dress object, generates a unique ID for the Dress and stores the IDs of the input elements in an attributed named "externalIds". Therefore N input elements becomes M Dress elements with 1+ DressVariants, M <= N.
WTF also has to do some other processing on its WTF XML input such as detecting which images are real and which are just fake placeholders but we won't discuss that.
I've decided to keep having a class per data type not to diverge too much from the current implementation. How do we now make the manual conversions generic and reusable?
Let's first see how I would like to construct the processing pipeline:
So we fetch XML from a URL, send it to a parser to extract some nodes that are automatically converted to DressVariant objects, next we use a transformer that merges multiple DressVariants into a single, unified Dress object, and finally we convert the resulting POJO into a Mongo DBObject before storing it into the DB. What do we use for the conversions?
Result: Aside of custom data-type-specific transformations, instead of 1 POJO, 4 hand-coded converters, and 2+2 methods for each data type we now need only 1 POJO per data type plus one generic converter, 4 generic methods and one or two libraries. Less coding, less code, less defects, more productivity, more fun.
Notice that thanks to our choice of libraries, if the default conversion schemas turn out not to be sufficient for us, we can tweak them as much as we want - though we most certainly don't want to go that way. It's better to sacrifice some flexibility and more fit data formats than doing too many tweaks, struggling with the mapping libraries instead of leveraging them. A wise man chooses his battles.
Sample code demonstrating automatic, generic mappings XML -> Java -> Mongo -> REST with JSON is available at GitHub - generic-pojo-mappers.
Many applications force developers to convert data between a number of objects, which is very unproductive and error-prone. A better approach is to avoid the conversions and use the same object throughout the whole processing as much as possible, doing conversions only when really necessary. These conversions are better written in a generic and reusable way than hand-coded for each data type and it often pays off to use an existing, mature mapping library for that (though you must make sure your intended use is aligned with its philosophy and design).
Using the same object throughout the processing causes it to be less fitted for the individual processing stages but it makes them much easier and faster to write and maintain. We lose some performance due to using reflection but that is negligible with respect to the I/O (retrieving a file over HTTP, sending data to a DB) and XML parsing.
In the example of the World of Thrilling Fashion, we have cut the amount of manual coding and methods considerably and the result is a smaller, cleaner, and more flexible code (w.r.t. adding new data types).
But I really need to use objects fine-tuned for each layer of processing!
Your choice, if you really need it, do it - but be aware of how much you pay for it.
Libraries are evil!
Well, yes. Sometimes it's better to hand-code things but not always. Make sure that you don't use a library in a way different than intended because then you might lose more time fighting it than being productive.
You are an idiot!
Yes, many people think so. Thank you for reading.
Do you say that I'm an idiot if I wrote code like that?
Not at all, you might have good reasons to do so. Or you might not know the alternatives. Or you just haven't such a strong dislike of writing mindless code as I do. That's OK.
The rich Persistent Domain Object + slim Gateway patterns by Adam Bien also make it possible to use the same object (a JPA entity) throughout all the application (web UI - DB) in the name of increased productivity.
M. Fowler's EmbeddedDocument is a pattern for working with JSON flowing in/out of our services (REST <-> JSON-friendly DB) without unnecessary conversions but with good encapsulation; naive approach: json -> object graph -> (processing) -> json; "In many of these situtiations a better way to proceed is to keep the data in a JSONish form, but still wrap it with objects to coordinate manipulation." - use a lib to parse the JSON into a generic structure (e.g. a structure of lists, and maps/dicts) and store in a field of an object defining methods that encapsulate it - f.ex. for an Order we could have a method returning the customer and another computing the cost, accessing the underlying generic structure. The user of the wrapper object doesn't need to know/care about the underlying structure.
The application, The World of Thrilling Fashion (or WTF for short) collects and stores information about newly designed dresses and makes it available via a REST API. Every poor dress has to go through the following conversions before reaching a devoted fashion fan:
- Parsing from XML into a XML-specific XDress object
- Processing and conversion to an application-specific Dress object
- Conversion to a MongoDB's DBObject so that it can be stored in the DB (as JSON)
- Conversion from the DBObject back to the Dress object
- Conversion from Dress to a JSON string
Uff, that's lot of work! Each of the conversions is coded manually and if we want to extend WTF to provide information also about trendy shoes, we will need to code all of them again. (Plus couple of methods in our MongoDAO, such as getAllShoes and storeShoes.) But we can do much better than that!
Eliminating the Manual Conversions
It's time-consuming, error-prone and annoying to code all the conversions while you actually want to use your time to build business logic and not some boilerplate code. We can eliminate the manual work in two ways:
- Generalize the conversions so that they only need to be written once (likely leveraging existing conversion libraries)
- Eliminate them, e.g. use the same data format through the complete processing chain
To be fair, I have to admit that the manual approach also has some advantages: you have full (fool? :-)) power over the form of the objects and can fit them perfectly for the processing required, you don't need to introduce huge and buggy libraries, and you earn more money, if paid by LOC.
However the disadvantages named above nearly always overweight the advantages, especially if you reuse suitable, mature, high-quality libraries that allow you to customize the processing to any detail on an on-need basis.
One question remains: how do we represent the data? We have two possibilities:
- With generic data structures, i.e. maps. This is common in dynamic functional languages such as Clojure and it is extremely easy and comfortable.
- Pros: Less work, very flexible, generic operations can be applied easily (map, filter, etc.)
- With objects specific for each data type, i.e. POJOs such as DressVariant, Shoes
- Pros: Type safety, the compiler helps to ensure that your code is correct, it might be easier to understand
- Cons: You have to write and maintain a class for each possible data element being processed
Sidenote: The Business Domain
You might skip this section and only come back later if you want to understand the reasoning behind the design.
WTF has to do some processing of the dress elements that it retrieves, mainly because multiple elements may represent the same dress only with slight variations such as color. WTF thus stores such a group of related elements as a list of DressVariant items inside a parent Dress object, generates a unique ID for the Dress and stores the IDs of the input elements in an attributed named "externalIds". Therefore N input elements becomes M Dress elements with 1+ DressVariants, M <= N.
WTF also has to do some other processing on its WTF XML input such as detecting which images are real and which are just fake placeholders but we won't discuss that.
Implementing the Static-Typed Generic Processing
I've decided to keep having a class per data type not to diverge too much from the current implementation. How do we now make the manual conversions generic and reusable?
Let's first see how I would like to construct the processing pipeline:
fetchFrom("http://wtf.example.com/atom/dresses.xml")
.parseNodesAt("/feed/dress")
.transform(DressVariant.class, new DressDeduplicatingTransformer()); // Transformer
.transform(new PojoToDboTransformer()); // Transformer
.store(new MongoDAO());
// + we'll use DBObject.toMap() + PojoToJson mapper when serving the data via REST
So we fetch XML from a URL, send it to a parser to extract some nodes that are automatically converted to DressVariant objects, next we use a transformer that merges multiple DressVariants into a single, unified Dress object, and finally we convert the resulting POJO into a Mongo DBObject before storing it into the DB. What do we use for the conversions?
- XML -> DressVariant: Use JAXB to convert Nodes to our POJO annotated with @XmlRootElement. Notice that you can customize the conversion that JAXB performs very much, if the need be. Thus you only need to create a simple POJO and add one annotation.
- DressVariant -> Dress: We will check the MongoDB and either send further an existing Dress or a new Dress object with this DressVariant added (this will result in multiple updates if the dress really has multiple formats in the input feed, but that isn't a problem for us). This conversion is type-specific, i.e. for each data type we have to code its own transformation. That is good because for example Shoes don't need any such deduplicating processing/converting.
- Dress -> DBObject: We will use the Jackson Mongo Mapper, and extension of the first-class JSON mapping library, that adds support for Mongo DB. It will also performs some special data sanitization required by Mongo, such as replacing '.' in map keys with '-'.
- DBObject -> MongoDB: We will have one generic method,
storeDocument(String collectionName, DBObject doc)
, where the collection name is derived from the original object (e.g. DressVariant -> "dressVariants"). The doc's attribute id is expected to be its unique identificator (and thus we will either update or insert based on its [missing] value). - MongoDB -> DBObject: Again a generic method,
list(String collectionName)
- DBObject -> Map: The DBObject does that itself
- Map -> JSON: We will use the PojoMapping feature of the Jersey REST library to automatically convert the Map produced by our methods to JSON when sending it to the clients.
- JSON -> clients: We will have one GenericCollectionResource with a list method mapped to the URL /list/{collectionName}". It will load the collection from Mongo as described and return a List, automatically converted to JSON by Jersey.
Result: Aside of custom data-type-specific transformations, instead of 1 POJO, 4 hand-coded converters, and 2+2 methods for each data type we now need only 1 POJO per data type plus one generic converter, 4 generic methods and one or two libraries. Less coding, less code, less defects, more productivity, more fun.
Notice that thanks to our choice of libraries, if the default conversion schemas turn out not to be sufficient for us, we can tweak them as much as we want - though we most certainly don't want to go that way. It's better to sacrifice some flexibility and more fit data formats than doing too many tweaks, struggling with the mapping libraries instead of leveraging them. A wise man chooses his battles.
Sample Code
Sample code demonstrating automatic, generic mappings XML -> Java -> Mongo -> REST with JSON is available at GitHub - generic-pojo-mappers.
Summary and Conclusion
Many applications force developers to convert data between a number of objects, which is very unproductive and error-prone. A better approach is to avoid the conversions and use the same object throughout the whole processing as much as possible, doing conversions only when really necessary. These conversions are better written in a generic and reusable way than hand-coded for each data type and it often pays off to use an existing, mature mapping library for that (though you must make sure your intended use is aligned with its philosophy and design).
Using the same object throughout the processing causes it to be less fitted for the individual processing stages but it makes them much easier and faster to write and maintain. We lose some performance due to using reflection but that is negligible with respect to the I/O (retrieving a file over HTTP, sending data to a DB) and XML parsing.
In the example of the World of Thrilling Fashion, we have cut the amount of manual coding and methods considerably and the result is a smaller, cleaner, and more flexible code (w.r.t. adding new data types).
Criticism
But I really need to use objects fine-tuned for each layer of processing!
Your choice, if you really need it, do it - but be aware of how much you pay for it.
Libraries are evil!
Well, yes. Sometimes it's better to hand-code things but not always. Make sure that you don't use a library in a way different than intended because then you might lose more time fighting it than being productive.
You are an idiot!
Yes, many people think so. Thank you for reading.
Do you say that I'm an idiot if I wrote code like that?
Not at all, you might have good reasons to do so. Or you might not know the alternatives. Or you just haven't such a strong dislike of writing mindless code as I do. That's OK.
Related
The rich Persistent Domain Object + slim Gateway patterns by Adam Bien also make it possible to use the same object (a JPA entity) throughout all the application (web UI - DB) in the name of increased productivity.
M. Fowler's EmbeddedDocument is a pattern for working with JSON flowing in/out of our services (REST <-> JSON-friendly DB) without unnecessary conversions but with good encapsulation; naive approach: json -> object graph -> (processing) -> json; "In many of these situtiations a better way to proceed is to keep the data in a JSONish form, but still wrap it with objects to coordinate manipulation." - use a lib to parse the JSON into a generic structure (e.g. a structure of lists, and maps/dicts) and store in a field of an object defining methods that encapsulate it - f.ex. for an Order we could have a method returning the customer and another computing the cost, accessing the underlying generic structure. The user of the wrapper object doesn't need to know/care about the underlying structure.
The sweet spot for an embedded document is when you're providing the document in the same form that you get it from the data store, but still want to do some manipulation of that data. [..] The order object needs only a constructor and a method to return its JSON representaiton. On the other hand as you do more work on the data - more server side logic, transforming into different representations - then it's worth considering whether it's easier to turn the data into an object graph.
You might enjoy also other posts on effective development.