My path to SCEA 5

I'd like to share with you my experience with the Sun Certified Enterprise Architect for the Java Platform, Enterprise Edition 5 (SCEA 5) [1] certification. There was a couple of unclear things regarding the assignment and its deliverables and I've learned some interesting things (mostly about hardware estimation and deployment environments such as the "clouds"), both of which may be of an interest to somebody aiming at this certification. I only know that I passed but not how well, so my way of doing things, though sufficient, may not be the best one.

The value of the certification

This certification was really valuable for me, because:
  • I had to read a couple of important books that I always wanted to read but never found the time to do so (including Real World Java EE Patterns - Rethinking Best Practices,  EJB 3 in Action, UML 2 and the Unified Process: Practical Object-Oriented Analysis and Design (2nd Edition), Architecting Enterprise Solutions: Patterns for High-Capability Internet-based Systems, Design Patterns: Elements of Reusable Object-Oriented Software by the Gang of Four, Core J2EE Patterns)
  • I got an overview of the complete Java EE landscape and the non-functional characteristics of a Java EE application and how they are impacted by the individual technological and architectural choices
  • I had finally an opportunity to use UML in practice, including the determination of the appropriate level of detail and the proper ways to represent some concepts (like non-intrusive representation of an iteration over a collection)
And I believe that it is also valuable for an employer because even though it doesn't prove you to be an architect, it does prove that you have a good overview of the complete JEE landscape, that you are aware of architectonic concerns and the necessity to evaluate the pros and cons of any technological choice, and that you have some basic proficiency with UML and some general knowledge beyond the technologies themselves, such as about design patterns.

Step 1: The knowledge test

About the test: You get 64 questions and 2 hours and need 57% (37 q.) to pass. About half of the questions are multiple-choice and you must select all the correct answers to get the points (I believe that they tell you how many correct answers there are).

The test was quite challenging even though I answered ~ 3/4 questions correctly. Paradoxically, the least I gained in Web Tier Technologies, which in practice I use the most. The main problems were:
  • Missing knowledge - my resources (the book Sun Certified Enterprise Architect for J2EE Technology Study Guide and an additional material regarding tiered architectures (this is a really important topic), EJB3 and JPA) were pretty good, but I completely missed information about JCA and few "obscure" topics like the security limitations of Java Web Start and JVM's bytecode verification, I estimate that in total I missed about 10% of the knowledge needed.  It's important not to skip the less used parts of Java EE, like those mentioned above, JAX-WS etc. You should know about Ajax support in JSF and of course you must know the essential changes between EJB 2.x and 3.0.
  • Multiple-choice test - as mentioned, most questions have several correct answers and you must select all. Selecting the first ones is easy but the last one is sometimes/often difficult to choose, see the next point. It's better to concentrate first on the questions with 1 or only few responses because there is a higher probability that you will get it right.
  • Unclear questions and answers - as usual with tests, if you aren't on the same "brain wave" as the test author, some questions and answers will not be sufficiently clear to you, which is further complicated if you aren't native English speaker. The key is in determining correctly what, in the question or answers, is really important and what is only a "decoration". Inevitably, due to a missing knowledge, practice, or a different thinking process than the author's,  you will sometimes put too much stress on an unimportant fact or, conversely, neglect something that the author considered to be essential.  For example:
    • Does the emphasis on "ease of development" imply the preference of JPA over JDBC? I guess so but who knows what the author thinks?
    • A JEE application needs to communicate with an old terminal server, will you use a session-bean using a screen-scraper or JDBC to access directly the DB of the server? - In the description there was no mention of the server having any database, so how may I know whether it has one? On the other hand, screen-scraping seems to be something more suitable for a JCA adapter than a @Stateless bean... .
    • A couple of questions required that I select the proper combination of competing technologies (such as JPA vs. JDBC) for a particular model situation. Because each technology has some benefits and disadvantages, it was difficult to pick the right one because the short problem statement didn't give me enough information to be sure how these pluses and minuses are important for the author.
Advices:
  • Learn the advantages and disadvantages of the alternative technologies in each layer (JSF vs. JSP with JSTL etc.) and when one is more suitable than the other. You will employ this knowledge also in the parts 2 and 3, so it's really important. After all, you want to be an architect :-) .
  • Though this is Java EE exam, don't push the "cool" technologies (JSF, EJB, JPA, ...) everywhere and don't do or avoid doing something only because in general it is considered to be a good or bad practice, you have to really decide based on the actual situation at hand and a rational consideration of the pros and cons. For example even using JSP with JSTL tags to do database queries may be the most suitable solution in some special case.
  • As usually: read the problem statement and the answers thoroughly and don't let your pre-existing preferences and concepts to form your decision.
  • It's important to have some general knowledge outside of the scope of the exam preparation books. For example you should know what spoofing and tampering are.
  • Read sample questions with answers or a discussion on the Internet (e.g. the JavaRanch SCEA forum) - having seen a couple of similar questions and learning what others consider important in the questions and their answers will make it easier to understand and assess correctly a question you might get. (Thanks to David Z. for this advice.)

Step 2: The assignment

This assignment is the core of the certification, and it took me a considerable amount of time, lot of it spent by searching for UML wisdom (how to express some specific ideas in an appropriate way on a particular diagram), by estimating the resource requirements of the application and by exploring and comparing deployment environments and their non-functional characteristics (Amazon Cloud & other offerings, various physical and virtual server providers etc.). I could spend a lot less time on the deployment part but I really wanted to go into the detail to make it clear for myself.

First of all, make sure to read the documents [1] and [2] (see Resources below) thoroughly, they will make many things clear. Both are created by Humphrey Sheil, one of the essential people behind SCEA 5.

Deliverables

You will receive a brief introduction to the customer's company and her needs for a Java EE application, a structured description of the main requirements, a domain model, and few use case diagrams. And you are expected to deliver the following:
  • A class diagram
  • A component diagram
  • A deployment diagram that describes the proposed physical layout of the major tiers
  • A Sequence or Collaboration diagram for each use case
  • The top three technical risks & their mitigation strategies

Understanding what to do

The assignment is quite vague and only specifies a very high-level description of the main use cases and thus there are many open many questions:
  • The assignment itself seems to be incomplete, the requirements ("Workshop Output") mention things that are not reflected by its other parts, namely the use case diagrams, and miss information necessary for the design. For example it mentioned that a visualisation of the "product" that users may design for themselves via the webapp under construction is very important and that the customer has therefore procured a modeling tool, which can render various visualizations from an XML input. But no UC contained an invocation of the visualization and there were no details regarding the communication means supported by the tool (webservice, something else, command line, or only GUI?). Therefore you have to fill the missing parts for yourself and replace the missing information with assumptions, perhaps adding the unclear element to the list of risks.
  • What about all the common parts of an enterprise web application that aren't mentioned explicitly in the assignment, like user registration/login, administration? 
Answer as per [1]: You should include them. So I added few JSPs and one or two manager EJBs without going into any detail, providing only a draft of the functionality to present that I'm aware of it.
  • What should be the level of detail of the Class and Component diagrams? (See [4] for the distinction between analytical, design and implementation classes and the corresponding level of detail/abstraction.) According to [1], quite a low one, but still on the design level.
    • My domain model provided with the assignment had 11 entities and the class diagram I've produced had 25 classes plus 8 views (namely JSPs). The level of detail of the class diagram was limited by the need to fit it on a single page, so it was on quite a high level, including the entities, important manager EJBs and few other classes representing important architectural concepts (a cache, an interceptor, a DAO for accessing an external tool,...).
    • The domain model maps to classes nearly directly, only with few small modifications and extensions (like introducing a parent class) , of course with the important added information of @Entity or @Embeddable, where appropriate.
  • Sequence Diagrams (SD) - shall the UI controller be included? Or even the JSPs? I did include both.
    • I kept SDs on a rather high level, using only as little special constructs as possible, like once a loop for expressing that something is done for each item of a collection, a ref to express the fact that an SD depends on the outcome of another SD, opt and alt, both once, where I wanted to depict an important operation that doesn't need to be performed and to stress  alternative paths.
  • Component Diagram (CD) - what is a component, i.e. the proper level of detail?
    • According to the instructions in the assignment itself: "Examples of components are Enterprise JavaBeanTM (EJBTM), servlets, ..."
    • According to [1] and [2], the CD should be more like a package diagram, i.e. components should be on a higher level than a single class (EJB/JSP/...), and that's what I did, creating components (as empty boxes) like "Design Logic and Persistence", "Customer Search", "Admin UI", "User Management and Persistence", "LDAP DAO", in total 14 of them in 3 layers plus 3 external systems, connecting them only with the uses relationship (---->), i.e. no interfaces, ports etc.
    • [1] suggests depicting security constraints on admin pages, more detailed JSPs etc., but this isn't in [2]; I followed [2]

Designing the solution

The key part is of course the solution itself, the various diagrams are only its visible manifestation, not the thing itself. You have to decide what parts should the system have, how will they interact, how will that ensure that the non-functional and functional requirements are met etc. Note down any risk you discover or an assumption that you make.

The diagrams

Once the proper level of detail is clear and you have the solution in you mind, it isn't too difficult to draw the class, component, sequence (or collaboration), and deployment diagrams. You may be sometimes unsure how to depict something while keeping the same level of detail, so a good UML book is handy. A good UML editor is also necessary. I used Visual Paradigm (Modeller Edition), which drove me nuts sometimes, being more of a hindrance than a help, but it was certainly better than Paintbrush.

The deployment: Resource estimation and hardware selection

Estimating the resources needed by the application and proposing a hardware providing them was perhaps the most difficult and time-consuming task for me, because all of that was completely new to me. Actually it isn't that important part of the assignment, you can perhaps just guess something and do not need to propose any particular hardware (as H. Sheil says in [2] at the page 7, no significant marks are allocated to this), but for me the main purpose of the certification was to learn.

The important questions here are:
  • How to estimate the resource requirements?
  • Should we consider horizontal SW scaling, namely multiple AS instances on a single physical node?
  • What HW and deployment environment (physical, virtual dedicated server(s), a cloud) to choose?

How to estimate the resource requirements?

How to assess the HW necessary? Is there enough input information? I think it isn't, also, you can't really know the real resource requirements without a proper load testing. But you can and should perform an "informed guess".
  • See an explanation why you can't really estimate the HW in a JavaRanch forum. And a disagreeing follow-up: "Yes. The exam specifies the no. of concurrent users, the required up time or words to that effect. At least mine did. The size of a typical request can be reasonably guessed from the estimated maximum size of the web page in the application . The tps [transactions per second] can be derived from the above two."
I had only a little idea of what the resource requirements of an enterprise web application may be, even knowing the availability requirements and number of concurrent users from the assignment. The requirements of each application are of course different but I wanted to find out at least some limits within which my estimate should be, and fortunately I had competent colleagues willing to help (thanks you, David, Alda and Mishanek!). So below are two examples that can give you an idea what you may or may not need.
Example 1: A very high-traffic enterprise web application
The application is a J2EE learning management software that must support 2000 concurrent users with the availability of 24/7. It's use of the DB is far from being efficient and thus the DB HW must be very powerful. The infrastructure is:
  • 2 physical Websphere nodes (and it's been considered to have 2 AS instances on each)
    • 4 GB heap (this proved to be enough; 2GB are left for the OS)
    • can use up to 8 virtual processors
    • no session failover due to inability of the JEE application itself
    • Scalability: WebSphere needs for double users approx. 2.5 increase in CPU (verified in this case, likely not generally applicable)
    • Websphere itself can scale nearly linearly with factor 0.9 - 0.95 but it also depends a lot on the application
  • Database
    • master-slave with automatic fail-over
    • 32 GB RAM
    • 24 (virtual) processors
Example 2: Sizing HW for the Liferay Portal
Liferay Portal for 5000 registered users with 500 of them accessing it concurrently would need an application server with 4GB memory and a 3GHz quad-core Xeon processor; for the database, a server below 4GB/1-quad 3GHz would suffice.
Example 3: Recommended minimum for a production WebSphere AS
IBM recommends: "Typical deployments of IBM WebSphere Application Server v7.0 require at least a High-CPU Medium instance to have access to enough physical memory and computing power. A small instance type is sufficient for development, single-user purposes only." That means 1.7GB RAM and 2 (virtual) cores each with the power of 2.5-3.0 GHz 2007 Opteron or Xeon processor.

Because this is the minimum production configuration to run WebSphere, we can take is as the minimum configuration for any "standard" Java EE web application.
Conclusion
Given the examples above, other information, and my requirement for 200 concurrent users doing no resource-intensive actions and 10h/day availability,  I believe one 4GB, 2-core 2.5GHz machine running both a single AS and the DB would be able to satisfy the (rather friendly) performance requirements in my assignment. If running on Amazon EC2 with their great SLA, it would be also available enough with future scalability possible either by increasing the resources or by  splitting the DB out of the machine - but such a single machine solution doesn't sound enough "enterprisy" so I've chosen a more classical one with multiple but weak (not to waste money) virtual machines from Amazon EC2.

Quote: "Oftentimes installations find that their database server runs out of capacity much sooner than the WebLogic Server does. You must plan for a database server that is sufficiently robust to handle the application. Typically, a good application will require a database three to four times more powerful than the application server hardware." [the source is unknown]

Should we consider running multiple AS instances on a single physical node?

"It is usually sufficient to create a single server instance on a machine, since the Application Server and accompanying JVM are both designed to scale to multiple processors. However, it can be beneficial to create multiple instances on one machine for application isolation and rolling upgrades." [Sun Java System Application Server 9.1 Deployment Planning Guide, p22]

"2 processes can utilize CPU and memory better than 1 (rule of thumb: 1 instance for every 2 CPUs)" [I haven't noted down the source of this advice :(]

I used a low-resource machines so running multiple SW instances was not an option for me anyway.

Update: A recent blog comments on 64b JVM issues and recommends to use several 2GB instances instead of one multi-GB one.

What HW and deployment environment to choose?

To simplify my life, I didn't consider all the applicable hardware that you can buy but only the offerings of a few providers of physical and virtual server hosting and cloud environments.

You may choose on of three deployment environments (provided that you don't want to run your own one):
  1. Hire a physical server - the most expensive but may be more cost-effective in the long-term, especially if there is permanent high load
  2. Hire a Virtual Private Server (VPS) - the virtual servers offerings have usually rather low resources (such as 0.5-1.5 GB, ~ 1 GHz) and are thus rarely suitable for Java EE
  3. Hire a virtual server running in a cloud - very flexible (easy addition of a new instance, scale an instance up/down only with a restart), may provide very good SLA, but the final price may be much higher than the price of the virtual server itself due to various small fees that accumulate a lot. See Stack Overflow: Is the Cloud ready for an Enterprise Java web application? Seeking a JEE hosting advice.
Let's have a look at some of the physical and VPS offerings:
  • ServInt VPS: Unlimited VPS: 2GB, RADI 10, $129/m, or Super VPS with 4GB but still only a 1 single-core CPU
  • AbcHost.cz - lease and hosting of a dedicated physical server, including
    • 2*dual core AMD Opteron 2,6GHz, 8GB RAM - 4800 CZK/$230 (Sun Fire X2200 M2 with Opteron 2218)
    • 2*quad core AMD Opteron 2,2GHz, 8GB RAM - 5800 CZK/$290 (Sun Fire X2200 M2 with Opteron 2354)
    • BM xSeries 335 1U server,  CPU Intel Xeon 2,4 GHz, 2GB RAM, 2*36GB HDD@10k w/ RAID 1 (mirroring) - 2890CZK/$146
    • IBM xSeries 335 1U server, CPU Intel Xeon 2,8 GHz (otherwise as above) - 3190CZK/$155
    • Connectivity 100 Mbps
  • Rackspace as a dedicated server provider:
    • Basic One dedicated server: 1xDual-Core AMD Opteron, 3.5GB RAM, 2x250GB, 7.2K, SATA (RAID1), 2TB Bandwidth - $419/month
    • Basic Two dedicated server: 1xQuad-Core AMD Opteron, 8GB RAM, 3x250GB, 7.2K, SATA (RAID5), 2TB bandwith - $529/mo
  • VPS Performance Comparison (11/2009)
And some cloud offerings:
  • Rackspace as a cloud VPS provider:
    • Rackspace uses Xen, RAID10, every cloud server gets 4 virtual CPUs, 64b. The virtual CPU is *perhaps* an equivalent of Opteron 2347 HE, i.e. 2GHz
    • A basic VPS: 1024 MB RAM, 40 GB HDD ~ $44/mo (the cost of data transfer is perhaps negligible)
    • A more powerful VPS: 4 GB RAM, 160 GB HDD -  $175/mo
    • Rackspace vs. Amazon EC2:
      • (According to Rackspace): the server is persistent, RAID10 storage; EC2 is cheaper but RS has higher performance & higher disk throughput and thus it pays off
        • "Most common VPS offerings only offer the equivalent of 128MB up to 512MB RAM at most. With most VPS offerings, you are not guaranteed the resources that you are paying for,.."
        • "Cloud Server host machines have dual quad core processors. Each Cloud Server is assigned four virtual cores and the amount of CPU cycles allocated to these cores is weighted based on the size of the Cloud Server. For example a 4G Cloud Server will have twice the weight of a 2G Cloud Server."
      • Rackspace Cloud Servers versus Amazon EC2: Performance Analysis at The Bitsource
      • Some negative experience with Rackspace (in the comment #1) - poor support, problems with crashes and a "totally dysfunctional staff and broken support system"
      • Why Mixpanel migrated from Rackspace to Amazon (11/2010) - EBS more powerful and flexible than disks, variety of instance types, likely less outages, (much) better control tools, easier backups thanks to no 2G limit, pricing (reserved and bidded instances, A. constuntly  reduces its prices); main reason: "Amazon just iterates on their product faster than anyone else and has the best one.".
  • Amazon EC2
    • VPS instances of interest:
      • High-CPU Medium: 1.7GB RAM, 2 virtual cores each w/ 2.5 EC CU; reserved: ~ $96/month (plus traffic etc.)
      • Large instance: 7.5GB, 4 CU (2 virtual cores with 2 CU each); reserved: ~ $190/month (plus traffic etc.)
      • Notice that you can either pay standard hour rates or "reserve" the instance for  a 1 or 3 years fee to get lower hourly rates (which I've done when computing the price per month)
    • 1 EC CU (compute unit) = 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
    • I/O shared, two priorities: medium or high (i.e. better I/O access)
    • The instances are not persistent, any changes to the filesystem will be lost after a restart, you must use Amazon EBS for a permanent storage (and an extra fee)
    • Additional advantages/offerings (for extra money):
      • Elastic Load Balancing => not necessary to include a load balancer of my own
      • Amazon Relational Database Service (RDS) [beta] = MySQL 5.1 running in its own instance + advantages: auto backup, easy storage/mem/cpu scaling [restart required] - e.g. Large DB Instance: (7.5 GB memory, 4 ECUs ~ 2 cores 2GHz each, 64b) . High Availability is being prepared. Check the Amazon RDS DB Instance Sizing Guide.
      • ...
    • Interesting
      • JBoss middleware for EC2 (beta): JBoss Enterprise Application Platform is built on open standards and integrates JBoss Application Server, Clustering, Cache, Messaging, Hibernate, Seam and JBoss Transactions JTA
      • Websphere at EC2: "Typical deployments of IBM WebSphere Application Server v7.0 require at least a High-CPU Medium instance to have access to enough physical memory and computing power. A small instance type is sufficient for development, single-user purposes only." A note from Amazon: "The WebSphere Application Server AMI is 32 bit and works with Small and High-CPU Medium EC2 instance sizes. "
      • MySQL on Amazon EC2: Articles, Blogs, Docs, FAQs
      • MySQL master-slave setup for EC2 - but "We currently don’t automatically promote the slave to master as this is a very delicate operation. Gotta ensure the master is really not servicing requests first, etc."
      • JBoss@EC2 perfromance problems discussion (using a large instance)
Given these offerings, I've opted for Amazon EC2 and its two medium high-cpu instances for the AS and a large instance for the DB, assuming some secure communication channel (e.g. via VPN) with the external back-end enterprise systems used by the application.

Resources:

[1] A presentation of Humphrey Sheil (one of the major figures behind SCEA 5): scea-part-two-2009-bof.pdf - a MUST READ and re-read a couple of times; 27 slides only

[2] Chapter 9 of Sun Certified Enterprise Architect for Java™ EE Study Guide, Second Edition, M. Cade, H. Sheil, Prentic Hall 2010, ISBN 0-13-148203-3 - a completely worked-out SCEA assignment whose parts are in [1]; 11 pages. Note: Also in safaribooksonline.com, where you can  get 10-days trial access allowing you to read up to 100 pages from not more than 10 books

[3] JavaRanch SCEA forum - a really valuable source

[4] UML 2 and the Unified Process: Practical Object-Oriented Analysis and Design (2nd Edition)

[5] Architecting Enterprise Solutions: Patterns for High-Capability Internet-based Systems

And even more resources

that you perhaps don't need and don't want to study (which I've done neither, I only looked into them here or there)

The risks and assumptions

You should have collected a list of risks and assumptions already during the design process. Most, if not all, application will share some risks common to all (enterprise) web application (security, performance etc.) but usually you will also have few specific ones. Select the three most likely and dangerous ones, decide, how to mitigate them.

Step 3: The essay

At present you have to complete the essay before submitting the assignment, but you should do it only after actually finishing the assignment. That's because even though the questions are generic, the answers are closely related to your particular solution.

The essay consists of 8 questions and you have 90 minutes to answer them, few sentences for each. For me the time was just right (though usually I tend to finish well ahead of a limit :-)). If you are well prepared, it is not too difficult. The best preparation for the essay is to consider
  1. How and why you do meet the individual non-functional requirements (performance, security, ...)
  2. What were the individual technological choices you have done and the other possible alternatives and why you've rejected them. You want to be an architect so you must prove that you are able to come up with alternative solutions and select the best one based on sound arguments. Consider the choices pertaining to:
    1. The non-functional requirements in your solution
    2. The communication with external systems
    3. The patterns that you applied
    4. Frameworks that you applied (even beyond the level of detail of the assignment deliverables, for example what presentation framework you've selected)
Most questions ask you what choice regarding a particular design issue you have made, what were the alternatives, and why you have rejected them. Some may ask also about the risks and mitigation strategies and what would you do if some non-functional requirement changes considerably (such as having few times more users than expected).

To give you a better idea of what is the essay like, I've made up two questions and answers pretty similar to what you might encounter.

Example question 1

Question What framework or technology have you chosen for the persistence layer? What alternatives did you consider and why have you rejected them?

Answer I considered JPA, JDBC, MyBatis (being on a level of abstraction somewhere between JDBC and JPA), and a combination of JPA and JDBC. I've decided to use JPA for most of the application mostly for its ease of use and thus higher productivity and better maintainability and to use JDBC for the reporting module because it has to process large quantities of data and the JPA's overhead like conversion into objects would be too much while its benefits here are negligible. I've rejected MyBatis because the JPA's level of abstraction is more suitable and I don't need to have control over everything, also JPA is a standard and thus more people know it.

Example question 2

Question The customer decides to expand into China and thus the application's availability must be close to 24/7 instead of 8am - 5 pm Pacific Time because the Chinese work non-stop. How would you handle that? Justify your decision.

Answer Each element is already redundant and therefore there is no single point of failure, however I'd add another application server node to make sure that the application can continue normally if one of the ASs fails (if there were only two, the remaining would suddenly need to handle 200% load, which could easily crash it). I'd also extract the business rules hardcoded in the Java code of the DiscountComputation module and move them into the database (for example in the form of Groovy code snippets) so that they can be changed without restarting the application, because currently the frequent rules changes are the main reason for restarts.

Conclusion

The exam takes time and requires some experience, but it is a good motivation and opportunity to learn, both theoretically and practically. It doesn't prove that you are an architect, but it shows, that you know enough to be able to communicate with one and perhaps once become a real architect, and of course it also shows that you are a determined person :-)

Tags: architecture java


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob