How to Add MapRed-Only Node to Hadoop

I was surprised not to be able to google an answer to this so I want to record my findings here. To add (a.k.a. commision) a node to Hadoop cluster that should be used only for map-reduce tasks and not for storing data, you have multiple options:
  1. Do not start the datanode service on the node
  2. If you've configured Hadoop to allow only nodes on its whitelist files to connect to it then add it to the file pointed to by the property mapred.hosts but not to the file in dfs.hosts.
  3. Otherwise add the node to the DFS' blacklist, i.e. file pointed to by the property dfs.hosts.exclude and execute hadoop dfsadmin -refreshNodes on the namenode to apply it.

Continue reading →

Most interesting links of July '12

A brief one due to (thanks to?) holiday and an accompanying surprising lack of enthusiasm for the technical stuff.

Recommended Readings


Continue reading →

Book Review: Implementation Patterns

Implementation Patterns, Kent Beck, 2007, ISBN 0321413091.


Summary: Should you read the book? Yes, the chapter on principles and values is trully enlightening. The book in general contains pearls of wisdom hidden in the mud of "I know that already, man." I would thus recommend skimming through the book and reading only the pieces matching your level and needs.

The book seems to be targeted a lot at Java beginners (especially the chapter on collections), going into otherwise unnecessary details, yet there are many valuable advises of which some can only be appreciated by somebody with multiple years of professional programming experience. It thus seems to me that the book isn't a perfect match for anybody but everybody will find there many useful ideas. It would best be split in two.

An experienced developer will already know many of the patterns though it's perhaps useful to see them named and described explicitly and listed next to each - it helps to be aware and clearer of what you do and why you do it.

I'd absolutely recommend everybody to read the chapter A Theory of Programming, explaining Kent's style of programming and the underlying key values of communication, simplicity and flexibility as well as the more concrete principles (local consequence, minimize repetition, logic and data together, symmetry, declarative expression, co-locating data and logic having the same rate of change). Also in the rest of the book there are valuable ideas that it would be a pity to miss. I list below some of those that I found particularly interesting.

Continue reading →

Notify on Errors in a Log File with Zabbix 1.8

Situation: You want to get notified when a log entry marked ERROR appears in a log file. You want the corresponding trigger to reset back to the OK state if there are no more errors for 10 minutes. (This post assumes certain familiarity with Zabbix UI.)


Continue reading →

Testing Zabbix Trigger Expressions

When defining a Zabbix (1.8.2) trigger e.g. to inform you that there are errors in a log file, how do you verify that it is correct? As somebody recommended in a forum, you can use a Calculated Item with a similar expression (the syntax is little different from triggers). Contrary to triggers, the value of a calculated item is easy to see and the historical values are stored so you can check how it evolved. If your trigger expression is complex the you can create multiple calculated items, one for each subexpression.


Continue reading →

How to Set JVM Memory for Clojure REPL in Emacs (clojure-jack-in, clojure-swank)

How to increase heap size for Clojure REPL started from Emacs, either standalone or as a part of a project.

1. Clojure REPL Started for a Lein Project


Continue reading →

Most interesting links of June '12

Recommended Readings

Development
Continue reading →

Creating Custom Login Modules In JBoss AS 7 (and Earlier)

JBoss AS 7 is neat but the documentation is still quite lacking (and error messages not as useful as they could be). This post summarizes how you can create your own JavaEE-compliant login module for authenticating users of your webapp deployed on JBoss AS. A working elementary username-password module provided.


Continue reading →

Serving Files with Puppet Standalone in Vagrant From the puppet:// URIs

If you use Puppet in the client-server mode to configure your production environment then you might want to be able to copy & paste from the prod configuration into the Vagrant's standalone puppet's configuration to test stuff. One of the key features necessary for that is enabling file serving via "source => 'puppet:///path/to/file'". In the client-server mode the files are served by the server, in the standalone mode you can configure puppet to read from a local (likely shared) folder. We will see how to do this.


Continue reading →

Most interesting links of May '12

This was a rich month, bringing some hope for ORM, providing a peep-hole into the bright and awesome future with in-browser video and other cool web-stuff presented at WebRebels 2012 and IDEs providing immediate feedback and visualisation. There were valuable articles about simplicity and quality in software and good talks about the lean startup (i.e. enabling innovation) and other topics.

Recommended Readings


Continue reading →

Bad Code: Too Many Object Conversions Between Application Layers And How to Avoid Them

Have you ever worked with an application where you had to copy data from one object to another and another and so on before you actually could do something with it? Have you ever written code to convert data from XML to a DTO to a Business Object to a JDBC Statement? Again and again for each of the different data types being processed? Then you have encountered an all too common antipattern of many "enterprise" (read "overdesigned") applications, which we could call The Endless Mapping Death March. Let's look at an application suffering from this antipattern and how to rewrite it in a much nicer, leaner and easier to maintain form.

The application, The World of Thrilling Fashion (or WTF for short) collects and stores information about newly designed dresses and makes it available via a REST API. Every poor dress has to go through the following conversions before reaching a devoted fashion fan:

  1. Parsing from XML into a XML-specific XDress object
  2. Processing and conversion to an application-specific Dress object
  3. Conversion to a MongoDB's DBObject so that it can be stored in the DB (as JSON)
  4. Conversion from the DBObject back to the Dress object
  5. Conversion from Dress to a JSON string


Uff, that's lot of work! Each of the conversions is coded manually and if we want to extend WTF to provide information also about trendy shoes, we will need to code all of them again. (Plus couple of methods in our MongoDAO, such as getAllShoes and storeShoes.) But we can do much better than that!


Continue reading →

Beautiful Code: Simplicity Yields Power



In Simple Made Easy argues Rich Hickey that mixing orthogonal concerns introduces unnecessary complexity and that we should keep them separate. This mixing sometimes occurs on such a basic level that we believe that there is no other way to do it, an example being the interleaving of polymorphism and hierarchical namespacing represented by OO class hierarchies. Taking those "complected" concerns apart and dealing with them separately yields cleaner, simpler solutions and sometimes also more powerful ones because you are free to combine them as you need and not as the author decided.


Continue reading →

Creating On-Demand Clusters in EC2 with Puppet

For a recent project I needed to be able to start on-demand clusters of machines in Amazon EC2. We needed each instance in a cluster to allow SSH and sudo access for all team members and to install and configure the software appropriate for that cluster ("database" node or "testclient" node).

Continue reading →

Most interesting links of April '12

Recommended Readings


Continue reading →

Exposing Functionality Over HTTP with Groovy and Ultra-Lightweight HTTP Servers

I needed a quick and simple way to enable some users to query a table and figured out that the easiest solution was to use an embedded, ligthweight HTTP server so that the users could type a URL in their browser and get the results. The question was, of course, which server is best for it. I'd like to summarize here the options I've discovered - including Gretty, Jetty, Restlet, Jersey and others - and their pros & cons together with complete examples for most of them. I've on purpose avoided various frameworks that might support this easily such as Grails because it didn't feel really lightweight and I needed only a very simple, temporary application.

I used Groovy for its high productivity, especially regarding JDBC - with GSQL I needed only two lines to get the data from a DB in a user-friendly format.

My ideal solution would make it possible to start the server with support for HTTPS and authorization and declare handlers for URLs programatically, in a single file (Groovy script), in just few lines of code. (Very similar to the Gretty solution below + the security stuff.)


Continue reading →

Copyright © 2026 Jakub Holý
Powered by Cryogen
Theme by KingMob