How to Add MapRed-Only Node to Hadoop
- Do not start the datanode service on the node
- If you've configured Hadoop to allow only nodes on its whitelist files to connect to it then add it to the file pointed to by the property mapred.hosts but not to the file in dfs.hosts.
- Otherwise add the node to the DFS' blacklist, i.e. file pointed to by the property dfs.hosts.exclude and execute
hadoop dfsadmin -refreshNodeson the namenode to apply it.
Continue reading →
Most interesting links of July '12
Recommended Readings
Continue reading →
Book Review: Implementation Patterns
Summary: Should you read the book? Yes, the chapter on principles and values is trully enlightening. The book in general contains pearls of wisdom hidden in the mud of "I know that already, man." I would thus recommend skimming through the book and reading only the pieces matching your level and needs.
The book seems to be targeted a lot at Java beginners (especially the chapter on collections), going into otherwise unnecessary details, yet there are many valuable advises of which some can only be appreciated by somebody with multiple years of professional programming experience. It thus seems to me that the book isn't a perfect match for anybody but everybody will find there many useful ideas. It would best be split in two.
An experienced developer will already know many of the patterns though it's perhaps useful to see them named and described explicitly and listed next to each - it helps to be aware and clearer of what you do and why you do it.
I'd absolutely recommend everybody to read the chapter A Theory of Programming, explaining Kent's style of programming and the underlying key values of communication, simplicity and flexibility as well as the more concrete principles (local consequence, minimize repetition, logic and data together, symmetry, declarative expression, co-locating data and logic having the same rate of change). Also in the rest of the book there are valuable ideas that it would be a pity to miss. I list below some of those that I found particularly interesting.
Continue reading →
Notify on Errors in a Log File with Zabbix 1.8
Continue reading →
Testing Zabbix Trigger Expressions
Continue reading →
How to Set JVM Memory for Clojure REPL in Emacs (clojure-jack-in, clojure-swank)
1. Clojure REPL Started for a Lein Project
Continue reading →
Most interesting links of June '12
Recommended Readings
DevelopmentContinue reading →
Creating Custom Login Modules In JBoss AS 7 (and Earlier)
Continue reading →
Serving Files with Puppet Standalone in Vagrant From the puppet:// URIs
Continue reading →
Most interesting links of May '12
Recommended Readings
Continue reading →
Bad Code: Too Many Object Conversions Between Application Layers And How to Avoid Them
The application, The World of Thrilling Fashion (or WTF for short) collects and stores information about newly designed dresses and makes it available via a REST API. Every poor dress has to go through the following conversions before reaching a devoted fashion fan:
- Parsing from XML into a XML-specific XDress object
- Processing and conversion to an application-specific Dress object
- Conversion to a MongoDB's DBObject so that it can be stored in the DB (as JSON)
- Conversion from the DBObject back to the Dress object
- Conversion from Dress to a JSON string
Uff, that's lot of work! Each of the conversions is coded manually and if we want to extend WTF to provide information also about trendy shoes, we will need to code all of them again. (Plus couple of methods in our MongoDAO, such as getAllShoes and storeShoes.) But we can do much better than that!
Continue reading →
Beautiful Code: Simplicity Yields Power
In Simple Made Easy argues Rich Hickey that mixing orthogonal concerns introduces unnecessary complexity and that we should keep them separate. This mixing sometimes occurs on such a basic level that we believe that there is no other way to do it, an example being the interleaving of polymorphism and hierarchical namespacing represented by OO class hierarchies. Taking those "complected" concerns apart and dealing with them separately yields cleaner, simpler solutions and sometimes also more powerful ones because you are free to combine them as you need and not as the author decided.
Continue reading →
Creating On-Demand Clusters in EC2 with Puppet
Continue reading →
Most interesting links of April '12
Recommended Readings
- V. Duarte: Story Points Considered Harmful - Or why the future of estimation is really in our past... (also as 1h video) - thoughtful and data-backed claim that there is a much cheaper way for estimating work throughput than estimating each story in story points (SP) and that is simply counting the stories. Even though their sizes differ, over (not that much) longer periods, where it really matters, these differences will even out. The author argues that estimating in number of stories provides the same reliability and benefits as SP and is much easier. (Keep in mind that estimation is just an attempt at predicting the future and humans are proved to be terrible at doing that; why to pretend that we can do it?) I'd recommand this to anybody doing Scrum and similar.
- M. Fowler: Test Coverage - it's obvious that increasing test coverage for the sake of test coverage it's a nonsense but some people still need to be reminded of it :-). Fowler explains what the real benefit of test coverage measurements is and how to use it for good instead of for evil.
- Brian Marick: How to Misuse Code Coverage (pdf) - cited a lot by Fowler in his article, this is really a good paper. Marick has participated in the development of several code coverage tools and understands well their limitations. One of the key points is that code coverage tools can discover only one class of test weakness (not testing some paths through your code) but cannot discover that you are missing some code you should have (e.g. when you check only for two of three possible return values). Thus the code coverage metric tells you "this code isn't well tested, are you sure you don't to look more into it"? It's crucial not to write tests so as to increase the code coverage; look at the code and improve the test without any regard for coverage. You may thus decrease the likeliness of both the class of problems.
- A Year with MongoDB - Kiip has found out that Mongo isn't the best choice for them (having 240GB, 500+ operations/s, 85M docs and their specific usage of the store) and migrated to the combination of Riak (key-value store) and PostgreSQL. Some of the issues they hit are slow counts and limit/offset queries due to using non-counting B-trees for indexing, memory management that could be more intelligent and tuned for the use to make sure the data needed is indeed in RAM, no built-in support for compressing key names (their size adds up as they're repeated in each document; you've to compress them [user -> u etc.] in the client if you want to), limited concurrency due to process-wide write lock (which becomes a problem if the write's aren't short enough w.r.t. number of ops/s, e.g. because data isn't in RAM and/or the query is complicated), safe settings (waiting for a write to finish, ...) off by default, offline-only table compaction (w/o it the disk usage grows unbounded). The lessons learnt for me: Know your storage, its weaknesses and intended way of usage, and make sure it matches your needs.
- Rudolf Winestock: The Lisp Curse - Lisp's expressive power is actually a cause of its lack of momentum because it's so easy to implement anything that people have no need to join forces and thus there are many half-baked ("works-for-me") solutions for anything - but no complete, generally accepted one. An interesting essay. "Lisp is so powerful that problems which are technical issues in other programming languages are social issues in Lisp."
- Understanding JDBC Internals & Timeout Configuration - the article itself could have been written better but it conveys the important information that configuring timeouts for JDBC isn't trivial because they need to be set correctly at different levels and without a socket timeout set in a driver-specific way it can hang forever if the DB cannot be reached due to network/system failure
- Circos: An Amazing Tool for Visualizing Big Data - this article is interesting primarily for its combination of Google Analytics API, Neo4J and an unusual data visualization with circular graphs
Continue reading →
Exposing Functionality Over HTTP with Groovy and Ultra-Lightweight HTTP Servers
I used Groovy for its high productivity, especially regarding JDBC - with GSQL I needed only two lines to get the data from a DB in a user-friendly format.
My ideal solution would make it possible to start the server with support for HTTPS and authorization and declare handlers for URLs programatically, in a single file (Groovy script), in just few lines of code. (Very similar to the Gretty solution below + the security stuff.)
Continue reading →