Migrating from JRoller to Wordpress

This post describes how to migrate a blog from JRoller.com to WordPress.com. The steps are:
  1. Backup JRoller via the util by La tortue cynique
  2. Export from WP
  3. Convert JRoller to a fragment of the WP format
  4. Add proper header and footer to the generated WP import file
  5. [optional] download images, perhaps upload them somewhere and modify URLs accordingly
  6. Import it into WP
  7. Check formatting, add tags...

0. Introduction

I decided to move my blog from JRoller to Wordpress, especially because I missed a lot an easy blog backup tool, the platform is not managed actively (e.g. not updated in ages) and because I desired for an easy way to post source codes, which WordPress' shortcode [sourcecode] makes possible (though sometimes it gets a bit confused). This blog describes the process to move a blog from JRoller to WP.

1. Backup JRoller

Follow the instructions at La tortue cynique - Export and backup your JRoller blog . Namely:
  1. Create a "backup template" called tetsuwan with content from the jroller_atom_feed.tpl included in the archive you can download from the La tortue's blog
  2. Run La tortue's backup java class with URL of this template, which must be in the form http://jroller.com/page//tetsuwan
    • Notice that the URL http://jroller.com//page/tetsuwan would also work but accessing older entries by appending the date would not
    • A single page displays at most the number of entries determined by the value of the configuration property "Number of entries to display on weblog" under Preferences / Settings, which can be set at most to 30. The downloader makes use of the fact that appending a date to a properly formatted URL returns entries not older than the date (such as http://jroller.com/page/holy/tetsuwan/20090121). See the Roller 3.1 user guide - section 3.2.3 - Finding old entries using the pages of your weblog - point 3: Using URLs.
    • Notice that URLs like http://jroller.com/page/holy/tetsuwan/20090121 are redirected to http://jroller.com/holy/page/tetsuwan?date=20090121 (anyway they work).
  3. [optional] Merge the downloaded jroller_bak*.xml files into one if the program hasn't done that (for me it failed with StringIndexOutOfBoundsException, perhaps because the last page accessed had no blogs to show)
    • You may but don't need to merge all your your posts and commens into a single file, it's possible to import them into WordPress in sequence - which is great if any one would cross the WP's import file size (currently 15MB, I believe)

2. Export from WP

Export your current posts, pages, comments etc. from WordPress - we will need the header and footer of the export file and it's also a good idea to have a backup :-)

Login, go to the Dashboard, expand the Tools sections and click Export.

3. Convert JRoller to a fragment of the WP format

I've created a Groovy (a Java-like scripting language) script that converts the posts and comments exported from JRoller into a fragment of the WordPress export format (WXR). This fragment will only contain a list of <item>s representing your posts with comments embedded as <wp:comment>s. You will then need to add a proper header and footer to turn it into a valid import file.

Among others, the script tries to fix problems with tags within <pre>...</pre>, namely it replaces <br> with a new line because this tag would be simply stripped by WP.

How to use it:
  1. Download Groovy 1.7.2 or higher, unpack it, run the Groovy console GUI (bin/groovyConsole[.bat]), paste there the script provided below
  2. Modify the configuration, namely  change inputFileUrl to point to your JRoller backup file, outputFilePath to where you want to store the output, and defaultAuthor to your WP user name
    • Note: The base url for is not important as it will be replaced with the target blog's URL.
  3. Run the script in the Groovy console (Script -> Run); it should log something into the output window and the output file should be created
The conversion  Groovy script (the code highlight isn't perfect, especially regarding multiline strings, you'd see it better in the Groovy console):

// CONFIGURATION SECTION ######################
final int basePostId = 100 // I belive this isn't importatn as WP will assign an ID it sees fit...
final String inputFileUrl = "file:///tmp/jroller_bak1.xml"
final String outputFilePath = "/tmp/wordpress_import-items_only-1.xml"
final String defaultAuthor = "theholyjava"
// /CONFIGURATION SECTION ######################

// vars: entry, postId, postBody, postName, category, postDate // NOTE: WP uses regular expressions to read the input, not a XML parser => it's essential to keep the proper format including spaces etc. def entryTplTxt = """ <item> <title>\${entry.title}</title> <link>/\${postDate.format("yyyy/MM/dd")}/\${postName}/</link> <pubDate>\${postDate.format("EEE, dd MMM yyyy HH:mm:ss")} +0000</pubDate> <dc:creator><![CDATA[${defaultAuthor}]]></dc:creator> <category><![CDATA[\${category}]]></category> <category domain="category" nicename="\${category.toLowerCase()}"><![CDATA[\${category}]]></category> <guid isPermaLink="false"></guid> <description></description> <content:encoded><![CDATA[\${postBody}]]></content:encoded> <excerpt:encoded><![CDATA[]]></excerpt:encoded> <wp:post_id>\$postId</wp:post_id> <wp:post_date>\${postDate.format("yyyy-MM-dd HH:mm:ss")}</wp:post_date> <wp:post_date_gmt>\${postDate.format("yyyy-MM-dd HH:mm:ss")}</wp:post_date_gmt> <wp:comment_status>open</wp:comment_status> <wp:ping_status>open</wp:ping_status> <wp:post_name>\${postName}</wp:post_name> <wp:status>publish</wp:status> <wp:post_parent>0</wp:post_parent> <wp:menu_order>0</wp:menu_order> <wp:post_type>post</wp:post_type> <wp:post_password></wp:post_password> <wp:is_sticky>0</wp:is_sticky> """ // close it with '</item>' after adding comments!

// vars: comment, commentId >= 1 def commentTplTxt = """ <wp:comment> <wp:comment_id>\$commentId</wp:comment_id> <wp:comment_author><![CDATA[\${comment.author.name}]]></wp:comment_author> <wp:comment_author_email>\${comment.author.email}</wp:comment_author_email> <wp:comment_author_url>\${comment.author.url}</wp:comment_author_url> <wp:comment_author_IP></wp:comment_author_IP> <wp:comment_date>\${postDate.format("yyyy-MM-dd HH:mm:ss")}</wp:comment_date> <wp:comment_date_gmt>\${postDate.format("yyyy-MM-dd HH:mm:ss")}</wp:comment_date_gmt> <wp:comment_content><![CDATA[\${comment.content}]]></wp:comment_content> <wp:comment_approved>1</wp:comment_approved> <wp:comment_type></wp:comment_type> <wp:comment_parent>0</wp:comment_parent> <wp:comment_user_id>0</wp:comment_user_id> </wp:comment> """

def engine = new groovy.text.SimpleTemplateEngine() def entryTpl = engine.createTemplate(entryTplTxt) def commentTpl = engine.createTemplate(commentTplTxt)

def blog = new XmlSlurper(false,false).parse(inputFileUrl) def output = new File(outputFilePath) output.createNewFile() //assert 30 == blog.entry.size() : "actual: ${blog.entry.size()}"

// turn a post title into a string that can be used in the post's URL private String makePostName(String title, int postId, Set postNameSet) {

def postName = java.net.URLEncoder.encode( title.replaceAll("\\s", "-") ,"UTF-8") .replaceAll("%..",""); postName = postName.substring(0,Math.min(34, postName.length())).toLowerCase()

// Ensure postName is unique: while (! postNameSet.add(postName)) { postName = postId + postName.substring(0, postName.length()-2) }

return postName }

// replace <br> and other formatting markup within <pre> segment with \n, ' ' etc.; // WP would drop <br> thus destroying the formatting private String fixMarkupWithinPre(final String postContent) { return postContent.replaceAll(/(?is)<\s*pre\s*>.*?<\s*\/\s*pre\s*>/, { preFrag -> return preFrag .replaceAll(/(?ius)<\s*br\s*\/?\s*>/, '\n') .replaceAll(/(?ius)&nbsp;/, ' ') .replaceAll(/(?ius)&quot;/, '"') }) }

def postId = basePostId def commentId def postNameSet = [] as Set def categories = [] as Set

blog.entry.each(){ it -> def postDate = Date.parse("yyyy-MM-dd'T'HH:mm:ss", it.issued.text()) // a comment? if(it.annotate.size() > 0) { output.append commentTpl.make([comment:it, commentId:(++commentId), postDate:postDate]).toString() } else { // Close the previous post: if (postId > basePostId) { output.append "</item>" } ++postId commentId = 0 // reset for the next post

def category = it.subject.text().replaceFirst("/","") categories << category output.append entryTpl.make([ entry:it, postId:postId, postDate:postDate , postName:makePostName(it.title.text(), postId, postNameSet) , postBody: fixMarkupWithinPre(it.content.text()) , category:category]) .toString() } } // Close the final post if (postId > 0) { output.append "</item>" }

println "The posts used the following categorie, which will be thus created in WP: $categories" "done; check $output"

4. Add proper header and footer to the generated WP import file

Open your WordPress export file and copy everything from the beginning till the first <item>, paste it at the beginning of the generated WP import file. Beware: Each <item> must start on a line of its own! (Avoid <atom:link .../><item> on the same line.) It will be st. like:

<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your blog. -->
... many lines skipped ...
	<atom:link rel="search" type="application/opensearchdescription+xml" href="/osd.xml" title="The Holy Java" />
	<atom:link rel='hub' href='/?pushpress=hub'/>

It's pretty possible that some/most parts of the header aren't necessary or will be replaced based on your blog, but I haven't experimented with that. Copy & paste all is safe.

Open your WordPress export file and copy everything following the last </item> till the end of the file to the end of the generated WP import file. It should be:

</channel> </rss>

Make really sure that each <item> or </item> tag is on the line of its own. WP doesn't use a XML parser to read the file but a couple of regular expression so white spaces and end of lines can make a big difference.

5. [optional] download images, perhaps upload them somewhere and modify URLs accordingly

You may want to download any images you used in JRoller, upload them smewhere else (WP, Picasaweb, Flickr, ...) and modify links in the generated XML accordingly. I haven't done that so you are on your own :-).

6. Import it into WP

Import in WP normally adds the imported posts, pages and comments to the existing one unless WP detects that you're importing a post that exists already, in which case it's either ignored or overriden - I'm not sure. How this detection works I do not know either, I've only find out it is not based on equality of the numerical IDs (wp:post_id). Perhaps it is based on the wp:post_name? Anyway, this makes it possible to import your posts in several batches without destroying what is already there.

Login, go to the Dashboard, expand the Tools sections and click Import, select Wordpress as the format, follow the instructions. It will allow you to create or map post authors (you will want to map the creator/defaultAuthor from the import file to yourself). WP.com will send you an email when finished (usually immediately), a standalone installation of WP would present you with some statistics of the imported items.

If you want to know more about the import process, download Wordpress and check the file /wp-admin/import/wordpress.php (make sure to get the version corresponding your Wordpress version). As mentioned already, WP doesn't use a XML parser but regular expressions to parse the file so be careful not to break something.

7. Check formatting, add tags...

You are done now. However I'd advice you to go through the imported posts, check that their formatting is OK (especially within <pre>), and perhaps add tags (they weren't exported from JRoller).

Known limitations

  • Aside of not importing images, I haven't dealt with any attachements.
  • This process has been applied successfully to WordPress.com in its version as of 5/2010 - I don't know which it is 9likely st. between 2.5 and 3). It also works with the standalone Wordpress in version 2.8.4.

Tags: groovy

Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob