Tags: , ,

We decided to move from Atlassian's Confluence wiki software to MediaWiki, in the hopes that a more familiar wiki system will encourage participation. To start, I exported from Confluence, got a zip file with entities.xml in it. Attached is a script to create a text file per page.

#!/usr/bin/env python

from cElementTree import iterparse
from cStringIO import StringIO
import codecs

for event, elem in iterparse(file("entities.xml")):
    if elem.tag == "object" and elem.get('class') == 'Page':
        save = True
        title = None
        content = None
        children = elem.getchildren()
        id = elem.find('id')
        for child in children:
            if child.tag == "property" and child.get('name') == "title":
                title = child.text

            if child.tag == "property" and child.get('name') == "content":
                content = child.text

            if child.tag == "property" and child.get('name') == "originalVersion":
                save = False
                orig_id = child.getchildren()[0]

        if not save:
            continue

        print "Will save page with title '%s'" % (title,)

        if not content:
            print "... but has no contents"
            continue

        f = codecs.open('pages/%s' % (title,), 'w', 'utf-8')
        f.write(content)

5 Responses

  1. wjvJuly 20, 2006 at 10:20 AM.

    Oh MediaWiki, how I love thee. (Note: The above remark may contain trace quantities of sarcasm.)
  2. D-ArbJuly 20, 2006 at 10:57 AM.

    How come you did not opt for something pythonic like moin? media wiki is nice for some setups, but not everything...
  3. Neil Blakey-MilnerJuly 21, 2006 at 11:07 AM.

    MediaWiki was chosen over Moin in the hopes of user familiarity, and in theory because of more plugins for things like anti-spam measures, code highlighting, embedding RSS, and so forth.
  4. Falko RichterFebruary 29, 2008 at 06:19 PM.

    in python 2.5 you need

    from xml.etree.cElementTree import
    iterparse
    instead of

    from cElementTree import iterparse


    see effbot.org [effbot.org] for more information.
  5. Ard RighJuly 21, 2008 at 02:45 AM.

    Using your script above, if there is no content, files are not created?

    Does that mean the export from Confluence is incomplete or corrupted somehow?

    I am trying to convert our work content across to MediaWiki. I just need to be able to get the exports (have both XML and HTML) to work in MediaWiki :\

Have your say

The text area above accepts Post Markup, a BBCode work-alike.

[b]foo[/b]: foo
[i]foo[/i]: foo
[link]http://nxsy.org/[/link]: http://nxsy.org/ [nxsy.org]
[link http://nxsy.org/]Neil[/link]: Neil [nxsy.org]
        

You can also use:

[code python]
import foo
[/code]