Skip to content

PloneConf2010 – and there’s more

I’m here at PloneConf2010 too, and I’ve been to mostly different talks to Mitch, so here’s my write up too.  It’s taken a bit of time to find the time to write this up properly!

Calvin Hendryx-Parker: Enterprise Search in Plone using Solr

We’ve been using Solr for a few years here at Isotoma, but so far we’ve not integrated it with Plone.  Plone’s built-in Catalog is actually pretty good, however one thing it doesn’t do fantastically well is full-text search.  It is passable in English, but has very limited stemming support – which makes it terrible in other languages.

Calvin presented their experience of using Solr with Plone. They developed their own software to integrate the Plone catalog with Solr, instead of using collective.solr, which up till then was the canonical way of connecting them. Their new product alm.solrindex sounds significantly better than collective.solr.  Based on what I’ve heard here, you should definitely use alm.solrindex.

To summarise how this all hangs together, you need an instance of Solr installed somewhere that you can use.  You can deploy a solr specifically for each site, in which case you can deploy it through buildout.  Solr is Java, and runs inside various Java application servers.

You can also run a single Solr server for multiple Plone sites – in which case you partition the Solr database.

You then configure Solr, telling it how to index and parse the fields in your content. No configuration of this is required within Plone.  In particular you configure the indexes in Solr not in Plone.

Then install alm.solrindex in your plone site and delete all the indexes that you wish to use with Solr. alm.solrindex will create new indexes by inspecting Solr.

Then reindex your site, and you’re done!  It supports a lot of more complex use cases, but in this basic case you get top-end full text indexing at quite low cost.

Dylan Jay, PretaWeb: FunnelWeb

Funnelweb sounds invaluable if you want to convert an existing non-Plone site into a Plone site, with the minimum effort.

Funnelweb is a tool based on transmogrifier. Transmogrifier provides a “pipeline” concept for transforming content. Pipeline stages can be inserted into a pipeline, and these stages then have the ability to change the content in various ways.

Dylan wrote funnelweb to use transmogrifier and provide a harness for running it in a managed way over existing websites.  The goal is to create a new Plone site, using the content from existing websites.

Funnelweb uploads remotely to Plone over XML-RPC, which means none of transmogrifier needs to be installed in a Plone site, which is a significant advantage.  It is designed to be deployed using buildout, so a script will be provided in your build that executes the import.

A bunch of pipeline steps are provided to simplify the process of importing entire sites.  In particular funnelweb has a clustering algorithm that attempts to identify which parts of pages are content and which are templates.  This can be configured by providing xpath expressions to identify page sections, and then extract content from them for specific content fields.

It supports the concept of ordering and sorts, so that Ordered Folder types are created correctly.  It supports transmogrify.siteanalyser.attach to put attachments closer to pages and transmogrify.siteanalyser.defaultpage to detect index pages in collections and to make them folder indexes in the created sites.

Finally it supports relinking, so that pages get sane urls and all links to those pages are correctly referenced.

Richard Newbury: The State of Plone Caching

The existing caching solution for Plone 3 is CacheFu, which is now pretty long in the tooth.  I can remember being introduced to CacheFu by Geoff Davis at the Archipelago Sprint in 2006, where it was a huge improvement on the (virtually non-existent) support for HTTP caching in Plone.

It’s now looking pretty long in the tooth, and contains a bunch of design decisions that have proved problematic over time, particularly the heavy use of monkeypatching.

This talk was about the new canonical caching package for Plone, It was built by Richard Newbury, based on an architecture from the inimitable Martin Aspeli.

This package is already being used on high-volume sites with good results, and from what I saw here the architecture looks excellent.  It should be easy to configure for the general cases and allows sane extension of software to provide special-purpose caching configuration (which is quite a common requirement).

It provides a basic knob to control caching, where you can select strong, moderate or weak caching.

It can provide support for the two biggest issues in cache engineering: composite views (where a page contains content from multiple sources with different potential caching strategies) and split views (where one page can be seen by varying user groups who cannot be identified entirely from a tuple of URL and headers listed in Vary).

It provides support for nginx, apache, squid and varnish.  Richard recommends you do not use buildout recipes for Varnish, but I think our recipe isotoma.recipe.varnish would be OK, because it is sufficiently configuration.  We have yet to review the default config with though.

Richard recommended some tools as well:

  • funkload for load testing
  • browsermob for real browsers
  • HttpFox instead of LiveHttpHeaders
  • Firebug, natch
  • remember that hitting refresh and shift-refresh force caches to refresh.  Do not use them while testing!

Jens Klein: Plone is so semantic, isn’t it?

Jens introduced a project he’s been working on called Interactive Knowledge Stack (IKS), funded by the EU.  This project is to provide an open source Java component for Content Management Systems in Europe to help the adoption of Semantic concepts online.  The tool they have produced is called FISE. The name is pronounced like an aussie would say “phase” 😉

FISE provides a RESTful interface to allow a CMS to associate semantic statements with content.  This allows us to say, for example that item X is in Paris, and in addition we can state that Paris is in France.  We can now query for “content in France” and it will know that this content is in France.

The provide a generic Python interface to FISE which is usable from within Plone.  In addition it provides a special index type that integrates with the Plone Catalog to allow for updating the FISE triple store with the information found in the content.  It can provide triples based on hierarchical relationships found in the plone database (page X is-a-child-of folder Y).

Jens would like someone to integrate the Aloha editor into Plone, which would allow much easier control by editors of semantic statements made about the content they are editing.