Feed aggregator
Firefox on the Coliseum
This photo is not photoshopped:
The Mozilla Italia team projected a Firefox wordmark onto Rome’s most famous landmark — and on many other places all over the city. Make sure to check out the picture in its full glory over on flickr.
Picture CC by-sa licensed by nois3lab on flickr.
Code Cosmetics With Komodo and vim
The source code for the Mozilla Add-ons project tries to follow the PEAR Coding Standards. One of these standards is to use 4 spaces per level of indentation, and no tabs.
Over time, unfortunately, some files start to contain a significant amount of mixed-up indentation (both from badly set-up IDEs and third-party contributions that came with tab indentation but went un-noticed). That’s both tedious and error-prone to fix by hand.
A similar problem poses trailing whitespace. While it’s just annoying in general, especially in HTML template files, it also increases page size unnecessarily by leading to more bytes transmitted on the wire, with no benefit to neither the users nor the developers.
Luckily, there are two quick fixes for these problems in both the editor vim and my IDE of choice, Komodo:
To remove tabs and replace them with spaces…
- in Komodo, select a code block, then click Code -> Untabify Region.
- in vim, type :%s/\t/ /g (those are four spaces) — or, as oremj points out in the comments, you could just to :retab .
And to wipe out trailing whitespace…
- in Komodo, in Preferences -> Editor / Save Options, activate the option “Clean trailing whitespace and EOL markers”. Then open your document of choice and just save it again. However, when writing patches, you might want to refrain from keeping this option on at all times: It might result in confusion if a lot of lines are touched that do not have anything to do with the current patch. I wish there was a one-time way to run this, instead of a config option.
- in vim, type :%s/\s\+$//g which the regex-savvy among you have quickly decyphered as: “in the entire document, replace all one or more whitespace characters that are followed by a line ending with the empty string”.
Happy cleaning!
Shortcut to Internet
For work, I have a virtual machine serving one little purpose: To run Microsoft Internet Explorer 6, for the rare instances when I have to test a website with it. To make that a little more convenient and a little less painful, I just created a “desktop shortcut” for it, and look what it showed up as:

Haha, “Shortcut to Internet” — this is classic.
Federal Agency Co-Op CENDI Issues FAQ including Open Source Software
Vicki E. Allums, Associate General Counsel for Intellectual Property at the Defense Information Systems Agency (DISA) will join GOSCON this week to discuss a document released today by CENDI, a cooperative of the major Federal science, technology and information centers. The document titled “Frequently Asked Questions about Copyright and Computer Software: Issues Affecting the U.S. Government with Special Emphasis on Open Source Software”, its principal authors from Defense Information Systems Agency and the Defense Advanced Research Projects Agency, is expected to serve as a useful resource as federal agencies increasingly adopt the use of open source software in their IT architectures.
Allums will walk attendees thought the new document including:
- US Government policy guidance regarding use of Open Source Software (OSS)
- Issues unique to federal agencies distributing OSS
- OSS copyright licensing and contractual considerations for the US Government
- Advantages and Disadvantages to federal agencies using OSS as an alternative to proprietary technologies.
Hope you can join me at GOSCON this week and not miss these important milestones in the evolution of information technology within the US Government.
addthis_url = 'http%3A%2F%2Fwww.bryantsblog.com%2F%3Fp%3D839'; addthis_title = 'Federal+Agency+Co-Op+CENDI+Issues+FAQ+including+Open+Source+Software'; addthis_pub = '';Whitehouse.gov Migrates to Open Source Drupal
Yesterday the White House Office of New Media announced it had migrated its legacy system for whitehouse.gov to Drupal. Let’s be clear that this constitutes a change in plumbing – important plumbing – and not policy – but is a significant and of course highly visible sign that open source software has gone main stream.
Perhaps we can look forward to that day in the future when someone quips “No one ever got fired for implementing Drupal”.
This made yesterday a double red letter day for me and my colleagues in Oregon; an open source application was rolled out in mission-critical environment in government AND it was one the projects supported by Oregon State University’s Open Source Lab (OSL) where I work. Double the satisfaction. Although I don’t talk about it much here – where I focus on government adoption of open source – our core activity at the OSL is supporting many significant open source projects through hosting, administration and development as part of our charter to build community. We’ve been quietly at work since 2003 at this important endeavor.
There is no shortage of news coverage on this event so I won’t rehash here. I’m focused on the upcoming Government Open Source Conference – GOSCON DC, but wanted to share with you that Drupal Association will be present at the event next week – November 5 at the Ronald Regan Building & International Trade Center. On-line registration is still open, hope you will plan to join us if you’re in the area (or make the trip, we have colleagues as far away as Japan joining us for the day!)
addthis_url = 'http%3A%2F%2Fwww.bryantsblog.com%2F%3Fp%3D764'; addthis_title = 'Whitehouse.gov+Migrates+to+Open+Source+Drupal'; addthis_pub = '';Nominations Sought for Open Source Software use in Government
GOSCON AWARDS for Open Business Use in Government Nominations are now being accepted for the 2009 Excellence Awards for Open Source Business Use in Government.
It’s true. After five years of operation of the Government Open Source Conference, we’re looking forward to recognizing government employees who have made significant accomplishments in the application of Open Source Technology to meet government business or mission requirements.
To nominate a government employee or project, visit www.goscon.org/awards for information and an on-line submission form. Deadline is Friday Oct 23 2009.
addthis_url = 'http%3A%2F%2Fwww.bryantsblog.com%2F%3Fp%3D731'; addthis_title = 'Nominations+Sought+for+Open+Source+Software+use+in+Government'; addthis_pub = '';vcl_hash
I’ve been trying to figure out the effect of this line in my Varnish config:
sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }It seemed to make sense, but I was having a hard time wrapping my head around its ramifications. I was looking at some of the docs on the Varnish site and at this great Varnish config walkthrough when the metaphorical lightbulb went on. By adding the cookie to the hash it’s effectively creating a per-session cache.
Hmm. An interesting tradeoff. On one hand it’s filling up my available cache with duplicate copies of the same content because the hash identifying the cached content is cookie-specific. On the other, it is delivering content from cache that wouldn’t normally be cached because of the cookie.
Share this:
High performance Varnish/Pressflow/Drupal community of practice
At DrupalCamp PDX this weekend, I was fortunate enough to have some very interesting (if tantalizingly-brief) discussions with Josh Koenig (joshk), Sam Boyer (sdboyer) and Damien Tournoud around sharing configs and best practices for scaling Drupal sites, especially using Varnish and Pressflow. OK. We’ve talked about it. Now let’s do it!
I see three primary places we might start building on that seed:
- High Performance Drupal group
- Pressflow Launchpad (or its associated wiki)
- Project Mercury Launchpad
Those of you out there running Drupal in large scale environments, let’s start sharing configs and techniques so we can all do better.
Share this:
Pressflow, Varnish and Caching … oh my!
It all started with an itch. It was a really painful itch that involved a Drupal site that was essentially down due to load. I scratched it with the help of a few incredibly helpful blog posts I found, so now it’s my turn to add to them so someone else can benefit as well.
The Problem:
HALP! The site, it is sinking!
A large school district wanted to replace their existing outdated static web site with a modern CMS. They chose Drupal as their platform. The new site was successful.
Too successful.
The average traffic of 5 hits/sec jumped to over 100 hits/sec and the server went into a swap death spiral.
Fear not! Help is on the way in to form of a couple of technological superheroes …

Pressflow and Varnish to the rescue!
The mutually-complimentary combination of these two tools can vastly increase the number of users your site can serve. Here’s the what, why, and how:
Pressflow is “a derivative of Drupal core providing enhanced performance, scalability, and data integrity”. Basically, some really smart guys at Four Kitchens and elsewhere back-ported a bunch of Drupal 7 performance enhancements to the Drupal 6 (and even Drupal 5!) code base.
Why:The most expensive thing a web site can do is have to fire up the entire Apache/PHP stack, pull something from the database, and render it. It takes a lot of time, processor cycles, and memory to do it. It’s slow. It ties up sessions waiting for the query to return and render the response. Whenever you can, push static content - images, CSS, JS, static files, etc. - into some sort of cache. Preferably in memory, and preferably as far out on the “edge” (as close to the requesting client browser) as possible. If we can avoid pulling something from disk (or the database), absolutely do it! If we can avoid even touching Apache/PHP (and its associated overhead), do it. The Pressflow changes help make the output more cache-friendly so that more and more of the site’s content can live in and be served from cache. With it we can free up those web server sessions and resources for serving content that does need to be dynamic.
How:Pressflow adds the following features to Drupal.
- Support for database replication
- Support for Squid and Varnish reverse proxy caching
- Optimization for MySQL
- Optimization for PHP 5
All four are admirable additions that can help sites scale, but the second one is the primary reason I chose to bring the site up on Pressflow. It makes Drupal more cache-friendly, allowing us to store and serve more content from cache, speeding up the site and increasing the number of users who can be served.
My experience has been that Pressflow is also 100% compatible with Drupal core - I’ve switched sites back and forth between Pressflow and Drupal with no changes to the database or modules. Copy the /sites directory over and point the webroot at Pressflow, you’re done. Win!
So this is all well and good, but we need to actually have a cache in front of the server for this to do much good. Enter our second technology superhero:
Varnish What:Varnish is an HTTP accelerator and caching reverse proxy. Varnish is all about speed. It stores as much content as it can in the fastest place possible - RAM in this case - and bypasses the expensive process of making a request to Apache.
Why:Pressflow structures the Drupal content to be more cache-friendly, but we still need something to actually cache the content.
How:Varnish sits in front of Apache, accepts incoming connections from browsers and, if possible, fulfills the requests from its cache. If it can’t, it passes the request on to the underlying Apache/PHP stack. It then takes the response from Apache and forwards it on to the requesting browser. If the response from Apache is cacheable, Varnish stores it in RAM for fulfilling future requests.
Setting up PressflowInstalling Pressflow is just like installing Drupal. Grab the tarball from Four Kitchens, unzip it, do the usual Drupal setup. You’re probably going to want to make sure Cacherouter is installed and properly configured (including the config array added to settings.php). Point your Apache vhost at the Pressflow docroot. That’s it for Pressflow. It’s Drupal, really. Just tweaked.
Setting up Varnish- Download and compile Varnish
- Configure your Apache vhosts to listen on an alternate port (8080, for example)
- Start varnishd
Now the fun really begins. Varnish is amazingly-configurable. The VCL syntax is, even for a sysadmin-turned-developer like me, clear and relatively easy to understand. The default.vcl is well-commented. I learned a lot just reading through it. I’m going to post excerpts from my current Varnish config showing what I modified.
Please note - I am not a Drupal core hacker. I’m experimenting my way through this mostly through a “cut and try” methodology. This config info is based primarily on the work of others - Josh Koenig, Iskra/ekes and Narayan Newton. I don’t completely understand yet what a few of these configs do, exactly, other than they seem to make a positive difference in the hit rates on my systems and they didn’t seem to break anything. I’ll also include the configs I commented out because they did break something in the hopes that we can figure out why and what we should do to improve them.
By default, Varnish is set up to pass any requests with a cookie on to the backend (Apache) un-cached. We’re playing it safe and not risking sending an authenticated user outdated content by not sending cached content if we see a cookie. It’s a sane and conservative way of making sure that this is an anonymous user that can be given static content. Consequently, a lot of the customization is telling Varnish, “Even though there’s a cookie associated with this request you really can ignore the cookie and cache it” when the browser requests things like CSS files, JavaScript, theme images, or uploaded static files.
There are four basic places where I’ve added code: vcl_recv, vcl_hash, vcl_fetch and vcl_deliver. Below are the snipped (and very slightly-sanitized) sections from the config file I’m currently running in production (as of 10/10/2009).
vcl_recvvcl_recv is where we configure what happens when Varnish receives a request from a browser client for some content.
sub vcl_recv {… snip …
## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie, “(^|;\s*)(__[a-z]+|has_js)=[^;]*”, “”);
## Remove a “;” prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, “^;\s*”, “”);
## Remove empty cookies.
if (req.http.Cookie ~ “^\s*$”) {
unset req.http.Cookie;
}
This first segment clears out some cookies that are unnecessary. With the cookies set, Varnish won’t cache the associated content. So we tell Varnish to unset the cookie before continuing.
## Catch Drupal theme files - THIS BREAKS UPDATE.PHP#if (req.url ~ “^/sites/”) {
# unset req.http.Cookie;
#}
# Catch Drupal misc files (like drupal.js and jquery.js)
#if (req.url ~ “^/misc/”) {
# unset req.http.Cookie;
#}
When I first set up Pressflow and Varnish, I was looking at the cache hit/miss rate and noticed a lot of the CSS and JS files in /sites and /misc were not being cached. So I thought I’d be clever and tell Varnish that it really should cache these files by un-setting the cookies. For a while, it worked great. Hit rates were up, lots of stuff was now being cached. Then I need to roll out a security update and run update.php. With this config in place, update.php would either reject the attempt (because the admin user session cookie has been unset, reverting the user to an anonymous session) or, if $update_free_access is set to TRUE, causing an endless loop back to step one. I’m not quite sure what in /sites and /misc is the root of the problem (both seem to cause it), but I’ve disabled this until we can identify and work around it.
# Site still uses some static files out of /files, cache themif (req.url ~ “^/files/site.*”) {
unset req.http.Cookie;
}
# enable caching of theme files (can’t enable globally due to update.php problem above)
if (req.url ~ “^/sites/www.site.*”) {
unset req.http.Cookie;
}
Because of the update.php problem above, and because update.php uses Garland instead of the site theme, we can tell Varnish to cache the theme, module, and uploaded files here.
# Drupal js/css doesn’t need cookies, cache themif (req.url ~ “^/modules/.*\.(js|css)\?”) {
unset req.http.Cookie;
}
I noticed that we were also seeing a lot of misses on much of the core JS and CSS (like jquery.js), so we told Varnish to cache them.
## Moodle themes - disabled, seems to cause random problems#if (req.url ~ “^/(theme|pix)/”) {
# unset req.http.Cookie;
#}
We also run Moodle vhosts on this server. This was my first attempt at convincing Moodle to cache its images and theme files. It failed miserably. Moodle currently requires the session cookies be set on the files in question. Un-setting any of them forces the user to re-authenticate on every pageload. Hence the commenting.
## Pass cron jobs and server-statusif (req.url ~ “cron.php”) {
return (pass);
}
if (req.url ~ “.*/server-status$”) {
return (pass);
}
… snip …
}
Lastly, we don’t want server-status or cron cached, so tell Varnish to ignore it and pass them straight to the backend without further processing.
vcl_hashvcl_hash is where (I believe) Varnish looks at the hash for content it has cached to make sure it’s still good.
sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }This seems to help improve hit rates and non-intrusive, so I left it in.
vcl_fetchvcl_fetch is where Varnish makes a request to the backend (Apache) for content it can’t for various reasons deliver from cache.
vcl_delivervcl_deliver is where Varnish delivers the requested content back to the browser client - either from cache or from a backend request.
Backend configLastly, we were seeing too many 503 errors under load, so we increased the timeouts to 600 seconds. So far we haven’t seen any ill effects from the log timeouts.
backend default {.host = “127.0.0.1″;
.port = “8080″;
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
} /etc/conf.d/varnishd
We have made a couple of changes to the default varnishd startup options:
VARNISHD_OPTS=”-a *:80 \-T 127.0.0.1:8181 \
-f /etc/varnish/default.vcl \
-p thread_pools=4 \
-p thread_pool_max=1500 \
-p listen_depth=2048 \
-p lru_interval=1800 \
-h classic,169313 \
-p obj_workspace=4096 \
-p connect_timeout=600 \
-p max_restarts=6 \
-s malloc,2G”
Couple of notes here on this:
- connect_timeout=600 - we were seeing random 503 errors when the system was under load, even though there were Apache workers available. We extended the timeouts to 600 seconds, figuring it was better to have an individual user occasionally get an element that loads slowly or times out than many users across the site seeing an uninformative “503 guru meditation”. So far we’ve not seen any poor side effects.
- malloc,2G - we’re running the site with malloc storage (instead of the default file-based) at 2 gigabytes. The sites being served from behind this Varnish instance are quite a bit bigger than that, but because Varnish is running on the same box as Apache, we decided to throttle Varnish to leave resources for Apache/PHP. It’s likely this is suboptimal, but it works.
So I’ve installed Varnish and Pressflow. The site’s up and running. How do I tell if it’s doing any good?
With Pressflow and Varnish installed with the default configurations we saw an immediate drop in load and better performance, but I wanted to optimize it to cache as much as possible. Fortunately, Varnish comes with an excellent set of tools to see what it’s doing:
varnishtopThis command shows the most often-made requests to the backend:
varnishtop -b -i TxURLIt’s excellent for spotting often-requested items that are currently not being cached. The “-b” flag filters for requests made to the backend. “-i TxURL” filters for the request URL that triggered the request to the backend. Its output looks something like this:

Top of the list, most often-requested URL from the backend. A prime candidate for caching.
This command hows a histogram for the past 1000 requests, whether they were cache hits (denoted by a ‘|’) or misses (denoted by a ‘#’), and how long the requests took to process (further to the right, longer time). It’s good for a high-level view of how the server is doing under load.

This command displays all varnish traffic for a specific client. It’s helpful for seeing exactly what a particular page or request is doing. Set it to your workstation IP, load the page, see everything Varnish does with your connection including hit/miss/pass status. Varnishlog is really useful, but it puts out an overwhelmingly-large amount of data that isn’t easily filtered. The “-o” option groups all of the entries for a specific request together (without it all entries from all requests are displayed fifo) and it accepts a tag (”ReqStart” in this example) and regex (the IP address in this case) to filter for only requests associated with that tag & regex. It’s the only way I’ve found to filter down the firehose of log entries into something useful.

This command provides an overview of the stats for the current Varnish instance. It shows hit/miss/pass rates and ratios, lots of other gory internal details.

Varnish can eat RAM like there’s no tomorrow. Be careful and be sure to configure its max memory to be something less than your available RAM. I forgot when I first set things up. The system worked great for a while, and then took a nosedive as the Varnish cache ate up all the available RAM and pushed the system into a swap death spiral.
It’s OK to not cache everythingThis is a concept I struggled with at first - “oh no! It’s not caching xyz! I must fix that!” Remember that even if you can’t cache all the static content on your site, you’re still doing a lot of good offloading the most commonly-accessed content onto Varnish. Every connection you can serve from Varnish frees up an Apache thread to do something else.
The ResolutionI am delighted to report that the site is currently serving more than 30,000 hits per day without any trouble. We’ve seen traffic exceed 150 simultaneous clients without pushing the system into swap, nor is it seeing significant iowait. As best we can tell, at peak traffic times it’s entirely processor-bound with all four cores running at 95% or higher servicing apache threads.
OK, now what?Moving forward, how do we do better? There were some great discussions at DrupalCamp PDX this weekend about how we can do more with Varnish and Drupal/Pressflow. Some interesting ideas that came up:
- Edge Side Includes (ESI), and making Drupal aware of them. Specifically interesting in relation to pushing the internal Drupal panels and block caches into an ESI store.
- A Varnish module within Drupal that hooks into the internal Drupal cache functions and can talk to the Varnish server on its management port to tell it to purge or extend the lifetime of content.
- Preconfigured Amazon EC2 images to quickly scale sites … oh wait, there’s Project Mercury!
- Intelligent ways to front multiple Drupal webnodes with one (or a clustered pair) of Varnish nodes.
I’m up for taking a whack at it. Any Drupal core wizards out there interested?
Share this:
Notes on the Oregon State University Open Source Lab
I wrote up a summary of current goings on at OSUOSL including our newly expanded datacenter, new projects that we're hosting and more. It's being hosted over at Google's Open Source Blog. The article is available here http://google-opensource.blogspot.com/2009/09/notes-from-oregon-state-un...
Pydra 0.5 Released
This marks the first beta release of Pydra, a distributed computing framework for python. While most core features are implemented, at least partially, there are many open tickets in the tracker for bugs and improved features. While this is not a stable release we felt the project was far enough along that there should be an installable package out there for people to experiment with. The coming months will bring more stability and features as we transition to a scheduled release cycle.
Many thanks to Jakub Gustak, Yin Qiu, and Almir Karic who have contributed code vital to this release.
Major Features- Task API with:
- Basic building blocks which can be combined to build complex, reusable Tasks. Base classes include a basic task and containers for sequential work, parallel work, and MapReduce.
- Datasource API for managing connections and input efficiently.
- Forms for defining, processing, and validating data.
- Task Packaging system for packaging tasks, supporting files, and libraries
- Synchronization System to automatically version and deploying Tasks Packages to the cluster.
- Scheduling System to track and run tasks.
- Task History recorded for debugging and future profiling components.
- Web Frontend for managing and running tasks.
- Controller API for controlling Pydra from within other programs via REST. This includes a client object to simplify dealing with Pydra’s strict authentication system.
- Security for all connections within the cluster.
- Module System for enabling or disabling features.
We welcome and encourage feedback, whether your love Pydra or Hate it. We want to shape this project to make it more useful for people, and for it to be easier to use. If something doesn’t work well we want to know. If you have a great idea that isn’t on our roadmap, we’ll determine out how to fit it in.
DownloadAdding Empty Directories to git-svn
Just a reminder, because I always forget it: When you use git-svn on an svn repository and your code base contains empty directories (say, for temporary files, or log files), they will be ignored by git unless they contain at least one file.
Paradox? Maybe. There’s a good reason however: git ignores empty directories because it tracks (file) content, not a bunch of directories some of which happen to contain a file (the concept of tracking files might be the only thing git has remotely in common with good ol’ CVS — though git also does not deeply care about file names, only content).
The “common” way to handle this is by adding a .gitignore file to the repository. This won’t harm svn-only clients, but it’ll make git-svn clients pick up the (almost) empty directory properly.
This is what you need to do.
mkdir empty_dir
echo '*' > empty_dir/.gitignore
echo '!.gitignore' >> empty_dir/.gitignore
git add empty_dir
git commit -m 'adding empty directory' empty_dir
The .gitignore file tells git what file names not to track inside the directory in question. The asterisk means, ignore all files, but the second line makes sure the .gitignore file itself is recognized and added to the repository.
Open Source for America Unvieled at OSCON
Not too long ago I attended TransparencyCamp in DC and led a discussion on how to work with government. The session was intended for technologists and advocates. I was pleasantly
surprised to find one of the participants was a senate staffer. After listening to much of the discussion she explained that she had worked on a bill that included a
role for open source software which eventually failed to move forward. Her question to me: why was that no one from the open source community stepped forward or offered to help answer questions. Where were they? Her question gave me pause.
Several moths later, the cavalry has arrived. Along with the town crier, the
librarian, the community manager, the mayor and a cadre of plumbers. The newly formed coalition is “Open Source for America”, and I’m pleased to have bee
n a par
t of its founding effort. Read more at the association web site.
addthis_url = 'http%3A%2F%2Fwww.bryantsblog.com%2F%3Fp%3D701'; addthis_title = 'Open+Source+for+America+Unvieled+at+OSCON'; addthis_pub = '';Pydra Status Update July ‘09
We’ve been working on Pydra for more about 6 months now and we’ve come a long way. So where is the project? Pydra is moving closer to a stable release, but there’s a lot to finish. We still expect to have something usable, by most people, by the end of the summer.
There are no releases yet but we’re getting ready to use it in production:
- We’re close to deploying Pydra with the Protein Geometry Project. I’ve been working bugs out of Pydra while implementing their protein data import tool. The main issues are related to the protein parser rather than Pydra itself.
- I’ve also deployed a small cluster of a few blades and other random desktop machines we had lying around. We’re throwing it at the Engine Yard Contest just to try out Pydra. The trial attempts have been good, but this was more about kicking the tires on Pydra than winning the contest.
I’m a big fan of “release early, release often” but the project still needs have a certain degree of usability. There’s still missing features that will dramatically affect how easy Pydra is to deploy and use. Sure you can get by without them but it may require a bit of hacking and manual setup to deal with. The type of person who can make use of Pydra right now isn’t phased by checking out directly from a git repository.
Whats being worked on right now?We’re updating the project on a regular basis to get it ready for an initial release. Here are the highlights:
Module SystemI’m refactoring the core of pydra to use a module system. Modules will be loosely coupled using the observer pattern. It will allow us greater flexibility with our components and make it easier to implement features such as replication. It will also allow us to make functionality modular or optional.
The module system is intended to allow long term growth of the project. It wasn’t in the original plan because the core isn’t overly complicated right now. I’m implementing modules now, to prevent a jumbled mess of code a year or more down the road.
SchedulerThe task scheduler handles assigning work to workers in the cluster. The new task scheduler will fix quite a few bugs in our current scheduler and make it easier to extend.
DatasourcesWe’re designing an abstract datasource API that will abstract interactions with storage systems. Datasources will simplify distribution of connection information along with your tasks. Datasources will reduce the amount of setup you have to do on a Node before you can run a task on it. They will also make it easier to slice your data into pieces that can be distributed amongst Pydra Nodes.
Want to know more, or better yet hack on Pydra?I’ll be at OSCON all week. Follow me on twitter, here.
At OSCON: Why the Government is Turning to Open Source
I ‘m excited about speaking at O’Reilly’s OSCON next week, and I’ll be joined by fantastic panelists. My session is 4:30pm Wednesday, July 22.
Cloudkick takes “Best in Show” at Under the Radar
Last Friday, we presented at the Under the Radar conference. Things went really well, and Cloudkick ended up taking best in show (people’s choice)! Techcrunch did a nice article on our progress. Exciting times!



