skip page navigation Oregon State University

Peter Krenesky

Syndicate content
Developed in open source communities.
Updated: 35 min 40 sec ago

Pydra 0.5 Released

09/21/2009

This marks the first beta release of Pydra, a distributed computing framework for python.  While most core features are implemented, at least partially, there are many open tickets in the tracker for bugs and improved features.  While this is not a stable release we felt the project was far enough along that there should be an installable package out there for people to experiment with. The coming months will bring more stability and features as we transition to a scheduled release cycle.

Many thanks to Jakub Gustak, Yin Qiu, and Almir Karic who have contributed code vital to this release.

Major Features
  • Task API with:
    • Basic building blocks which can be combined to build complex, reusable Tasks.  Base classes include a basic task and containers for sequential work, parallel work, and MapReduce.
    • Datasource API for managing connections and input efficiently.
    • Forms for defining, processing, and validating data.
    • Task Packaging system for packaging tasks, supporting files, and libraries
  • Synchronization System to automatically version and deploying Tasks Packages to the cluster.
  • Scheduling System to track and run tasks.
  • Task History recorded for debugging and future profiling components.
  • Web Frontend for managing and running tasks.
  • Controller API for controlling Pydra from within other programs via REST.  This includes a client object to simplify dealing with Pydra’s strict authentication system.
  • Security for all connections within the cluster.
  • Module System for enabling or disabling features.
Feedback

We welcome and encourage feedback, whether your love Pydra or Hate it.  We want to shape this project to make it more useful for people, and for it to be easier to use.  If something doesn’t work well we want to know.  If you have a great idea that isn’t on our roadmap, we’ll determine out how to fit it in.

Download

Pydra 0.5

Categories: Planet OSL

Pydra Status Update July ‘09

07/20/2009

We’ve been working on Pydra for more about 6 months now and we’ve come a long way. So where is the project? Pydra is moving closer to a stable release, but there’s a lot to finish. We still expect to have something usable, by most people, by the end of the summer.

There are no releases yet but we’re getting ready to use it in production:

  • We’re close to deploying Pydra with the Protein Geometry Project. I’ve been working bugs out of Pydra while implementing their protein data import tool. The main issues are related to the protein parser rather than Pydra itself.
  • I’ve also deployed a small cluster of a few blades and other random desktop machines we had lying around. We’re throwing it at the Engine Yard Contest just to try out Pydra. The trial attempts have been good, but this was more about kicking the tires on Pydra than winning the contest.
You’re using it but, why are there no releases yet?

I’m a big fan of “release early, release often” but the project still needs have a certain degree of usability.  There’s still missing features that will dramatically affect how easy Pydra is to deploy and use. Sure you can get by without them but it may require a bit of hacking and manual setup to deal with. The type of person who can make use of Pydra right now isn’t phased by checking out directly from a git repository.

Whats being worked on right now?

We’re updating the project on a regular basis to get it ready for an initial release. Here are the highlights:

Module System

I’m refactoring the core of pydra to use a module system. Modules will be loosely coupled using the observer pattern. It will allow us greater flexibility with our components and make it easier to implement features such as replication. It will also allow us to make functionality modular or optional.

The module system is intended to allow long term growth of the project. It wasn’t in the original plan because the core isn’t overly complicated right now. I’m implementing modules now, to prevent a jumbled mess of code a year or more down the road.

Scheduler

The task scheduler handles assigning work to workers in the cluster. The new task scheduler will fix quite a few bugs in our current scheduler and make it easier to extend.

Datasources

We’re designing an abstract datasource API that will abstract interactions with storage systems. Datasources will simplify distribution of connection information along with your tasks. Datasources will reduce the amount of setup you have to do on a Node before you can run a task on it. They will also make it easier to slice your data into pieces that can be distributed amongst Pydra Nodes.

Want to know more, or better yet hack on Pydra?

I’ll be at OSCON all week.  Follow me on twitter, here.

Categories: Planet OSL