Wednesday, May 9, 2012

Time to Dance

I’ll start by acknowledging the fact that it has been too long since I have updated this blog. I got bored and had no collaborators for the pyFidelity project, so I stopped.

I now have a new project to work on (this time with collaboration). I will not go into too much detail about the project but to say that it can be modeled as an exact cover problem; a set of constraints and a set of partial solutions. For any given constraint, all of the partial solutions are mutually exclusive. The best example of such a problem is a sudoku puzzle. I do not want to spend too much time explaining the problem, so I’ll move on.

I have been enjoying the Go programming language lately, so I want to see if the Dancing Links implementation of Knuth’s Algorithm X described best in his paper [gzipped postscript] can be done nicely in go. Not to avoid due diligence, I conducted a quick search of the go resources and mailing list as well as github repositories. I found only a sudoku solver which roughly implements DLX and several implementations in other languages. So, I started my own project.

As time goes on, I hope to keep this blog updated with the development of this project.

Tuesday, May 24, 2011

Beyond Many-To-Many

As I work through this project, I will be sharing some of what I think are interesting choices that I make. I have started building out the model classes as outlined in The Basic Schema. The SongCreator relationship was the first to be built.

In building these models, it became clear that the original thought was not quite clear. First, we have unneeded redundancy with the Creator and Person models, so we can bypass the Creator entities altogether. Now that we no longer have the one-to-one proxy between Song_Creator and Person, we can rename Song_Creator to the more terse, Creator. I hope I haven’t lost you here.

The Song, Person, and CreatorType entities were easy to define, so I won’t go into any detail there. The fun is in the Creator entity. There are a couple of things I want to take note of here:

  • There is a three dimensional many-to-many(-to-many) relationship here.
  • The primary key is a composite of the three foreign keys.
class Creator(Base):
    __tablename__ = "creator"

    primary = Column(Boolean)
    song_id = Column(Integer, ForeignKey('song.id'), nullable=False,
                     primary_key=True)
    creator_type_id = Column(Integer, ForeignKey('creator_type.id'),
                             nullable=False, primary_key=True)
    person_id = Column(Integer, ForeignKey('person.id'), nullable=False,
                       primary_key=True)

    song = relation('Song', backref=backref('creator', order_by=id))
    creator_type = relation('CreatorType',
                            backref=backref('creator', order_by=id))
    person = relation('Person', backref=backref('creator', order_by=id))

Now, I am in no way an expert database administration, so I have no idea if any of this is really the right way to model this, but it looks like it will work. Please feel free to take a look or share your thoughts.

Friday, May 20, 2011

The Basic Schema

With some of the tools chosen, it is time to create my database schema, or something like it. Below is my first guess at what the basic organization should be. Notice that Songs and Tracks are separate. This reflects the reality of Songs being intangible objects while Tracks are physical manifestations of those Songs. Tracks are the shadows on the wall of the cave (thank you, Plato).

First Approximation (fails some edge cases)

The relationships shown above are oversimplified, despite being about as much information as provided by most other methods for storing this kind of data. Let’s dig deeper into these relationships.

Song–Creator Relationship

Let’s first define a Creator as a person who is either a composer, lyricist, or both. The Song_Creator entity links Creators – who are not limited to being any one Creator_Type – to Songs. Additionally, Songs often have a primary Creator rather than equal contributers, so we also want to put this in our representation of that interaction. A final note on the Song–creator relationship: the diagram appears to show a many-to-many relationship between Song and Song_Creator when it is really a one-to-many relationship where Song has many Song_Creators.

Song–Track Relationship

I’ve already outlined why these exist as separate entities. The relationship here is many-to-many to account for not only the many different recordings of the same song but also to account for medleys.

Album–Track Relationship

It is already assumed that Albums can have multiple Tracks, but we also assert that Tracks can be released on multiple albums (ex. singles, compilations, etc.). Album_Tracks represent the relationship here with the addition of the track number. Album_Sides are pulled out here for the moment, but they could easily be rolled into the Album_Track entity as a property.

Performer–Track Relationship

This is the most intricate real-world relationship we model here. We relate to Tracks rather than Albums here because of the common cases of compilations and guest musicians. We model performers as both Musicians and Ensembles. Each Performance represents an individual contribution to the Track. This contribution also relates whether the Musician is a primary contributer and what, if any, Ensemble they are a part of on this Track. An Ensemble is considered primary if any of the Musicians in the Ensemble are primary.

Further iterations of this model may see the addition of roles to Performances, which would allow the merger of Musicians and Creators. The relationship between a Musician and a Track is so close to the relationship between a Creator and a Song that they could be easily merged.

I have done all I can think to account for edge cases. As I start building out tests and building out the models, I will need more edge cases to test against. Feel free to leave me comments with albums that you think might break my model.

Tuesday, May 17, 2011

The Right Tool for the Job (Part IV)

Database

This choice was really the first one I actually had to ruminate on. Once I wrote down my highest priority criteria for database engine selection, I was able to come to a clear conclusion. Let me take you through the process.

SQLAlchemy Support

Because I’m using Pylons, which uses SQLAlchemy by default, I wanted to be sure that whatever I use will work with SQLAlchemy. Here were my options with this criteria:

Open Source

I intend to release this project under an open source license, so the database I choose should be open source (it doesn’t have to be, but I want it to be). Let us look at the licenses for these databases:

DB2
Proprietary IBM product
Drizzle
Fork of MySQL – GPL
Firebird
Non-standard open source license
SQL Server
Microsoft
MySQL
GPL
Oracle
Propriety beast
PostgreSQL
Self titled
SQLite
Public domain
ASE
Proprietary
Levenshtein Distance

This is the one that makes the difference. I intend to go to great lengths to make sure that there is no unintentional duplication of data. The very heart of this application relies on the intricate relationships modeled in the database. In order to eliminate human error, the ability to do text search using Levenshtein distance as an aid to the user in entering data. There is some work going on to implement this for SQLite, and there are several UDFs available for MySQL. But PostgreSQL has supplied fuzzy string matching.

Based on my criteria, PostgreSQL will be my database of choice. MySQL would be a possibility as well, so long as the Levenshtein distance UDF is installed, but with the wealth of other features PostgreSQL brings to the table, and the looming Oracle monster, PostgreSQL is a clear winner.

Friday, May 13, 2011

The Right Tool for the Job (Part III)

Source Control

In part 1 I explored language choice; in part 2 I explored framework choice; today, I will briefly discuss source control. I will be using Git. The choice was clear. I don’t feel like going over all of the reasons why Git is better than other options. I prefer to let other people talk about it.

With that square, I have done the only logical thing to do: start a new project on GitHub. The project is nothing but an initial build of a Pylons project.

I have a confession to make. These last three posts have been a little bit deceptive. I didn’t really make my tool selections in the way that I’ve outlined. I knew these tools before I knew my project. In fact, I selected my project to fit these tools. I could do this because the problem of organizing my record collection is not important, and I am leaving myself open to change my tools as my needs change. Next post, I will actually be making a decision on my data store. Any suggestions?

Tuesday, May 10, 2011

The Right Tool for the Job (Part II)

In Part I, I spent time belittling all but the languages that make me happy (Python and Go). I didn’t explain exactly why I ultimately chose Python; in the end, the architecture/framework choice pushed me into the Python camp.

In my investigation of possible architectures for this simple project, I think that I have finally cataloged all of the possible architectures for a web application such as this one:

  • MVC
  • Spaghetti

Is it an oversimplification? Maybe. Does it really matter? Nope. Just about everything I could find with any structure at all was some variation of the same basic concept as described by Trygve Reenskaug.

The essential purpose of MVC is to bridge the gap between the human user's mental model and the digital model that exists in the computer.

Knowing that I will be using MVC rather than spaghetti, I can move on to the real question: how much of the work should the framework do?

Leave me alone, I’ll do it myself!

This could be a good option, but it will mostly be reinventing wheels (a noble, maligned art). It is probably a good exercise to do this on a small scale at some point, if only to gain a deeper understanding and appreciation for the strength of the existing libraries and frameworks. If I were to do this, I would probably choose a smaller project and use Go.

Do it for me?

The metaprogramming bug hits all of us at some point; of course, there is a huge amount of complexity involved in making this happen. The rise of frameworks like – the 800 lb gorilla of the web framework world – Rails and others embracing “convention over configuration” take care of the complex parts for you. Honestly, these do a great job… so long as all of the basic assumptions remain true. I plan on deviating from some of those assumptions, and while I know these frameworks do not preclude me from doing this, they often make it significantly more difficult to do.

I still have a library card!

Some of those really important components in the architecture have been built to stand alone. Slightly easier than rolling your own everything, libraries like SQLAlchemy, mako, and Routes (those of you who are clever already see where this is headed) take care of the heavy lifting, giving you the freedom to build your application exactly the way you envision it. The major downside to this approach is that changing libraries can be difficult when you have built everything around particular libraries.

Just a touch of awesome

There is one framework with the perfect balance of structure and flexibility: Pylons. Pylons is effectively a lightweight glue holding together libraries responsible for the real work. I have no problems with assumptions not applying to my project, and I can easily replace the libraries without having to completely rewrite. I spent a good bit of time exploring different frameworks, and Pylons (soon enough to become Pyramid) has minimal development overhead with maximum flexibility.

TL;DR

Pylons is the ideal framework for my project.

Sunday, May 1, 2011

The Right Tool For The Job

I’m a firm believer in the aphorism above. So, in that spirit, I will now expose the process I used in choosing the tools for my project. At the risk of offending language zealots, I will be perfectly blunt; I can’t stand some of these languages, and I only put them here to belittle them.

Part I: Language

Choosing a language is a pretty important part of this process. Here’s the rundown on my choices.

C

Yeah, I didn’t really consider this, but it is worth noting. C is one of my favorite languages of all time, but it seems ill suited to this problem. I would have too much overhead to get it off the ground.

C++

Just like C, this would just carry too much overhead.

Java

Honestly, I’m not a huge Java fan. It is a little too verbose for me. The ideas behind the language were nothing short of genius, but actually using it – especially in a case like this one – is just a pain.

C#

Ha.

PHP

I have quite a bit of experience with PHP, and I find it to be a decent language. That being said, I have found it to be a bit disorganized; the language has gotten a lot better over the years, but it still doesn’t feel like much more than a toy.

ColdFusion

This is what I work in every day at my job, and I can’t stand it (sorry, work friends, but you already knew I feel this way). ColdFusion makes Java look terse and PHP look clean and organized.

Haskell

This would be overkill on so many levels. I don’t think I’m quite nerdy enough to start building simple web applications in a CS trendy language like Haskell.

Ruby

Tempting, but having actual rockstar aspirations makes the stereotypes here all too real. Not to mention all the other hipster symptoms I’m too often diagnosed with. Maybe for another project with less inherent hipster vibe.

node.js (JavaScript)

I’ve heard good things about node.js, and I do enjoy JavaScript. JavaScript without DOM might be a fun adventure. I do enjoy functions as first-class objects and closures, but I don’t really need the event-driven I/O. Maybe another project.

Go

Go is just plain awesome. The only thing preventing me from using Go for this is that I could only choose one. If you haven’t had a chance to try out this language, do.

Python

Clean, terse, powerful, and stable… With a vast amount of resources available, and some of the most flexible, powerful frameworks, this was nearly a no-brainer. Plus, I just wanted to learn it.