RSS

Monthly Archives: January 2012

REST semi-realtime transactions

The freelance pattern implemented with TornadoWeb and ZeroMQ.

I recently implemented one of the broker reliable patterns as described by the ZeroMQ guide. It’s something very similar to beanstalkd’s but left to the reader to implement. This in itself is not a bad thing but it is more code to design, write and test; and had you the budget to hire these guys directly you would get the best broker money could buy. But how reliable is this model. Really?

I’m not a big fan of the broker model. It’s a lot of extra code to write for the broker itself. It’s also a single point of failure. And then there is the error handling as the client and worker negotiation the status of a transaction only to renegotiate it when the broker fails. And then there are all those places where transactions can queue up and all that code that is written that does not need to be. (the crux of this article)

In a brokerless model each client connects to each server (many to many) and in a traditional socket implementation that would not be possible. But it is with ZMQ. (read the guide). So a user app can connect to more than one server at a time and the client will “fan-out” the send() to the next server.

ctx = zmq.Context()
socket = zmq.Socket(ctx, zmq.REQ) 
socket.setsockopt(zmq.HWM, 1)
socket.connect('http://127.0.0.1:5555')
socket.connect('http://127.0.0.1:5556')
socket.connect('http://127.0.0.1:5557')
. . .
socket.send('a message for you')
socket.send('a message for you')
socket.send('a message for you')

What is going to happen here is that this code is going to send one message each to each of the servers assuming that there is an actual connection. Because the socket defines multiple endpoints. And it’s all very orderly and as expected.

The documentation talks about only round robin-ing active connections… sadly a call to connect() without a bind is still considered a valid connection and so this port would still receive a transaction but not actually send it to the server. Meaning that some transactions are going to be delayed. Just how long depends on the restart time for the downed server.

So on the upside… when everything is running smoothly, the transactions are going to be distributed nicely. Each server will be given some work to perform. The workers are still standard userspace applications that do not need any special threading or processing. Just bind to a socket endpoint and wait for incoming work. Do the work and send a response.

When things go wrong or when you might restart a server manually, that endpoint address is still in the client side. Should a transaction be headed that way and the connection had not been reestablished then that message will block until that port instance reconnects. If the server is running via daemontools then it should restart any second. The transaction in the queue will be scooped up and procession will resume. The number of transactions queued per connection depends on the high water mark setting.

I say ’1′ transaction in the queue because we set the HWM (high water mark) when creating the connections. This is probably a good setting for realtime systems where losing transaction in an invisible queue is the least desirable event. You might also be able to add NOBLOCK on the send() function to get some other actionable events. It really depends on the applications tolerances.

At first I did not like the idea of losing the transaction(s) but I’m warming to the idea that the codebase will be smaller and possibly more reliable overall.

 
Leave a comment

Posted by on 2012/01/29 in architecture, Tools

 

Tags: , ,

‘take me off the list’

A couple years ago there was a “do not call list”. Well that did not work.  Every new and fly by night company ignores those laws anyway and by the time law enforcement or the attorney general investigates they are long gone. So what is a person to do, specially when they call several times a day, when the babies are sleeping or when my wife and I are sleeping?

Also, these new phone systems are pretty smart. Their voice recognition is really good and their artificial intelligence or workflow is even better.  I’ve been fooled 2 or 3 times already but I think I have the magic now.

Hmmm… I tried a few things…

  1. Talking and talking and talking…. it did not work, the machine has more patience than I do. – FAIL
  2. Every swear word I could think of… the machine just ignores me and asks another question like Eliza did. – FAIL
  3. Answering in the negative to every question… but either it would keep trying to sell me something or I would receive another call in a day or two. – FAIL
  4. Answering in the affirmative to every question… but that did not work either. It told me someone human was going to call me… but it kept asking questions; probably based on some law that required an electronic/audio signature. – FAIL

And that’s when I figured it out and it was all by accident. I happen to say ‘take me off the list’. That’s when the machine stopped talking and responded that it was going to take me off immediately.

Voila. So the next time I hear one of these stupid recordings that’s exactly what I’m going to say.

 

Tags: , ,

Proper use of a MQ designs

Bus, as a term referring to hardware/software components has been around long enough that many noobs have no idea of it’s origin or how to use it properly or when to use it.

In modern computing the bus has it’s origin in hardware. It was not always like that. Engineers went back and forth between direct connected components and bus architecture until the 1980s when IBM introduced BusMaster architecture in the PS/2. Things remained stagnant for a few years until there was yet another resurgence of direct connected hardware.

Memory I/O performance was increased by moving cache directly into the CPU, connecting the CPU to RAM via DMA type access.1

Disk I/O performance was increased when the disk controllers were allowed to talk directly to the systems RAM.

So it’s no wonder now that MQs are becoming easier to deploy that they are becoming the connective tissue between components rather than direct connections or API calls.

Old school programmers remember MQs like IBM’s very well when main memory was very expensive and it was cheaper to page services (SOA) in and out of main system memory when an actual event was there and ready for processing. The complete static application codebase could not fit in memory and implementing smaller services was practical.

MQs were not implemented or made popular on PCs until recently with the advent of J2EE, SOA, and the notion of an application bus. The idea being that services could be distributed across a single machine, a cluster of machines or a WAN cluster of machines.

One of my earlier recollections was when I worked for IBM and A/IX team on the A/IX microkernel.

Now we are experiencing a renaissance of a sort where software/system architects are installing MQs in every corner of their application design. I’m clueless as to why. I’m constantly wrestling with my own designs wondering where the value is and where the inevitable costs are. And one thing for sure is that there always tradeoffs. But here’s the thing and for me it’s starting to become a rule instead of a thing.

If it takes more time and/or more code to MQ an event [or the event contents is relatively large] to a service then the code that makes up that service should be statically linked rather than treated as a service because you are just generating heat and not money.

For example, in modern programming languages the first target application is typically hello world. And in most MQ implementations the first service is typically echo or add. Sadly, if the rule were implemented then these two services would never be written and we would have to think about more complex services to demonstrate the design (echo and add are so simple that they fool most managers).

In the simple case:

There is very little overhead here. The contract between the client and the add class/service is clear. The code is small enough, the data is small enough, that statically linking the code makes the most possible sense. Consider that the service that the service provides is but a few assembly instructions once you get past the instruction stack and memory fetches.

But if you’ve had a sip from the bug-juice then you see the world this way:

In this case the client thinks it’s calling the add function locally, and the add function thinks it’s being called locally, the fact is that may not be the case. The remote add stub and the MQ may be on the same machine or even CPU core in the same box but it might not be.

In fact the stub has a certain amount of overhead just because it exists. Then there is the communication overhead. (a remote connection across a WAN can take from 100 to 300ms on a good day), The marshaling of the data from one system to another has some overhead too. (back in the day of the original RPC there was a stub generator that would simply massage the endian-ness of the data so that it was cross platform; now things are much more complicated). The actual add() is only going to take one or two CPU clock cycles; which cannot be measured in ms. So for the sake of some grandiose style guide we have added huge amounts of overhead.

In a recent design I implemented an MQ/service-bus in order to handle the impedance mismatch between incoming connections using epoll, a single threads web server, and the need to be able to handle high transaction volumes.

In this case the service does a lot more than just add two numbers. The message from the end-user client needs to be parsed(1), some data needs to be decrypted(2), some initial decisioning needs to be performed(3), and then the transaction needs to be reformatted(4) and directed to a 3rd party for processing(4&5).  When the 3rd party completes the transaction, then the response needs to be parsed(6), a few more decisions made(7), and then a response is assembled for the end-user(8). And then the response is sent back to the client through the same path(9).

What makes this different than the add() service is that it’s performing real work in the form of the service. If I took all of the work-units that the service performed and split the service into the sub-parts then I could potentially have hundreds of services, each with 100-300ms of communication overhead. I’ve identified 9 possible steps in the transaction… and at 100ms per steps that’s almost a full second.

I’m trying to find a silver lining for a fine-grained SOA but I cannot. The work performed by the service is sync not async and therefore the product of the non-essential MQ is heat.

Two examples;

1) Google is working on GO, a new programming language. Rob Pike and Co have been describing Google’s code base. The code covers several programming languages and it is considered a monolith. GO is best used when it’s statically linked; I do not recall if it even supports dynamic libs.

2) Depending on your magnification the study of MQ starts with the hardware and some sub-components. Then  firmware, then the OS and it’s device drivers, and continues to build out. This looks a lot like a Mandelbrot image of sorts.

Our predecessors selected what was a service and what was a direct connect very carefully. Mostly based on ROI and the cost of gold. The same decisions are true today.

 

 
Leave a comment

Posted by on 2012/01/27 in architecture, web

 

Tags: , ,

Dynamic Languages and PCI-DSS

Some security experts, including myself, thought that implementing financial software using dynamic languages would create a security threat for the “company” or the account holder. However, as I sit here this morning contemplating an open source payment platform delivery system I realize that it’s a silly hypothesis.

Forgoing all of the traditional attack/fraud vectors I’m thinking about the code. The PCI-DSS covers the production hardware and database(s) but it also covers the developer’s computers, build machines, staging and QA, and the code repository. The “processor” is expected to treat the securely and equally. (this highest priority goes to the encryption keys and devices).

So if an attacker can get to any of these systems and inject code then you really have a bigger problem than whether the code was Python, perl or Ruby. Of course since Java can be executed anywhere then it can also be compiled anywhere. Reverse engineering Java and then recompiling is no more or less difficult. Of course injecting code into a Java or C based system is more complicated if the attacker is already in… regardless of the programming language, you’re cooked.

As I continue to consider my open source project it’s just a matter of selecting the right dynamic language for it. I want to be productive enough that the extra time can be spent on physical security instead of the false hopes of obfuscation.

 

Tags: ,

How to find programmers

Inc is running an article that finally makes sense of internet hiring. The leadership at Pulse, the company named in the article, told it’s programmers to start blogging. As a result they have started attracting attention from all corners. This certainly makes more sense than speed dating, code scraping, social ranking, etc… And of course it helps to have Inc do a story on your business.

First you are attracting people who are interested in the company and the work being done. Second you might be opening a dialog with the candidate before they are actually a candidate via blog comments. And finally, it likely is not going to cost you anything more than existing methods and it’s certainly less than professional recruiting services.

Pulse get’s a +1 from me.

 
1 Comment

Posted by on 2012/01/24 in for hire, management

 

Tags:

Hirelite – speed dating for jobs

Hirelite is another one of those last minute entries in the fly by night job site of the day websites. There is no doubt that the likes of Careerbuilder, Monster and TopJobs have lost their luster. But this is starting to look like a clown car at the circus. Just how many of these so called job search companies are there?

That was rhetorical. Don’t answer it.

While it’s true that there is a social aspect to the professional hiring process it’s certainly not akin to dating. The Bachelor is on TV right now and I do not see a resemblance to the hiring process there. When you date someone it’s usually because there is an intent on a level of permanency, in an employment situation you’re going in a different vector. In the 1980s it was commonplace to ask “where do you see yourself in 5 years”.

Anyway, as an employer I have a responsibility to find the best candidates though responsible means. As an employee I want to be hire by companies that aren’t trendy and show common sense. A 5 minute speed session is no way to find a mate and certainly not a valid way to find an employee.

 
Leave a comment

Posted by on 2012/01/24 in for hire

 

Hirewolf makes no excuses

Hirewolf is the latest in a series of employment service providers that promise to filter and test potential candidates in the hopes of getting a “golden ticket” to employment. (recently I wrote about GitHire)

What makes these guys different is that they make no excuses for the decision making process. They are going to decide the candidates fate by

“We will choose whichever project strikes us as most beneficial to the open source community.”

This is probably the most shameless description of the 1% solution that I’ve ever heard or read. Just in case I’ve never said it… I do not want to be scrutinized by any these services. These companies are not driven by the same rules that HR departments of proper companies which are prohibited from disclosing anything about your performance. All they can say is the dates of employment. And maybe your title.

So the fact that both of these companies are going to scrub your social identity and produce a score for the perspective employer… that you may or may not know… that even if you knew you might not have a chance to defend or refute the score.

Right now social aggregation does not have a passive place in the hiring process. The candidate must be actively involved… let the games begin.

 

Tags: ,

Domain Specific Framework

I’m grateful that Wikipedia does not have a reference to something called a Domain Specific Framework. So I get to define it here.

A Domain Specific Framework (DSF) is a set of hardware, languages, libraries, and best practices that make up a software development environment for a programmer(s) for implementing applications larger than “hello world” and of median complexity that does not require too much specialization or edge case libraries.

A perfect example of an ideal DSF might be Xcode for iOS application development. What makes it ideal is that it is self-contained and has most everything an iOS developer is going to need to implement, test and deploy an application for an iOS device. A second good example would be JEE (fka J2EE). (Grails, Rails and Django are good examples too)

Where I typically get derailed is when super heroes start with a base programming language, maybe they implemented version 1.0 of the application,  and then start stapling on libraries in order to develop a DSF for the team of developers that have been added to the team. These might be called a Custom Domain Specific Framework.

There is good reason for avoiding DSF style of application development if an application is going to have a fixed and manageable level of complexity. But when you know that the application is going to expand beyond that in a short, albeit relative, period then a DSF might be a better way to get started.

The justification for a DSF like JEE and Rails is huge. These and other DSFs are a basic common denominator for developers. It sets the bar for the conversation and for the development standards. To say that you know Ruby/RubyGems or Java/Maven and can read English is not enough to say that you know Rails or JEE development.

I cannot believe I’m about to suggest this but… .NET means something. So does Rails, JEE, Django, and so on. To say “we use java and a hundred other jars” will only add friction to your development lifecycle whether it’s hiring, training, building, testing or deploying. Hey I get it. Not all of the DSFs out there use best in breed libraries. That’s a completely different issue.

NOTE: The one important thing I forgot to mention is that when there is commercial potential for a DSF then it’s likely that there is going to be a commercial version… as there are several JEE vendors (IBM, Oracle). Glassfish is an open source late entry but it is certified and I do not remember who owns Jetty (probably RedHat). But the other thing is that there are plenty of books and good online documentation. You cannot say the same for a lot of custom DSF(s).

 

Tags: ,

Forced Pomodoro

I was in the middle of writing and testing my CRUD-fest article when it occurred to me that the evaluation versions of IntelliJ, PyCharm and RubyMine might actually be better than the paid-for version. Granted it’s not English to use the eval version and not pay for it but an American might split hairs a little while longer.

For the record I am preparing a purchase order but I might continue to use the eval version. That’s because the eval version pops up with a warning every 30 minutes. And every 30 minutes you have to restart the IDE. Now I’m not a fan of IDEs because they hide so much from you in terms of the full stack, however, they are pretty good when it comes to productivity when you want to concentrate on code and not framework. I’m also not a big fan of using Java or the JVM unless it’s packaged with the IDE; but that’s off topic.

The fact that I have to restart the IDE every 30 minutes means I have a better concept of time. I have a moment to catch my breath in order to focus on the next 30 minutes. The only thing I’m not certain of is whether it’s going to close my files before it turns itself off. All in all, even after I pay for it I might have to continue to use the eval version.

 

 
Leave a comment

Posted by on 2012/01/23 in agile, management

 

Tags:

CRUD-fest : grails, rails, django shootout

The mission is to deploy a CRUD implementation in all three frameworks by reverse engineering my schema from an existing Postgres Database which I will construct with raw SQL. Later I would like to add some data to the tables so let’s see how it handles some ETL (export transform load) in the form of a CSV file into some REST calls that I’d implement or some other type of messaging.

What I did not do! I think O’Reilly has the most comprehensive map of the history of all computer programming languages, however, GitHub has a list of languages that would seem to be current or relevant. Granted that some of this, to be effective, would mean investigating popular frameworks within the domain of languages. Well of that semi-complete list from GitHub I picked out this set: php, go, lua, haskell, erlang, scala, clojure, perl, javascript, rhino, nodejs, iOS, Objective-C, C++, C, Pascal, Pro-SQL, CoffeeScript, OCaml, Scheme, tcl, Smalltalk, Visual Basic. I think they are the most relevant. As it goes, however, either they do not reverse engineer schema from a PG (postgres) connection like most ORMs, or they do not have web or other application frameworks in order for a user to interact with the data, and many do not have IDEs or version managers they way that Ruby and Python do. (I cover the IDE topic later.) I think I picked the sweetspot of frameworks to test and skipped the ones that would distract me from the task.

Code Wars: PHP vs Ruby vs Python – Who Reigns Supreme [Infographic]

Let’s start with the schema design. The actual SQL is here. The “message” represents an ISO8583 message. You can follow the link to read more about the message. What’s important to know is that it represents the standard message used between certified credit card acquirers and their associations like Visa, MasterCard, Amex, Discover and many others.  It also represents the message used between the associations and their issuing processors. The message format is also reused by many POS terminal software vendors as well as gateway and technical acquirers/processors. Each association has slightly different implementations depending on their specific needs, however, most of the fields have common names and usage.

When building an endpoint or a gateway or any point in between there is a need to support ISO8583 and a bigger need to test the endpoint. As part of the testing phase it’s important to build a test harness that can generate the necessary test transactions. Depending the application’s position in the network it will have different testing needs. One thing for certain is that the messages and their contents need to be modeled, testable and repeatable. One huge challenge is TDD (test driven development) and another is simple regression testing. It’s my personal belief that a well defined toolset could be deployed in such a way that vendors across the board could contribute data a code for regression testing thus reducing the load on everyone. PS: while this mission is ISO8583 there is no reason why a request/response transaction could not be JSON, XML or S-exp; it’s a general enough schema.

  • TABLE: message_field_dictionary – this is a list of the field names and their formats. Nothing else.
    • FIELD: id (PK)
    • FIELD: field_id (FK)
    • FIELD: field_name
    • FIELD: field_description
    • FIELD: field_format (i.e.; ‘YYYYMMDD’)
    • FIELD: data_types (a, an, ans)
    • FIELD: default_values (i.e.; ’1′, ’2′, ’3′, ‘abc’…)
  • TABLE: message_request – this is the set of fields used in a particular request transaction. It is also possible that this represents the incoming request pattern. If the pattern does not match then that’s separate issue.
    • FIELD: id (PK)
    • FIELD: test_case_id (FK)
    • FIELD: field_id
    • FIELD: request_type
    • FIELD: field_value
  • TABLE: message_response – this is a set of fields in the response generated from the response. This might also represent a response message based on the incoming request pattern.
    • FIELD: id (PK)
    • FIELD: test_case_id (FK)
    • FIELD: field_id
    • FIELD: response_type (i.e.; absent, required, optional, conditional)
    • FIELD: field_value (i.e.; a value, regex, or a combination set, private function or other field_id)
  • TABLE: message_test_cases
    • FIELD: id (PK)
    • FIELD: test_case_id (FK)
    • FIELD: short_name (FK)
    • FIELD: description
    • FIELD: expected_results
    • FIELD: elapsed_ceiling
    • FIELD: is_active
    • FIELD: group_name
    • FIELD: sub_group_name
  • TABLE: message_test_results
    • FIELD: id (PK)
    • FIELD: test_case_id
    • FIELD: started
    • FIELD: finished
    • FIELD: elapsed_time
    • FIELD: results
    • FIELD: request
    • FIELD: response
    • FIELD: errors
    • FIELD: trace
  • TABLE: test_cards – I decided to put the test cards in a separate table because if contributors provided test transactions the actual card numbers and magstripes would be considered confidential data… and the associations do not want anyone recording that info with one possible exception.
    • FIELD: id – (PK)
    • FIELD: card_number (FK)
    • FIELD: serial_number (FK)
    • FIELD: expiration_date
    • FIELD: issue_date
    • FIELD: street
    • FIELD: zipcode
    • FIELD: pin
    • FIELD: atm_pin
    • FIELD: CVV
    • FIELD: CVV2
    • FIELD: track1
    • FIELD: track2
    • FIELD: track3
    • FIELD: reset_balance
    • FIELD: is_decrement
    • FIELD: actual_balance
    • FIELD: open_to_buy

The schema is self explanatory. I did not create any real indexes. I’m not certain (right now) whether I’m going to create any FKs or referential integrity. It depends on how much reverse engineering the different frameworks are going to execute. One thing for sure, this is not intended to be a lesson in DB design. Maybe another time. FKs might be required in order for the reverse engineering to work properly. Specially if I use tables to populate pulldowns and select lists.

The IDE I decided to use was RubyMine, PyCharm and IntelliJ. It is purely by coincidence that I decided to use this family of IDE from jetbrains. (for the record I’m currently using the demo version. I’m hoping that the licenses do not expire before I finish this article… this paragraph was written before coding began). What makes them interesting is that they support Django, Rails and Grails out of the box. After agonizing over it IntelliJ was the only reason why I included Grails. Java does not have a version manager like Ruby or Python, however, you can get there by collecting your jar files in a single folder alongside the JDK you’re using. And since many binary distributions of the JDK are in version folders it makes resetting the CLASSPATH and PATH easier… but still more manual than the ruby and python versions.

RubyMine heads up. When I originally installed RubyMine I had not installed RVM. I found a post from the folks at JetBrains about version 2.0.2 where they probably added RVM support. Anyway they said it just worked. That after a restart RubyMine would give you access to your tools. That was not the case. I had to take one extra step. I had to go into the preferences and navigate through the “Ruby and SDK” page. I also navigated through the gem sets for good measure. Now when I created my project I had access to my Rails version; previously unknown. I had a similar problem with PyCharm and it’s support for VirtualEnv but I will have to verify it with Django. (Shame on me. My desktop virtualenv did not have django installed. I will likely have to do a complete install based on my notes which were originally installed on my virtual machine and not my desktop).

Getting Started

Now that I’ve managed to crawl through the minutia of project preparation it’s time to start putting the project together. So Now that I have installed IntelliJ, PyCharm and RubyMine I have to create my project. I’m calling the project crud_fest_rb, crud_fest_py, and crud_fest_j.

Creating an empty rails and an empty django project was pretty simple. Specially after all the setup I’ve done in preparation. The one observation I’ll make is that there are a lot more artifacts in an empty rails project than there are in a similar django project. The Grails project has a lot more artifacts than that and since there is a compilation step it takes a lot longer to get started. One thing that bugs me about IntelliJ is that it starts the browser to a default page once it’s ready to run.

This project was not meant to be a JetBrains tutorial, however, I’ll mention a few more things. Rails started right away. Django required the user to enable the admin function, update the settings file to point to the proper DB, and then you had to manually execute the ‘manage.py syncdb’ command in order to create a default admin  user. This step will be required later in order to sync the db schema to the model. Rails and Grails are still under investigation.

One nice thing about the Rails and Django projects is that they respect SQLite3. Not that Grails goes out of it’s way to reject SQLite3 but the support is hard to come by. This means that when I put my SQL together it will need to support but SQLite3 and H2. Which will probably work but what a pain. I suppose I could use Postgres but there is nothing easier than a ‘cp’ command to reset the DB to it’s default. And if this project is successful then copying the output table to the target application means that that the SQLite DB file is now the config file.

The Schema

I have created the SQL. I was not going to embed the code directly but the Gist is here. It’s only six small tables and the foreign keys are few. There are a few constraints which should be removed when the constraints are fully represented in code instead of schema. (Keep in mind that when calculating performance things like O(log<n>) no longer makes sense when there are cascading reads based on constraints. And frankly it does not make any sense to have the constraint modeled in code and SQL at the same time.)

Import the Schema

… into the project is the next step.

Django

PyCharm could execute the following commands but currently it feels better to execute them manually from the command line.

The first step is making sure that the DBs are configured properly in the settings.py file. In a recent version of Django the developers made it possible and easy to support multiple and different databases simultaneously. That means I could connect to one DB for one set of actions and another DB for a different set. There are plenty of interesting use-cases here. So let’s configure the DB:

DATABASES = {
 'default': {
     'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
     'NAME': '/tmp/crud_fest.db', # Or path to database file if using sqlite3.
     'USER': '', # Not used with sqlite3.
     'PASSWORD': '', # Not used with sqlite3.
     'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
     'PORT': '', # Set to empty string for default. Not used with sqlite3.
 },
 'messages': {
     'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
     'NAME': '/tmp/crud_fest_py.db', # Or path to database file if using sqlite3.
     'USER': '', # Not used with sqlite3.
     'PASSWORD': '', # Not used with sqlite3.
     'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
     'PORT': '', # Set to empty string for default. Not used with sqlite3.
 }
}

(The indenting is not exact here but that’s a WordPress thing)

The second step is to make certain that the admin functionality that we previous enabled (and now stored in the default database: crud_fest.db) has been sync’d properly. (you’ll need to answer some questions about the admin user including the username and password.)

/Users/rbucker/git/flafreeit/crud_fest_py/manage.py syncdb

Now that the admin tables have been created, you’ll need to create the actual crud_fest_py tables.

cd ${HOME}/git/flafreeit/crud_fest_db
sqlite3 /tmp/crud_fest_py.db <./setup.sql

And then the last step is to dump or reverse engineer the table(s) into a models.py file.

/Users/rbucker/git/flafreeit/crud_fest_py/manage.py inspectdb > /Users/rbucker/git/flafreeit/crud_fest_py/test_config/models.py

Looking at the models.py file you’ll see something like (looks like I have some trimming to do; the admin tables were included):

# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
# * Rearrange models' order
# * Make sure each model has one field with primary_key=True
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin.py sqlcustom [appname]'
# into your database.
from django.db import models
class MessageFieldDictionary(models.Model):
    id = models.IntegerField(null=True, primary_key=True, blank=True)
    field_id = models.IntegerField(unique=True, null=True, blank=True)
    field_name = models.CharField(unique=True, max_length=25, blank=True)
    field_description = models.TextField(blank=True)
    field_format = models.CharField(max_length=200, blank=True)
    data_types = models.CharField(max_length=200, blank=True)
    default_values = models.CharField(max_length=200, blank=True)
    class Meta:
        db_table = u'message_field_dictionary'

This is just a sample of the tables that inspectdb generated… because there is one final step. Now that we have a models.py file with the individual schema we need to tell Django about the tables and the individual fields that need to be editable. There are some shortcuts; the online docs are really good. So we are going to create an admin.py file like this.

from django.contrib import admin
from myproject.myapp.models import MessageFieldDictionary

class MessageFieldDictionaryAdmin(admin.ModelAdmin):
    pass
admin.site.register(MessageFieldDictionary, MessageFieldDictionaryAdmin)

Once this last step is completed then you need to launch the django server and navigate to the admin site with your favorite browser. All of your tables should be there. You might still need to customize the widgets but this is the place where we stop.

One Final note. Sadly my decision to use SQLite means that I might have to do all of this all over again. It seems that the foreign keys have not been incorporated into the results of the ‘inspectdb’ command. There is a pragme in SQLite that enables FKs but it has to be compiled in beforehand. There was also at least one multi field constraint that does not appear.

But this is a good place to stop for now.

Rails

The first thing I noticed is that there is no code in Rails for reverse engineering a legacy database the way that Django does. So I had to install a missing gem.

gem install rmre

Then I had to reverse engineer my DB.

cd ${HOME}/git/flafreeit/crud_fest_rb
rmre -d /tmp/crud_fest_py.db -o ./app/models/

After the command completed I was returned to the command line. There were no error messages so I assume that it completed OK. I looked in the ./app/models/ directory and noticed that there were a handful of new files. These files were 1:1 with the table names. Here’s an example:

class MessageFieldDictionary < ActiveRecord::Base
    set_table_name 'message_field_dictionary'
end

This is a little hinky because none of the field names or types have been included. After doing some searching I found that this is OK and that ActiveRecord will fill in the blanks. I don’t know if I buy that. I like that Django fills in the holes and this sparse programming … *sigh*.

There is another command that is interesting:

rake db:schema:dump

This will dump the schema into a file db/schema.rb. It represents the complete schema. I suppose that this code could be copied to the model files. But for the moment this is not required… I think the db:migrate command will regenerate this db/schema.rb file when it completes.

Another caveat here is that it is possible to have multiple databases configured in the database.yml file. The difference, however, is that ActiveRecord needs to know which database goes with which definition. So there is some manual work to be done here. There are some simple google searches you can execute and most of them make perfect sense. It’s a little beyond the scope here even though I described the python version.

Grails

Grails supports multiple databases in it’s DataSource.groovy file. Since I’m working with Grails 2.0.0 there is a possibility that the latest Hibernate is included. Hibernate is the ORM that Grails uses to communicate with the DB. But the tool for reverse engineering needs to be installed.

cd ${HOME}/git/flafreeit/crud_fest_j
grails install-plugin db-reverse-engineer

The output was pretty simple.

rbucker@rmac[crud_fest_j]$ grails install-plugin db-reverse-engineer
| Plugin installed.

… but I have no idea which version was installed. So we move forward for the moment. The next step is to locate the h2 command line version.

java -cp ${HOME}/lib/grails-2.0.0/lib/com.h2database/h2/jars/h2-1.2.147.jar org.h2.tools.Shell

You’ll need to answer a few default questions. You can accept the default values for the moment.

Now that H2 is running and pointing to the same repository as the DataSource.groovy, now we need to run the SQL to create the database tables as we did previously. (I had to make some changes to the code because there are some differences with H2.

Now we try to do the reverse engineering…. actually, it’s not going to happen. I’m going to leave this up to the reader to complete. And if anyone wants to contribute, please, by all means. For the moment This is the end of the road for this project.

Conclusion

It is safe to say that I’m done with this project. While it seems plausible that I could reverse engineer a database using the Grails Plugin – the amount of configuration required just amazes me. Spring is heavily dependent on XML config files and it appears that Grails uses some of each. One thing for certain is that Java has these huge namespaces everywhere so just the slights config requirement for reverse engineering is so incredibly painful. I’m really surprised that the Grails guys did not do more Groovy scripting for this sort of thing.

Ruby/Rails on the other hand still required an outside GEM in order to perform the reverse engineering. I’m surprised that with Rails3 and 3.2 that they never addressed the issue directly. And not to mention that the resulting models were still sparsely emitted.

Finally, Django seems to have gotten it right. It emits the code in it’s entirety. Getting to the CRUD is a simple matter of some manual labor which the user could script easy enough. My vote is going to Python as the all around winner and Ruby a close second. The Java code is on hold, maybe we can call this a “did not finish”. The overall performance of the dependencies and the compile step make it a less valuable experience.And let’s not forget the JDK version madness.

So for the time being I’m inclined to purchase a license for PyCharm and RubyMine just because this is where I’m going to be spending my time for a little while… and it’s my money.

 
 

Tags: , , , , , ,

 
One Page Docs

Creating a library one page at a time.

One Page Bugs

Reducing the friction of writing and fixing bugs or features.

Follow

Get every new post delivered to your Inbox.

Join 223 other followers