RSS

Tag Archives: python

N-way file merge Perl, Python, Go and Lua [Java, Ruby]- Compared

[Update 2012-06-22] Here is the java version of this assignment. It is/was awful. Once you stray from OO, in java, the code inflates like a sea monkey and clearly OO is over the top overkill.

[Update 2012-06-22] Here is the Ruby version of this assignment. I like it’s compactness although that came at a steep price as accessing hash elements meant clunky dereferencing and string comparisons were just awful. [That was an error on my part; works as you'd expect]

Not to beat a dead horse but I now have 4 example implementations in Perl, Python, Go, and Lua.

I did my complaining about Lua in a previous article, however, in summary here… this example in Lua is verbose and lacks consistency. I’m not expecting to reduce this to a single LOC (line of code) but I would have liked some additional APIs that would have implemented more efficient algorithms based on internals knowledge. Or at least well documented idioms.

The Go example was fun because the version 1.x of the toolset was simple to use. I would regularly execute “go run merge_tick_data_hash.go file1.csv file2.csv” and it would run like a champ. The only challenge is/was that simple errors that most dynamic languages permit until the code would actually execute would cause the compiler to barf. And initially I had no idea they were compiler errors; but it was easy enough to get used too. The compiled version of the code was lightening fast to startup and execute even though it was 1.4M bytes in size.

The Perl version took some doing. I was able to reduce the LOC and optimize the code quite a bit. I think there is still some room for improvement based on the Python implementation which was just a few lines smaller because it had the benefit of being written last. In this case I sacrificed adding the filename to the %ticks hash and that reduced a few LOC but added some de-referencing which “might” be optimized by a good JIT; cannot say for certain.

I’d like to compare these implementations to a Ruby version but I’m just not a fan of RVM this week. As for the remaining candidates. I have to admit that I really liked the Go version, however, I do have a complaint that while “Go” seemed like a good name for the project when it started. (prefix for google) right now it’s hard to do google searches. GO is such a small and common word that there is no way to optimize searches. One strong advantage is the static linking once the project is compiled.

 
Leave a comment

Posted by on 2012/06/21 in ProgLang

 

Tags: , , , , ,

Compare table length in Perl and Lua

In the beginning there Perl; and in Perl was the array and the hash. Later there was Lua; and in Lua there is the table. The table in Lua seems to be both an array and hash, where they are both implemented as a hash. But let’s start with Perl.

In Perl (don’t panic there are idioms that make this less straight forward):

# length of an array
my @a = (4,5,6);
print scalar @a;   #-- prints 3

# length of a hash
my %h = (four=>4, five=>5, six=>6);
print scalar keys %h;    #-- prints 3

In the array example I’m thinking that the number of elements in the array is actually stored in the array and that the function of converting the array to a scalar causes Perl to return the stored value. I really hate the idea that Perl might be counting every element in the array every time this was performed… and if it was then that might justify storing the length locally for the different uses although this get’s funky when multi-threading.

As for the hash version.  The keys() method exports the keys held in the hash as an array which is then converted to a scalar as in the array example.

In Lua (go ahead and panic as the code is ugly):

-- length of an array
local a = { 4, 5, 6}
count = 0
for _ in pairs(a) do count = count + 1 end
print(count)
print(#a)

-- length of a hash
local h = { four = 4, five = 5, six = 6}
count = 0
for _ in pairs(h) do count = count + 1 end
print(count)
print(#h)

In the array example the values 4, 5, 6 are assigned to index’ 1, 2, 3 respectfully. When counting the pairs() we are evaluating each entry in the table just like the perl hash function keys(). However, when we execute #a, unlike the perl array functionality, Lua starts initialized an index counter at ’1′ and increments it so long as the value is not nil. So in the case of the array these two methods are synonyms.

In the hash example the count method works exactly the same, however,  the #h fails completely.

Clearly the language designers did not think this was a big deal. Whether it was because counting the pairs is truly efficient or there is another more efficient way or the reader is actually expected to fill in the gaps with more robust code. After all Lua is not made up of too much code anyway.

I’m not certain that I really care but it is frustrating. Specially when modern languages like Python and Ruby seem to have done away with particular issue. And as a side comment we should do away with idiomatic programming.

 
Leave a comment

Posted by on 2012/06/20 in architecture, ProgLang

 

Tags: , , ,

Monte Hall Paradox

[Updated 2012-06-04] I added a Perl version of the same function.

I was recently taking part in a technical interview when the interviewer provided me with the Monte Hall. This is nothing like The Full Monty. It is something completely different. I started to solve the problem using a random DSL of my choosing… which turned out to be something akin to pseudocode or maybe more like detailed comments that one might write for assembler code ’cause most comments are getting a bad rap these days.

My intuition told me that if we started with 3 doors, 1 car and 2 goats… that once Monte removed one of the goat/doors that if I did not change my guess that my odds were going to change from 33% to 50% regardless. Meaning that whether or not I did anything I was going to have the same odds. (spoiler alert: I was wrong)

So on my flight home I decided to solve the problem or at least simulate it. (the solution is over my head)

#!/usr/bin/env python
import random</pre>
def monte_hall(change):
    car = random.randint(1,3)
    guess = random.randint(1,3)
    if change:
        return guess != car
    return guess == car

def monte_run(change, trials):
    wins = reduce(lambda acc, trial: acc+monte_hall(change), xrange(1, trials))
    print "trials (%d), wins (%d) ratio (%f)" % (trials, wins, (wins / (trials*1.0)))

if __name__ == "__main__":
    trials = 10000
    monte_run(False, trials)
    monte_run(True, trials)

The code has a few shortcuts in it but those were only implemented after the brute force version implemented it concisely. There may even be a way to reduce the code in the monte_hall() function but it’s small enough for me now. Using the reduce() and lambda were the fun parts thanks to erlang.

As a result of the simulation here are a few things I learned:

(1) if you do not change your selection after Monte discards one goat/door then your odds of winning are 33%

(2) if you ALWAYS change your selection after Monte discards one goat/door then your odds of winning are 66%

(3) and while I did not actually focus on this, but if you randomly (50:50) changed then for some strange reason your chances of winning were 50%. This choice has a few problems. Mainly that there aren’t enough trials on the actual show to give you a chance to win. So choosing (2) would always seem to be the best strategy.

I decided to rewrite the code in Perl:

#!/usr/bin/env perl
use List::Util qw(reduce);</pre>
sub monte_hall($) {
  my ($change) = @_;
  my $car = int(rand(3));
  my $guess = int(rand(3));
  return $guess != $car if $change;
  return $guess == $car;
}

sub monte_run($$) {
  my ($change, $trials) = @_;
  my $wins = reduce { $a + monte_hall($change) } 0, 1 .. $trials;
  printf("trials (%d), wins (%d) ratio (%f)\n", $trials, $wins, ($wins / ($trials*1.0)));
}

my $trials = 10000;
monte_run(0, $trials);
monte_run(1, $trials);
 
Leave a comment

Posted by on 2012/06/01 in for hire

 

Tags: ,

Python PEP-405 – virtualenv – like

PEP-405 is recommendation to include some virtualenv-like like functionality into the python stdlib. I suppose this idea might actually fly if python were driven from a single PYTHON_HOME or PYTHON_PATH env variable and for the most part it seems that PEP-405 suggests that potential.

It should be noted that this PEP was also endorsed by Ian Bikling the inventor of the proper virtualenv. –PEP-405

There is some discussion about backward compatibility but it is sort of vague and very mystical in a hand waving sort of way. One reason it might actually work well is that one application like the current virtualenv toolkit does not have to carry around all of the version info needed to work in each python version.

But let’s be clear. PEP-405 is not virtualenv. It is virtualenv-like. It is also approved for deployment in version python 3.3 and I do not see anything about back porting.

Virtualenv is a killer feature. If they miss the mark and abandon all that came before I hope that someone picks up the slack.

 
Leave a comment

Posted by on 2012/05/28 in architecture, Tools

 

Tags: , , , ,

Another killer app for Perl

I’ve written about perldoc and CPAN as being Perl’s killer apps. I’ve also written about Ruby’s RVM and Python’s Virtualenv. Now I get to write about Perl’s perlbrew.

I’ve been tweeting(@rbucker) with a couple of techies today s a result of a comment that one of the made. Something to the effect that virtualenv was going to be made a core python app. Suggesting that it was going to be rolled into the distro.

If you’ve been around a while and you have a little intuition… it should be going off at this very moment. I’m not going to go into the high level discussion that I had with these guys nor am I going to go into the micro details. What I will say, in summary, is that this is a very bad idea and as a result virtualenv should become very unstable as a result.

Which got me thinking about Ruby and Perl. On the one hand I know that Ruby has RVM but is there something for Perl? Yep! As I write this article I have installed perlbrew and I’m installing Perl 5.16.0 at this very moment.

I do not know anything about perlbrew at this point other than it seems to be installing Perl properly and in userspace where I want it. If all goes well and I have the required prerequisites all should be well in the next little while. I really like Perl and Python. The idea of dumping Python feel like jumping the shark. Perl-6 and Python-3 feel unnatural at the moment. I’m just hopeful that virtualenv and perlbrew can keep my world glued together until the rest shakes out.

 
 

Tags: , , , , ,

Java: everything should be public

If not everything then at least all of the methods and classes.

I wish I new the history of this decision and more importantly what is keeping this artifact of the language in place. I suppose from a historical perspective it has not really caused any trouble. The language designers had some ideas that were rooted in commercial software and commercial software libraries. I’m remembering various commercial JDBC drivers, crypto drivers, X.25 drivers, MQ drivers. But in the modern development environment black box development is no longer the norm; so it might be time to change with the times.

Looking at Ruby, Perl, Python, even Groovy. They are all dynamic languages. They are all compiled or processed at runtime and so there is no benefit to private or protected objects. The code is there for the reading if you are so inclined. Java and C++ are compiled languages. Java does have some capability for runtime meta programming. But while historically developers purchased libraries to supplement the core JDK, they are now using Maven repositories like Ruby’s Gems, Python’s PyPi, and Perl’s CPAN.

private and protected are now more for vanity than any “protection” that the Java’s creators had envisioned.

 
Leave a comment

Posted by on 2012/05/09 in ProgLang

 

Tags: , , , , , , , , ,

In defense of dynamic languages

There are a good many truths and there are a better set of likelihoods. Given the current state of dynamic languages today they are less performance than static and functional languages, however, it is also true that dynamic languages are more productive than static and functional languages. (I am not talking about savants)

Don’t optimize your code at the first stage. First make it right, then (if necessary) make it fast (while keeping it right). –erlang programming rules

It is likely that regardless of the size of your project, the size or makeup of your team, or the breakthrough that you think the project represents… that your project is going to have average results at best. The Google’s, FaceBooks and Twitters of the world are extreme edge cases. As proof, look at the iPhone app store. There are over 600,000 apps and only a very small fraction of those apps have the following that Angry Birds does.

So before you go off in a corner reinventing the wheel in your favorite language consider this. WHat is going to be your return on investment? I cannot blame you for learning a new language or tool that would enhance your marketability or even just for hobby sake. But if your intent is to make some money and maybe a little independence they you really need to consider your ROI. And if you’re making money then rewriting your killer app in whatever killer fast programming language is available (and popular) will make make plenty of sense.

This is why I’m hot on python and python’s django, tornadoweb, flask; perl and perl’s mojolicious; ruby and ruby’s sinatra and rails; redis, sqlite, zeromq.

PS: While I’m not a fan of erlang, partly because of what it represents, I really like it’s Programming Rules and Conventions(PRC). By comparison python’s PEP-8 is amateurish. The PRC starts off with ideas like the one quoted above and giving you ideas on how best to approach the problem. This is like python’s PEP-20 but again it’s like signing your name with a crayon instead of a fountain pen.

 
Leave a comment

Posted by on 2012/05/03 in architecture, ProgLang

 

Tags: , , , , , , , , , , , ,

Where is Google now?

Over the past few days there have been press reports that Google was deprecating some of it tools as evidenced by google’s own project pages (link1, link2) What has me concerned about this policy is that I might have an idea for the next great webapp or I might have a client using some critical tools that Google is deprecating… now what?

Being an observer it’s too difficult to know what projects are in or out. It’s probably safe to say that GMail and Google Apps are in. While GMail is a free and there is a free version of Google Apps; there is a commercial component here too. But what about AppEngine? Well, there seems to be an ecosystem here and they just released GO v1 for AppEngine. But while this is fun and interesting for geeks and internal Googlers what does it mean for external businesses?

I think that Google is a riskier play than say deploying on a virtual or dedicated host or even another cloud vendor. And until someone can corner Google management with a commitment it might be better to pass on AppEngine for now.

That said, the platform development strategy going forward will be either be Java or Python (probably Python) making certain that the code is compartmentalized into libraries that will work on either platform… giving the client flexibility. The good news is that Django also works in both spaces.

 
Leave a comment

Posted by on 2012/04/25 in architecture

 

Tags: , , ,

RVM excels over virtualenv… (update) NOT!

[Update 2012-04-21:] What a freakin’ mistake. Rails is such crapware that it defies explanation or description. I had just completed installation of rails on 2 different Macs and an Ubuntu server. I then created my “demo” project to make sure that everything was installed properly. And I discovered that I had not. This was was a pretty good patch and it went flawlessly. When I went back to my project and tried the “bundle install” each of the projects barfed. When I finally got the ubuntu installation to fein completion I ran “rake about” and I now get javascript errors. I get it, I’m missing more prerequisites. This reminds me that I had a complaint about rub, rails and gems. The dependency stack is just too freakin’ deep. There is no way that anyone knows everything from shell to DB. Think about the autoconf tools. It is so long in the tooth these days that it is more magic than  reality. The difference is that the executables there are clear, maintainable and reproducible. Ruby and Rails are no more eliteware than erlang. Show me someone who claims to be a Ruby expert and I’ll show you someone who build vaporware on pretendware. (I’m pissed for spending 100′s of dollars on books and RubyMine; and weeks letting my mind consider that there was some value in Ruby; for taking a Ruby Job in Alabama… which I was converting to anything else… and for scanning the ruby job boards over the last few months) Virtualenv might not support many different versions of python, however, python just freakin’ works and the same can be said for Django.

[Update 2012-04-20:] With RVM you can install just about any version/revision of Ruby that suites you. I am in the process of installed 1.9.3-p194 right now and RVM supports a number of different flavors like MacRuby, ree. Virtualenv, on the other hand, is at the mercy of userspace installation of the target python and even then versions like pypy require patches not yet pulled back into virtualenv. I cannot say that this is the only reason for moving to Ruby from python but it is pretty strong.

… when dealing with the issue of the language versions. RVM gives you direct access to the versions of ruby currently available and installed and virtualenv puts the burden on the user. And installation is a pain in the ass,

 
Leave a comment

Posted by on 2012/04/21 in updates

 

Tags: , , ,

Your Next Web Application Framework?

Suppose you are the person who has to make the decision as to what language and framework your startup is going to use to deploy it’s application.  What would you choose? There are so many interesting and qualified frameworks that are already powering a good portion of the internet.

(You don’t have to know the language and framework but enough to argue what makes it idea.)

What would it be?

For example: I read an article several years ago that strongly recommended Erlang. At the time it was a great idea. The author suggested that using Erlang would attract smart people and keep the actual number of respondents to something manageable. Since then I implemented several applications that in hindsight: (a) impossible to attract new talent. (b) the more time that passes the more fragile the app gets because my detail recollection is fading (c) and it lacks common tools that would make allow generalized apps to give access to “operators” instead of programmers.

Other examples include: perl, ruby, python… mojolicious, rails and sinatra, flask, tornadoweb, and django.

I have my ideas… what are yours?

 
Leave a comment

Posted by on 2012/04/12 in architecture, future, web

 

Tags: , , , , , , , , , , ,

 
One Page Docs

Creating a library one page at a time.

One Page Bugs

Reducing the friction of writing and fixing bugs or features.

Follow

Get every new post delivered to your Inbox.

Join 223 other followers