RSS

Tag Archives: credit card

Eventually Consistent Storage Will Save Mankind

I recently read a tweet from @justinsheehy, the very public face of Riak @ basho.com. He wrote:

Paraphrasing @GeorgeReese: to be protected from failure, put as much of your data in an eventually-consistent system as possible.

In response, and without thinking too deeply, I asked the question:

@justinsheehy @georgereese good point so why not a flat file and import later? Why all the extra cycles/rotations to write to any type DB?

And then @georgereese and I started to converse at 140-byte intervals until he sent me a link to this article: Eventual consistency – Wikipedia, the free encyclopedia.  At that moment I realized that my original question was really more of a statement; in it’s absolute simplest form; the wiki definition for eventual consistency can be applied to a flatfile on a DOS-based computer so long as you take backups and restore them on another computer… at some point in time.

That said, I think; and I could be wrong, Sheehy and Reese were probably talking about Riak which has a lot more moving parts in it than -say… a zipped-flatfile and rsync… and there is plenty of computer science reference material that discusses BLOC (bugs per line of code).

I’m currently designing and implementing a credit card payment gateway. It’s not overly complicated, however, the most interesting piece of this implementation is the use of Redis as the storage engine. While Redis stores everything in memory, I have enabled the feature/function that saves the data to disk; so while I have not enabled replication… this “system” can be described as eventually consistent.

Eventually Consistent is so much more interesting when applied generally and globally across systems instead of narrowly defined applications.

In the interest of full disclosure; I recently interviewed with @justinsheehy for a position on the Riak project. While I recognized that I had not performed well after only having a few hours sleep thanks to my pair of newborns I have not yet received any formal feedback. This conversation and post are meant to be informative and with the sincere hope that one day basho might offer me a position.

 
Leave a comment

Posted by on 2011/10/03 in database

 

Tags: , , , , ,

OLTP benchmarking is hard to do

I’ve built a number of successful OLTP systems used in the creditcard/prepaid card market place. One of these systems performs at around 12M transactions a day and the other around 900K. The first system has a lot more headroom. The CPU, disk and network are barely breathing. The latter, on the other hand, struggles and over the last few years I have found myself up late thinking about it.

The 12M system is running on Sun hardware running an Oracle database backend. The application was written in C using Oracle’s embedded SQL. This one application runs multiple instances on the same box as the DB and the entire hardware/software stack is duplicated per client. This application connects directly to the internal network where OLTP transactions are routed from the company’s internal POS devices for the closed network cards and from the credit card associations for the open network cards. This application also provides APIs that are called by the other applications for services like the help desk, card boarding and plastics etc. Reporting is performed in perl and connects directly to the database.

The 900K system runs on big honking Dell PCs with a SAN to store the data and ease backups. The stack is a Microsoft SQL server stack with the business logic implemented as stored procedures and the message normalization for transactions coming from the associations written in Java. The number of asynchronous socket connections with all of the associations can be duplicated as needed. Same for the gateway hardware that processes these transactions. The transaction is then sent to the database as a call into the first stored procedure which gets a list of the rules, implemented as other stored procedures, that this transaction is made up of. As control passes from one stored procedure to the next the data it collects and works on is rolled into the parameter call stack in order to prevent rereads from the DB. The actual execution of the stored procedures is not bad and for that matter it was a decent implementation and it met or exceeded many of the design requirements; if I can say so myself. But it was still too slow.

I failed to mention a few details. The 900K system implemented 4-way master-master replication. Each machine was processing every transaction from every source. Just think about when batch fees-processing was running! [update] each node was an 8core system with 4 or 8 GB memory for each code.

So where did we go wrong? Well I have a checklist:

  • The 12M system only had 5 tables, the 900K system had over 100 tables in the auth system.
  • Many of the 900K tables should have been in code either hard coded or preloaded during startup.
  • The transactions in the 900K system were lazy. They only read from the DB when they needed data meaning that there were more roundtrips. And in some cases there was some lock escalation.
  • Some of the indexes used btrees instead of hashes.
  • Some tables simply had too many indexes that did not apply and were never used confusing the optimizer and just taking more time.
  • Using a document approach for an account should have improved performance overall. If the document included all of the account information and the current transaction history all in once place.
  • Logging is a killer. The more logging you do equates to more I/O which clearly steals large fractions of a transaction. Consider Redis. They say that they can something like 1M TPS. But if you log 100 messages into their pubsub then you are only going to get max 10K TPS. Now if you read and write to an MQ several times in a transaction then you will experience other performance robbing events. (we did not use an MQ, however, we did a lot of logging)
  • While modern SQL is getting better there are all sorts of arguments for going NoSQL. This works to an extent but it puts a different burden on the design team. You now have to implement a robust API set that you would otherwise defer to some SQL magic.

I think that covers things. I did a small proof of concept after I left the 900K company. I implemented a system without logging, using a document container for the account, using hash indexes for the tables that were important, limited the number of tables overall, eliminate SQL (thank you BerkeleyDB/SleepyCat). And do all of this on very modest hardware. I managed to get 1400 TPS on a very modest CPU.

Now the things I did to get these numbers are not totally unreasonable, however, they break a lot of rules from the business point of view. Business owners like to be able to perform root-cause-analysis. Especially when something bad happens. So some about of logging is inevitable. SQL is really important for report generation specially when the genius programmers cannot be bothers.

So there is room in my head for yet another full blown system. If you look over to the Box Files section in the sidebar there are some system designs that I’m putting together. I’m hoping that someone might actually pay me to develop them. Any takers?

 
Leave a comment

Posted by on 2011/08/05 in beta, credit cards

 

Tags: , , ,

A new approach : HamsterDB

Revisiting my favorite subject again, credit card processing, the hamsterDB’s description on the NoSQL website triggered an alarm.

hamsterDB – (embedded solution) ACID Compliance, Lock Free Architecture (transactions fail on conflict rather than block), Transaction logging & fail recovery (redo logs), In Memory support – can be used as a non-persisted cache, B+ Trees – supported

The key words being “lock free”. In any typical CC issuing system you can expect to see transaction times from 50 to 500ms depending on the amount of work the authorization system has to perform, DB latency and locking.

Typical transaction workflow looks like some code that just tries to get some data from the DB, do some work, get some more data from the DB and do some more work. And while performing I/O with the DB you always have to be ready for a failure. Typical failures are deadlocks, consistency because another process updated a record and so on. And when you think about the breadcrumbs and trying to recover from these failures it simply makes the code more complex.

update account set balance=balance-10.00, version=version+1 where ccnumber=? and version=11;

With the hamster approach you can and should get all the data that you need from the DB upfront at the beginning of the transaction. Keep in mind that in some use-cases this data could be prohibitively large so it’s best to completely understand the scope. Then do the workflow as you would normally… leaving you with a set of DB updates/inserts that need to be executed. So execute them.

Now if anything goes wrong you have choices. 1) retry the DB changes from the last step; 2) retry the workflow from step two; 3) retry the entire transaction. It simply depends on the nature of the write failure and what you determined was the best recovery.

And here’s why this is better. Given the performance profile for an authorization (50-500ms) and the timeout that is permissible by the association (10-45seconds depending in the trantype) you can retry this transaction internally almost any number of times in order to get a positive response… providing the error was internal and not hardware, network ete related.

One other thing that did not escape my eye. Kevin Smith (formerly from Riak) is the erlang client maintainer. Optional encryption(great for PCI)

On the downside there is no replication, sharding, python, perl, or traditional C. However, the approach would be interesting for other platforms… almost there.

 
1 Comment

Posted by on 2011/06/23 in credit cards, database

 

Tags: ,

Redis In Payments

There are a number of hurdles for the merchant checkout/shopping-cart to overcome when accepting credit card transactions. There are a number of obvious and outwardly facing challenges like:

  • PCI-DSS
  • Acquirer contracts
  • Shopping Cart
  • Banking
  • requirements – payments, recurring payments

Once you make it past, not so technical speed bumps, there are a number of implementation details that follow. On the one hand there is the user experience and how that is implemented by the website; and then there are the many acquirers and the different protocols and payload formats. This is usually referred to as transaction impedance.

 

What does this mean? How is that implemented in Redis?

Let’s start with the user experience. At some point the user will want to complete the purchase by providing some credit card information that you are going to use to send to an acquirer for processing. Given the number of ways this can be accomplished the best way will be an internal implementation using an iFrame. This way you can encapsulate and reuse the checkout in multiple places within your app.

The iFrame will then POST via REST or Ajax type message to the URL that provided the iFrame. The form data should be validated here and then forwarded to the message broker. I’m suggesting RestMQ is a good option as it uses Redis. The message is put on the message queue. Shortly thereafter a worker daemon that has been blocking on an empty queue will awaken, pull a message from the queue, reformat the message for the acquirer and forward the message to the Acquirer.

Here is where it get’s tricky. Depending on the protocol either the worker is going to wait for a response from the Acquirer in response to the request or the worker is going to move on to the next message in the broker’s queue. This depends on the protocol with the Acquirer. If the Acquirer is implemented in REST or HTTP then the worker can simply wait. Of course there can be as many workers for as many simultaneous connections the acquirer will permit.

On the other hand, many acquirers like to use a single socket and process the transactions in an asynchronous fashion. In which case you’ll need two threads and a cache for the transactions in flight. I know that’s a tough concept…. here goes…

A worker thread pulls a message from the queue, assigns a UUID, stores the transaction in a hash with an expiration date, then writes the message to the async socket.

Some other thread is blocking while reading from the remote socket.  When a response is received, the working will queue the response.  Another worker will read from that side-queue, locate the UUID, get the request from the cache, parse the response, construct a response for the user checkout and then forward the response to the user’s iFrame.

WOW… and now the interesting stuff.  Since the messages in the queue and cache can have expiration dates transactions that timeout can be allowed to timeout all over the system without adversely effecting the overall performance of the system. Historically transaction nodes have allowed transactions to timeout rather than send back errors in response to transactions that timeout. The logic required to insure that the error responses make their way around the “system” are costly. And in fact have demonstrated that users get nutty with the “retry” and they can essentially DDOS your payment system.

This is a pretty exciting design. Tornado and Cyclone use epoll and address the C10K problem nicely. Redis can perform 100K writes per second. Daemon tools can handle keeping the system alive. Redis offers some replication although it is master-slave so HA is still possible. Since this is a cluster solution it is possible to distribute the transaction load over several servers. However, in a cluster arrangement you may need to route the transactions (cluster affinity) so that the same card numbers follow the same route… should there be any sort of stand-in processing, duplicate transactions, etc… you want to have the latest information possible. Implementing routing via cluster affinity is another use-case for Redis as you can store the routes and then replicate that data and read the record from the slaves. With any luck this will be faster than an errant user. But you still have to keep an you out there for evil does… so a good blacklist is also helpful.

Some elements I left out… end of day batch file, recurring transactions, reporting, and customer care.

[UPDATE: Redis does not like combining EXPIRE and replication. The EXPIRE can have unpredictable results when executing queries against the slave(s). OK, so we have learned that FIFOs and Caches are useful in MQ broker construction and as part of the impedance mismatch correction. Redis/RestMQ seem to be strong tools.]

 
Leave a comment

Posted by on 2011/06/21 in Uncategorized

 

Tags: , , , ,

 
One Page Docs

Creating a library one page at a time.

One Page Bugs

Reducing the friction of writing and fixing bugs or features.

Follow

Get every new post delivered to your Inbox.

Join 223 other followers