The Release Riak 0.8 and JavaScript Map/Reduce
February 03, 2010 at 04:00 PM | categories: Riak, JavaScript, Map/Reduce, ScreencastWe are happy to announce the release of Riak 0.8 available for download immediately. Riak 0.8 features a number of enhancements to the core map/reduce machinery that will make Riak more accessible to a wider audience. The biggest enhancement is the ability to write map/reduce queries in JavaScript. We're using our erlang_js project to integrate Mozilla's Spidermonkey engine directly into Riak to keep overhead to a minimum.
We've also built a spiffy REST API for submitting map/reduce queries. Queries are described in JSON and POST-ed to the Riak server. Results are sent back as JSON for your processing pleasure. And, the REST interface supports streaming results for large result sets, too.
To kick it all off, we've put together a short screencast demonstrating how to use Riak's flashy new features. You can watch it below, or view it on Vimeo. There's also a slew of bug fixes and optimizations included in Riak 0.8. See the release notes for all the juicy details.
Download and enjoy!
Basho Podcast Three - An Introduction To Innostore
February 02, 2010 at 03:45 PM | categories: Riak, InnoDB, Erlang, InnostoreYou may remember that Basho recently open-sourced Innostore, our standalone Erlang application that provides a simple interface to embedded InnoDB...
In this podcast, Dave "Dizzy" Smith and Justin Sheehy discuss the release of Innostore, why we built it, how we use it in Riak, and why it might be useful for other Erlang projects. The discussion focuses on the stability and predictability of InnoDB, especially under load and as compared with other storage backends like DETS.
And of course, go download Innostore when you are done with the podcast.
Enjoy!
Why Vector Clocks are Easy
January 29, 2010 at 11:15 AM | categories: RiakVector clocks are confusing the first time you're introduced to them. It's not clear what their benefits are, nor how it is you derive said benefits. Indeed, each Riak developer has had his own set of false starts in making them behave.
The truth, though, is that vector clocks are actually very simple, and a couple of quick rules will get you all the power you need to use them effectively.
The simple rule is: assign each of your actors an ID, then make sure you include that ID and the last vector clock you saw for a given value whenever to store a modification.
The rest of this post will explain why and how to follow that simple rule. First, I'll explain how vector clocks work with a very simple example, and then show how to use them easily in Riak.
Vector Clocks by Example
We've all had this problem:
Alice, Ben, Cathy, and Dave are planning to meet next week for dinner. The planning starts with Alice suggesting they meet on Wednesday. Later, Dave discuss alternatives with Cathy, and they decide on Thursday instead. Dave also exchanges email with Ben, and they decide on Tuesday. When Alice pings everyone again to find out whether they still agree with her Wednesday suggestion, she gets mixed messages: Cathy claims to have settled on Thursday with Dave, and Ben claims to have settled on Tuesday with Dave. Dave can't be reached, and so no one is able to determine the order in which these communications happened, and so none of Alice, Ben, and Cathy know whether Tuesday or Thursday is the correct choice.
The story changes, but the end result is always the same: you ask two people for the latest version of a piece of information, and they reply with two different answers, and there's no way to tell which one is really the most recent.
Vector clocks to the rescue, but how? Simple: tag the date choice with a vector clock, and then have each party member update the clock whenever they alter the choice. Start with Alice's initial message:
date = Wednesday vclock = Alice:1
Alice says, "Let's meet Wednesday," and tags that value as the first version of the message that she has seen. Now Dave and Ben start talking. Ben suggests Tuesday:
date = Tuesday vclock = Alice:1, Ben:1
Ben left Alice's mark alone, but added a mark specifying that it was the first version of the message that he had seen. Dave replies, confirming Tuesday:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1
Just like Ben's modification, Dave just adds his own first-revision mark. Now Cathy gets into the act, suggesting Thursday:
date = Thursday vclock = Alice:1, Cathy:1
But wait, what happened to Ben's and Dave's marks? Cathy didn't have a version of the object that had been modified by Ben or Dave, so their marks can't appear in her modification. This means that Dave has two conflicting objects:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1
and
date = Thursday vclock = Alice:1, Cathy:1
Dave can tell that these versions are in conflict, because neither vclock "descends" from the other. In order for vclock B to be considered a descendant of vclock A, each marker in vclock A must have a corresponding marker in B that has a revision number greater than or equal to the marker in vclock A. Markers not contained in a vclock can be considered to have revision number zero. So, since the Tuesday value has a Cathy revision of zero while Thursday has a Cathy revision of one, Tuesday cannot descend from Thursday. But, since Thursday has Ben and Dave revisions of zero while Tuesday has Bend and Dave revisions of one, Thursday is also not descended from Tuesday. Neither succeeds the other, so Dave has a conflict to sort out.
Luckily, Dave's a reasonable guy, and chooses Thursday:
date = Thursday vclock = Alice:1, Ben:1, Cathy:1, Dave:2
Dave also created a vector clock that is successor to all previously-seen vector clocks: it has revision numbers for every actor equal to or greater than the last revision number he saw for that actor. He emails this value back to Cathy.
So now when Alice asks Ben and Cathy for the latest decision, the replies she receive are, from Ben:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1
and from Cathy:
date = Thursday vclock = Alice:1, Ben:1, Cathy:1, Dave:2
From this, she can tell that Dave intended his correspondence with Cathy to override the decision he made with Ben. All Alice has to do is show Ben the vector clock from Cathy's message, and Ben will know that he has been overruled. (Dave will, almost certainly, blame his broken email software for failing to inform Ben of the change.)
How to do this in Riak
Now that you understand vector clocks, using them with Riak is easy. I'll use the raw HTTP interface to illustrate.
First, whenever you store a value, include an X-Riak-ClientId
header to identify your actor. For Alice's first message above,
you'd say:
curl -X PUT -H "X-Riak-ClientId: Alice" -H "content-type: text/plain" \ http://localhost:8098/raw/plans/dinner --data "Wednesday"
When Ben, Cathy, and Dave each GET Alice's plans, they'll get the same vector clock (I've removed some of the other headers for brevity):
curl -i http://localhost:8098/raw/plans/dinner HTTP/1.1 200 OK X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA== Content-Type: text/plain Content-Length: 9 Wednesday
The X-Riak-Vclock header contains an encoded version of a vclock
that is the same as out earlier example: Alice has modified this
value once.
Now when Ben sends his change to Dave, he includes both the vector
clock he pulled down (in the X-Riak-Vclock header), and his own
X-Riak-Client-Id:
curl -X PUT -H "X-Riak-ClientId: Ben" -H "content-type: text/plain" \ -H "X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA==" \ http://localhost:8098/raw/plans/dinner --data "Tuesday"
Dave pulls down a fresh copy, and then confirms Tuesday:
curl -i http://localhost:8098/raw/plans/dinner ... X-Riak-Vclock: a85hYGBgymDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF/00ACmcBAA== ... curl -X PUT -H "X-Riak-ClientId: Dave" -H "content-type: text/plain" \ -H "X-Riak-Vclock: a85hYGBgymDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF/00ACmcBAA==" \ http://localhost:8098/raw/plans/dinner --data "Tuesday"
Cathy, on the other hand, hasn't pulled down a new version, and instead merely updated the plans with her suggestion of Thursday:
curl -X PUT -H "X-Riak-ClientId: Cathy" -H "content-type: text/plain" \ -H "X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA==" \ http://localhost:8098/raw/plans/dinner --data "Thursday"
(That's the same vector clock that Ben used, in that encoded gibberish is making your eyes cross.)
Now, when Dave goes to grab this new copy (after Cathy tells him she
has posted it), he'll see one of two things. If the "plans" Riak
bucket has the allow_mult property set to false, he'll see just
Cathy's update. If allow_mult is true for the "plans" bucket,
he'll see both his last update and Cathy's. I'm going to show the
allow_mult=true version below, because I think it illustrates the
flow better.
curl -i -H "Accept: multipart/mixed" http://localhost:8098/raw/plans/dinner HTTP/1.1 300 Multiple Choices X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fsRwsypF59BhT0mIoTZ/1SYQIUrEcJszUksu9R6kCWyAA== Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Length: 368 --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Tuesday --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Thursday --ZZ3eyjUllBi7GXRRMJsUublFxjn--
Dave sees two values because the vclock that Cathy generated wasn't a successor to the vclock that Dave had generated with his last modification. Riak couldn't choose between them, and therefore kept both values.
Dave picks Thursday, and updates the object, resolving the conflict. Riak has already computed a unified, descendant vector clock for Dave, so he uses the vector clock from the multi-value version he just pulled down, just like before:
curl -X PUT -H "X-Riak-ClientId: Dave" -H "content-type: text/plain" \ -H "X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fsRwsypF59BhT0mIoTZ/1SYQIUrEcJszUksu9R6kCWyAA==" \ http://localhost:8098/raw/plans/dinner --data "Thursday"
Now when Alice check for the latest version, she just sees the final decision:
curl -i http://localhost:8098/raw/plans/dinner HTTP/1.1 200 OK X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fvhwmzNSSy71HqgEpUTEerZ/1SYYBFmTr34DCjMBBTOnQwUzgIA Content-Type: text/plain Content-Length: 7 Thursday
While Riak couldn't decide whether to choose Cathy's modification over Dave's earlier modification, it was easy to choose Dave's latest modification, because the vclock created was a successor to the vclock in place.
Review
So, vclocks are easy: assign each of your actors an ID ("Alice", "Ben", "Cathy", and "Dave" in these examples), then make sure you include that ID and the last vector clock you saw for a given value whenever to store a modification.
If two actors store changes with vector clocks that don't descend from each other, Riak will store and hand back both values. When descendancy can be calculated, values stored with vector clocks that have been succeeded will be removed.
Innostore -- connecting Erlang to InnoDB
January 26, 2010 at 08:11 AM | categories: Riak, InnoDBRiak has pluggable storage engines, and so we're always on the lookout for better ways that users can store their data locally. Recent experiences with some Basho customers managing some large datasets led us to believe that InnoDB might work out very well for them.
To answer that question and fill that need, Innostore was written. It is a standalone Erlang application that provides a simple interface to Embedded InnoDB. So far its performance has been quite good, though InnoDB (with or without the Innostore API) is highly dependent on tuning the local configuration to match the local hardware. Luckily, Dizzy -- the author of Innostore -- has some heavy-duty experience doing that kind of tuning and as a result we've been able to help people meet their performance goals using Innostore.
Basho Podcast Two - An Introduction to erlang_js
January 19, 2010 at 09:10 AM | categories: Riak, Map/Reduce, NoSQL, Database, JavaScript, erlang_js, Scaling, PodcastCheck out the new Basho podcast featuring Kevin Smith and Bryan Fink that discusses erlang_js, a simple and easy-to-use binding between Erlang and JavaScript. It is packaged as an OTP application so developers can easily embed Javascript inside their own applications.
Once you are done with the podcast, go download erlang_js.
Enjoy,
A Quick Note on Rebar
January 10, 2010 at 11:45 PM | categories: Riak, RebarAs many of you Erlang and Riak fans know, Dave Smith has been hard at work on Rebar. For those of you who don't know, Rebar is a truly cool packaging and build tool for Erlang applications. Dave took a break from coding this morning to post a few words on his blog Gradual Epiphany about why he was inspired to write Rebar and what it means for building and deploying applications. Check it out. It's a great read.
Also, if you haven't had a chance to join the Rebar mailing list, you can do so here.
Riak Screencasts and Presentations - The Collected Works
December 28, 2009 at 03:45 PM | categories: NoSQL, Riak, Resources, Screencast, DatabaseTo date, there have been a number of screencasts and presentations done on Riak and Riak-related technologies. As a belated holiday gift (we were coding, not blogging), we thought it would be a valuable resource if we assembled all of them in an easy-to-peruse list here on the Basho Blog. If we missed any, please let us know in the comments.
Go forth and consume!
- Justin Sheehy's Riak presentation at NoSQL East
- Bryan Fink's fantastic overview of Riak in October at SQLFreeNYC
- Dave Smith's introducion to Riak in an embedded node using Rebar, a packing and build tool for Erlang applications.
Link: http://vimeo.com/8311407
- Dave Smith also gave a more general overview of Rebar
Link: http://vimeo.com/8311407
- Martin Scholl's...awesome presentation on Riak at NoSQL Berlin
Link: http://vimeo.com/7318171
- Rusty Klophaus' presentation entitled "Nitrogen and Riak by Example" at Erlang Factory in Stockholm
Link: http://bit.ly/5AgKzW
- Bryan Fink's screencast with Ben Ahlan of Video Code Chat demonstrating basic setup and usage of Riak
Link: http://bit.ly/akagc
Dave Smith Gives a General Overview of Rebar
December 21, 2009 at 02:00 PM | categories: Riak, Screencast, Erlang, RebarAs a follow up to my first screencast on using Rebar for embedded Riak nodes, this video gives a more general overview of Rebar using ibrowse, an existing Erlang application, to show functionality and intended uses.
Enjoy,
