Webinar Recap - Riak in Action: Wriaki

August 20, 2010 at 11:00 AM | categories: Riak, Erlang, Webmachine, NoSQL, Database

Thank you to those who attended our webinar yesterday. Like before, we're recapping the questions below for everyone's sake (in no particular order). If you missed the webinar, want some more information, or want to ask us some more questions, we've prepared a resource page for you. As always, you can also get ahold of us directly.

Q: How would solve full text search with the current versions of Riak? One could also take Wriaki as an example as most wikis have some sort of fulltext search functionality.

I recommend using existing fulltext solutions. Solr has matched up well with most of the web applications I have written, and would certainly work for Wriaki as well.

Q: Where in the course of the interaction (shown on slide 18) are you defining the client ID? Don't you need the client ID and vclock to match between updates?

On slide 42, we talk about "actors" which are essentially client IDs. Using the logged-in user as the client ID can help prevent vclock explosion and is a sensible way of structuring your updates.

Bryan



Erlang Factory London Recap

June 14, 2010 at 12:00 PM | categories: Riak, Erlang

This was originally posted by @rklophaus on his blog, rklophaus.com. Please email him with any corrections.

Erlang Factory London gathers Erlang pioneers from across the world—Berlin to Boston, Krakow to Cordoba, and San Francisco to Shanghai—for a two-day conference of innovative Erlang development.

The summaries below are just a small sampling of the talks at Erlang Factory London. There were three tracks running back-to-back for two days, and I often couldn't decide which of the three to attend. Slides and videos will be released by Erlang Solutions, and can be found under individual track pages on the Erlang Factory website.

Day 1 - June 10, 2010

Opening Session

Francesco Cessarini (Chief Strategy Officer, Erlang Solutions Ltd.), began the conference with a warm welcome and a quick review of progress made by Erlang-based companies in the last year.

Some highlights:

The History of the Erlang Virtual Machine - Joe Armstrong, Robert Virding

Joe Armstrong and Robert Virding gave a colorful, back-and-forth history of the Erlang's birth and early years. A few notable milestones and achievements:

  • Joe's early work on reduction machines. Robert's complete rewrite of Joe's work. Joe's complete rewrite of Robert's work. (etc.)
  • How Erlang was almost based on Smalltalk rather than Prolog
  • The quest to make Erlang 1.0x 80 times faster
  • Experiments with different memory management and garbage collection schemes
  • The train set used demonstrate Erlang, now in Robert's basement
  • The addition of linked processes, distribution, OTP, and bit syntax

It's easy to take a language like Erlang for granted and assume that its builders followed some well-known, pre-ordained path. Hearing Erlang's history from two of its main creators provided an excellent reminder that building software is both an art and a science, uncertain and exciting like any creative process.

Riak from the Inside - Justin Sheehy

Justin Sheehy (CTO of Basho Technologies) opened his talk by introducing Riak, "a scalable, highly-available, networked, open-source key/value store." He then very quickly announced that he wasn't there to talk about using Riak, he was there to talk about how Riak was built using Erlang and OTP

There are eight distinct layers involved in reading/writing Riak data:

  • The Client Application using Riak
  • The client-side HTTP API or Protocol Buffers API that talks to the Riak cluster
  • The server-side Riak Client containing the combined backing code for both APIs
  • The Dynamo Model FSMs that interact with nodes using Dynamo style quorum behavior and conflict resolution
  • Riak Core provides the fundamental distribution of the system (not covered in the talk)
  • The VNode Master that runs on every physical node, and coordinates incoming interaction with individual VNodes
  • Individual VNodes (Virtual Nodes) which are treated as lightweight local abstractions over K/V storage
  • The swappable Storage Engine that persists data to disk

During his talk, Justin discussed each layer's responsibilities and interactions with the layers above and below it.

Justin's main point is that carefully managed complexity in the middle layers allows for simplicity at the edge layers. The top three layers present a simple key/value interface, and the bottom two layers implement a simple key/value store. The middle layers (FSMs, Riak Core, and VNode Master) work together to provide scalability, replication, etc. Erlang makes this possible, and was chosen because it provides a platform that evolves in useful and relatively-predictable ways (this is a good thing, a surprising evolution is bad).

Mnesia for the CAPper - Ulf Wiger

Ulf Wiger (CTO of Erlang Solutions) discussed where Mnesia might fit into the changing world of databases, given the new focus on "NoSQL" solutions. Ulf gave a quick introduction to ACID properties, Brewer's CAP theorem, and the history of Mnesia, and then dove into a feature level description/comparison of Mnesia with other databases:

  • Deployed commercially for over 10 years
  • Comparable performance to current top performers clustered SQL space
  • Scalable to 50 nodes
  • Distributed transactions with loose time limits (in other words, appropriate for transactions across remote clusters)
  • Built-in support for sharding (fragments)
  • Incremental backup

The downsides are:

  • Erlang only interface
  • Tables limited to 2GB
  • Deadlock prevention scales poorly
  • Network partitions are not automatically handled, must recombine tables automatically

Ulf and others have done work to get around some of these limitations. Ulf showed code for an extension to Mnesia that automatically merges tables after they have split, using vector clocks.

Riak Search - Rusty Klophaus

I presented Riak Search, a distributed indexing and full-text search engine built on (and complementary to) Riak.

Part one covered the main reason for building Riak search: clients have built applications that eventually need to find data by value, not just by key. This is difficult, if not impossible, in a key/value store.

Part two described the shape of the final solution we set out to create. The goal of Riak Search is to support the Lucene interface, with Lucene syntax support and Solr endpoints, but with the operations story of Riak. This means that Riak Search will scale easily by adding new machines, and will continue to run after machine failure.

Part three was an introduction to Inverted Indexing, which is the heart of all search systems, as well as the difference between Document-Partitioning and Term-Partitioning, which forms the ongoing battle in the distributed search field. Part three continued with a deep-dive into parsing, planning, and executing the search query on Erlang.

Slides: http://www.slideshare.net/rklophaus/riak-search-erlang-factory-london-2010

Building a Scalable E-commerce Framework - Michael Nordström and Daniel Widgren

Michael Nordström and Daniel Widgren presented an Erlang-based e-commerce framework on behalf of their project team from Uppsala University (Christian Rennerskog, Shahzad Gul, Nicklas Nordenmark, Manzoor Ahmad Mubashir, Mikael Nordström, Kim Honkaniemi, Tanvir Ahmad, Yujuan Zou, and Daniel Widgren) and their industrial partner, Klarna AB.

The application uses a "LERN stack" (Linux, Erlang, Riak, Nitrogen), to provide a reusable web shop that can be quickly set up by clients, customized via templates and themes, and extended via plugins to support different payment providers.

The project is currently going a rewrite to update to the latest versions of Riak and Nitrogen.

GitHub: http://github.com/mino4071/CookieCart-2.0

Twitter: @Cookie_Cart

Clash of the Titans: Erlang Clusters and Google App Engine - Panos Papadopoulos, Jon Vlachoyiannis, Nikos Kakavoulis

Panos, Jon, and Nikos took turns describing the technical evolution of their startup, SocialCaddy, and why they were forced to move away from the Google App Engine. SocialCaddy is a tool that mines your online profiles for important events and changes, and tells you about them. For example, if a friend gets engaged, SocialCaddy will tell you about it, and assist you in sending a congratulatory note.

Google App Engine imposes a 30 second limit on requests. As SocialCaddy processed larger and larger social graphs, they bumped into this limit, which made GAE unusable as a platform. In response, the team developed Erlust, which allows you to submit jobs (written in any language) to a cluster. An Erlang application coordinates the jobs, and each job should read from a queue, process messages, and write to another queue.

Using Open-Source Trifork QuickCheck to test Erjang - Kresten Krab Thorup

Kresten Krab Thorup (CTO of Trifork) stirred up dust when he originally announced his intention to build a version of Erlang that ran on the JVM. Since then, he has made astounding progress. Erjang turns Erlang .beam files into Java .class files, now supporting a broad enough feature set to run Mnesia over distributed Erlang. Kresten claimed performance matching (or at times exceeding) that of the Erlang VM.

Erjang is still a work in progress, there are many BIFs that still need to be ported, but if a prototype exists to prove viability, then this prototype was certainly a success. One slide showed the code for the spawn_link function reimplemented in Java in ~15 lines of simple Java code.

For the second half of his talk, Kresten showed off Triq (short for Trifork Quickcheck), a scaled-down, open-source QuickCheck inspired testing framework that he built in order to test Erjang. Triq supports basic generators (called domains), value picking, and shrinking. Kresten showed examples of using Triq to validate that Erjang performs binary operations with the exact same results as Erlang.

More information about Erjang here: http://wiki.github.com/krestenkrab/erjang/

Day 2 - June 11, 2010

Efene: A Programming Language for the Erlang VM - Mariano Guerra

Mariano Guerra presented Efene, a new language that is translated into Erlang source code. Efene is intended to help coax developers into the world of Erlang who might otherwise be intimidated by the Prolog-inspired syntax of Erlang. We've heard about a number of other projects compiling into Erlang byte-code (such as Reia and Lisp-Flavored Erlang), but Efene takes a different approach in that the language is parsed and translated using Leex and Yecc into standard Erlang code, which is then compiled as normal. By doing this, Mariano manages to leave most of the heavy lifting of optimizations to the existing Erlang compiler.

Efene actually supports two different syntax flavors, one with curly brackets, the other without, leading to a syntax that feels vaguely like Javascript or Python, respectively. (The syntax without curly brackets is called Ifene, for "Indented Efene", and is otherwise identical to Efene.)

In some places, Efene syntax is a bit more verbose than Erlang. This is done to make the language more readable than Erlang. ("if" and "case" statements have more structure in Efene than Erlang.) In other places, Efene requires less typing, multi-claused function definitions don't require you to repeat the function name, for example.

Code samples and more information: http://marianoguerra.com.ar/efene

Erlang in Embedded Systems - Gustav Simonsson, Henrik Nordh, Fredrik Andersson, Fabian Bergstrom, Niclas Axelsson and Christofer Ferm

Gustav, Henrik, Fredrik, Fabian, Niclas, and Christofer (Uppsala University), in cooperation with Erlang Solutions, worked on a project to shrink the Erlang VM (plus the Linux platform on which it runs) down to the smallest possible footprint for use on Gumstix and BeagleBoard hardware.

The team experimented with OpenEmbedded and Angstrom, using BusyBox, uClibc, and stripped .beam files to further decrease the footprint. During the presentation, they played a video showing how to install Erlang on a Gumstix single-board computer in 5 minutes using their work.

More information about Embedded Erlang here: http://embedded-erlang.org

Zotonic: Easy Content Management with Erlang's Performance and Flexibility - Marc Worrell

Marc Worrell (WhatWebWhat) breaks CMSs into:

  • 1st Generation - Static text and images
  • 2nd Generation - Database- and template-driven systems (covers current CMS systems)
  • 3rd Generation - Highly interactive, real-time, personalized data exchanges and frameworks

Zotonic is aimed squarely at the third generation, Zotonic turns a CMS into a living, breathing thing, where modules on a page talk to each other and other sessions via comet, and the system can be easily extended, blurring the line between CMS and application framework.

This interactivity is what motivated Marc to write the system in Erlang; at one point he compared the data flowing through the system to a telephone exchange. Zotonic uses Webmachine, Mochiweb, ErlyDTL, and a number of other Erlang libraries, with data in PostgreSQL. (Marc also mentioned Nitrogen as an early inspiration for Zotonic, parts of Zotonic are based on Nitrogen code, though much has been rewritten.)

The data model is physically simple, with emergent functionality. A site is developed in terms of objects (called pages) interlinked with other objects. In other words, from a data perspective, adding an image to a web page is the same as linking from a page to a subpage, or tagging a page with an author. Mark gave a live demo of Zotonic's ability to easily add and change menu structures, modify content, and add and re-order images. Almost everything can be customized using ErlyDTL templates. Very polished stuff.

Marc then introduced his goal of "Elastic Zotonic", a Zotonic that can scale in a distributed, fault-tolerant, "buzzword-compliant" way, which will involve changes to the datastore and some of the architecture.

Marc is now working with Maximonster to develop an education-oriented social network on top of Zotonic.

More information: http://zotonic.com

Closing Session

Francesco (CSO, Erlang Solutions, Ltd.) thanked the sponsors, presenters, and audience. Frank then gave a big special thanks to Frank Knight and Joanna Włodarczyk, who both worked tirelessly to organize the conference and make everything go smoothly.

Final Thoughts

Erlang is gaining momentum in the industry as a platform that enables you to solve distributed, massively concurrent problems. People aren't flocking directly to Erlang itself, they are instead flocking to projects built in Erlang, such as RabbitMQ, ejabberd, CouchDB, and of course, Riak. At the same time, other languages are adopting some of the key features that make Erlang special, including a message-passing architecture and lightweight threads.



Hello, Bitcask

April 27, 2010 at 02:11 PM | categories: Erlang, Database

because you needed another local key/value store

One aspect of Riak that has helped development to move so quickly is pluggable per-node storage. By allowing nearly anything k/v-shaped to be used for actual persistence, progress on storage engines can occur in parallel with progress on the higher-level parts of the system.

Many such local key/value stores already exist, such as Berkeley DB, Tokyo Cabinet, and Innostore.

There are many goals we sought when evaluating which storage engines to use in Riak, including:

  • low latency per item read or written
  • high throughput, especially when writing an incoming stream of random items
  • ability to handle datasets much larger than RAM w/o degradation
  • crash friendliness, both in terms of fast recovery and not losing data
  • ease of backup and restore
  • a relatively simple, understandable (and thus supportable) code structure and data format
  • predictable behavior under heavy access load or large volume
  • a license that allowed for easy default use in Riak

Achieving some of these is easy. Achieving them all is less so.

None of the local key/value storage systems available (including but not limited to those written by us) were ideal with regard to all of the above goals. We were discussing this issue with Eric Brewer when he had a key insight about hash table log merging: that doing so could potentially be made as fast or faster than LSM-trees.

This led us to explore some of the techniques used in the log-structured file systems first developed in the 1980s and 1990s in a new light. That exploration led to the development of bitcask, a storage system that meets all of the above goals very well. While bitcask was originally developed with a goal of being used under Riak, it was also built to be generic and can serve as a local key/value store for other applications as well.

If you would like to read a bit about how it works, we've produced a short note describing bitcask's design that should give you a taste. Very soon you should be able to expect a Riak backend for bitcask, some improvements around startup speed, information on tuning the timing of merge and fsync operations, detailed performance analysis, and more.

In the meantime, please feel free to give it a try!

- Justin and Dizzy



Riak in Production - A Distributed Event Registration System Written in Erlang

April 20, 2010 at 05:00 PM | categories: Riak, Erlang

Riak, at its core, is an open source project. So, we love the opportunity to hear from our users and find out where and how they are using Riak in their applications. It is for that reason that we were excited to hear from Chris Villalobos. He recently put a Distributed Event Registration application into production at his church in Gainesville, Florida, and after hearing a bit about it, we asked him to write a short piece about it for the Basho Blog.

Enjoy,

Mark



Use Case and Prototyping

As a way of going paperless at our church, I was tasked with creating an event registration system that was accessible via touchscreen kiosk, SMS, and our website, to be used by members to sign up for various events. As I was wanting to learn a new language and had dabbled in Erlang (specifically Mochiweb) for another small application, I decided that I was going to try and do the whole thing in Erlang. But how to do it, and on a two month time line, was quite the challenge.

The initial idea was to have each kiosk independently hold pieces of the database, so that in the event something happened to a server or a kiosk, the data would still be available. Also, I wanted to use the fault-tolerance and distributed processing of Erlang to help make sure that the various frontends would be constantly running and online. And, as I wanted to stay as close to pure Erlang as possible, I decided early against a SQL database. I tried Mnesia but I wasn't happy with the results. Using QLC as an interface, interesting issues arose when I took down a master node. (I was also facing a time issue so playing with it extensively wasn't really an option.)

It just so happened that Basho released Riak 0.8 the morning I got fed up with it. So I thought about how I could use a key/value store. I liked how the Riak API made it simple to get data in and out of the database, how I could use map-reduce functionality to create any reports I needed and how the distribution of data worked out. Most importantly, no matter what nodes I knocked out while the cluster was running, everything just continued to click. I found my datastore.

During the initial protoyping stages for the kiosk, I envisioned a simple key/value store using a data model that looked something like this:

		[
		{key1, {Title, Icon, Background Image, Description, [signup_options]}},
		{key2, {...}}
		]
		

This design would enable me to present the user with a list of options when the kiosk was started up. I found that by using Riak, this was simple to implement. I also enjoyed that Riak was great at getting out of the way. I didn't have to think about how it was going to work, I just knew that it would. ( The primary issue I kept running into when I thought about future problems was sibling entries. If two users on two kiosks submit information at the same time for the same entry, (potentially an issue as the number of kiosks grow), then that would result in sibling entries because of the way user data is stored:

<<key1>>, <<ts>>, [user data]

But, by checking for siblings when the reports are generated, this problem became a non-issue.)

High Level Architecture

The kiosk is live and running now with very few kinks (mostly hardware) and everything is in pure Erlang. At a high level, the application architecture looks like this:

Each Touchscreen Kiosk:

  • wxErlang
  • Riak node

Web-Based Management/SMS Processing Layer:

  • Nitrogen Framework speaking to Riak for Kiosk Configuration/Reporting
  • Nitrogen/Mochiweb processing SMS messages from SMS aggregator

Periodic Email Sender:

  • Vagabond's gen_smtp client on a eternal receive after 24 hours send email-loop.

In Production

Currently, we are running four Riak nodes (writing out to the Filesystem backend) outside of the three Kiosks themselves. I also have various Riak nodes on my random linux servers because I can use the CPU cycles on my other nodes to distribute MapReduce functions and store information in a redundant fashion.

By using Riak, I was able to keep the database lean and mean with creative uses of keys. Every asset for the kiosk is stored within Riak, including images. These are pulled only whenever a kiosk is started up or whenever an asset is created, updated, or removed (using message passing). If an image isn't present on a local kiosk, it is pulled from the database and then stored locally. Also, all images and panels (such as the on-screen keyboard) are stored in memory to make things faster.

All SMS messages are stored within an SMS bucket. Every 24 hours all the buckets are checked with a "mapred_bucket" to see if there are any new messages since the last time the function ran. These results are formatted within the MapReduce function and emailed out using the gen_smtp client. As assets are removed from the system, the current data is stored within a serialized text file and then removed the database.

As I bring more kiosks into operation, the distributed map-reduce feature is becoming more valuable. Since I typically run reports during off hours, the kiosks aren't overloaded by the extra processing power. So far I have been able to roll out a new kiosk within 2 hours of receiving the hardware. Most of this time is spent doing the installation and configuration of the touchscreen. Also, the system is becoming more and more vital to how we are interfacing with people, giving members multiple ways of contacting us at their convenience. I am planning on expanding how I use the system, especially with code-distribution. For example, with the Innostore interface, I might store the beam files inside and send them to the kiosks using Erlang commands. (Version Control inside Riak, anyone?)

What's Next?

I have ambitious plans for the system, especially on the kiosk side. As this is a very beta version of the software, it is only currently in production in our little community. That said, I hope to open source it and put it on github/bitbucket/etc. as soon as I pretty up all the interfaces.

I'd say probably the best thing about this whole project is getting to know the people inside the Erlang community, especially the Basho people and the #erlang regulars on IRC. Anytime I had a problem, someone was there willing to work through it with me. Since I am essentially new to Erlang, it really helped to have a strong sense of community. Thank you to all the folks at Basho for giving me a platform to show what Erlang can do in everyday, out of the way places.

Chris Villalobos


Using Innostore with Riak

February 22, 2010 at 11:00 AM | categories: Riak, Erlang, Innostore, Database

Innostore is an Erlang application that provides an API for storing and retrieving key/value data using the InnoDB storage system. This storage system is the same one used by MySQL for reliable, transactional data storage. It’s a proven, fast system and perfect for use with Riak if you have a large amount of data to store. Let’s take a look at how you can use Innostore as a backend for Riak.

(Note: I assume that you have successfully built an instance of Riak for your platform. If you built Riak from source in ~/riak, then set $RIAK to ~/riak/rel/riak.”)

We first get started by grabbing a stable release of Innostore. You’ll need to download the source for a release from: http://bitbucket.org/basho/innostore/downloads/

Looking in the “Tags & snapshots” section, you should download the source for the highest available RELEASE_* tag. In my case, RELEASE_4 is the most recent release, so I’ll grab the bz2 file associated with it:

http://bitbucket.org/basho/innostore/get/RELEASE_4.tar.bz2

Once I have the source code, it’s time to unpack it and build:

$ tar -xjf innostore-RELEASE_4.tar.bz2
$ cd innostore
$ make

Depending on the speed of the machine you are building on, this may take a few minutes to complete. At the end, you should see a series of unit tests run, with the output ending:

=======================================================
All 7 tests passed.
100222 7:43:58 InnoDB: Shutdown completed; log sequence number 90283
Cover analysis: /Users/dizzyd/src/public/innostore/.eunit/index.html

Now that we have successfully built Innostore, it’s time to install it into the Riak distribution:

$ ./rebar install target=$RIAK/lib

If you look in the $RIAK/lib directory now, you should see the innostore-4 directory alongside a bunch of .ez files and other directories which compose the Riak release.

Now, we need to tell Riak to use the Innostore driver as a backend. Make sure Riak is not running. Edit $RIAK/etc/app.config, setting the value for “storage_backend” as follows:

{storage_backend, innostore_riak},

In addition, append the configuration for the Innostore application after the SASL section:

{sasl, [ ....
]}, %% < -- make sure you add a comma here!!

{innostore, [
{data_home_dir, "data/innodb"}, %% Where data files go
{log_group_home_dir, "data/innodb"}, %% Where log files go
{buffer_pool_size, 2147483648} %% 2G in-memory buffer in bytes
]}

You may need to adjust the directories for your data_home_dir and log_group_home_dirs to match where you want the inno data and log files to be stored. If possible, make sure that the data and log dirs are on separate disks -- this can yield much better performance.

Once you've completed the changes to $RIAK/etc/app.config, you're ready to start Riak:

$ $RIAK/bin/riak console

As it starts up, you should see messages from Inno that end with something like:

100220 16:36:58 InnoDB: highest supported file format is Barracuda.
100220 16:36:58 Embedded InnoDB 1.0.3.5325 started; log sequence number 45764

That’s it! You’re ready to start using Riak for storing truly massive amounts of data.

Enjoy,

Dave Smith



Basho Podcast Three - An Introduction To Innostore

February 02, 2010 at 03:45 PM | categories: Riak, InnoDB, Erlang, Innostore

You may remember that Basho recently open-sourced Innostore, our standalone Erlang application that provides a simple interface to embedded InnoDB...

In this podcast, Dave "Dizzy" Smith and Justin Sheehy discuss the release of Innostore, why we built it, how we use it in Riak, and why it might be useful for other Erlang projects. The discussion focuses on the stability and predictability of InnoDB, especially under load and as compared with other storage backends like DETS.

And of course, go download Innostore when you are done with the podcast.

Enjoy!

Mark



If you are having problems getting the podcast to play, click here to play in new window or right click to download the podcast.



Dave Smith Gives a General Overview of Rebar

December 21, 2009 at 02:00 PM | categories: Riak, Screencast, Erlang, Rebar

As a follow up to my first screencast on using Rebar for embedded Riak nodes, this video gives a more general overview of Rebar using ibrowse, an existing Erlang application, to show functionality and intended uses.

Enjoy,

Dave

(View on Vimeo)



If You Happen to Find Yourself in Stockholm...

November 09, 2009 at 12:22 PM | categories: Riak, Erlang, Nitrogen, NoSQL

Then make sure to check out Basho's very own Rusty Klophaus, who will be opening up the Erlang User Conference, slated to kick-off November 12th.

Rusty will be giving a brief overview of both Nitrogen and Riak, and then plans to describe common patterns and practices of Nitrogen and Riak development against the background of a sample application that allows a presenter to share and control a slideshow over the internet. In short, his presentation is not to be missed. (Official abstract can be found here.)

So, if you find yourself in Sweden this Thursday, make sure to show your face at the Astoria on Nybrogatan in Stockholm, and show your support for Erlang, Riak, Nitrogen and, most-importantly, Rusty.

Riak On,

Mark Phillips