Introducing Riak Core

July 30, 2010 at 01:30 PM | categories: Riak, Riak Core, NoSQL, Database

This post was originally published on Kevin Smith's Blog, Hypothetical Labs. If you have questions or comments, please use the original post.

What is riak_core?

riak_core is a single OTP application which provides all the services necessary to write a modern, well-behaved distributed application. riak_core began as part of Riak. Since the code was generally useful in building all kinds of distributed applications we decided to refactor and separate the core bits into their own codebase to make it easier to use.

Distributed systems are complex and some of that complexity shows in the amount of features available in riak_core. Rather than dive deeply into code, I’m going to separate the features into broad categories and give an overview of each.

Note: If you’re the impatient type and want to skip ahead and start reading code, you can check out the source to riak_core via hg or git.

Node Liveness & Membership

riak_core_node_watcher is the process responsible for tracking the status of nodes within a riak_core cluster. It uses net_kernel to efficiently monitor many nodes. riak_core_node_watcher also has the capability to take a node out of the cluster programmatically. This is useful in situations where a brief node outage is necessary but you don’t want to stop the server software completely.

riak_core_node_watcher also provides an API for advertising and locating services around the cluster. This is useful in clusters where nodes provide a specialized service, like a CUDA compute node, which is used by other nodes in the cluster.

riak_core_node_watch_events cooperates with riak_core_node_watcher to generate events based on node activity, i.e. joining or leaving the cluster, etc. Interested parties can register callback functions which will be called as events occur.

Partitioning & Distributing Work

riak_core uses a master/worker configuration on each node to manage the execution of work units. Consistent hashing is used to determine which target node(s) to send the request and the master process on each node farms out the request to the actual workers. riak_core calls worker processes vnodes. The coordinating process is the vnode_master.

The partitioning and distribution logic inside riak_core also handles hinted handoff when required. Hinted handoff occurs as a result of a node failure or outage. In order to assure availability, most clustered systems will use operational nodes in place of down nodes. When the down node comes back the cluster needs to migrate the data from its temporary home on the substitute nodes to the data’s permanent home on the restored node. This process is called hinted handoff and is managed by components inside riak_core. riak_core also handles migrating partitions to new nodes when they join the cluster such that all work continues to be evenly partitioned to all cluster members.

riak_core_vnode_master starts all the worker vnodes on a given node and routes requests to
the vnodes as the cluster runs.

riak_core_vnode is an OTP behavior wrapping all the boilerplate logic required to implement a vnode. Application-specific vnodes need to implement a handful of callback functions in order to participate in handoff sessions and receive work units from the master.

Cluster State

A riak_core cluster stores global state in a ring structure. The state information is transferred between nodes in the cluster in a controlled manner to keep all cluster members in sync. This process is referred to as “gossiping”.

riak_core_ring is the module used to create and manipulate the ring state data shared by all nodes in the cluster. Ring state data includes items like partition ownership and cluster-specific ring metadata. Riak KV stores bucket metadata in the ring metadata, for example.

riak_core_ring_manager manages the cluster ring for a node. It is the main entry point for application code accessing the ring, via riak_core_ring_manager:get_my_ring/1, and also keeps a persistent snapshot of the ring in sync with the current ring state.

riak_core_gossip manages the ring gossip process and insures the ring is generally consistent across the cluster.

What’s the plan?

Over the next several months I’m going to cover the process of building a real application in a series of posts to this blog where each post covers some aspect of system building with riak_core. All of the source to the application will be published under the Apache2 licensed and shared via a public repo on github.

And what type of application will we build? Since the goal of this series is to illustrate how to build distributed systems using riak_core and also satisfy my own technical curiosity I’ve decided to build a distributed graph database. A graph database should provide enough use cases to really exercise riak_core while at the same time not obscuring the core learning experience in tons of complexity.

Thanks to Sean Cribbs and Andy Gross for providing helpful review and feedback.



Free Webinar - Riak with Rails - August 5 @ 2PM Eastern

July 29, 2010 at 05:00 PM | categories: Riak, Ruby, NoSQL, Database

Ruby on Rails is a powerful web framework that focuses on developer productivity. Riak is a friendly key-value store that is simple, flexible and scalable. Put them together and you have lots of exciting possibilities!

We invite you to join us for a free webinar on Thursday, August 5 at 2:00PM Eastern Time (UTC-4) to talk about Riak with Rails. In this hands-on webinar, we'll discuss:

  • Setting up a new Rails 3 project for Riak
  • Storing, retrieving, manipulating key-value data from Ruby
  • Issuing map-reduce queries
  • Creating rich document models with Ripple
  • Using Riak as a distributed cache and session store

The presentation will last 30 to 45 minutes, with time for questions at the end. Fill in the form below if you want to get started building Rails applications on top of Riak!

The Basho Team

Webinar Registration Form

You will be redirected to our webinar software after registering, where you can reserve your seat.

 First Name *
 
 Last Name *
 
 Email
(where webinar details will be sent) *
 
 Job Title *
 
 Company / Organization *
 
 How far are you into evaluating Riak?
 
 What use-cases are you evaluating Riak for?
 
 Additional comments / questions?
 


Consistent Smashing

July 28, 2010 at 01:30 PM | categories: Riak, NoSQL, Database

Sometimes you need more than words to illustrate a point. Here is Basho's humble attempt to clarify the difference between "Dynamo-Style" systems (like Riak) that use consistent hashing to achieve fault tolerance, simple scaling, and prevent data loss, and systems that use techniques like sharding.

Enjoy!

Mark

Consistent Smashing from Basho Technologies on Vimeo.



Webinar Recap - MapReduce Querying in Riak

July 27, 2010 at 05:00 PM | categories: Riak, Map/Reduce, NoSQL, Database

Thank you to all who attended the webinar last Thursday, it was a great turnout with awesome engagement. Like before, we're recapping the questions below for everyone's sake (in no particular order). If you missed the webinar, want some more information, or want to ask us some more questions, we've prepared a resource page for you. As always, you can also get ahold of us directly.

Q: Say I want to perform two-fold link walking but don't want to keep the "walk-through" results, including the initial one. Can I do something to keep only the last result?

In a MapReduce query, you can specify any number of phases to keep or ignore using the "keep" parameter on the phase. Usually you only want to keep the final phase. If you're using the link-walker resource, it'll return results from any phases whose specs end in "1". See the REST API wiki page for more information on link-walking.

Q: Will Riak Search work along with MapReduce, for example, to avoid queries over entire bucket? Will there be a webinar about Riak Search?

Yes, we intend to have this feature in the Generally Available release of Riak Search. We will definitely have a webinar about Riak Search close to its public release.

Q: Are there still problems with executing "qfun" functions from Erlang during MapReduce?

"qfun" phases (that use anonymous Erlang functions) will work on a one-node cluster, but not across a multi-node cluster. You can use them in development but it's best to switch to a compiled module function or Javascript function when moving to production.

Q: Although streams weren't mentioned, do you have any recommendations on when to use streaming map/reduce versus normal map/reduce?

Streaming MapReduce sends results back as they get produced from the last phase, in a multipart/mixed format. To invoke this, add ?chunked=true to the URL when you submit the job. Streaming might be appropriate when you expect the result set to be very large and have constructed your application such that incomplete results are useful to it. For example, in an AJAX web application, it might make sense to send some results to the browser before the entire query is complete.

Q: How do you indicate to Riak that the input key is a list of keys rather than a key whose value should be passed to the map function?

A custom map function could accomplish this, like the Javascript example below. The example assumes its input has a JSON Array of keys in the target bucket, and that the target bucket is the key of the input object.

function(object, keyData, arg){
  var keys = Riak.mapValuesJson(object)[0];
  return keys.map(function(item){ return [object.key, item] });
}

There's more to this issue — we discuss it in the next question.

Q: Which way is faster: storing a lot of links or storing the target keys in the value as a list? Are there any limits to the maximum number of links on a key?

How the links are stored will likely not have a huge impact on performance. If you choose to store a key list in a document, both methods would work. There are two relevant operations that would be performed with the key list document (updating and traversal).

The update process would involve retrieving the list, adding a value, and saving the list. If you are using the REST interface you will need to be aware of limitations in the number of allowed headers and the allowed header length. Mochiweb restricts the number of allowed headers to 1000. Header length is limited to 8192 characters. This imposes an upper limit for the number of Links that can be set through the REST interface.

The best method for updating a key list would be to write a post commit hook that performed the update. This avoids the need to access the key list using the REST interface so header limitations are no longer a concern. However, the post-commit hook could become a bottleneck in your update path if number of links grows large.

Traversal involves retrieving the key list document, collecting the related keys, and outputting a bucket/key list to be used in proceeding map phases. A built-in function is provided to process links. If you were to store keys in the value you would need to write a custom function to parse the keys and generate a bucket/key list. (see above question)

Q: Are you planning to run distributed reduce phases in the future?

Yes, here are two relevant feature requests you can track:

Q: What's the benefit of passing an arg to a map or reduce phase? Couldn't you just send the function body with the arg value filled in? Can I pass in a list of args or an arbitrary number of args?

When you have a lot of queries that are similar but with minor differences, you might be able to generalize a map or reduce function so that it can vary based on the 'arg' parameter. Then you could store that function in a built-ins library (see the question below) so it's preloaded rather than evaluated at query-time. The arg parameter can be any valid JSON value.

Q: What's the behavior if the map function is missing from one or more nodes but present on others?

The entire query will fail. It's best to make sure, perhaps via automated deployment, that all of your functions are available on all nodes. Alternatively, you can store Javascript functions directly in Riak and use them in a phase with "bucket" and "key" instead of "source" or "name".

Q: If there are 2 map phases, for example, then does that mean that both phases will be run back to back on each individual node and *then* it's all sent back for reduce? Or is there some back and forth between phases?

It's more like a pipeline, one phase feeds the next. All results from one phase are sent back to the coordinating node, which then initiates the subsequent phase once all participating nodes have replied.

Q: Would it be possible to send a function which acts as both a map predicate and an updater?

In general we don't recommend modifying objects as part of a MapReduce job because it can add latency to the request. However, you may be able to implement this with a map function in Erlang. Erlang MapReduce functions have full access to Riak including being able to read and write data.

%% Inside your own Erlang module
map_predicate_with_update(Value,_KeyData,_Arg) ->
  case predicate(Value) of
    true -> [update_passed_value(Value)];
    _ -> []
  end.

update_passed_value(Value) ->
  {ok, C} = riak:local_client(),
  %% modify your object here, store with C:put
  ModifiedValue.

This could come in handy for large updates instead of having to pull each object, update it and store it.

Q: Are Erlang named functions or JS named functions more performant? Which are faster — JS or Erlang functions?

There is a slight overhead when encoding the Riak object to JSON but otherwise the performance is comparable.

Q: Is there a way to use namespacing to define named Javascript functions? In other words, if I had a bunch of app-specific functions, what's the best way to handle that?

Yes, checkout the built-in Javascript MapReduce functions for an example.

Q: Can you specify how data is distributed among the cluster?

In short, no. Riak consistently hashes keys to determine where in the cluster data is located. This article explains how data is replicated and distributed throughout the cluster. In most production situations, your data will be evenly distributed.

Q: What is the reason for the nested list of inputs to a MapReduce query?

The nested list lets you specify multiple keys as inputs to your query, rather than a single bucket name or key. From the Erlang client, inputs are expressed as lists of tuples (fixed-length arrays) which have length of 2 (for bucket/key) or 3 (bucket/key/key-specific-data). Since JSON has no tuple type, we have to express the inputs as arrays of length 2 or 3 within an array.

Q: Is there a syntax requirement of JSON for Riak?

JSON is only required for the MapReduce query when submitted via HTTP, the objects you store can be in any format that your application will understand. JSON also happens to be a convenient format for MapReduce processing because it is accessible to both Erlang and Javascript. However, it is fairly common for Erlang-native applications to store data in Riak as serialized Erlang datatypes.

Q: Is there any significance to the name of file for how Riak finds the saved functions? I assume you can leave other languages in the same folder and it would be ignored as long as language is set to javascript? Additionally, is it possible/does it make sense to combine all your languages into a single folder?

Riak only looks for "*.js" files in the js_source_dir folder (see Configuration Files on the wiki). Erlang modules that contain map and reduce functions need to be on the code path, which could be completely separate from where the Javascript files are located.

Q: Would you point us to any best practices around matrix computations in Riak? I don't see any references to matrix in the riak wiki...

We don't have any specific support for matrix computations. We encourage you to find an appropriate Javascript or Erlang library to support your application.

Dan and Sean



Riak in Production - Lexer

July 21, 2010 at 01:00 PM | categories: Riak, Production, Community, NoSQL

A few members of the Basho Team are at OSCON all week. We are here to take part in the amazing talks and tutorials, but also to talk to Riak users and community members.

Yesterday I had the opportunity to have a brief chat with Andrew Harvey, a developer who hails from Sydney, Australia and works for a startup called Lexer. They are building some awesome applications around brand monitoring and analytics, and Riak is helping in that effort.

In this short clip, Andrew gives me the scoop on Lexer and shares a few details around why and how they are using Riak (and MySQL) at Lexer.

(Deepest apologies for the shakiness. I forgot the Tripod.)

Enjoy!

Mark

Riak in Production - Lexer from Basho Technologies on Vimeo.



Basho West and the Riak One Year Anniversary

July 19, 2010 at 12:00 PM | categories: , Riak, Community, NoSQL

Basho is growing. Fast. We are adding customers and users at a frenetic pace, and with this growth comes expansion in both team and locations. As some of you may have noticed, the Basho Team is not only becoming larger but more distributed. We now have people in six states scattered across four time zones pushing code and interacting with clients everyday.

First Order of Business

To bolster this growth and expansion, we did what any self-respecting tech startup would do: we opened an office in San Francisco. Several members of the Basho Team recently moved into a space at 795 Folsom, a cozy little spot a mere five floors below Twitter. (Proximity to the Nest was a requirement when evaluating office space.) We are calling it "Basho West." There are four of us here, and we are settling in quite nicely.

If you are in the area and want to talk Riak, Basho, open source, coffee, etc., stop in and pay us a visit any time. Seriously. If you walk through the door of Suite 1028 with a Mac Book in hand and have a question about how to model your data in Riak, we'll get out the whiteboard and help you out.

Second Order of Business

To make an immediate impact in the Bay Area, we thought it would be a great idea to get the first regularly scheduled Riak Meetup off the ground. We heard a rumor that there were a lot of people using or interested in databases out here, so we feel obliged to join the conversation. Here is the link to the San Francisco Riak Meetup group. If you're in the Bay Area and want to meet with other like-minded developers and technologists to discuss Riak (and other database technologies) in every possible capacity, please join us.

Third Order of Business

Pop quiz: When did Basho Technologies open source Riak? We asked ourselves this the other day. As far we can tell, it was sometime during the first week and a half of August last year. "Huh," we thought. "Wouldn't it be great to have a little gathering to commemorate this event?" It sure would, so that's what we are doing.

I mentioned above that we are starting a regularly scheduled Riak Meetup. To us, it made perfect sense to combine the inaugural Meetup with the event to celebrate Riak's One Year Anniversary of being a completely open source technology.

The date of this gathering is Monday, August 9th. The exact time and location still needs to be solidified. We'll be announcing that within the next few days. But put it on your calendar now, as you will not want to miss this. In addition to food, drink, and exceptional overall technical discussion and fireworks, here is what you can expect:

  • A talk from Dr. Eric Brewer, Basho Board Member and Father of the CAP Theorem
  • A few words from the team at Mochi Media about their experiences running Riak in production
  • A short talk from Basho's VP of Engineering, Andy Gross, on the state of Riak and the near term road map

If you have any other suggestions about what you would like to see at this event, just leave us a message or an idea on the Meetup page linked above.

Let's review:

  1. Come visit the new Basho Office at 795 Folsom, Suite 1028
  2. Join the Riak Meetup Group
  3. Come be a part of the Riak One Year Anniversary Celebration

And stay tuned, because things are only going to get more exciting from here.

The Basho Team



Basho Headed to OSCON and Community Leadership Summit

July 16, 2010 at 03:00 PM | categories: Riak, Community, NoSQL

Basho is sending some team members to Portland to take part in the two great events happening up there over the next week. Antony Falco, Mark Phillips (that's me) and John Hornbeck will be in "Stumptown" starting today for the Community Leadership Summit and OSCON. (We'll be landing at around 9PST if you want to meet us at PDX with welcome signs.)

If you would like to meet-up or want to say "hi" leave a comment, message us on Twitter, or email riak@basho.com.

We'll have shirts and stickers with us, too, so if you would like to get your hands on some Riak swag make sure to get in touch. I'll also be staggering around with a video camera, looking to interview anyone who has used or ever thought about using Riak or any other piece of Basho software. Users beware...

See you there!

Mark



Free Webinar - Map/Reduce Querying in Riak - July 22 @ 2PM Eastern

July 15, 2010 at 03:00 PM | categories: Riak, Map/Reduce, NoSQL, Database

Map-Reduce is a flexible and powerful alternative to declarative query languages like SQL that takes advantage of Riak's distributed architecture. However, it requires a whole new way of thinking about how to collect, process, and report your data, and is tightly coupled to how your data is stored in Riak.

We invite you to join us for a free webinar on Thursday, July 22 at 2:00PM Eastern Time (UTC-4) to talk about Map-Reduce Querying in Riak. We'll discuss:

  • How Riak's Map-Reduce differs from other systems and query languages
  • How to construct and submit Map-Reduce queries
  • Filtering, extracting, transforming, aggregating, and sorting data
  • Understanding the efficiency of various types of queries
  • Building and deploying reusable Map-Reduce function libraries

We'll cover the above topics in conjunction with practical examples from sample applications. The presentation will last 30 to 45 minutes, with time for questions at the end.

Fill in the form below if you want to get started building applications with Map/Reduce on top of Riak!Sorry, registration has closed!

The Basho Team



Next Page ยป