MongoDB + Haskell

Haskell is a fun language. I've just started learning it and it's been a bit of a bumpy start because of the unfamiliar syntax. There are a lot of funny symbols to get accustomed to, new concepts and a unique type system (hey maybe it's not unique, but compaired to most mainstream languages it definitely feels that way).

I wanted to try using MongoDB with Haskell. I used the mongodb-haskell package. There is documentation and some examples which helped but some methods not demonstrated in the example were hard to figure out. I figured it out by quite a bit of trial and error and wanted to create a few examples of my own in case it could be of help to anyone else.

Hook me up!

To get started, let's get connected. You could just check out the official example but to save you some time here is how I did it:

runMongo dbName functionToRun = do
  pipe <- runIOE $ connect (host "127.0.0.1")
  e <- access pipe master (pack dbName) functionToRun
  close pipe

All you need to do then is pass a function to runMongo that will do all the queries. Here we're connecting to localhost, there's no username/password and it's a "master" meaning we're not using a cluster of Mongo database servers.

Ready, Set, Populate!

A database with no data ain't much fun so let's add some records. How about a bunch of fake phone book entries?

First let's define a type for the contacts. Let's call it Contact.

data Contact = Contact { firstName :: String,
                                        lastName :: String,
                                        category :: String,
                                        city :: String,
                                        cellPhone :: String
                                        } deriving (Show)

Now we create a function that populates the collection with some sample contacts. The function insertMany takes the collection name ("contacts") and a list of documents. We need a function to convert our type Contact to a Document holding the same properties. I called mine contactToDocument, and all it does is assign fields from the record to fields in a new BSON document. Using map we can apply our conversion function to a list of Contacts.

We could just skip the record but I think it adds to readability and if the definition of what a Contact is changes then you are less likely to have to deal with runtime bugs. Sure it won't compile but better that then the alternative.

doPopulate = do
  let contacts = (Contact {firstName = "Steve", lastName = "Powell", category = "business", city = "Toronto", cellPhone = "123-456-7890"}:Contact {firstName = "Test", lastName = "Person", category = "personal", city = "Ottawa", cellPhone = "144-456-3333"}:Contact {firstName = "Bob", lastName = "Person", category = "business", city = "Montreal", cellPhone = "144-222-3333"}:[])
  runMongo dbName (insertContacts contacts)

insertContacts :: [Contact] -> Action IO ()
insertContacts contacts = do
  let docs = map contactToDocument contacts
  contactIds <- (insertMany "contacts" docs)
  let sContactIds = map show contactIds
  liftIO $ mapM_ (printf "Added _id : %s\n") sContactIds

contactToDocument :: Contact -> Document
contactToDocument (Contact {firstName = fN, lastName = lN, category = ct, city = cty, cellPhone = cell}) =
  ["firstName" =: (pack fN), "lastName" =: (pack lN), "category" =: (pack ct), "city" =: (pack cty), "cell" =: (pack cell)]

Add me

What if we wanted to add a new contact? Assume we want to do this:

      ~/blog/haskell-mongo$ dist/build/phoneBook/phoneBook add Wood Chuck personal Toronto 343-234-9399
Added _id : 53cbb29e1d41c8508b000000

Similar to the above one but with insert instead of insertMany

doInsert :: [String] -> IO()
doInsert args = do
  case args of
    (fN:lN:ct:cty:cell:[]) ->
      runMongo dbName (insertContact contact)
        where contact = Contact {firstName = fN, lastName = lN, category = ct, city = cty, cellPhone = cell}
    [] ->
      liftIO $ printf "No arguments. Supply firstName lastName category city cellPhone\n"
    _ ->
      liftIO $ printf "Bad arguments. Supply firstName lastName category city cellPhone\n"
insertContact :: Contact -> Action IO ()
insertContact (Contact {firstName = fN, lastName = lN, category = ct, city = cty, cellPhone = cell}) = do
  contactIds <- (insert "contacts"
        ["firstName" =: (pack fN), "lastName" =: (pack lN), "category" =: (pack ct), "city" =: (pack cty), "cell" =: (pack cell)])
  let sContactId = show contactIds
  liftIO $ printf "Added _id : %s\n" sContactId

Where's that number?

MongoDB is a NoSQL database and a Mongo database consists of Collections of Documents. A Document is a bunch of Fields. A Field is a Label and Value. A Value can be an Integer, Text, Boolean, the usual stuff or another Document.

To query a Mongo database we need to do a find query.

findByFirstName :: String -> IO()
findByFirstName firstName = do
  phoneBookEntries <- rest =<< find (select [firstName =: (pack firstName)] "contacts")

The findByFirstName takes a string which we need to pass to select as a "Text" type. That's what pack does, it takes a String and gives us a Text. What's the difference? Strings are a list of characters and can be used with functions that act on lists while Text isn't a list but is more compact and efficient.

The "rest" function takes a Mongo cursor and extracts all the documents out of it at once. There are also the functinos next, nextN, and nextBatch that act on a cursor and retrieve a portion of Documents at a time. The part in the square brackets after "select" is the actual filter we're using whic in this case we're looking for documents with a a matching first name.. "<-" just runs/assigns the result of the query action so we can pass it to another function for processing.

From Field to Value

What's a document? It's just a list of Fields. Wouldn't it be nice if we could pull out the field we want and take a look at its value? We can, there are two ways of doing it. One way is to create a function to find what we want based on it's label. The other way is to create a function that takes a document and use guards to separate the fields out. The problem with the second approach is that we need to know the order of the fields returned. Let's try the first way.

getString :: Label -> Document -> String
getString label = typed . (valueAt label)

getInteger :: Label -> Document -> Integer
getInteger label = typed . (valueAt label)

getObjId :: Document -> ObjectId
getObjId = typed . (valueAt "_id")

getSecondaryObjId :: Label -> Document -> ObjectId
getSecondaryObjId label = typed . (valueAt label)

lookupString :: Label -> Document -> Maybe String
lookupString label document =
  document !? label

lookupInteger :: Label -> Document -> Maybe Integer
lookupInteger label document =
  document !? label

lookupSecondaryObjId :: Label -> Document -> Maybe ObjectId
lookupSecondaryObjId label document =
  document !? label

Now we can do stuff like this:

let phoneNumber = getString "cell" phoneBookEntry
let firstName = getString "firstName" phoneBookEntry
let objId = getObjId phoneBookEntry

The lookup functions are a bit different in that they can look into subdocuments. So you can pass "subDocument.someField" to it and get a field called "someField" from a subdocument called "subDocument". Unlike the other functions it returns Maybe String/Integer/etc.

Deleting

Deleting a document isn't much different than finding a document.

doDelete (firstName:theRest)
  | firstName == "all" = do
    runMongo dbName deleteAllEntries
  | otherwise = do
    runMongo dbName (deleteEntry (head theRest))

deleteAllEntries = delete (select [] "contacts")

deleteEntry firstName = delete (select ["firstName" =: (pack firstName)] "contacts")

A Little Aggregation

Do you miss all the cool "group by" stuff from SQL? Now you can enjoy the same fun stuff in Mongo, sort of, kind of, maybe, but not quite. Anyways here's a sample of what you can do using aggregation.

doCitySummary :: Action IO()
doCitySummary = do
  counts <- countByCity
  mapM_ printCityCount counts

countByCity :: Action IO [Document]
countByCity = aggregate "contacts" [
  ["$match" =: ["cell" =: ["$exists" =: 1]]],
  ["$group" =: ["_id" =: ["city" =: "$city"], "total" =: ["$sum" =: 1]]],
  ["$sort" =: ["city" =: 1]] ]

printCityCount :: Document -> Action IO()
printCityCount document = do
  let cityName = lookupString "_id.city" document
  let count = getInteger "total" document
  case cityName of
    Just cn -> liftIO $ printf "City: %s, Count: %i\n" cn count
    Nothing -> liftIO $ printf "City: no-name, Count: %i\n" count

Now we can do this with our program:

~/blog/haskell-mongo$ dist/build/phoneBook/phoneBook citySummary
City: Montreal, Count: 1
City: Ottawa, Count: 1
City: Toronto, Count: 3

Source Code

Get the source code here.

Other Packages

These other packages provide a layer of abstraction over mongodb-haskell.

mt-mongodb - You can use records instead of manually setting or getting fields from the document. It handles the conversion to and from the BSON document, so your code remains more statically typed and you don't need to worry about documents with dynamic fields. The downsides are that your documents all need to have the same number of fields, or you have to be careful writing your queries so you retrieve only records that have the same fields.

persistance-mongoDB - I haven't tried this one but it also maps Haskell records to BSON documents. It's meant to be a persistance layer for your app and is part of Yesod but can be used separately.