DatabasesSQLSurrealDBtcm21581yr1yr

I spent the last year migrating my backend from MongoDB to SurrealDB and this is what I've learned

You may have heard about SurrealDB, a 'NewSQL' database written in Rust, which boasts an impressive collection of features. With a SQL-like syntax, an easy to use single-file CLI that can be hosted pretty much anywhere; SurrealDB makes a strong case for one of the 'databases of the future'. I have spent the last year or so migrating away from MongoDB and using SurrealDB instead as the database for this very website. I'd like to share some of the things I have learned during this process, as well as some paradigms I think could work well that you could adopt should you choose to check out SurrealDB.

Project History

This site initially started out as project for my web development class in my final year of university. I built it with plain Vue, express.js, and MongoDB. After finishing that class (A+ 🎓), graduating, and finding a full stack developer job at a SaaS ERP company, I decided I wanted to come back to the project and rewrite it from scratch again in my free time. I had proved the concept to myself, created a MVP, and wanted to try it again with better tools and eventually showcase it on my portfolio.

Nuxt 3 just got released, and I was hearing quite a bit of buzz about SurrealDB, so after doing some basic evaluations on both technologies, I decided that it was finally time. So in January of 2023 I started to work.

Relational and Schemaless

Similar to a hashtable, SurrealDB stores a direct index for every entry in a table. A simple example would be post:gt3qk6y427k4dgype117 for a entry in the post table.

What is special about these record ids is that they can be used as unique links between different entries in different tables, acting sort of like a replacement for the typical JOIN you might find in a SQL query. Let's say for example you had a post object which looked like this:

{
  id: "post:gt3qk6y427k4dgype117",
  user: "user:flpmi4o91lj04fi4cw8l",
  title: "This is the title of the post",
  time: "2023-04-18T16:02:55.467Z",
  comments: [
    "comment:4w75svwf5c5kbqkvjbxn",
    "comment:8tcsi75ekt3zytdb8qdt",
    "comment:a1bh0zs4s4zqkbgxst64"
  ],
}

You can could fetch the data that corresponds to those records ids using a query like this:

SELECT * FROM post:gt3qk6y427k4dgype117
FETCH user, comments

This gives you a more complex object which uses the record id as a way to locate the related data. It supplies the table and the exact index of them item to retrieve. The result of the query would be:

{
  id: "post:gt3qk6y427k4dgype117",
  user: {
    id: "user:flpmi4o91lj04fi4cw8l",
    name: "tcm"
    admin: true
  }
  title: "This is the title of the post",
  time: "2023-04-18T16:02:55.467Z",
  comments: [
    {
      id: "comment:4w75svwf5c5kbqkvjbxn",
      user: "user:flpmi4o91lj04fi4cw8l",
      post: "post:gt3qk6y427k4dgype117",
      content: "I hate SQL, it's so confusing... 😡",
      time: "2023-04-18T16:02:55.467Z"
    },
    {
      id: "comment:8tcsi75ekt3zytdb8qdt",
      user: "user:opkdyfig54tdre96jc37",
      post: "post:gt3qk6y427k4dgype117",
      content: "I like mongodb, it's so simple and easy",
      time: "2023-04-14T02:37:59.379Z"
    },
    {
      id: "comment:a1bh0zs4s4zqkbgxst64",
      user: "user:opkdyfig54tdre96jc37",
      post: "post:gt3qk6y427k4dgype117",
      content: "Excel works just fine for me.",
      time: "2023-04-15T03:05:14.331Z"
    }
  ],
}

This also works recursively as well, you could the following to also get the user information for each comment, which would be helpful if you wanted to include the username when displaying the comment.

SELECT * FROM post:gt3qk6y427k4dgype117
FETCH user, comments, comments.user

You can also select from the entire table by omitting the id, and you will get an array of results instead.

SELECT * FROM post
FETCH user, comments

While by default the all tables are schemaless, you are able to setup defined fields for data that needs to be formatted a certain way. You can define the type, default value, and any assertions that the field requires. You can do this at any time, so if you need to makes changes later in development to account for new features or changes in design philosophy, it is almost painless.

SQL-like Syntax

The query syntax is pretty elegant imho. I personally am a fan of writing SQL queries compared to working with a client library, I will use a string builder and then write it out line by line. I will show some examples of queries I am using right now, and explain some of the awesome QoL features that I think are awesome.

Creating a post

CREATE POST
CONTENT $post;

-- loop over each topic in javascript
UPDATE $topic SET
posts += $post;

parameters are denoted with a $, and can be inserted into the query from javascript. You can use any regular javascript object, so the $post here would be formatted similarly to the example post in the first section
arrays can be added to and removed from very easily. The topic.posts field is an array of post record ids. When a new post is created, each topic which is used, will be updated to include the post id for the new post. This post id is then used to fetch all of the posts which use that topic when navigating to a specific topic's page

Getting posts for the feed

SELECT id, title, topics, comments, time, votes, user.id, user.name
FROM post
ORDER BY time DESC
LIMIT $pageSize
START $pageStart
FETCH user

easy paging with the LIMIT and START keywords

Casting votes

UPDATE $item SET
votes.positive = array::union(votes.positive, [$user]),
votes.misleading -= $user,
votes.negative -= $user

this query is used for casting votes everywhere. On DOX you can vote on posts, comments, users, topics, etc, which all use this same query. I just pass in the id of the thing being voted on and the user who voted, and this will work regardless of what it is
specific functions to handle operations on arrays. I don't want duplicate votes, so using the array::union function avoids this

Calculate scores for posts

DEFINE EVENT calculateScore ON TABLE post
WHEN $event = "CREATE" THEN {
    UPDATE post SET
    votes.score = <future> {
      array::len(votes.positive) - (<float>array::len(votes.misleading) / 2) - array::len(votes.negative)
    }
  WHERE id = $after.id
};

you can define events (similar to triggers in regular SQL), which are executed based on certain conditions such as when a new entry is created, when a specific fields changes, etc. This event adds a new field to every new post automatically
the future is calculated when a record is accessed. I want to be able to sort the posts based upon a score, which is computed from the different votes. With it being a future I can know that it will always be accurate and never need to update it

Following and unfollowing

UPDATE $user SET
following += $follower;
UPDATE $follower SET
followers += $user;

can easily add and remove record ids from each users database entry

Send email confirmations

CREATE confirmation SET
user = $user,
time = time::now(),
used = false,
expired = <future> { time::now() > time + 15m }

using duration addition and a future, it is very easy to set a time limit for the confirmation to be completed. If a user does not complete the confirmation within 15 minutes, then it will evaluate to expired and they will need to try again

Fetching all posts for a specific topic

SELECT *
FROM post
WHERE topics CONTAINS $topic
FETCH user

easily filter the posts which include the desired topic

Calculate trending topics

SELECT count(), topic
FROM (
    SELECT id, topics as topic
    FROM post
    WHERE time > time::now() - 4w
    SPLIT topic
)
GROUP BY topic
ORDER BY count DESC
LIMIT 5

here I am filtering all posts within the last month, getting all of the unique topics which were used in those posts, and the counting how many times each topic was used. The top five topics are then selected and displayed as the trending topics

Query builder setup using javascript

var { sql, parameters } = queryBuilder()
sql.push('SELECT id, title, topics, comments, time, votes, user.id, user.name')
sql.push('FROM post')
sql.push('WHERE user = $user')
sql.push('ORDER BY time DESC')
sql.push('FETCH user')
parameters['user'] = `user:${id}`
const posts = await queryAll<Post>({ sql, parameters })

this is how all of the queries and defined in the backend. It works great for me, I have lots of control over the query and it and I can assure that it will do exactly what I want it to

Conclusion

I think my efforts have been entirely worthwhile. My development speed has increased significantly. I am only able to work on this in my free time, so being able to quickly make changes to the database schema based on new features I need to implement is super important. I love the performance of SurrealDB, basically all my database queries take ~30-100ms. I love that I am able to host it myself, this entire website lives on a $10 Linode server. There are tons of upcoming features in the release pipeline for SurrealDB that make me super exited as well such as real-time feeds and full-text search to new a few.

What do you think? Would you consider using SurrealDB for your next project? Do you prefer SQL, NoSQL, or NewSQL? Consider checking out the rest of the site! You can find the entire source code for the site on my GitHub if you are interested in learning more (you can even find the old branch for that initial MVP I created in 2022 😬).