Tải bản đầy đủ
Chapter 9. Setting Up a Replica Set

Chapter 9. Setting Up a Replica Set

Tải bản đầy đủ

A One-Minute Test Setup
This section will get you started quickly by setting up a three-member replica set on
your local machine. This setup is obviously not suitable for production, but it’s a nice
way to familiarize yourself with replication and play around with configuration.
This quick-start method stores data in /data/db, so make sure that di‐
rectory exists and is writable by your user before running this code.

Start up a mongo shell with the --nodb option, which allows you to start a shell that is
not connected to any mongod:
$ mongo --nodb

Create a replica set by running the following command:
> replicaSet = new ReplSetTest({"nodes" : 3})

This tells the shell to create a new replica set with three servers: one primary and two
secondaries. However, it doesn’t actually start the mongod servers until you run the
following two commands:
>
>
>
>
>

// starts three mongod processes
replicaSet.startSet()
// configures replication
replicaSet.initiate()

You should now have three mongod processes running locally on ports 31000, 31001,
and 31002. They will all be dumping their logs into the current shell, which is very noisy,
so put this shell aside and open up a new one.
In the second shell, connect to the mongod running on port 31000:
> conn1 = new Mongo("localhost:31000")
connection to localhost:31000
testReplSet:PRIMARY>
testReplSet:PRIMARY> primaryDB = conn1.getDB("test")
test

Notice that, when you connect to a replica set member, the prompt changes to
testReplSet:PRIMARY>. "PRIMARY" is the state of the member and "testReplSet" is
an identifier for this set. You’ll learn how to choose your own identifier later; testRepl
Set is the default name ReplSetTest uses.
Examples from now on will just use > for the prompt instead of testReplSet:PRI

MARY> to keep things more readable.

170

|

Chapter 9: Setting Up a Replica Set

Use your connection to the primary to run the isMaster command. This will show you
the status of the set:
> primaryDB.isMaster()
{
"setName" : "testReplSet",
"ismaster" : true,
"secondary" : false,
"hosts" : [
"wooster:31000",
"wooster:31002",
"wooster:31001"
],
"primary" : "wooster:31000",
"me" : "wooster:31000",
"maxBsonObjectSize" : 16777216,
"localTime" : ISODate("2012-09-28T15:48:11.025Z"),
"ok" : 1
}

There are a bunch of fields in the output from isMaster, but the important ones indicate
that you can see that this node is primary (the "ismaster" : true field) and that there
is a list of hosts in the set.
If this server says "ismaster" : false, that’s fine. Look at the "pri
mary" field to see which node is primary and then repeat the connection
steps above for that host/port.

Now that you’re connected to the primary, let’s try doing some writes and see what
happens. First, insert 1,000 documents:
> for (i=0; i<1000; i++) { primaryDB.coll.insert({count: i}) }
>
> // make sure the docs are there
> primaryDB.coll.count()
1000

Now check one of the secondaries and verify that they have a copy of all of these docu‐
ments. Connect to either of the secondaries:
> conn2 = new Mongo("localhost:31001")
connection to localhost:31001
> secondaryDB = conn2.getDB("test")
test

Secondaries may fall behind the primary (or lag) and not have the most current writes,
so secondaries will refuse read requests by default to prevent applications from acci‐
dentally reading stale data. Thus, if you attempt to query a secondary, you’ll get an error
that it’s not primary:
A One-Minute Test Setup

|

171

> secondaryDB.coll.find()
error: { "$err" : "not master and slaveok=false", "code" : 13435 }

This is to protect your application from accidentally connecting to a secondary and
reading stale data. To allow queries on the secondary, we set an “I’m okay with reading
from secondaries” flag, like so:
> conn2.setSlaveOk()

Note that slaveOk is set on the connection (conn2), not the database (secondaryDB).
Now you’re all set to read from this member. Query it normally:
> secondaryDB.coll.find()
{ "_id" : ObjectId("5037cac65f3257931833902b"),
{ "_id" : ObjectId("5037cac65f3257931833902c"),
{ "_id" : ObjectId("5037cac65f3257931833902d"),
...
{ "_id" : ObjectId("5037cac65f3257931833903c"),
{ "_id" : ObjectId("5037cac65f3257931833903d"),
{ "_id" : ObjectId("5037cac65f3257931833903e"),
Type "it" for more
>
> secondaryDB.coll.count()
1000

"count" : 0 }
"count" : 1 }
"count" : 2 }
"count" : 17 }
"count" : 18 }
"count" : 19 }

You can see that all of our documents are there.
Now, try to write to a secondary:
> secondaryDB.coll.insert({"count" : 1001})
> secondaryDB.runCommand({"getLastError" : 1})
{
"err" : "not master",
"code" : 10058,
"n" : 0,
"lastOp" : Timestamp(0, 0),
"connectionId" : 5,
"ok" : 1
}

You can see that the secondary does not accept the write. The secondary will only per‐
form writes that it gets through replication, not from clients.
There is one other interesting feature that you should try out: automatic failover. If the
primary goes down, one of the secondaries will automatically be elected primary. To try
this out, stop the primary:
> primaryDB.adminCommand({"shutdown" : 1})

Run isMaster on the secondary to see who has become the new primary:
> secondaryDB.isMaster()

It should look something like this:
172

|

Chapter 9: Setting Up a Replica Set

{

}

"setName" : "testReplSet",
"ismaster" : true,
"secondary" : false,
"hosts" : [
"wooster:31001",
"wooster:31000",
"wooster:31002"
],
"primary" : "wooster:31001",
"me" : "wooster:31001",
"maxBsonObjectSize" : 16777216,
"localTime" : ISODate("2012-09-28T16:52:07.975Z"),
"ok" : 1

Your primary may be the other server; whichever secondary noticed that the primary
was down first will be elected. Now you can send writes to the new primary.
isMaster is a very old command, predating replica sets to when MongoDB only sup‐
ported master-slave replication. Thus, it does not use the replica set terminology con‐
sistently: it still calls the primary a “master.” You can generally think of “master” as
equivalent to “primary” and “slave” as equivalent to “secondary.”
When you’re done working with the set, shut down the servers from your first shell.
This shell will be full of log output from the members of the set, so hit Enter a few times
to get back to a prompt. To shutdown the set, run:
> replicaSet.stopSet()

Congratulations! You just set up, used, and tore down replication.
There are a few key concepts to remember:
• Clients can send a primary all the same operations they could send a standalone
server (reads, writes, commands, index builds, etc.).
• Clients cannot write to secondaries.
• Clients, by default, cannot read from secondaries. By explicitly setting an “I know
I’m reading from a secondary” setting, clients can read from secondaries.
Now that you understand the basics, the rest of this chapter focuses on configuring a
replica set under more realistic circumstances. Remember that you can always go back
to ReplSetTest if you want to quickly try out a configuration or option.

A One-Minute Test Setup

|

173

Configuring a Replica Set
For actual deployments, you’ll need to set up replication across multiple machines. This
section takes you through setting up a real replica set that could be used by your
application.
Let’s say that you already have a standalone mongod on server-1:27017 with some data
on it. (If you do not have any pre-existing data, this will work the same way, just with
an empty data directory.) The first thing you need to do is choose a name for your set.
Any string whatsoever will do, so long as it’s UTF-8.
Once you have a name for your replica set, restart server-1 with the --replSet name
option. For example:
$ mongod --replSet spock -f mongod.conf --fork

Now start up two more mongod servers with the replSet option and the same identifier
(spock): these will be the other members of the set:
$ ssh server-2
server-2$ mongod --replSet spock -f mongod.conf --fork
server-2$ exit
$
$ ssh server-3
server-3$ mongod --replSet spock -f mongod.conf --fork
server-3$ exit

Each of the other members should have an empty data directory, even if the first member
had data. They will automatically clone the first member’s data to their machines once
they have been added to the set.
For each member, add the replSet option to its mongod.conf file so that it will be used
on startup from now on.
Once you’ve started the mongods, you should have three mongods running on three
separate servers. However, each mongod does not yet know that the others exist. To tell
them about one another, you have to create a configuration that lists each of the members
and send this configuration to server-1. It will take care of propagating it to the other
members.
First we’ll create the configuration. In the shell, create a document that looks like this:
> config = {
"_id" : "spock",
"members" : [
{"_id" : 0, "host" : "server-1:27017"},
{"_id" : 1, "host" : "server-2:27017"},
{"_id" : 2, "host" : "server-3:27017"}
]
}

174

|

Chapter 9: Setting Up a Replica Set

There are several important parts of config. The config’s "_id" is the name of the set
that you passed in on the command line (in this example, "spock"). Make sure that this
name matches exactly.
The next part of the document is an array of members of the set. Each of these needs
two fields: a unique "_id" that is an integer and a hostname (replace the hostnames
with whatever your servers are called).
This config object is your replica set configuration, so now you have to send it to a
member of the set. To do so, connect to the server with data on it (server-1:27017) and
initiate the set with this configuration:
>
>
>
>
>
{
}

// connect to server-1
db = (new Mongo("server-1:27017")).getDB("test")
// initiate replica set
rs.initiate(config)
"info" : "Config now saved locally.
"ok" : 1

Should come online in about a minute.",

server-1 will parse the configuration and send messages to the other members, alerting
them of the new configuration. Once they have all loaded the configuration, they will
elect a primary and start handling reads and writes.
Unfortunately, you cannot convert a standalone server to a replica set
without some downtime for restarting it and initializing the set. Thus,
even if you only have one server to start out with, you may want to
configure it as a one-member replica set. That way, if you want to add
more members later, you can do so without downtime.

If you are starting a brand-new set, you can send the configuration to any member in
the set. If you are starting with data on one of the members, you must send the config‐
uration to the member with data. You cannot initiate a set with data on more than one
member.
You must use the mongo shell to configure replica sets. There is no way
to do file-based replica set configuration.

rs Helper Functions
Note the rs in the rs.initiate() command above. rs is a global variable that contains
replication helper functions (run rs.help() to see the helpers it exposes). These
Configuring a Replica Set

|

175

functions are almost always just wrappers around database commands. For example,
the following database command is equivalent to rs.initiate(config):
> db.adminCommand({"replSetInitiate" : config})

It is good to have a passing familiarity with both the helpers and the underlying com‐
mands, as it may sometimes be easier to use the command form instead of the helper.

Networking Considerations
Every member of a set must be able to make connections to every other member of the
set (including itself). If you get errors about members not being able to reach other
members that you know are running, you may have to change your network configu‐
ration to allow connections between them.
Also, replica sets configurations shouldn’t use localhost as a hostname. There isn’t much
point to running a replica set on one machine and localhost won’t resolve correctly from
a foreign machine. MongoDB allows all-localhost replica sets for testing locally but will
protest if you try to mix localhost and non-localhost servers in a config.

Changing Your Replica Set Configuration
Replica set configurations can be changed at any time: members can be added, removed,
or modified. There are shell helpers for some common operations; for example, to add
a new member to the set, you can use rs.add:
> rs.add("server-4:27017")

Similarly, you can remove members;
> rs.remove("server-1:27017")
Fri Sep 28 16:44:46 DBClientCursor::init call() failed
Fri Sep 28 16:44:46 query failed : admin.$cmd { replSetReconfig: {
_id: "testReplSet", version: 2, members: [ { _id: 0, host: "ubuntu:31000" },
{ _id: 2, host: "ubuntu:31002" } ] } } to: localhost:31000
Fri Sep 28 16:44:46 Error: error doing query:
failed src/mongo/shell/collection.js:155
Fri Sep 28 16:44:46 trying reconnect to localhost:31000
Fri Sep 28 16:44:46 reconnect localhost:31000 ok

Note that when you remove a member (or do almost any configuration change other
than adding a member), you will get a big, ugly error about not being able to connect
to the database in the shell. This is okay; it actually means the reconfiguration succeeded!
When you reconfigure a set, the primary closes all connections as the last step in the
reconfiguration process. Thus, the shell will briefly be disconnected but will automati‐
cally reconnect on your next operation.

176

|

Chapter 9: Setting Up a Replica Set

The reason that the primary closes all connections is that it briefly steps down whenever
you reconfigure the set. It should step up again immediately, but be aware that your set
will not have a primary for a moment or two after reconfiguring.
You can check that a reconfiguration succeeded by run rs.config() in the shell. It will
print the current configuration:
> rs.config()
{
"_id" : "testReplSet",
"version" : 2,
"members" : [
{
"_id" : 1,
"host" : "server-2:27017"
},
{
"_id" : 2,
"host" : "server-3:27017"
},
{
"_id" : 3,
"host" : "server-4:27017"
}
]
}

Each time you change the configuration, the "version" field will increase. It starts at
version 1.
You can also modify existing members, not just add and remove them. To make mod‐
ifications, create the configuration document that you want in the shell and call rs.re
config. For example, suppose we have a configuration such as the one shown here:
> rs.config()
{
"_id" : "testReplSet",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "server-1:27017"
},
{
"_id" : 1,
"host" : "10.1.1.123:27017"
},
{
"_id" : 2,
"host" : "server-3:27017"
}

Changing Your Replica Set Configuration

|

177

]
}

Someone accidentally added member 1 by IP, instead of its hostname. To change that,
first we load the current configuration in the shell and then we change the relevant fields:
> var config = rs.config()
> config.members[1].host = "server-2:27017"

Now that the config document is correct, we need to send it to the database using the
rs.reconfig helper:
> rs.reconfig(config)

rs.reconfig is often more useful that rs.add and rs.remove for complex operations,

such as modifying members’ configuration or adding/removing multiple members at
once. You can use it to make any legal configuration change you need: simply create the
config document that represents your desired configuration and pass it to rs.reconfig.

How to Design a Set
To plan out your set, there are certain replica set concepts that you must be familiar
with. The next chapter goes into more detail about these, but the most important is that
replica sets are all about majorities: you need a majority of members to elect a primary,
a primary can only stay primary so long as it can reach a majority, and a write is safe
when it’s been replicated to a majority. This majority is defined to be “more than half of
all members in the set,” as shown in Table 9-1.
Table 9-1. What is a majority?
Number of members in the set Majority of the set
1

1

2

2

3

2

4

3

5

3

6

4

7

4

Note that it doesn’t matter how many members are down or unavailable, as majority is
based on the set’s configuration.
For example, suppose that we have a five-member set and three members go down, as
shown in Figure 9-1. There are still two members up. These two members cannot reach
a majority of the set (at least three members), so they cannot elect a primary. If one of
them were primary, it would step down as soon as it noticed that it could not reach a

178

|

Chapter 9: Setting Up a Replica Set

majority. After a few seconds, your set would consist of two secondaries and three un‐
reachable members.

Figure 9-1. With a minority of the set available, all members will be secondaries
Many users find this frustrating: why can’t the two remaining members elect a primary?
The problem is that it’s possible that the other three members didn’t go down, and that
it was the network that went down, as shown in Figure 9-2. In this case, the three mem‐
bers on the left will elect a primary, since they can reach a majority of the set (three
members out of five).
In the case of a network partition, we do not want both sides of the partition to elect a
primary: otherwise the set would have two primaries. Then both primaries would be
writing to the data and the data sets would diverge. Requiring a majority to elect or stay
primary is a neat way of avoiding ending up with more than one primary.

Figure 9-2. For the members, a network partition looks identical to servers on the other
side of the partition going down
It is important to configure your set in such a way that you’ll usually be able to have one
primary. For example, in the five-member set described above, if members 1, 2, and 3
are in one data center and members 4 and 5 are in another, there should almost always
be a majority available in the first data center (it’s more likely to have a network break
between data centers than within them).

How to Design a Set

|

179

One common setup that usually isn’t what you want is a two member set: one primary
and one secondary. Suppose one member becomes unavailable: the other member can‐
not see it, as shown in Figure 9-3. In this situation, neither side of the network partition
has a majority so you’ll end up with two secondaries. For this reason, this type of con‐
figuration is not generally recommended.

Figure 9-3. With an even number of members, neither side of a partition has a majority
There are a couple of configurations that are recommended:
• A majority of the set in one data center, as in Figure 9-2. This is a good design if
you have a primary data center where you always want your replica set’s primary
to be located. So long as your primary data center is healthy, you will have a primary.
However, if that data center becomes unavailable, your secondary data center will
not be able to elect a new primary.
• An equal number of servers in each data center, plus a tie-breaking server in a third
location. This is a good design if your data centers are “equal” in preference, since
generally servers from either data center will be able to see a majority of the set.
However, it involves having three separate locations for servers.
More complex requirements might require different configurations, but you should
keep in mind how your set will acquire a majority under adverse conditions.
All of these complexities would disappear if MongoDB supported having more than
one primary. However, multimaster would bring its own host of complexities. With two
primaries, you would have to handle conflicting writes (for example, someone updates
a document on one primary and someone deletes it on another primary). There are two
popular ways of handling conflicts in systems that support multiple writers: manual
reconciliation or having the system arbitrarily pick a “winner.” Neither of these options
is a very easy model for developers to code against, seeing that you can’t be sure that the
data you’ve written won’t change out from under you. Thus, MongoDB chose to only
support having a single primary. This makes development easier but can result in pe‐
riods when the replica set is read-only.

How Elections Work
When a secondary cannot reach a primary, it will contact all the other members and
request that it be elected primary. These other members do several sanity checks: Can
they reach a primary that the member seeking election cannot? Is the member seeking
180

|

Chapter 9: Setting Up a Replica Set