Tải bản đầy đủ
2 How we’d normally represent entities outside of Azure

2 How we’d normally represent entities outside of Azure

Tải bản đầy đủ

242

CHAPTER 11

The Table service, a whole different entity

In chapter 2, we chose to keep the example simple by hardcoding the list of products rather than retrieving it from a data store such as the Table service or SQL Server.
The following code is a hardcoded list of the three shirt entities displayed in figure
11.1 (Red Shirt, Blue Shirt, and Blue Frilly Shirt).
var products =
new List
{
new Product
{
Id = 1,
Name = "Red Shirt",
Description = "Red"
},
new Product
{
Id = 2,
Name = "Blue Shirt",
Description = "A Blue Shirt"
},
new Product
{
Id = 3,
Name = "Blue Frilly Shirt",
Description = "A Frilly Blue Shirt"
},
};

In the preceding code, we simply defined the list of products as a hardcoded list. Obviously this isn’t a very scalable pattern—you don’t want to redeploy the application
every time your shop offers a new product—so let’s look at how you can store that data
using a non-Windows Azure environment, such as SQL Server.

11.2.2 How we’d normally store an entity in SQL Server
To store an entity in SQL Server, you first
need to define a table where you can store
the entity data. Figure 11.2 shows how the
Products table could be structured in SQL
Server.
Figure 11.2 shows a table called ProdFigure 11.2 A representation of how you could
ucts with three columns (ProductId, Prostore the Hawaiian shirt data in SQL Server
ductName, and Description). In this table,
ProductId would be the primary key and would uniquely identify shirts in the table.
Table 11.1 shows how the shirt data would be represented in SQL Server.

Download from Wow! eBook

How we’d normally represent entities outside of Azure
Table 11.1

243

Logical representation of the Products table in SQL Server

ProductId

ProductName

Description

1

Red Shirt

Red

2

Blue Shirt

A Blue Shirt

3

Blue Frilly Shirt

A Frilly Blue Shirt

In table 11.1 we’ve enforced a fixed schema in our SQL Server representation of the
Hawaiian shirts. If you wanted to store extra information about the product (a thumbnail URI, for example) you’d need to add an extra column to the Products table and a
new property to the Product entity.
Now that we can represent the Hawaiian shirt product as both an entity and as a
table in SQL Server, we’ll need to map the entity to the table.

11.2.3 Mapping an entity to a SQL Server database
Although you can manually map entities to SQL Server data, you’d typically use a dataaccess layer framework that provides mapping capabilities. Typical frameworks
include the following:
ADO.NET Entity Framework
LINQ’s many varieties, like LINQ to SQL and LINQ to DataSet
NHibernate

The following code maps the Products table returned from SQL Server as a dataset to
the Product entity class using LINQ to DataSet.
var products = ds.Tables["Products"].AsEnumerable().Select
(
row => new Product
{
Id = row.Field("ProductId"),
Name = row.Field("ProductName"),
Description = row.Field("Description")
}
);

In this example, we convert the dataset to an enumerable list of data rows and then
reshape the data to return a list of Product entities. For each property in the Product
entity (Id, Name, and Description) we map the corresponding columns (ProductId,
ProductName, and Description) from the returned data row.
We’ve now seen how we’d normally define entities in C#, how we’d represent entities in SQL Server, and how we could map the entity layer to the database. Let’s look at
what the differences are when using the Table service.

Download from Wow! eBook

244

CHAPTER 11

The Table service, a whole different entity

11.3 Modifying an entity to work with the Table service
Before we look at how we can start coding against the Table service, you need to
understand how your data is stored in the Table service and how that differs from the
SQL-based solution we looked at in the previous sections. In the next couple of sections, we’ll look at the following:
How can we modify an entity so it can be stored in the Table service?
How is an entity stored in the Table service?
As these points suggest, before you can store the shirt data with the Table service, you
need to do a little bit of jiggery pokery with the entity definition. Let’s look at what
you need to do.

11.3.1 Modifying an entity definition
To be able to store the C# entity in the Table service, each entity must have the following properties:
Timestamp
PartitionKey
RowKey

Therefore, to store the Product entity in the Azure Table service, you’d have to modify the previous definition of the Product entity to look something like this:
[DataServiceKey("PartitionKey", "RowKey")]
public class Product
{
public string Timestamp{ get; set; }
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string Name { get; set; }
public string Description { get; set; }
}

In the preceding code the original Product entity is modified to include those properties required for Table storage (Timestamp, PartitionKey, and RowKey). Don’t worry
if you don’t recognize these properties—we’ll explain what they mean shortly.
To generate a hardcoded list of shirts using the new version of the Product entity,
you’d need to change the hardcoded product list (shown earlier in section 11.2.1) to
something like this:
var products =
new List
{
new Product
{
PartitionKey = "Shirts",

Download from Wow! eBook

Modifying an entity to work with the Table service

245

RowKey= "1",
Name = "Red Shirt",
Description = "Red"
},
new Product
{
PartitionKey = "Shirts",
RowKey = "2",
Name = "Blue Shirt",
Description = "A Blue Shirt"
},
new Product
{
PartitionKey = "Shirts",
RowKey = "3",
Name = "Frilly Blue Shirt",
Description = "A Frilly Blue Shirt"
}
};

As you can see from the preceding code, the only difference is that you’re now setting
a couple of extra properties (PartitionKey and RowKey).

Look, no Timestamp
Notice that the revised object-creation code doesn’t set the Timestamp property.
That’s because it’s generated on the server side and is only available to us as a readonly property. The Timestamp property holds the date and time that the entity was
inserted into the table, and if you did set this property, the Table service would just
ignore the value.
The Timestamp property is typically used to handle concurrency. Prior to updating an
entity in the table, you could check that the timestamp for your local version of the
entity was the same as the server version. If the timestamps were different, you’d
know that another process had modified the data since you last retrieved your local
version of the entity.

Now that you’ve seen how to modify your entities so that you can store them in the
Table service, let’s take a look at how these entities would be stored in a Table service table.

11.3.2 Table service representation of products
In table 11.1 you saw how we’d normally store our list of Hawaiian shirt product entities in SQL Server, and table 11.2 shows how those same entities would logically be
stored in the Windows Azure Table service.

Download from Wow! eBook

246

CHAPTER 11

Table 11.2

The Table service, a whole different entity

Logical representation of the Products table in Windows Azure

Timestamp
2009-07-01T16:20:32

PartitionKey
Shirts

RowKey
1

PropertyBag
Name: Red Shirt
Description: Red

2009-07-01T16:20:33

Shirts

2

Name: Blue Shirt
Description: A Blue Shirt

2009-07-01T16:20:33

Shirts

3

Name: Frilly Blue Shirt
Description: A Frilly Blue Shirt

As you can see in table 11.2, entities are represented in the Table service differently
from how they’d be stored in SQL Server. In the SQL Server version of the Products
table, we maintained a fixed schema where each property of the entity was represented by a column in the table. In table 11.2 the Table service maintains a fairly minimal schema; it doesn’t rigidly fix the schema. The only properties that the Table
service requires, and that are therefore logically represented by their own columns,
are Timestamp, PartitionKey, and RowKey. All other properties are lumped together
in a property bag.
EXTENDING AN ENTITY DEFINITION

Because all tables created in the Table service have the same minimal fixed schema
(Timestamp, PartitionKey, RowKey, and PropertyBag) you don’t need to define the
entity structure to the Table service in advance.
This flexibility means that you can also change the entity class definition at any
time. If you wanted to show a picture of a Hawaiian shirt on the website, you could
change the Product entity to include a thumbnail URI property as follows:
[DataServiceKey("PartitionKey", "RowKey")]
public class Product
{
public string Timestamp{ get; set; }
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string ThumbnailUri { get; set; }
}

Once you’ve modified the entity to include a thumbnail URI, you can store that entity
directly in the existing Products table without modifying either the table structure or
the existing data. Table 11.3 shows a list of shirts that include the new property.

Download from Wow! eBook

247

Modifying an entity to work with the Table service
Table 11.3

The modified entity with a new property can happily coexist with older entities that don’t
have the new property.

Timestamp
2009-07-01T16:20:32

PartitionKey
Shirts

RowKey
1

PropertyBag
Name: Red Shirt
Description: Red

2009-07-01T16:20:33

Shirts

2

Name: Blue Shirt
Description: A Blue Shirt

2009-07-01T16:20:33

Shirts

3

Name: Frilly Blue Shirt
Description: A Frilly Blue Shirt

2009-07-05T10:30:21

Shirts

4

Name: Frilly Pink Shirt
Description: A Frilly Pink Shirt
ThumbnailUri: frillypinkshirt.png

In the list of shirts in table 11.3, you can see that existing shirts (Red Shirt, Blue Shirt,
and Frilly Blue Shirt) have the same data that was stored in table 11.2—they don’t contain the new ThumbnailUri property. But the data for the new shirt (Frilly Pink Shirt)
does have the new ThumbnailUri property.

11.3.3 Storing completely different entities
Due to the flexible nature of the Table service, you could even store entities of different types in the same table. For example, you could store the Product entity in the
same table as a completely different entity, such as this Customer entity:
[DataServiceKey("PartitionKey", "RowKey")]
public class Customer
{
public string Timestamp{ get; set; }
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string Firstname { get; set; }
public string Surname { get; set; }
}

As you can see from the Customer entity, although the entity must contain the standard
properties (Timestamp, PartitionKey, and RowKey) no other properties are shared
between the Customer and Product entities; they even have different class names.
Even though these entities have very different definitions, they could be stored in
the table, as shown in table 11.4. The Table service allows for different entities to have
different schemas.

Download from Wow! eBook

248

CHAPTER 11

Table 11.4

The Table service, a whole different entity

Storing completely different entities in the same table

Timestamp
2009-07-01T16:20:32

PartitionKey
Shirts

RowKey
1

PropertyBag
Name: Red Shirt
Description: Red

2009-07-01T16:20:33

Shirts

2

Name: Blue Shirt
Description: A Blue Shirt

2009-07-01T16:20:33

Shirts

FredJones

Firstname: Fred
Surname: Jones

2009-07-05T10:30:21

Shirts

4

Name: Frilly Pink Shirt
Description: A Frilly Pink Shirt
ThumbnailUri: frillypinkshirt.png

CHALLENGES OF STORING DIFFERENT ENTITY TYPES

Although the Table service is flexible enough to store entities of different types in the
same table, as shown in table 11.4, you should be very careful if you’re considering
such an approach. If every entity you retrieve has a different schema, you’ll need to
write some custom code that will serialize the data to the correct object type.
Following this approach will lead to more complex code, which will be difficult to
maintain. This code is likely to be more error prone and difficult to debug. We
encourage you to only store entities of different types in a single table when absolutely
necessary.
CHALLENGES OF EXTENDING ENTITIES

On a similar note, if you need to modify the definition of existing entities, you should
take care to ensure that your existing entities don’t break your application after the
upgrade.
There are a few rules you should keep in mind to prevent you from running into
too much trouble:
Treat entity definitions as data contracts; breaking the contract will have a serious effect on your application, so don’t do it lightly.
Code any new properties as additional rather than required. This strategy
means that existing data will be able to serialize to the new data structure. If
your code requires existing entities to contain data for the new properties, you
should migrate your existing data to the new structure.
Continue to support existing property names for existing data. If you need to
change a property name, you should either support both the old and new
names in your new entity or support two versions of your entity (old definition

Download from Wow! eBook

Partitioning data across lots of servers

249

and new definition). If you only want to support one entity definition, you’ll
need to migrate any existing data to the new structure.
Now that you’ve seen how entities are stored within the Table service, let’s look at what
makes this scalable.

11.4 Partitioning data across lots of servers
In the last couple of sections, we’ve skipped past a few topics, namely, accounts, partition keys, and row keys. We’ll now return to these topics and explain how the Windows
Azure Table service is such a scalable storage mechanism.
In this section, we’ll look at how the Table service scales using partitioning at the
storage account and table levels. To achieve a highly scalable service, the Table service
will split your data into more manageable partitions that can then be apportioned out
to multiple servers. As developers, we can control how this data is partitioned to maximize the performance of our applications.
Let’s look at how this is done at the storage account layer.

11.4.1 Partitioning the storage account
In this section, we’ll look at how data is partitioned, but we’ll leave performance optimization to a later section.
In figure 11.1, there were two
Server 1
Server 2
tables within a storage account
Products
Products
ShoppingCart
(ShoppingCart and Products). As
Red Shirt
Red Shirt
Item 1
Blue Shirt
Blue Shirt
Item 2
the Table service isn’t a relational
.. .. ...
.. .. ...
.. .. ...
database, there’s no way to join these
Blue Frilly Shirt
Blue Frilly Shirt
Item X
two tables on the server side.
Because there’s no physical depenServer 4
Server 3
dency between any two tables in the
ShoppingCart
Products
ShoppingCart
Table service, Windows Azure can
Item 1
Red Shirt
Item 1
Item 2
Blue Shirt
Item 2
scale the data storage beyond a sin.. .. ...
.. .. ...
.. .. ...
gle server and store tables on sepaItem X
Blue Frilly Shirt
Item X
rate physical servers.
Figure 11.3 shows how these
tables could be split across the Win- Figure 11.3 Tables within a storage account split
across multiple servers
dows Azure data center. In this figure, you’ll notice that the Products table lives on servers 1, 2, and 4, whereas the
ShoppingCart table resides on servers 1, 3, and 4. In the Windows Azure data center,
you have no control over where your tables will be stored. The tables could reside on
the same server (as in the case of servers 1 and 4) but they could easily live on completely separate servers (servers 2 and 3). In most situations, you can assume that your
tables will physically reside on different servers.

Download from Wow! eBook

250

CHAPTER 11

The Table service, a whole different entity

Data replication
In order to protect you from data loss, Windows Azure guarantees to replicate your
data to at least three different servers as part of the transaction. This data replication
guarantee means that if there’s a hardware failure after the data has been committed,
another server will have a copy of your data.
Once a transaction is committed (and your data has therefore been replicated at least
three times), the Table service is guaranteed to serve the new data and will never
serve older versions. This means that if you insert a new Hawaiian shirt entity on server
1, you can only be load balanced onto one of the servers that has the latest version
of your data. If server 2 was not part of the replication process and contains stale
data, you won’t be load balanced onto that server. You can safely perform a read of
your data straight after a write, knowing that you’ll receive the latest copy of the data.
The Amazon SimpleDB database (which has roughly the same architecture as the Windows Azure Table service) doesn’t have this replication guarantee by default. Due to
replication latency, it isn’t uncommon in SimpleDB for newly written data not to exist
or to be stale when a read is performed straight after a write. This situation can never
occur with the Windows Azure Table service.

Now that you’ve seen how different tables within a single account will be spread across
multiple servers to achieve scalability, it’s worth looking at how you can partition data
a little more granularly, and split data within a single table across multiple servers.

11.4.2 Partitioning tables
One of the major issues with traditional SQL Server–based databases is that individual
tables can grow too large, slowing down all operations against the table. Although the
Windows Azure Table service is highly efficient, storing too much data in a single table
can still degrade data access performance.
The Table service allows you to specify how your table could be split into smaller
partitions by requiring each entity to contain a partition key. The Table service can
then scale out by storing different partitions of data on separate physical servers. Any
entities with the same partition key must reside together on the same physical server.
In tables 11.2 through to 11.4, all the data was stored in the same partition
(Shirts), meaning that all three shirts would always reside together on the same
server, as shown in figure 11.3. Table 11.5 shows how you could split your data into
multiple partitions.
Table 11.5

Splitting partitions by partition key

Timestamp
2009-07-01T16:20:32

PartitionKey
Red

RowKey
1

PropertyBag
Name: Red Shirt
Description: Red

Download from Wow! eBook

251

Partitioning data across lots of servers
Table 11.5

Splitting partitions by partition key (continued)

Timestamp
2009-07-01T16:20:33

PartitionKey
Blue

RowKey
1

PropertyBag
Name: Blue Shirt
Description: A Blue Shirt

2009-07-01T16:20:33

Blue

2

Name: Frilly Blue Shirt
Description: A Frilly Blue Shirt

2009-07-05T10:30:21

Red

2

Name: Frilly Pink Shirt
Description: A Frilly Pink Shirt
ThumbnailUri: frillypinkshirt.png

In table 11.5 the Red Shirt and the Frilly Pink Shirt now reside in the Red partition,
and the Blue Shirt and the Frilly Blue shirt are now stored in the Blue partition. Figure 11.4 shows the shirt data from table 11.5 split across multiple servers. In this figure,
the Red partition data (Red Shirt and Pink Frilly Shirt)
Server A
Server B
lives on server A and the Blue partition data (Blue Shirt
Blue Shirt
Red Shirt
and Frilly Blue Shirt) is stored on server B. Although
Frilly Blue Shirt
Pink Frilly Shirt
the partitions have been separated out to different physical servers, all entities within the same partition always Figure 11.4 Splitting partitions
across multiple servers
reside together on the same physical server.
ROW KEYS

The final property to explain is the row key. The row key uniquely identifies an entity
within a partition, meaning that no two entities in the same partition can have the
same row key, but any two entities that are stored in different partitions can have the
same key. If you look at the data stored in table 11.5, you can see that the row key is
unique within each partition but not unique outside of the partition. For example,
Red Shirt and Blue Shirt both have the same row key but live in different partitions
(Red and Blue).
The partition key and the row key combine to uniquely identify an entity—together
they form a composite primary key for the table.
INDEXES

Now that you have a basic understanding of how data is logically stored within the
data service, it’s worth talking briefly about the indexing of the data.
There are a few rules of thumb regarding data-access speeds:
Retrieving an entity with a unique partition key is the fastest access method.
Retrieving an entity using the partition key and row key is very fast (the Table
service needs to use only the index to find your data).
Retrieving an entity using the partition key and no row key is slower (the Table
service needs to read all properties for each entity in the partition).

Download from Wow! eBook

252

CHAPTER 11

The Table service, a whole different entity

Retrieving an entity using no partition key and no row key is very slow, relatively
speaking (the Table service needs to read all properties for all entities across all
partitions, which can span separate physical servers).
We’ll explore these points in more detail as we go on.

Load balancing of requests
Because data is partitioned and replicated across multiple servers, all requests via
the REST API can be load balanced. This combination of data replication, data partitioning, and a large web server farm provides you with a highly scalable storage solution
that can evenly distribute data and requests across the data center. This level of horsepower and data distribution means that you shouldn’t need to worry about overloading
server resources.

Now that we’ve covered the theory of table storage, it’s time to put it into practice.
Let’s open Visual Studio and start storing some data.

11.5 Developing with the Table service
Now that you have an understanding of how data is stored in the Table service, it’s
time to develop a web application that can use it. In the previous section, we defined
an entity for storing the Hawaiian shirt product, and we looked at how it would be
stored in the Table service. Here you’ll build a new application that will manage the
product inventory for the Hawaiian Shirt Shop website.

11.5.1 Creating a project
Rather than returning to the solution you built in chapter 2, here you’ll develop a new
product-management web page in a new web application project. Create a new Cloud
Service web role project called ShirtManagement. If you need a refresher on how to
set up your development environment or how to create a web role project, refer back
to chapter 2.
Like the other storage services, communication with the Table service occurs
through the REST API (which we’ll discuss in detail in the next chapter). Although you
can use this API directly, you’re likely to be more productive using the StorageClient
library provided in the Windows Azure SDK.
Whenever you create a new Cloud Service project, this library will be automatically
referenced. But if you’re building a brand new class library or migrating an existing
project, you can reference the following storage client assembly manually:
Microsoft.WindowsAzure.StorageClient
In addition, you’ll need to reference the ADO.NET Data Services assemblies:
System.Data.Services
System.Data.Services.Client

Download from Wow! eBook