2008-08-16 Sat

09:53 Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages (1910 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

Pre-generating static files is an oldy but a goody, and as Thomas Brox Røst says, it's probably an underused strategy today. At one time this was the dominate technique for structuring a web site. Then the age of dynamic web sites arrived and we spent all our time worrying how to make the database faster and add more caching to recover the speed we had lost in the transition from static to dynamic.

Static files have the advantage of being very fast to serve. Read from disk and display. Simple and fast. Especially when caching proxies are used. The issue is how do you bulk generate the initial files, how do you serve the files, and how do you keep the changed files up to date? This is the process Thomas covers in his excellent article Serving static files with Django and AWS - going fast on a budget", where he explains how he converted 600K thousand previously dynamic pages to static pages for his site Eventseer.net, a service for tracking academic events.

Eventseer.net was experiencing performance problems as search engines crawled their 600K dynamic pages. As a solution you could imagine scaling up, adding more servers, adding sharding, etc etc, all somewhat complicated approaches. Their solution was to convert the dynamic pages to static pages in order to keep search engines from killing the site. As an added bonus non logged-in users experienced a much faster site and were more likely to sign up for the service.

The article does a good job explaining what they did, so I won't regurgitate it all here, but I will cover the highlights and comment on some additional potential features and alternate implementations...

read more

  2008-08-15 Fri

09:46 Log Buffer #110: A Carnival of the Vanities for DBAs (2882 Bytes) » Pythian Group Blog

Welcome to the 110th edition of Log Buffer, the weekly review of database blogs.

From the MySQL side, Jeremy Zawodny gets things going this week. He writes about his doubts over the long-term performance of InnoDB, specifically the cost of multiversion concurrency control, particularly in a master-slave arrangment or a DW. Jeremy comments, “[The] disk bloat, fragmentation, and ongoing degradation in performance may be an argument for having some slaves that keep the same data in MyISAM tables.” His readers, however, point out some diagnostics and tools to remedy this concern.

Not that MyISAM is without foibles. Case in point, on the MySQL Performance Blog, Peter Zaitsev says, Beware of MyISAM Key Cache mutex contention. “. . . inserting in parallel into different tables when indexes fit in memory results in very bad contention causing hundreds of thousands of context switches per second with far less work done than one would hope.” Happily, as Peter reports, multiple key caches mitigates.

Peter also surveys the choices for picking a datatype for STATUS fields — “. . . ’new’, ‘confirmed’, ‘in production’, ’shipped’ status of job, message etc. People use variety of ways to handle them often without giving enough thought to the choice which can cause problems later.” VARCHAR(N), ENUM, TINYINT — which is right for the job?

Here on the Pythian Group Blog, Nicklas Westerlund demonstrates adding a unique constraint on a table with duplicate data.

Also here at home, Sheeri Cabral’s item on why you want to switch to MySQL 5.1. She argues that additions such as online ALTER TABLE, online table-based logging, and INFORMATION_SCHEMA.PROCESSLIST are compelling features worth an upgrade.

(more…)

07:10 Alex Gorbachev at Oracle Open World 2008: Under the Hood of Oracle Clusterware (1466 Bytes) » Pythian Group Blog

If a MySQL DBA from Pythian goes to Oracle Open World, it would be a shame not to send an Oracle bloke, so there I am — presenting a 90-minute session on the first day of the OOW 08 entitled Under the Hood of Oracle Clusterware.

I gave it during RAC Attack in Chicago and I’m pretty satisfied with how it went, so there should be no significant changes to the presentation. The session is in “User Group Forum,” thanks to RAC SIG and Dan Norris.

When the session was first added to the agenda it was misspelled as “Under the Good of Oracle Clusterware.” That’s hilarious and I thought I should have left it as is. Too late now — it’s been fixed.

I’m pretty sure that many of you will be at the OOW as well, so I’ll be glad to meet you in person. I’m getting back on Twitter slowly, so it might be a good way to track me down in SF. No guarantee I’ll keep it up to the minute if it takes too much effort, but I’ll try.

  2008-08-14 Thu

11:55 Oracle Silent Mode, Part 7: Installing an 11.1 RAC Database (1578 Bytes) » Pythian Group Blog

This seventh post digs into some of the silent installation commands of an 11.1 RAC. For a complete series agenda up to now, see below:

  1. Installation of 10.2 And 11.1 Databases
  2. Patches of 10.2 And 11.1 databases
  3. Cloning Software and databases
  4. Install a 10.2 RAC Database
  5. Add a Node to a 10.2 RAC database
  6. Remove a Node from a 10.2 RAC database
  7. Install a 11.1 RAC Database (this post!)
  8. Add a Node to a 11.1 RAC database
  9. Remove a Node from a 11.1 RAC database
  10. A ton of other stuff you should know

As for the Installation of a 10.2 RAC Database, this post shows how to (1) install the 11.1 clusterware, (2) install the 11.1 database, and (3) create a RAC database. It doesn’t explore any Patch Set upgrade since 11.1.0.7 is not out for now. Another interesting question, however, is how to upgrade the 10.2 clusterware to 11.1, since it has to be done in place.

So let’s get into it.

(more…)

09:11 Product: Terracotta - Open Source Network-Attached Memory (3381 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs.

NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete.

The Terracotta Architecture has three components:

  1. Client Nodes - Each client node corresponds to a client node in the cluster which runs on a standard JVM
  2. Server Cluster - java process that provides the clustering intelligence. The current Terracotta implementation operates in an Active/Passive mode
  3. Storage used as
    • Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
    • Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
    • Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).

JVM-level clustering can turn single-node, multi-threaded apps into distributed, multi-node apps, often with no code changes. This is possible by plugging in to the Java Memory Model in order to maintain key Java semantics of pass-by-reference, thread coordination and garbage collection across the cluster. Terracotta enables this using only declarative configuration with minimal impact to existing code and provides fine-grained field-level replication which means your objects no longer need to implement Java serialization.

Ari Zilka, the founder and CTO of Terracotta had a
video session
organized by Skills Matter. He will show you how it works and how you can start clustering your POJO-based Web applications (based on Spring, Struts, Wicket, RIFE, EHCache, Quartz, Lucene, DWR, Tomcat, JBoss, Jetty or Geronimo etc.).

2008-08-13 Wed

06:56 Consistent Gets not Necessarily the Best Way to Look at Query Performance (1122 Bytes) » Pythian Group Blog

This post is for those who think Consistent Gets is the only thing that matters. It’s not. That’s why Statspack and AWR provide not only the top queries sorted by Consistent Gets but also Sorted by IO, CPU, Cluster Waits, and so on. I won’t argue. Check for yourself.

I’ve run the queries that follow on top of 10.2.0.3 on Linux X86_64.

Sample Table

Create and Fill up a table to run your queries. You’ll find the script you need below:

create table X1(a number,b number);

begin
   for i in 1..1000000 loop
      insert into X1 values (i,mod(i,100000));
   end loop;
end;
/

commit;

exec dbms_stats.gather_table_stats(user, 'X1');

Case 1: 4164 Consistent Gets for 0.14 seconds

First, let’s assume that a few Consistent Gets means good performance. Look at the following query:

(more…)

  2008-08-12 Tue

21:32 Maatkit Options for Restoring a Slave or Master (1845 Bytes) » Pythian Group Blog

The Maatkit toolkit is a real blessing for the MySQL DBA. And while its documentation is pretty good, in some cases it’s necessary to read carefully a second and third time to make sure you are not missing an important piece of information. In this article I will comment on mk-table-chksum and mk-table-sync. My comments are mostly aimed at those DBAs who are considering using these utilities with medium or larger-sized databases.

–replicate

This option allows you to store the checksum results on the master, in a table that will get replicated to the slaves. Although it might seem like overhead for a simple check, it really simplifies your life, especially when used in combination with mk-table-sync. I always use it, for the following reasons:

  • You only need to run mk-table-chksum on the master.
  • A simple query will tell you the slave status.
  • When used with --chunksize, it divides the checking and synching into manageable portions.
  • It’s the best way to get consistent checks between master and slaves.

I always define the table as an InnoDB table to improve concurrency and avoid deadlocks (see the tool documentation for details). This table can be created on any database. Just make sure that it is going to be replicated. Here’s an example of the table definition (straight from the documentation) with the InnoDB specification at the end:

(more…)

21:11 Beware of MyISAM Key Cache mutex contention (4249 Bytes) » MySQL Performance Blog

Today I was working with the client loading data to MyISAM tables at very high rate. Hundreds of millions rows are loaded daily into single MySQL instance with bursts up to 100K of records/sec which need to be inserted (in the table with few indexes). It was good not all records had to go to the same table and so in theory using multiple thread to do inserts in multiple tables would not be bound by table locks and would be able to use multiple cores efficiently to get good insert rate… or so it seemed.

In reality inserting in parallel into different tables when indexes fit in memory results in very bad contention causing hundreds of thousands of context switches per second with far less work done than one would hope. As we already discussed
MyISAM key cache has serious mutex contention issue as there is global mutex which is held for the time of key_block being copied from key_cache to the thread local space.

Happily MyISAM allows you to create multiple key caches
. We use “keycache per table” this case and caused number of context switches to drop almost tenfold and performance almost doubled.

Another interesting result was - adding PACK_KEYS=0 to most actively updated tables almost doubled throughput again (and I would expect to see it more than double if all tables would be converted with this option. This is of course because some of indexes on the tables were on CHAR/VARCHAR columns. Honestly I did not expect so large difference for Inserts. I have already wrote about very large impact on joins this option has but I did not expect modification of packed keys would be so much more expensive.

After doing such changes we got insert rate to MySQL close to 200K rows/sec using standard multi value inserts which is pretty good number for indexing tables, especially considering application was doing some updates along the way too.


Entry posted by peter | 9 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

12:34 Strategy: Limit The New, Not The Old (2906 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

One of the most popular and effective scalability strategies is to impose limits (GAE Quotas, Fotolog, Facebook) as a means of protecting a website against service destroying traffic spikes. Twitter will reportedly limit the number followers to 2,000 in order to thwart follow spam. This may also allow Twitter to make some bank by going freemium and charging for adding more followers.

Agree or disagree with Twitter's strategies, the more interesting aspect for me is how do you introduce new policies into an already established ecosystem?

One approach is the big bang. Introduce all changes at once and let everyone adjust. If users don't like it they can move on. The hope is, however, most users won't be impacted by the changes and that those who are will understand it's all for the greater good of their beloved service. Casualties are assumed, but the damage will probably be minor.

Now in Twitter's case the people with the most followers tend to be opinion leaders who shape much of the blognet echo chamber. Pissing these people off may not be your best bet.

What to do? Shegeeks.net makes a great proposal: Limit The New, Not The Old. The idea is to only impose the limits on new accounts, not the old. Old people are happy and new people understand what they are getting into.

The reason I like this suggestion so much is that it has deep historical roots, all the way back to the fall of the Roman republic and the rise of the empire due to the agrarian reforms laws passed in 133BC. In ancient Rome property and power, as they tend to do, became concentrated in the hands of a few wealthy land owners. Let's call them the nobility. The greatness that was Rome was founded on a agrarian society. People made modest livings on small farms. As power concentrated small farmers were kicked of the land and forced to move to the city. Slaves worked the land while citizens remained unemployed. And cities were no place to make a life. Civil strife broke out. Pliny said that "it was the large estates which destroyed Italy."

read more

05:27 Distributed Computing & Google Infrastructure » High Scalability - Building bigger, faster, more reliable websites.

2008-08-11 Mon

11:03 Alert Log Aliases for 11g » I'm just a simple DBA on a complex production system
04:27 10g RAC - Single Node Install Error » AskDba.org Weblog

2008-08-10 Sun

22:11 Geek City: System Objects » Kalen Delaney
07:23 Why You Want to Switch to MySQL 5.1 » Pythian Group Blog

  2008-08-09 Sat

23:30 Picking datatype for STATUS fields » MySQL Performance Blog
19:59 Raw Devices to be deprecated in 12G » AskDba.org Weblog
10:42 37signals Architecture » High Scalability - Building bigger, faster, more reliable websites.
 123
 123