2008-08-14 Thu

11:55 Oracle Silent Mode, Part 7: Installing an 11.1 RAC Database (1578 Bytes) » Pythian Group Blog

This seventh post digs into some of the silent installation commands of an 11.1 RAC. For a complete series agenda up to now, see below:

  1. Installation of 10.2 And 11.1 Databases
  2. Patches of 10.2 And 11.1 databases
  3. Cloning Software and databases
  4. Install a 10.2 RAC Database
  5. Add a Node to a 10.2 RAC database
  6. Remove a Node from a 10.2 RAC database
  7. Install a 11.1 RAC Database (this post!)
  8. Add a Node to a 11.1 RAC database
  9. Remove a Node from a 11.1 RAC database
  10. A ton of other stuff you should know

As for the Installation of a 10.2 RAC Database, this post shows how to (1) install the 11.1 clusterware, (2) install the 11.1 database, and (3) create a RAC database. It doesn’t explore any Patch Set upgrade since 11.1.0.7 is not out for now. Another interesting question, however, is how to upgrade the 10.2 clusterware to 11.1, since it has to be done in place.

So let’s get into it.

(more…)

09:11 Product: Terracotta - Open Source Network-Attached Memory (3381 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs.

NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete.

The Terracotta Architecture has three components:

  1. Client Nodes - Each client node corresponds to a client node in the cluster which runs on a standard JVM
  2. Server Cluster - java process that provides the clustering intelligence. The current Terracotta implementation operates in an Active/Passive mode
  3. Storage used as
    • Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
    • Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
    • Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).

JVM-level clustering can turn single-node, multi-threaded apps into distributed, multi-node apps, often with no code changes. This is possible by plugging in to the Java Memory Model in order to maintain key Java semantics of pass-by-reference, thread coordination and garbage collection across the cluster. Terracotta enables this using only declarative configuration with minimal impact to existing code and provides fine-grained field-level replication which means your objects no longer need to implement Java serialization.

Ari Zilka, the founder and CTO of Terracotta had a
video session
organized by Skills Matter. He will show you how it works and how you can start clustering your POJO-based Web applications (based on Spring, Struts, Wicket, RIFE, EHCache, Quartz, Lucene, DWR, Tomcat, JBoss, Jetty or Geronimo etc.).

2008-08-13 Wed

06:56 Consistent Gets not Necessarily the Best Way to Look at Query Performance (1122 Bytes) » Pythian Group Blog

This post is for those who think Consistent Gets is the only thing that matters. It’s not. That’s why Statspack and AWR provide not only the top queries sorted by Consistent Gets but also Sorted by IO, CPU, Cluster Waits, and so on. I won’t argue. Check for yourself.

I’ve run the queries that follow on top of 10.2.0.3 on Linux X86_64.

Sample Table

Create and Fill up a table to run your queries. You’ll find the script you need below:

create table X1(a number,b number);

begin
   for i in 1..1000000 loop
      insert into X1 values (i,mod(i,100000));
   end loop;
end;
/

commit;

exec dbms_stats.gather_table_stats(user, 'X1');

Case 1: 4164 Consistent Gets for 0.14 seconds

First, let’s assume that a few Consistent Gets means good performance. Look at the following query:

(more…)

  2008-08-12 Tue

21:32 Maatkit Options for Restoring a Slave or Master (1845 Bytes) » Pythian Group Blog

The Maatkit toolkit is a real blessing for the MySQL DBA. And while its documentation is pretty good, in some cases it’s necessary to read carefully a second and third time to make sure you are not missing an important piece of information. In this article I will comment on mk-table-chksum and mk-table-sync. My comments are mostly aimed at those DBAs who are considering using these utilities with medium or larger-sized databases.

–replicate

This option allows you to store the checksum results on the master, in a table that will get replicated to the slaves. Although it might seem like overhead for a simple check, it really simplifies your life, especially when used in combination with mk-table-sync. I always use it, for the following reasons:

  • You only need to run mk-table-chksum on the master.
  • A simple query will tell you the slave status.
  • When used with --chunksize, it divides the checking and synching into manageable portions.
  • It’s the best way to get consistent checks between master and slaves.

I always define the table as an InnoDB table to improve concurrency and avoid deadlocks (see the tool documentation for details). This table can be created on any database. Just make sure that it is going to be replicated. Here’s an example of the table definition (straight from the documentation) with the InnoDB specification at the end:

(more…)

21:11 Beware of MyISAM Key Cache mutex contention (4249 Bytes) » MySQL Performance Blog

Today I was working with the client loading data to MyISAM tables at very high rate. Hundreds of millions rows are loaded daily into single MySQL instance with bursts up to 100K of records/sec which need to be inserted (in the table with few indexes). It was good not all records had to go to the same table and so in theory using multiple thread to do inserts in multiple tables would not be bound by table locks and would be able to use multiple cores efficiently to get good insert rate… or so it seemed.

In reality inserting in parallel into different tables when indexes fit in memory results in very bad contention causing hundreds of thousands of context switches per second with far less work done than one would hope. As we already discussed
MyISAM key cache has serious mutex contention issue as there is global mutex which is held for the time of key_block being copied from key_cache to the thread local space.

Happily MyISAM allows you to create multiple key caches
. We use “keycache per table” this case and caused number of context switches to drop almost tenfold and performance almost doubled.

Another interesting result was - adding PACK_KEYS=0 to most actively updated tables almost doubled throughput again (and I would expect to see it more than double if all tables would be converted with this option. This is of course because some of indexes on the tables were on CHAR/VARCHAR columns. Honestly I did not expect so large difference for Inserts. I have already wrote about very large impact on joins this option has but I did not expect modification of packed keys would be so much more expensive.

After doing such changes we got insert rate to MySQL close to 200K rows/sec using standard multi value inserts which is pretty good number for indexing tables, especially considering application was doing some updates along the way too.


Entry posted by peter | 8 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

12:34 Strategy: Limit The New, Not The Old (2906 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

One of the most popular and effective scalability strategies is to impose limits (GAE Quotas, Fotolog, Facebook) as a means of protecting a website against service destroying traffic spikes. Twitter will reportedly limit the number followers to 2,000 in order to thwart follow spam. This may also allow Twitter to make some bank by going freemium and charging for adding more followers.

Agree or disagree with Twitter's strategies, the more interesting aspect for me is how do you introduce new policies into an already established ecosystem?

One approach is the big bang. Introduce all changes at once and let everyone adjust. If users don't like it they can move on. The hope is, however, most users won't be impacted by the changes and that those who are will understand it's all for the greater good of their beloved service. Casualties are assumed, but the damage will probably be minor.

Now in Twitter's case the people with the most followers tend to be opinion leaders who shape much of the blognet echo chamber. Pissing these people off may not be your best bet.

What to do? Shegeeks.net makes a great proposal: Limit The New, Not The Old. The idea is to only impose the limits on new accounts, not the old. Old people are happy and new people understand what they are getting into.

The reason I like this suggestion so much is that it has deep historical roots, all the way back to the fall of the Roman republic and the rise of the empire due to the agrarian reforms laws passed in 133BC. In ancient Rome property and power, as they tend to do, became concentrated in the hands of a few wealthy land owners. Let's call them the nobility. The greatness that was Rome was founded on a agrarian society. People made modest livings on small farms. As power concentrated small farmers were kicked of the land and forced to move to the city. Slaves worked the land while citizens remained unemployed. And cities were no place to make a life. Civil strife broke out. Pliny said that "it was the large estates which destroyed Italy."

read more

05:27 Distributed Computing & Google Infrastructure (561 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

A couple of videos about distributed computing with direct reference on Google infrastructure.
You will get acquainted with:

--MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware
--GFS and the way it stores it's data into 64mb chunks
--Bigtable which is the simple implementation of a non-relational database at Google

read more

2008-08-11 Mon

12:33 Wonderfull World of MySQL Storage Engines slides are now published (2783 Bytes) » MySQL Performance Blog

I should be faster with publishing slides but things are how they are. The slides from my OSCON2008 talk are now published at Percona Presentation Pages.
Enjoy :)


Entry posted by peter | 4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

11:35 How to Minimize Downtime When Moving to a New SQL Server Environment (1733 Bytes) » Pythian Group Blog

I am often asked what ways are there to minimize downtime when upgrading from SQL Server 2000 to SQL Server 2005, or when moving databases to a new (probably bigger and stronger) environment.

Well, if you can afford having both–old environment and new environment in parallel–this task can be very easy and straightforward . . .

Options

First, let’s see what are the options to copy a database from one server to the other.

  1. Detaching and re-attaching the database.
  2. Using backup and restore.
  3. Using the database copy wizard.
  4. Manually (creating schemas + transferring data), probably by using DTS/SSIS packages.
  5. And here is another cute method: use the snapshot creation and delivery part of the replication.

I am not fond of the third method — the wizard never works very well for me. The fourth method? — nah, I am too lazy. Attaching and detaching database files? — the original database goes offline until the files are moved to the new location; it’s not what we want.

If the database is small to medium size, the replication snapshot can be used. If the database is larger, backing and restoring the databases might be the best solution. In these methods, the original database can stay on line while the secondary database is being built.

If you have large databases

What are the tasks that usually take the most time when moving large databases using backup and restore?

(more…)

11:03 Alert Log Aliases for 11g » I'm just a simple DBA on a complex production system
04:27 10g RAC - Single Node Install Error » AskDba.org Weblog

2008-08-10 Sun

22:11 Geek City: System Objects » Kalen Delaney
07:23 Why You Want to Switch to MySQL 5.1 » Pythian Group Blog

  2008-08-09 Sat

23:30 Picking datatype for STATUS fields » MySQL Performance Blog
19:59 Raw Devices to be deprecated in 12G » AskDba.org Weblog
10:42 37signals Architecture » High Scalability - Building bigger, faster, more reliable websites.
09:50 Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar » High Scalability - Building bigger, faster, more reliable websites.
08:27 Strategy: Drop Memcached, Add More MySQL Servers » High Scalability - Building bigger, faster, more reliable websites.

  2008-08-08 Fri

20:22 New MySQL Community release - Great job MySQL ! » MySQL Performance Blog

  2008-08-07 Thu

 123
 123