2008-08-18 Mon

13:48 mysqlbinlog Tips and Tricks (1545 Bytes) » Pythian Group Blog

So, you have a binlog. You want to find out something specific that happened inside of it. What to do? mysqlbinlog has some neat features, which I thought we would look at here.

I should first explain what mysqlbinlog really is. It is a tool that lets you analyze and view the binlogs/relaylogs from mysql, which are stored in binary format. This tool converts them to plaintext, so that they’re human-readable.

For the first tip, let’s start with the --read-from-remote-server option, which allows you to examine a binlog on a master server in order, perhaps, to dump it onto your slave and compare master/slave logs for potential problems*.

$ mysqlbinlog --read-from-remote-server -uwesterlund -p mysql-bin.000001 -h 127.0.0.1 -P 3306 | head -5
Enter password:
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#080815 19:25:23 server id 101  end_log_pos 107 	Start: binlog v 4, server v 6.0.5-alpha-log created 080815 19:25:23 at startup

Pretty useful!

Now, let’s assume we have a binlog that is 94 lines long*:

(more…)

08:15 The ultimate tool for generating optimal my.cnf files for MySQL (4691 Bytes) » MySQL Performance Blog

There are quite a few “tuning primers” and “my.cnf generators” and “sample my.cnf files” online. The ultimate tool for generating an optimal my.cnf is not a tool. It’s a human with many years of experience, deep knowledge of MySQL and the full application stack, and familiarity with your application and your data.

I don’t know exactly the percentage, but quite a few of the servers I take a look at have been “optimized” with some tuning primer or question-and-answer script that spits out “optimal” parameters for my.cnf.

Most of the time these servers are far from optimal. Sometimes the my.cnf parameters are extremely wrong, to the point of causing a severe performance penalty.

If it were as easy as writing a tool to do this, don’t you think Maatkit would have mk-optimal-mycnf already? In my opinion — as someone who knows very well the complexity of creating a good my.cnf — it’s practically impossible. Much harder than syncing data, or manipulating a replication hierarchy, or any of the other things Maatkit can do already. And I doubt I’ll ever even feel motivated to try creating such a tool.

Don’t bother with scripts. Don’t waste your time with most of the advice you see on the web in forums — much of it is fundamentally wrong, even when it seems to come from an informed source. Don’t put too much faith in the my.cnf samples that come with your operating system; many of them have very bad advice in the comments, such as instructing you on how to set up replication in ways that guarantee breakage.

If you want solid advice, ask someone who knows what they’re doing (and can prove it). Or buy our book.

But even more fundamentally, you should not focus so much on my.cnf. It is not the be-all and end-all of performance. Tuning your server settings has far less impact on performance than tuning your schema, indexing, queries and — you guessed it — thinking deeply about your application architecture. Server settings are a distraction and a waste of time for most people.

Most my.cnf files I see only need minor tweaks, which give only so-so performance improvements. Tuning my.cnf only helps a lot when my.cnf has extremely bad parameters. The kind you’ll get from tuning primers and automated my.cnf optimization scripts.


Entry posted by Baron Schwartz | 7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

04:39 Wuala - P2P Online Storage Cloud (2153 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network.

The goal of Wua.la is to provide distributed online storage that is:

  • large
  • scalable
  • reliable
  • secure

by harnessing the idle resources of participating computers.

This challenge is an old dream of computer science. In fact as Andrew Tanenbaum wrote in 1995:
"The design of a world-wide, fully transparent distributed filesystem fot simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader"

After three years of research and development at at ETH Zurich, the Swiss Federal Institute of Technology on a distributed storage system, Caleido is ready to unveil the result: Wuala. Wuala is a new way of storing, sharing, and publishing files on the internet. It enables its users to trade parts of their local storage for online storage and it allows us to provide a better service for free. In this Google Tech Talk, Dominik will explain what Wuala is and how it works, and he will also show a demo.

read more

2008-08-17 Sun

12:02 Strategy: Drop Memcached, Add More MySQL Servers (3538 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

Update: Dormando asks Should you use memcached? Should you just shard mysql more?. The idea of caching is the most important part of caching as it transports you beyond a simple CRUD worldview. Plan for caching and sharding by properly abstracting data access methods. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs over planning and wasting valuable time.

Feedster's François Schiettecatte wonders if Fotolog's 21 memcached servers wouldn't be better used to further shard data by adding more MySQL servers? He mentions Feedster was able to drop memcached once they partitioned their data across more servers. The algorithm: partition until all data resides in memory and then you may not need an additional memcached layer.

Parvesh Garg goes a step further and asks why people think they should be using MySQL at all?

Related Articles


  • The Death of Read Replication by Brian Aker. Caching layers have replaced read replication. Cache can't fix a broken database layer. Partition the data that feeds the cache tier: "Keep your front end working through the cache. Keep all of your data generation behind it."
  • Read replication with MySQL by François Schiettecatte. Read replication is dead and it should be used only for backup purposes. Take the memory used for caching and give it to your database servers.
  • Replication++, Replication 2.0, Replication.Next by Ronald Bradford. What should read replication be used for?
  • Replication, caching, and partitioning by Greg Linden. Caching overdone because it adds complexity, latency on a cache miss, and inefficiently uses cluster resources. Hitting disk is the problem. Shard more and get your data in memory.
  •   2008-08-16 Sat

    09:53 Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages (1910 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Pre-generating static files is an oldy but a goody, and as Thomas Brox Røst says, it's probably an underused strategy today. At one time this was the dominate technique for structuring a web site. Then the age of dynamic web sites arrived and we spent all our time worrying how to make the database faster and add more caching to recover the speed we had lost in the transition from static to dynamic.

    Static files have the advantage of being very fast to serve. Read from disk and display. Simple and fast. Especially when caching proxies are used. The issue is how do you bulk generate the initial files, how do you serve the files, and how do you keep the changed files up to date? This is the process Thomas covers in his excellent article Serving static files with Django and AWS - going fast on a budget", where he explains how he converted 600K thousand previously dynamic pages to static pages for his site Eventseer.net, a service for tracking academic events.

    Eventseer.net was experiencing performance problems as search engines crawled their 600K dynamic pages. As a solution you could imagine scaling up, adding more servers, adding sharding, etc etc, all somewhat complicated approaches. Their solution was to convert the dynamic pages to static pages in order to keep search engines from killing the site. As an added bonus non logged-in users experienced a much faster site and were more likely to sign up for the service.

    The article does a good job explaining what they did, so I won't regurgitate it all here, but I will cover the highlights and comment on some additional potential features and alternate implementations...

    read more

      2008-08-15 Fri

    09:46 Log Buffer #110: A Carnival of the Vanities for DBAs (2882 Bytes) » Pythian Group Blog

    Welcome to the 110th edition of Log Buffer, the weekly review of database blogs.

    From the MySQL side, Jeremy Zawodny gets things going this week. He writes about his doubts over the long-term performance of InnoDB, specifically the cost of multiversion concurrency control, particularly in a master-slave arrangment or a DW. Jeremy comments, “[The] disk bloat, fragmentation, and ongoing degradation in performance may be an argument for having some slaves that keep the same data in MyISAM tables.” His readers, however, point out some diagnostics and tools to remedy this concern.

    Not that MyISAM is without foibles. Case in point, on the MySQL Performance Blog, Peter Zaitsev says, Beware of MyISAM Key Cache mutex contention. “. . . inserting in parallel into different tables when indexes fit in memory results in very bad contention causing hundreds of thousands of context switches per second with far less work done than one would hope.” Happily, as Peter reports, multiple key caches mitigates.

    Peter also surveys the choices for picking a datatype for STATUS fields — “. . . ’new’, ‘confirmed’, ‘in production’, ’shipped’ status of job, message etc. People use variety of ways to handle them often without giving enough thought to the choice which can cause problems later.” VARCHAR(N), ENUM, TINYINT — which is right for the job?

    Here on the Pythian Group Blog, Nicklas Westerlund demonstrates adding a unique constraint on a table with duplicate data.

    Also here at home, Sheeri Cabral’s item on why you want to switch to MySQL 5.1. She argues that additions such as online ALTER TABLE, online table-based logging, and INFORMATION_SCHEMA.PROCESSLIST are compelling features worth an upgrade.

    (more…)

    07:10 Alex Gorbachev at Oracle Open World 2008: Under the Hood of Oracle Clusterware (1466 Bytes) » Pythian Group Blog

    If a MySQL DBA from Pythian goes to Oracle Open World, it would be a shame not to send an Oracle bloke, so there I am — presenting a 90-minute session on the first day of the OOW 08 entitled Under the Hood of Oracle Clusterware.

    I gave it during RAC Attack in Chicago and I’m pretty satisfied with how it went, so there should be no significant changes to the presentation. The session is in “User Group Forum,” thanks to RAC SIG and Dan Norris.

    When the session was first added to the agenda it was misspelled as “Under the Good of Oracle Clusterware.” That’s hilarious and I thought I should have left it as is. Too late now — it’s been fixed.

    I’m pretty sure that many of you will be at the OOW as well, so I’ll be glad to meet you in person. I’m getting back on Twitter slowly, so it might be a good way to track me down in SF. No guarantee I’ll keep it up to the minute if it takes too much effort, but I’ll try.

      2008-08-14 Thu

    11:55 Oracle Silent Mode, Part 7: Installing an 11.1 RAC Database (1578 Bytes) » Pythian Group Blog

    This seventh post digs into some of the silent installation commands of an 11.1 RAC. For a complete series agenda up to now, see below:

    1. Installation of 10.2 And 11.1 Databases
    2. Patches of 10.2 And 11.1 databases
    3. Cloning Software and databases
    4. Install a 10.2 RAC Database
    5. Add a Node to a 10.2 RAC database
    6. Remove a Node from a 10.2 RAC database
    7. Install a 11.1 RAC Database (this post!)
    8. Add a Node to a 11.1 RAC database
    9. Remove a Node from a 11.1 RAC database
    10. A ton of other stuff you should know

    As for the Installation of a 10.2 RAC Database, this post shows how to (1) install the 11.1 clusterware, (2) install the 11.1 database, and (3) create a RAC database. It doesn’t explore any Patch Set upgrade since 11.1.0.7 is not out for now. Another interesting question, however, is how to upgrade the 10.2 clusterware to 11.1, since it has to be done in place.

    So let’s get into it.

    (more…)

    09:11 Product: Terracotta - Open Source Network-Attached Memory (3381 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs.

    NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete.

    The Terracotta Architecture has three components:

    1. Client Nodes - Each client node corresponds to a client node in the cluster which runs on a standard JVM
    2. Server Cluster - java process that provides the clustering intelligence. The current Terracotta implementation operates in an Active/Passive mode
    3. Storage used as
      • Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
      • Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
      • Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).

    JVM-level clustering can turn single-node, multi-threaded apps into distributed, multi-node apps, often with no code changes. This is possible by plugging in to the Java Memory Model in order to maintain key Java semantics of pass-by-reference, thread coordination and garbage collection across the cluster. Terracotta enables this using only declarative configuration with minimal impact to existing code and provides fine-grained field-level replication which means your objects no longer need to implement Java serialization.

    Ari Zilka, the founder and CTO of Terracotta had a
    video session
    organized by Skills Matter. He will show you how it works and how you can start clustering your POJO-based Web applications (based on Spring, Struts, Wicket, RIFE, EHCache, Quartz, Lucene, DWR, Tomcat, JBoss, Jetty or Geronimo etc.).

    2008-08-13 Wed

      2008-08-12 Tue

    21:11 Beware of MyISAM Key Cache mutex contention » MySQL Performance Blog
    12:34 Strategy: Limit The New, Not The Old » High Scalability - Building bigger, faster, more reliable websites.
    05:27 Distributed Computing & Google Infrastructure » High Scalability - Building bigger, faster, more reliable websites.

    2008-08-11 Mon

     123
     123