2008-08-24 Sun
Looks interesting...
Abstract:
Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance.
In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.
2008-08-23 Sat
Sorry, I'm not actually going to tell you how to read the log. I'm just going to talk about it... and whether it's a good thing to be able to do or not, or whether it's an absolutely crucial feature that MS needs to provide for us immediately, in a hotfix, if not sooner. Forget about fixing bugs, I want to read the log because I forgot to set up a trace beforehand....
You may have noticed that my blogging frequency has fallen off. One or two of you also noticed that I am no longer writing a regular article each month for SQL Server Magazine. Those two facts are related. I have cut back on non-essential activities to try to get my next book out as soon as possible. It looks like I might even finish in time to get the book out on the shelves by early next year. Stay tuned...
Since I couldn't bear to not do anything for SQL Server Magazine, I started writing the commentary in the weekly email newsletter. Actually, I do it every week but the fourth week of the month. If you like, you can sign up for this free newsletter here.
My commentary last Thursday seemed to have rattled some cages. Before I even woke up Thursday morning, there were already two comments on the site, and someone sent me a personal email about what I wrote. By now, there are quite a few more comments. I basically wrote about the need for a log reader tool. It wasn't deeply technical; it's just a commentary after all. You can read it here:
http://www.sqlmag.com/Articles/Index.cfm?ArticleID=100076
But boy, did people get upset. They called me bad names... well, if 'mediocre' can be considered a bad name...
So I responded as follows:
Wow... I have never gotten so many comments so quickly about one of my articles. I must really have touched a nerve here!
There is a difference between the actual data rows referenced by the log, and the log format. It's the log format, and giving people full details about what is in the log, that is propriatary information. There is nothing specifically bad about giving people that information. However, calling me names because I don't stand up on a soapbox and DEMAND that MS add this functionality seems a little extreme. There are plenty of other things MS could do with the product and providing a log reader tool is way down on the list.
Yes I realize it is important to some people, but there are many other ways to get this information through tracing etc. If the developer resources are limited at MS, I would much prefer they spend their time on more important stuff. MS knows it's important that people have this information, that's why they added a great deal of additional tracing capabilities in SQL Server 2008.
Also, keep in mind that a log reader tool wouldn't help you debug problems with logic, or with bad reports due to faulty SELECTs. If your WHERE clause was written badly, a log reader tool could tell you which rows were affected, but not WHY. You'd need a tracing tool for that. Vogelm's comment that a log reader tool would help troubleshoot bad queries from 3rd party apps is not true; you need to see the statements for that, not just the affected data.
I do appreciate kbreneman's comment that the real problem is one of perception. MS should make clear that the transaction log is not an audit tool; if you want auditing, you need to set it up on your own, because you're the only one who knows what's important for you to capture.
(The only way to respond to comments is to write a comment of my own, and then the form insists that I rate the article I am responding to. I always feel a bit weird having to rate my own articles.)
Since I wrote the article, I have found out that Lumigent does have a log reader tool for SQL Server 2005, but I have heard less than stellar reviews about its ability to capture some of the more interesting datatype activities that are now possible in SQL Server 2005. And their website still doesn't list any version numbers.
I can't stop thinking about this, so I thought I would open up the issue to a wider audience.
How important do you think it is that Microsoft provide a log reader tool for us?
Thanks!
~Kalen
Yesterday I helped someone who was seeing a lot of "server has gone away" error messages on his website. While investigating this problem, I noticed several things amiss, which appeared to be related but really weren't. The biggest measurable sign was
-
[percona@server ~]$ mysqladmin ext | grep Abort
-
| Aborted_clients | 14835 |
-
| Aborted_connects | 15598 |
These two status variables are actually unrelated (see the manual page that explains them). The first was related to the errors the client was seeing: the server was closing inactive connections after a while, and I fixed it by increasing the wait_timeout configuration variable.
The second error does not indicate that an active connection is closed at all. Rather, it shows that a connection cannot be made for some reason. Perhaps it's networking, or perhaps there's an issue with permissions or something else. The first thing I did was look for packet loss between the database server and the web server; the network appeared to be working fine.
With that ruled out (at least, to my satisfaction) I turned to tcpdump to see what was happening with these connections. I ran the following command in one window of my screen session, so I could see when a connection was aborted:
-
[percona@server ~]$ mysqladmin ext | grep Abort | grep -v 0
And then I started tcpdump in another window:
-
[percona@server ~]$ tcpdump -s 1500 -w tcp.out port 3306
After I saw an aborted connection, I cancelled tcpdump and looked at the resulting file. Inspecting the session with tcpdump -r showed that there was a complete TCP session; nothing bad was happening at that layer. So I used the strings utility to look at the text sent in the packets:
-
[percona@server ~]$ strings tcpdump.out
-
Host 'XXX.XX.XX.XXX' is not allowed to connect to this MySQL server
I've anonymized the offending IP address. However, I checked the server's grant tables and indeed. that IP address (which is a machine in the local network) is not allowed to connect.
I don't actually use tcpdump much, but this was a fun little exercise that I thought I'd share with you.
Entry posted by Baron Schwartz | 4 comments
2008-08-22 Fri
We've discussed today how we should implement MySQL Version advisory in mk-audit tool. One obvious questions was to look at the end of life - it is often bad idea to run MySQL versions past end of life as even security bugs may not be fixed in these (though do not get paranoid, if you're running MySQL in isolated environment the risk may be low).
So how does EOL schedule looks ?
MySQL defines Active Lifecycle and Extended Lifecycle for release where first one is 2 years since initial GA release and second is further 3 years of life in "critical bug fixes only" mode with releases available for premium (Silver+) Support offerings.
For MySQL Community users this means only releases within Active Life Cycle will be made. For example MySQL 4.1 had end of its Active Lifecycle in the end of 2006. and indeed Latest MySQL 4.1 available for the public is 4.1.22 while as Manual Says there were number of further releases with last one in March 2008 containing fixes for security and critical bugs.
It is also worth to note even though MySQL 5.0 successor (MySQL 5.1) is still not released as GA, MySQL 5.0 Active LifeCycle will end in end of 2008, unless there are changes means. If same policies as of MySQL 4.1 are followed we'll soon see stop in MySQL community releases of MySQL 5.0 most likely before MySQL 5.1 will proven MySQL 5.0 replacement.
There is no blame on MySQL - it is no fun to support these old versions both for Support team (remembering these all old versions limitations) and for development team, and it costs, so somebody has to pay for this and this is exactly what premium MySQL Support levels are for.
My main point is - make sure you understand MySQL Release Policy and so what to expect whenever you're MySQL customer or community user.
Shameless Plug: I guess hundreds of Percona customers are reading this blog so I should say how Percona treats old versions. We obviously recommend to upgrade when it makes sense while at the same time we have no restrictions in terms of supported versions. If customer chooses to run older version he may have more problems and these may take more time to deal with, so the bill would be higher. We are also happy to provide builds based on updated trees and backport fixes from the newer releases if MySQL has chosen not to backport bug because of its severity. We believe in freedom of choice.
Entry posted by peter | 6 comments
After my previous post there were questions raised about Index Merge on Multiple Indexes vs Two Column Index efficiency. I mentioned in most cases when query can use both of the ways using multiple column index would be faster but I also went ahead to do some benchmarks today.
I'm using couple of simple tables:
-
CREATE TABLE `t1000idxmerge` (
-
`i` int(11) NOT NULL,
-
`j` int(11) NOT NULL,
-
`val` char(10) NOT NULL,
-
KEY `i` (`i`),
-
KEY `j` (`j`)
-
) ENGINE=MyISAM DEFAULT CHARSET=latin1
-
-
CREATE TABLE `t1000idx2` (
-
`i` int(11) NOT NULL,
-
`j` int(11) NOT NULL,
-
`val` char(10) NOT NULL,
-
KEY `i` (`i`,`j`)
-
) ENGINE=MyISAM DEFAULT CHARSET=latin1
I have populated this table with random data for i and j having both of them having 1000 of distinct values, independent on each other. I also created couple of other tables with same data just with really low cardinality with i and j having just 3 values each. The table contained about 18M rows though was small enough to fit in the systems memory.
I've benchmarked simple queries using where clause which covers multiple columns:
-
Q1 SELECT sum(length(val)) FROM T WHERE i=2 AND j=1
-
Q2 SELECT sum(length(val)) FROM T WHERE i=2 AND j BETWEEN 1 AND 2
-
Q3 SELECT sum(length(val)) FROM T WHERE j=2 AND i BETWEEN 1 AND 2
-
Q4 SELECT sum(length(val)) FROM T WHERE i=2 OR j=1
-
Q5 SELECT sum(length(val)) FROM T WHERE j=2 AND i BETWEEN 100 AND 200
As some of them there way too fast if run once I ran them multiple times and measured time appropriately. To remove the overhead of starting MySQL etc from equation I also measured execution of "SELECT 1" query using same script and subtracted this time from result in the table.
time for ((i=0;i<100;i+=1)); do mysql test -e "SELECT sum(length(val)) FROM t1000idxmerge WHERE i=2 AND j=1"; done > /dev/null
In the result table I compute per query results and present results in milliseconds.
| Query | 1000 - 2 indexes | 1000 - 2 columns | 3 - 2 indexes | 3 - 2 columns |
| Q1 | 20 | 0.2 | 6940 | 2530 |
| Q2 | 25 | 0.3 | 7400 | 7500 |
| Q3 | 25 | 0.3 | 7200 | 3830 |
| Q4 | 70 | 3800 | 4700 | 4700 |
| Q5 | 25 | 2980 | - | - |
Note1: Q1 will not use Index Merge technique for low cardinality table but instead pick to do single index scan. I'm not aware of the optimizer hint which would allow to force index merge as you can do with index accesses in general.
Note2 Q2/Q3 can't use Index Merge however as it is currently implemented so they would use single index range scan. The 2 indexes however benefits to Q3 because it can only use first keypart of index (j,i) to resolve BETWEEN part of the clause which may not be very selective.
Note3 You may be surprised why 2 column index is faster for Q3 in case of low cardinality even though MySQL can't use index well. You're right MySQL can't and MySQL does not - Full table scan is performed and in this case turns to be faster than scanning 1/5th of the table using index. Also Full Table Scan is preferred for Q4 in all cases but in case of high cardinality multiple index configuration.
Note4 Q5 was just run on high cardinality tables to show what difference large BETWEEN can make.
Conclusion: For benchmarked queries we can see Multiple Column index beats Index Merge in all cases when such index can be used. It is also worth to watchout a MySQL may decide not to do Index merge (either intersection or union) but instead do full table scan or access table picking only one index on the pair.
Entry posted by peter | 4 comments
Latency matters. Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.
The Amazon results were reported by Greg Linden in his presentation Make Data Useful. In one of Greg's slides Google VP Marissa Mayer, in reference to the Google results, is quoted as saying "Users really respond to speed." And everyone wants responsive users. Ka-ching! People hate waiting and they're repulsed by seemingly small delays.
The less interactive a site becomes the more likely users are to click away and do something else. Latency is the mother of interactivity. Though it's possible through various UI techniques to make pages subjectively feel faster, slow sites generally lead to higher customer defection rates, which lead to lower conversation rates, which results in lower sales. Yet for some reason latency isn't a topic talked a lot about for web apps. We talk a lot about about building high-capacity sites, but very little about how to build low-latency sites. We apparently do so at the expense of our immortal bottom line.
I wondered if latency went to zero if sales would be infinite? But alas, as Dan Pritchett says, Latency Exists, Cope!. So we can't hide the "latency problem" by appointing a Latency Czar to conduct a nice little war on latency. Instead, we need to learn how to minimize and manage latency. It turns out a lot of problems are better solved that way.
How do we recover that which is most meaningful--sales--and build low-latency systems?
Crisis has struck! This week’s Log Buffer editor had to beg off at the eleventh hour when his time vanished. It happens. But, in every crisis, an opportunity (well . . . maybe, maybe not). The opportunity — an open discussion of this week’s best database blog articles. Readers in control.
Log Buffer is always looking for editors, so if you’d like to step forward and publish one on your own blog, read the Log Buffer guidelines and send me an email.
I’m going to go through my bookmarks and add my own presently. I hope to hear from you!
2008-08-21 Thu
Quite common beginners mistake is not to understand how indexing works and so index all columns used in the queries.... separately. So you end up with table which has say 20 indexes but all single column ones. This can be spotted with a glance view. If you have queries with multiple column restrictions in WHERE clause you most likely will need to have multiple column indexes for optimal performance. But wait. Do not go ahead and index all combinations. This would likely be poor choice too ![]()
Entry posted by peter | 24 comments
Anybody who has tried this multi-node to single-node cloning in 11i knows that it’s difficult and very error-prone. If we outline the Apps Tier cloning process, it looks like this (supposing we have a two-node instance with the DB and CM on one node, and Web and Forms on the other):
# CM Node # copy all files to the target $ perl adpreclone.pl appsTier merge # Web Node # copy $COMMON_TOP/clone/appl directory only to the target $ perl adpreclone.pl appltop merge # Target Node $ perl adcfgclone.pl appsTier
This process is called merging appltops. It’s not as easy as it looks. Many times, the production environment might not have proper values in the fnd_nodes table, which makes adcfgclone.pl fail to properly recognize the appltops for merging.
But in R12, life is made easy. (more…)
So you need to design highly available MySQL powered system... how do you approach that ?
Too often I see the question is approached by focusing on expensive hardware which in theory should be reliable. And this really can work quite well for small systems. It is my experience - with quality commodity hardware (Dell,HP,IBM etc) you would see box failing once per couple of years of uptime which is enough to maintain level of availability needed by many small systems. In fact they typically would have order of magnitude more availability issues caused by their own software bugs, DOS attacks and other issues.
However as your system growths the reliability goes down. If you have 100 servers with each failing every 2 years this is about a server a week which is bad and if you're into thousands and tens of thousands of servers server failures are becoming common place so it is important to make sure failing server does not affect your system and also what you can recover from server failure easily
So you should assume every component in the system can fail (if it is Server,Switch,Router,Cable, SAN) etc and you're ready to deal with this. It does not mean you always have to ensure you stay fully operational after any failure but at least you should understand the risks. For example you may want to choose to keep single Cisco router because it has its own internal high availability on the component level which makes it extremely unlikely to fail, because you have 4 hour onsite repair agreement and because it is just freaking expensive. Though may be redundant less expensive systems could be better choice.
I would highlight again every component can fail it does not matter how redundant it is inside. The SAN is very good example - I've seen Firmware glitches causing failure in the SAN which was fully redundant on the component level. It is not every hardware component but also any code may fail as well. This is actually what makes your own code often the weakest link in availability.
Depending on failure rate you also should be thinking about automation - for frequent failures you want to recovery (like getting spare Web server and putting it online) to be automatic or done with simple manual command. For complex and rare failures you may have less automation - if certain type of failure happens once per couple of years for many evolving systems there is very high chance the old automation tools may not work well (this is of course unless you always test all automated failure scenarios regularly).
So if we're designing the system so it can tolerate hardware failures should we bother about hardware quality at all ? The answer is yes in particular for classic database/storage systems. Few systems are design with so much error detection and automated handling in mind as Google File System.
In particular you want to make sure Error Detection is on the good level. For example if you're running the system without ECC memory chances are your data will melt down and you will not notice it for long time (in particular if you're using MyISAM tables) which can cause the error to propagate further in the system and make recovery much more complicated than simply swapping the server. This is exactly one of the reasons many high scale installations prefer Innodb - it is paranoid and this is how you want your data storage to be. This is also why Sun is so proud about checksums on the file system level in ZFS.
What is about RAID when ? As strange as it may sound but you should not relay on RAID for your data safety. There are many ways to loose data on RAID system even if you're running RAID6 with couple of hot spare. The RAID is just dramatically reduces chance of data loss in case of hard drive failure and this is good because recovering database servers is not fully automated in most cases. Plus there may be system performance impact and (in particular if you use MySQL Replication for HA) the switch to the new server may not be 100% clean with few updates lost. RAID, especially with BBU also makes a good sense to get extra performance out of the box.
Some installations are using RAID0 for slaves - in these cases there are typically many slaves and recovery of the slave is very easy and causes no business impact. This is fine assuming you do the math and the performance gains or cost savings are worth it.
Another good RAID question is if Hot Spare should be used. I normally do not use it because it a large waste, especially as most of systems have even number of drives, so if you're looking for RAID10 setting up hot spare costs you 2 drives. Having hot spare does not add a lot to high availability - if you have proper RAID monitoring in place and keep couple of spare hard drives on the shelf in the data center we're speaking about couple of extra hours running in degraded mode. Even if you do not have spare hard drive you can often pool the one from the spare server and have the "warranty man" to replace it instead.
It is also a good question if you need redundant power supplies. In my experience they rarely fail so having redundant power supplies does not increase availability when it comes to hardware failures that much and so if you just look from this angle it may be justified only for the most critical servers. Do not forget redundant power supplies also increase server power usage a bit. Redundant power supplies however are helpful if you have multiple power feeds, so server can stay up if one of the phases has a power loss. Another benefit is - in redundant power supply will often allow to do some power work (like moving server to different circuit) without downtime which may be or may not be something important for you.
Finally I should mention about spare component. These are paramount if you're designing highly available system. Having spare drives on the shelf, spare switches, spare servers (which are same as better as servers which are in production) is paramount. It is important promotion happens easily and there are no performance gotchas (ie 8 core server can be slower than 4 core with MySQL). It is best if you just put couple of spare servers in each purchase batch so they are absolutely same configuration but I know it is not always possible. Dealing with spares is yet another reasons to avoid the "server zoo" and have limited set of purchased configurations which are reviewed yearly (on other regular interval) rather than finding different best configuration each week.
Having spare servers also means you often do not need most expensive support agreements and Business Hours Monday-Friday is good enough for you - you're not waiting for support for production anyway just fall back to another server and use it. Of course you can imagine cases when problem could affect all servers of the same type but it is not that frequently seen in practice.
To avoid multiple servers failing at the same time it is of course important to QA/Stress test servers before you put the load on them. I've seen multiple cases when something would go wrong and all servers of same configuration will experience the same problem. Proper QA/Stress test reduces the chance of this but you better to be testing with load similar to what you expect in production.
Requirement to have Spare hardware is also the reason why commodity inexpensive hardware is often better choice. If you have couple of $1M in production you need another $1M server as a spare and this is expensive. If instead you have 10 pairs $10K boxes having couple of spares would cost you only $20K plus I found it in many cases much easier to convince "finance" people to buy something cheap which is not used most of the time when to spend a lot of money on the server which will be where sitting doing idle.
How many spare servers you need - you would see it in practice. As I mentioned at least one for any hardware class you have. If you have many failures you need more of course. You may also decide to keep more spare systems when you can use them to help capacity management, especially if you have multiple applications which do not share hardware but share the data center. You may have "spares" to provide extra on demand capacity for web servers or memcached quite easily, or say increase number of slaves if you have unexpectedly high number of reports launched by users etc.
Entry posted by peter | 14 comments




