30 July 2006
Posted by Chris at 10:19 PM
I attended the LiveJournal session on the last day of OSCON. I've read previous slide decks from them on various scaling issues and their tools, etc. Also, since I've been thinking about memcached (like everyone else I guess), and MogileFS, it made sense.
The presentation was mostly an overview of the main open source tools they provide: Perlbal, MogileFS, and memcached (see code.sixapart.com page for info). Perlbal is their very configurable load balancer; MogileFS is a file store, not a file system; and memcached is a caching system. They did not get into setup details on any of them. They did cover some general aspects of LiveJournal's setup (and it sounds like VOX is similar), and a few points of interest, some relatively common ideas, some more detailed and useful ones (more on this below).
MogileFS is quite interesting to me. It is very similar to Amazon's S3 in general use, in that you store things essentially via keys, not via normal file system directories and files. So, what's cool about this is that you can have a somewhat infinitely sized system where you don't worry about limitations on directory contents, file names, disk space, and so on. Further, MogileFS allows for a "class" notion, which you configure to mean how many redundant copies you want. The typical example is that for photo storage, you'd have an "originals" class where you might have 3-4 copies, and then a "thumbnails" class where you had maybe one or two (since you can regenerate these). Also, there are namespaces, so you can partition data that way. You must use a client library with MogileFS, but they are available in many languages.
As a general recommendation with their tools, they noted that you would likely start out with your normal system of some web servers and DB server(s). Then, you'd add in memcached, next Perlbal in front, and finally add MogileFS (which essentially requires at least two machines, if not three or more depending on how you configure the three pieces involved for MogileFS).
They also noted that they use Gearman for job queues where the jobs don't matter, and either they're done in 10 seconds or they're gone. They then use dschwartz for queues of durable tasks, that you care about. You can send in multiple of the same task, and they get properly coalesced into only being done once.
Some of the interesting points I noted were:
- LiveJournal started on a shared hosting setup, then grew to dedicated, and so on until where they are today. They are still in a single data center, although looking to either go to multiple, or move (their data center is in SF, and they would like to be in a less environmentally disasterous area). They used to have a data center in Japan, but it wasn't yielding advantages. This was info I gathered when I talked to them after the session.
- They use lots of cheap SATA drives, and they found that, at least today, the 250GB disks are better than the 500GB disks, because they are less susceptable to heat, and can be used to capacity, whereas the 500GB's could only be used up to about 350GB.
- Further, they do not use RAID (5 anyway). They do use RAID 0 or RAID 10 say for database systems, but not for the file systems. With MogileFS, they don't need it, and it doesn't handle all problems, like power failures and so on. Also, fsck doesn't run in parallel, so it can take a long time.
- Continuing this, use lots of small machines, so you spray the disk writes across them, as opposed to big honkin' systems.
- Make sure MySQL is not a SPOF (single point of failure)! Single big MySQL boxes are not good, better to have smaller ones that are faster and cheaper and where MySQL can perform better (better/faster IO).
- They think they'll get rid of Apache within a year or so (Perlbal is enough). They have BigIP boxes in front of Perlbal's for simple load balancing, and these are nice boxes, but they don't know how truly busy the Apaches are.
- MySQL is the only thing in their system that blocks.
- They're moving to dual, dual-core machines in 1U boxes to cut down on heat. 4GB of RAM per box.
- I didn't get the exact config of MySQL, but they said they have about 25 boxes running it, with varying configs for different things.
- They don't explicitly talk to the Odeo guys, but do see them regularly. I had asked about this because Odeo is doing various Ruby versions of some of their tools (memcached and MogileFS I believe).
Note, for an older, seemingly slightly out of date slide deck, check out their presentation from last year's OSCON. This year's presentation I have yet to find online.
Posted by Chris at 9:47 PM
29 July 2006
Posted by Chris at 7:52 AM
24 July 2006
do-while constructs were different than a plain
while's. This is somewhat stunning in that this is the same across nearly all languages, and if they were the same, why would they have them both (oh wait, maybe Perl people would want those multiple ways ;-)
Lunch, or rather lunchtime (no lunch provided at conference) I talked to some folks about OSS in general and their interests (and mine). Two of them were into Asterisk, with one having set up a home Asterisk server so he could tie into his company's phone system. We also briefly talked about sshfs, which is something I'll have to look into, sounds interesting.
After lunch I headed to the Powell's mini-bookstore and lucked out in that Lucas Carlson was there signing his newly released Ruby Cookbook book. So, I picked that up, and one other book.
The afternoon session was a little better. I attended the Rails Guidebook session with Mike Clark and Dave Thomas. These guys are great speakers. Since I've been working with Rails for a bit, I knew roughly the first 75%. But, I hadn't yet gotten into RJS, so that was cool, even if brief. There were a few other tidbits that weren't in the slides, but were good, such as on deployment. Specifically, Dave mentioned that, paraphrasing, most of the people he knew that were running heavy traffic sites were using file based sessions on NFS volumes, in order to do distributed session management, as opposed to memcached. He mentioned that memcache[d] was a bit of a pain to set up. Also, the current "best" (my take, based on their comments, and various bloggers) deployment setup is Apache 2.2 with mod_proxy_balancer and Mongrel. I've mentioned this, with a few good links, in a recent post.
At the end of the day, I took the MAX train over to downtown PDX to meet coworker Mike Potter for dinner. We had great conversation and tasty Italian food at Pazzo Ristorante. So, the day improved as it went along, and that's a good trend. I took the picture above from my hotel room this evening. See you tomorrow...
Posted by Chris at 11:50 PM
23 July 2006
Posted by Chris at 9:39 PM
I've arrived! w00t! (cant I say that?) Anyway, flew in, flight was only an hour late--listened to some podcasts and relaxed. As hot as it may be here in Portland, it's not as hot as at home (113 today). Took the MAX train from the airport to the hotel, super easy. Checked in, then went and did the registration thing at OSCON. It's about a 10 minute walk from the DoubleTree hotel here.
After that, I came back to the room and wanted to do a video chat with my kids (4 year old daughter and 18mos old son). But iChat was screwing up for some reason; and it didn't give me any useful errors! Normally at hotels I may run into their network being too slow, but iChat will tell you that. Nope, here it was just something obtuse so I sent it to Apple a half dozen times. A real bummer though. I'll have to try from OSCON tomorrow, or maybe Urban Grind (there's one a few blocks away). Apparently this is where all the kewl kids hang out, although probably at the one in the Pearl. So far in my short view radius I've only seen Starbucks :( I'm now hitting the Delocator though.
Just ordered room service with a local brew. Hopefully will go out tomorrow night. Now time for a little code before dinner...
Posted by Chris at 7:49 PM
22 July 2006
Posted by Chris at 4:23 PM
20 July 2006
I'm getting very interested in server virtualization system, such as Xen. And, more to the point, using virtual server hosting options, as opposed to either a regular shared host setup, or a fully dedicated machine/colo. One reason is cost as related to the configuration of your setup (and this assumes you do not need to own the entire machine due to your application's needs).
For example, maybe I have a budget of around $500/month to get a web app going. Well, that'd buy me a couple dedicated servers (at a say mid-priced place) with probably minimal to "no" support. Or, it'd buy me somewhere between 8-10 virtual servers on a virtual server system. The virtual servers are far more RAM constrained (this seems to be the big dividing line). For example, those 8-10 server instances would have around 192MB of RAM. This appears to be decent for running a basic Rails stack for example.
But getting back to it, the reason I find this appealing, at least for early deployment, is that it would allow me to mimic a more fully scaled solution simply due to having more "servers." Instead of having say one web app server and one DB server with the full colo setup, you could have say a half dozen web app servers, a cluster of DB servers, and then maybe another box that you dump backups to, or dump logs to, or whatever. What it really lets you do though is setup your fully scaled architecture, something like this, this, or this.
Posted by Chris at 11:24 AM
19 July 2006
Posted by Chris at 4:59 PM
18 July 2006
We're still working on the problem of synchronizing data online/offline, but the storage issue looms large. Will users be comfortable having all of their information stored online? I think so, but RIA developers need to take pains to ensure that the data is secure and trust is not misplaced.
The above topic in general, i.e. whether or not to keep a copy of your data, or synchronize your data online is becomming a bigger and bigger question and issue. This is something I'm very interested in, and doing some work on myself.
I don't believe it's a simple issue of all your data online or not. I do however want ubiquitous access to my data, but most solutions so far come up very short. Mainly this has to do with security and privacy. I already use del.icio.us, Flickr, Backpack, Basecamp, Gmail, and various other tools that store some of my data online, and or hold a copy of some of that data. But, I most surely do not keep my Quicken files online, or various other documents and information that I consider particularly sensitive or needing explicit security.
But, I really want this. I want all my data available on any computer I use. I'm starting to look at things like S3 to create my own solutions. One notion is to store everything on the server encrypted. This works, but becomes a significant overhead when you start storing large amounts of data, or large files like media. Maybe it is an option per file or per resource location.
I think it will be very interesting to see what Google and folks do in regard to this aspect. Sure, GDisk would be great and all, but only for some things, at least until there's some level of security. And believe me, just having some code of conduct, terms of service, and employee guidelines for those working at the data centers, doesn't mean there can't be abuse - you need to make it essentially impossible for folks to see your data (when you care about security/privacy of said data).
Posted by Chris at 2:15 PM
The industrialized world is on a collision course with nature, says environmental hero and leading expert Al Gore, who passionately urges a Stanford Business School audience to take action to save the environment. In his presentation on global warming Gore presents with alarming clarity, conclusiveness, and humor, that the fact of global warming is not in question and that its consequences for the world we live in will be disastrous if left unchecked.
Posted by Chris at 10:18 AM
17 July 2006
Posted by Chris at 8:38 PM
15 July 2006
Something I want to do for our development environment is to put Rails, and all our Gems, as well as other tools we use, all under version control. The way we do this now is to freeze Rails, and then put all the other stuff in a directory in our code repository and only install from there. I was wanting to freeze gems as well, but the way I understand this, it wouldn't work because it freezes what you have installed, which means it is platform dependent (assuming you have some gems that use native code). I need to support development, currently primarily on MacOS X, but also on Windows, and then various UNIX flavors such as Linux, BSD, and Solaris.
I'm curious what other folks are doing to ensure their versions of everything are correct for any point in their source code control? I also want to do this with non-Ruby technologies as well, so it's generic problem. Things like MySQL, DarwinPorts, and so on. The script I mentioned I want to write is aimed at handling much of this, but it'd be nice to have the actual "installed" variants in source control so you simply had to sync/update to latest and have everything except a few major bits (MySQL for example) guaranteed to be correct.
Posted by Chris at 4:21 PM
14 July 2006
Posted by Chris at 9:58 AM
13 July 2006
I've just set up Explorer Destroyer on my blog. Their instructions are good, but if you use Blogger to host your blog, and you have the little Blogger bar across the top of your page, you'll need to make one tweak to get the Explorer Destoryer box to show up completely. I've only done this for (and it probably only applies to) the Level 1 version. In the Level 1 script code, look for the following line (currently it's line 150 in the HTML they supply):
<div style="padding: 20px; background-color: #ffffbb; font-family: arial; font-size: 15px; font-weight: normal; color: #111111; line-height: 17px;">
Add a top margin to account for the Blogger bar, thus changing the line to:
<div style="margin-top: 30px; padding: 20px; background-color: #ffffbb; font-family: arial; font-size: 15px; font-weight: normal; color: #111111; line-height: 17px;">
Posted by Chris at 12:17 PM
11 July 2006
I'm very excited. Beginning in August, after returning from OSCON, I will be changing jobs. I'll still be at Adobe, but I'm moving from the Photoshop team, to a new team (that is just me (development wise) to start), and back to doing web apps and web services. I'm further psyched because I'll be doing Ruby on Rails (taking a project I was working on in my spare time and making it a "real" project), Flex, and other cool stuff. I will do some minor continued work on Photoshop, but really be concentrated on this new gig.
I've been working on Rails stuff in my spare time for a while, but now I'll be joining the ranks of those getting paid to do it. Plus while I have utmost respect for the Photoshop team, and really like the folks there (there is some truly amazing talent, although that's true throughout Adobe), I'll be back to my real strength (and love) which is web and network related apps (and maybe next year when CS3 ships, I can reveal what new stuff I worked on in Photoshop - some rather interesting bits indeed!).
Posted by Chris at 9:45 PM
Blogged with Flock
Posted by Chris at 9:12 PM
So far I'm digging Flock. I'm using it as my primary browser on all my machines now (4 Macs and a Windows XP machine). I've run into two issues, one minor, the other, I'm not sure:
- The favorites/bookmark bar does not appear to support folders, like it does in Firefox, etc. This is a fairly major bummer to me. I don't use a Favorites menu or anything beyond the bookmark bar. I use bookmarks solely for the things I use on a regular basis, and thus this holds what I need--assuming I have folders.
- On Windows, Flock doesn't seem to be able to import all your Firefox settings. It only seems to see/allow/support IE import, which is pointless for me (I do use IE occasionally, when I have to, and actually in that case I use Maxthon instead, but it's rare).
Also, I need to look at bookmark synchronization more. I was under the impression that Flock did this, but it it's unclear if you are using full del.icio.us with it, if it does anything beyond that.
Blogged with Flock
Posted by Chris at 10:36 AM
10 July 2006
I'm trying out the Flock browser, which is Firefox based. So far it's pretty darn cool. I'm not using every feature (I prefer NNW to Flock's RSS, but that isn't their intent to replace anyway). But, the Flickr integration is cool, and this post is testing out the Blogger integration. The del.icio.us integration rocks. Also, the UI is very nice, and I dig how they've already integrated some of the nicer extension features. I've of course already installed other extensions like Firebug, Web Developer, Gmail, Session Manager, and so on. Anyway, check it out, pretty interesting.
Also, their search box is great - it does multiple searches while you are typing, and shows them in a drop down. So, you can simultaneously search Google, Yahoo, Amazon, Wink, or whatever. It's very nice, and highly effective. I should note that some of these things may be existing Firefox extensions, but Flock simply integrates them really well, and avoids me having to manage that many more extensions.
Blogged with Flock
Posted by Chris at 9:53 PM
09 July 2006
I'm about to embark on creating a script/system for keeping a project's Rails and related environment up to date, as well as doing an initial setup. Gems and various Rails pieces already can do some auto-update and setup things, but I'm wanting something more professional, and something more explicit and exact in terms of versions used.
The idea is that this system will take a bare bones machine from essentially just having an OS, to full development or production ready. And then also, once set up, it will update it. So, it will install Ruby, Rails, Gems, database of choice, libraries needed (e.g. ImageMagick), other components used (e.g. DarwinPorts), etc. In combination with the source code control system (SCCM) (such as Perforce or Subversion), it will keep each piece up to date.
In addition, it will do things like install gems and other pieces from what you store in your SCCM - not just from the net or by doing say a generic "gem update". The reason to do this is that you can track the exact versions used at any given point in your application's development or deployment. You won't have to somehow separately document that you were using XYZ versions when you deployed 2.1.3 of your app, etc. I've used, and have heard others who take this further in the past, by also putting even the OS, development tools, and so on withint he source control system (I've never done the OS part, but I believe at Siemens they used to do that back in MacOS <= 9 days). This script also helps ensure all developers and deployment systems are in sync. At the moment I'm planning to do this with a bootstrap bash shell script, but then once Ruby is available, it'd kick over to a Rake script. Partly I'm choosing this route as a way to learn more about Rake. I'd be interested in hearing if anyone else has such a system in place, or if there's already something out there for this (that works well with Ruby and Rails).
Update: I forgot one other thing that plays in here. Our firewall rules at work do not allow rsync outside the firewall. This prevents things like DarwinPorts from working. The solution, silly as it is, is to take a laptop home, connect to a non-VPN'ed network, and do your DarwinPorts work. Then, you come back, and the ports themselves are cached, so you can then add them to your SCCM and install from there for other machines (who wants to lug a super heavy G5 home, ya, not me).
Posted by Chris at 3:12 PM
08 July 2006
I've been reading, Agile Project Management with Scrum by Ken Schwaber. One of the things that struck me right off was in Mary Poppendieck's forward, where she provides an excellent analogy: that of driving a car. She describes how driving is made of a set of simple rules, and is a process by which we make small course corrections, frequent decisions, and so on, along the path we take. Yes, we have an end goal in mind, but how we get there is a process which the team (driver) determines along the way. This is unlike say a train route, where it's completely programmed and extremely well known before the trip even starts. A simple analogy, but probably the best fitting, strongest, and easiest to grasp that I've seen yet, in terms of Agile processes.
Posted by Chris at 2:22 PM
I think this is very cool, as I'm into energy conservation and such. Adobe's West Tower building (the first of the three that are connected together at the San Jose HQ), just received the highest rating for energy and environmental design:
The Adobe tower is the world's first commercial office building to earn this highest recognition possible for energy and environmental design excellence under the USGBC's permanent LEED Existing Building (LEED-EB) standard.The other two buildings that it connects to should get this rating shortly as well. It's nice knowing the company you work for takes these issues seriously. It's somewhat amazing to me that they do this, given the beautiful facility. Everyone has their own office, there are many "labs", good cafeteria (Google's main benefit over ours is that theirs is free - food isn't that different IMHO), top notch networking (Gigabit ethernet, WiFi all over), and so on.
Of course, I work in the Auburn, CA office, with four other people, so our facility isn't quite like this. But then we're only an hour drive from Tahoe :)
- Adobe Wins Platinum Certification Awarded by U.S. Green Building Council
- Adobe Wins Top California Flex Your Power! Award for Energy Efficiency
Posted by Chris at 2:04 PM
06 July 2006
The title of this post has more than one meaning, but for the purpose of this post... I've just signed up to attend the Scale with Rails seminar in Laguna Beach in August. I'm hoping that it provides a lot of good content and learning, as I start ramping up Rails apps.
I'll also be taking my family, and staying at my parents in Corona del Mar, which is just north of Laguna. We'll hit LEGOLAND the day after the seminar.
Posted by Chris at 2:38 PM
Posted by Chris at 2:37 PM