Work


I mentioned a few times that I’ve been working a lot harder than usual the past 6+ months. The project is a completely unplanned, unscheduled, unspecified, self-initiated makeover of our GUI application.

I’m not going to talk about the gory technical details because I can’t. The last couple of weeks were extra crazy because some sales people are going to a trade show and they want to show the new GUI. Last Friday I checked in close to 30,000 lines of code changes, and got this back from the sales guy:

… even at this ‘alpha’ stage, it genuinely looks brilliant. … Whilst it is a phase1 of the GUI enhancements I gotta say it’s a MASSIVE leap forward from the prod (old) GUI. Chalk and cheese comes to mind. … we have already had a couple of people, when stopping by our booth say, unprompted, that the GUI “looks really great”.

Keep in mind this is a sales guy, but this is an internal email so he’s likely 14% less boasting than he usually is.

Anyway I’m taking tomorrow off.

Advertisements

I wrote about my growing workaholicism about half a year ago.

It didn’t get better.

The last two projects (UTF8 and hg) are still not finished. But a new project in the last 3 months or so has been the main cause for evening logins.

I mentioned it in that old post: a complete remake of our GUI using JIDE docking framework.

It’s a greater project than anything I’ve ever worked on. I’ll truly be proud to say that I’m the creator of something functional and useful, even just for those a few hundred users of our system.

But that’s not why I’m writing this post.

I re-realized that I work this hard because it’s easy.

Easier than life.

When I’m programming, all I have to worry about is programming.

Getting a computer to do exactly what I want it to is not trivial.

But it’s many orders of magnitude easier than flushing our living room HVAC with dozens of buckets full of rusty water and pray it will work again on the coldest day of the year so far.

And it’s infinitely easier than getting Valentina to go potty when she’s watching Dora.

It’s only harder than one thing.

That’s why I’m writing this post instead of programming.

I can’t fight the natural law for the path of least resistance, which guarantees local optimum but usually at the cost of global optimum.

A rhetoric question beseeches a rhetoric answer: I don’t know.

RYF’s latest blog links back to an earlier piece, which I don’t remember reading, but it hit me like a marshmallow on my toe nail:

我觉得,人生在世总要做一二件属于你自己的事情,自己发起、自己负责、自己主导、自己完善。就是说,这件事是同你个人紧紧联系在一起的,就像阿三的这个项目。

我想,我们每个人都应该有一二个自己的项目,人们提到你的时候,就会提到你的项目,反过来也一样。

人的一生,说长不长,说短不短,正好够用来做完一二件事,所以不要浪费它。何况也许只有这样,才能充实而愉快地度过有意义的一生。

I’ve been working like crazy for almost half a year. Almost every night, including weekends, I use remote desktop for hours, quite a few times till 1am (last night) and beyond.

I’m not complaining about the work load. I’m complaining about myself getting used to it. I don’t know what to do if I don’t click the Remote Desktop at night. Actually no, I know there are thousands of things to do, but somehow I just don’t want to do them. Clicking Remote Desktop is the easiest thing to do.

I have to admit that I’m quite proud of what I’ve done. It’s the first time I can say that since I started working. This list don’t mean nothing to non programmers:

  • Wrote a converter to convert hundreds of proprietary text files that describes the GUI windows in our system in a very lame way into XML under the new GUI framework that I’ve been working on the last 3 years.
  • Picked a commercial Java GUI library (JIDE truly rocks! It was founded and run by a Chinese guy with a similar background and age as me) to introduce docking into our GUI. It will be the foundation of a complete remake of our arcane and awful GUI.
  • Use the same library to turn any table in our system into a full featured pivot table that’s even better than Excel. No server side change is needed at all.
  • Use UTF8 to enable non ASCII. It would be many many years till we get to full i18n, but I’ve started to spin the wheels out of prehistoric dark age.
  • Use Mercurial to replace TeamWare, which uses–u better believe this–SC freaking CS. I learned python and wrote more than 4k lines of code to create extensions to cater to our rigid development process. I’ve written scores of multi-page emails and hundreds of revisions of wiki documents to explain the new processes and explore the options. If you still use svn, you’re fossil.

The last one probably took the longest time (still not finished) and most work.

But are these my projects? Is this feature, this software, this company, this business, what I want to do for my life?

Hell no.

You can’t always get what you want.

And you can never get what you don’t know what to want.

CRAZY. CRAAAAZY!

It’s not as bad as “Corporate Counsels Payroll, Nina speaking. Just a moment.”

Sometimes it’s pretty close, like just now.

  • The never-has-nothing-to-do sys admins are looking at restricted web sites again. Restricted to other employees, I mean. One of them has a truly explosive and hysterical laugh. Worse than the “hyena lady” in Sleepless in Seattle. I used to sit 2 rows away from him and was tearing my hair cell out. Now I’m 5 rows away and it still hurts.
  • The fast-talking MBA project manager has been asking everybody when they’ll finish their projects as always. Only speaking faster today.
  • The sales partner’s secretary has been typing like mad, when she’s not calling places to reserve/cancel appointments. Donno why her seat is among the developers. Donno why her keyboard sounds so loud, either.
  • The guy who always strolls around the floor is doing it for the 36th time today. He always wears a suit. And a hat.
  • 3 guys sitting on my back side have been talking about all kinds of stuff. In Russian Serbian, Argentine, and a monotonic robotic origin-unknown accents. Maybe the 3rd guy IS a robot. Gosh, why didn’t I think of that before?! That explains a lot of things…
  • The guy sitting next to me just started today. In the morning he stared at the screen for about an hour trying to log in. Then he’s been making phone calls to insurance company, asking sys admin how to do this and that, etc. In Polish accent.
  • A guy just bought a handheld GPS and is showing it off. He can’t get a strong signal so he opens up a window and sticks it out. The wind is howling through the crevice and he’s bragging that the device can talk. In good old American accent. The other guys marveling at this modern marvel are my compatriots with you-know-what accent.

I myself is to be blame for the ordeal, though. I brought the super nice Bose QuietComfort headphone to work a few times, but got tired of commuting with it–certainly I can’t leave that expensive thing in the office. Now what I have is a cheap headset and my ears hurt after wearing it for 30 seconds.

I figure that there are very few tech companies in the world where developers have private offices. That’s why Fog Creek always looks to me like Google’s next door neighbor in programmer heaven. In my current company of more than 100 employees, there are only 8 private offices for the partners, CFO, and sales/finance/HR managers. Even the director of development sits among us in sprawling grey desks.

Oh wait, the receptionist kind of has her own office. Hmmm. Maybe I’ll trade for her space if it gets too rough and tough around here.


UPDATE: There’s one distinctive noise missing yesterday: the guy who skypes our outsource team in Ukraine for about 4 hours continuously everyday, in Russian-accented Russian.

But I don’t care any more since I have my QuietComfort now. In a month I’ll be able to transcribe all of Beethoven’s string quartets from memory 🙂

Questionaire in 11/2005 issue of Software Development.

  1. When it comes to policies, procedures and development standards, I:
    Follow them because consistency is important, but I offer to draft the changes needed to correct or update them.

  2. When writing a project-related document, I:
    Reference other documents or ask questions to make sure I understand what’s required, taking my best shot at it before submitting it to my manager for review.

  3. When I solve a particularly difficult technical problem, I:
    Provide my coworkers and manager the details of the problem and solution, to spare them similar difficulty.

  4. For routine communication with my manager, I:
    Ask her how she prefers to receive routine communication, whether by e-mail, phone or office visit–and then keep in touch accordingly.

  5. When I receive a new project assignment, knowing I already have more work than I can do effectively, I:
    Accept the assignment, but ask hwat priority it should be given in relation to other assignments.

  6. When I receive an assignment I feel is “grunt work”, I:
    Recognize that a certain amount of pedestrian work must be done, and respect my manager’s judgment in the assignment.
    (This is the only one I got “wrong”. I picked “Politely inform my manager that my skills could be put to better use elsewhere”)

  7. My attitude toward keeping current with tools and best practices is:
    I enjoy reading articles and books about the software development profession, even on my own time, for my own personal development.

  8. When demonstrating my development work-in-progress to coworkers, I:
    Take notes of their suggestions and follow up on each one, either making the suggested change or justifying why it isn’t feasible, realizing that they’re trying to help me improve the product.

  9. When reviewing my colleagues’ work-in-progress, I:
    Offer constructive criticism, focusing on the product and not the person.

  10. When my performance evaluation is coming up, I:
    Send my manager a detailed update specifically addressing how I met the performance objectives established for the performance period.

I responded to a Programmer/Software Applications Engineer position at Voyetra on Monster (I was stupid not to save/forward job description), and guess what–Carmine Bonanno, founder of Voyetra and president/CEO of VTB sent me an email saying that he wants to talk to me! Earlier I applied for a similar position at Voyetra and got no response at all. Gosh I hope I won’t blow this one!

Some top hits from Google:

Sound on Sound article about CAT and Voyetra 8: http://www.gordonreid.co.uk/vintage/cat.html

Keyboard Magazine article about Voyetra 8, including how the name came about: http://www.voyetra.com/site/kb_ftp/2901015.asp

Kitten story with images and links http://www.tribalsmile.com/kitten/

Dell doesn’t seem to use VTB any more, it’s all SB. Seems like Dell started using VTB Santa Cruz in 09/1998. This 02/01/2002 article http://www.zdnetindia.com/reviews/hardware/audio_cards/stories/49583.html talks about delay in VTB, which may be one reason.

http://atlas.csbnet.se/livecenter/showpage.php?id=47 talks about Audigy 2 NX, but mentions a lot business info. Soundcard business is in steady decline due to integrated audio on mainboard. “Sound cards used to make up more than 70% of (Creative) sales, but it is now in the low 20s.” Media player is reported to have 68% sale, so 20% is about right. According to http://biz.yahoo.com/prnews/050810/sfw092.html?.v=21
Creative’s 2Q 2005 sales is up 51% at $300m (260% increase in mp3 player sale) but profit is at $31m loss. 20% is $60m, so annual would be $250m. Global PC shipment in 2005 is predicted to be about 200m. If 10% buys sound card, that’s 20m cards. At average $50 a card, that’s $1b annual market–seems to high since Creative’s market share is probably more than 50%.

http://www.tatumpartners.com/partnerBio.php?id=86081 Joseph Migliozzi served as Chief Operating and Finance Officer & Corporate Director of Voyetra Turtle Beach, a $35 million manufacturer of consumer PC multimedia peripherals and software selling worldwide to OEM computer companies, retail stores, and e-tail directly to consumers. He led a turn-around restoring the company to health after a major acquisition. He re-engineered the organization and led the transformation from niche to mainstream consumer markets. He negotiated and managed contracts with PC OEM’s (i.e. Dell & Gateway) and reached preferred status at Dell, with of quality 500 DPPM, tracking to Six Sigma. He managed the selection and implementation of a state-of-the-art Sybase ERP system. (no mention when)

原载于《美国科学家》(American Scientist)82:250-255(1994年四月期)
译自《小波与滤波器组》(Wavelets and Filter Banks)附录一,Wellesley MA: Wellesley-Cambridge Press

我在听莫扎特的音乐走神了。一般来说不会有人承认这种事的,而我的借口是那不是莫扎特最好的作品,或者是因为音乐厅满座而我们来晚了,只好很“非法”地泊 了车。其实我走神的真正原因是一个无聊的念头,这篇文章就是从那个念头来的。我想的是一个问题:他到底是怎么把整部作品写下来的呢?

对于莫扎特来说,那恐怕是一个很大的问题。音乐记谱法就其本身来说是挺简捷的,但一部包括所有乐器的总谱还是跟一本书一样厚。最关键的是用什么符号来记 谱。对于一个音乐家来说现行的记谱法是唯一的可能,但实际上不是的。当音乐信号被录下来后(比如用数字录音),同样一部交响曲就被一种完全不同的方法表示 着。莫扎特写的谱子可没给录下来。被录制的完整信号不仅是所有的音符序列,更包含了演奏者所附加的细微表情:琴键、踏板、还有弓弦的运用--这些是区分演 奏大师与普通人的准则。谱子上可能有些额外的词,比如forte(强)或pianissimo(极弱),但它们一点也不准确!用数学语言来讲,音符是一种 “指示”(indication)但不是完整的“代码”(code)。
我们的世界中有无数声音、图象和比特串要被压缩、传送和恢复:

* 音频--音乐和语音
* 视频--静止图像和电视
* 数据--数字、字母、书籍,甚至邮政编码

邮政编码看起来不值一提,但这种很短的信号说明了我们有什么样的选择。我觉得英国和加拿大的邮政编码就压缩得过头了(译者注:美国的邮政编码是5位数字, 可能再附加4位辅助数字)。象M5S 1A4和V6T 1W5这样的邮政编码提供了额外的准确性,但实在太难写和记了。(有谁知道他们为什么同时用数字和字母吗?)压缩是一件很微妙的事情,因为我们还要处理压 缩后的信号,而不仅仅是把信号编码。这就如同学生记课堂笔记一样,他要压缩讲授的内容,但还得看得懂自己的笔记。
电视至今为止一直回避着压缩问题,但是这很快就要被改变了。电视屏幕上的每个“像素”需要24个0或1来混合三种颜色,所以颜色值可以有从0到255的 28种可能性--每种颜色用8比特。一幅非常明锐的图像可能需要用多于我们所能发送和接收的比特数所表示的更多像素。这时就要用到压缩了。有些信息得被丢 弃,否则电视信号就会跟不上拍了!对于高清晰度电视(HDTV) 来说,这个问题特别严重,因为它需要极高的比特率来转送。如果有人能成功地进行压缩的话,我们很快就能看到用多得多的像素表示的特别清楚的图像。所有的大 公司都在竞争HDTV信号处理的标准,因为这是一个事关几十亿美元的决定。这场竞赛是数学和工程方面的,同时也越来越变得政治化了。在本文的最后,我将讲 述一次有关这场竞赛的个人经历。

傅立叶变换和小波变换

本文将涉及几种分析与合成信号的方法,它们都不是完美的,而且一直被改进着。我们可以用三种方式来变换一部交响乐:

1. 用傅立叶变换分解为余弦波
2. 用短时傅立叶变换分解为一段段的余弦波
3. 用一种新方法分解为小波

经典的方法是把整部交响乐分解成一些纯粹的谐波,这相当于每件乐器只演奏一个固定的音符,整个信号变成了一些余弦波的和:a0+a1cost+…。这 就是Joseph Fourier在172年前在巴黎发表的傅立叶变换。这样的变换需要一个无限大的交响乐团来实现,但根本不需要指挥,所有的演奏者都会烦死。所有的音乐和 所有的信号都可以用傅立叶变换分解为纯粹的谐波。工程师们几乎会本能地做这种变换,用每个谐波的幅度来代替每一时刻的信号强度。这里最大的问题要用是多少 个频率来构成高保真的信号,在绝大多数情况下这个数字会非常大,因而无法进行有效的压缩。
另一种选择是短时傅立叶变换,交响乐的一个个片断被分别变换。每个片断还是被分解成余弦波。这相当于每个演奏者还是只拉一个音,但在每个片断中的幅度不 同。大多数信号是用这种方法分解的。这里主要的问题是在片断交界处有突变。你不一定能听到这种突变,但对于一幅图像来说你恐怕能看到突变所导致的边缘。这 种“块效应”是短时傅立叶变换内在的缺陷。
信号处理中的一种新方法更接近于莫扎特所写的乐谱。代替没完没了或被截断的余弦波的新砖石是“小波”,一些有起始和终止的小波动。它们都来自一个基本的小 波w(t),代表在时间t时一个基本曲调的音量。低音提琴用最长的时间来演奏这个曲调。大提琴拉同样的东西,但时间缩短一半而频率翻倍。从数学上讲,这个 加速过程就是把时间变量t换成2t。第一低音提琴拉b1w(t),第一大提琴拉c1w(2t),它们都从t=0时开始拉。下一对低音提琴和大提琴拉b2w (t-1)和c2w(2t-1),音量是b2和c2。低音提琴从t-1=0即t=1时开始。大提琴开始的稍早些,在2t-1=0即t=1/2 的时候。我们需要比低音提琴多一倍的大提琴来覆盖整部交响乐。中提琴和小提琴更快地拉同样的内容,互相交迭着。在每一级频率都翻倍(提高一个八度),并有 新的细节加入。“超级小提琴”用低音提琴的16、32、甚至64倍的频率演奏,但音量恐怕很小。这些小波在某种条件下累加起来就能构成整部交响乐,我们下 面就来讲怎样来做。 概述:一个很长的信号可以被分解成一些可能的基信号,它们可以是音符,波动或小波。所有的小波都来自一个函数w(t)的加速和延时。它们的幅度被传送到接 收者,在那里交响乐被恢复出来。(很小的幅度将被丢弃。后面还会讲到这一关键的压缩步骤。)我们现在来解释什么是基。

选择最好的基

这一节是为了已经知道向量和矩阵的读者所写的。你也许还记得任何向量(x,y)都是基向量(1,0)和(0,1)的线性组合。向量(4,2)等于4倍 (1,0)加上两倍(0,1)。另一对基向量是(1,1)和(1,-1),它们可以生成所有平面向量,包括(4,2)。这两对基向量所符合的条件是:空间 中的任何向量都有且仅有一种被基向量表达的方式。(1,0) 和(2,0)就不是基向量,它们处于同一条直线上,无法生成(4,2)。(1,0)、(0,1)和(3,1)也不是基向量,因为三个向量太多了。 (4,2)可以由三者中的任何两个生成(它也是三者的和)。任意两个方向不同的向量,比如(5,1)和(3,1),都构成整个平面的基。
所谓最好的基有着一个额外的性质:基向量是互相正交(即垂直)的。比如标准基(1,0)和(0,1),水平是垂直于竖直的。(1,1)和(1,-1)同样 符合垂直的标准:它们的点积是零。(x,y)与(X,Y)的点积是xX+yY,所以(1,1)与(1,-1)的点积是1-1=0。(5,1)与(3,1) 的点积是16,所以它们不互相垂直。正交基的优点是可以迅速地把一个有1000个元素的向量分解成1000个互相垂直的成分--在这里速度是最关键的。以 上这些是线性代数的核心。对于工程师、社会科学以及物理科学家来说,线性代数现在比微积分更重要。我这一代的学生,当然还有我的老师们,并没有意识到这个 变化。变化的一部分原因是由于从模拟到数字的转变,函数在被向量代替。线性代数用矩阵来提供着对n维空间的理解。

更高的维数

让我们来考虑四维空间。把(x1,x2,x3,x4)想成一个信号的幅度,如果是(1,1,1,1)的话就是一个固定不变的音,而x=(1,2,4,8) 是个渐强。传送这个信号的最简单的办法是把它的四个元素都传送出去,这相当于用(1,0,0,0)、(0,1,0,0)、(0,0,1,0)和 (0,0,0,1)作基向量。我们传送的数字x1,x2,x3,x4是这些基向量的幅度。这有没有可能不是最好的方法呢?那要看原来的信号是什么样的。比 如一个固定不变的音(1,1,1,1),这是一个很可能出现和很常见的向量,特别是在时间间隔很短的情况下,而好的音响系统必须用很短的间隔(短采样周 期)。如果用标准基,我们得传送四个1。而如果基向量中有 (1,1,1,1)的话,我们用一个数就成了。一个好的基能用少数的向量准确地生成信号。
压缩过程将丢弃信号中没有(或很少有)的基向量。当基向量表示不同频率--傅立叶基--的时候,高频经常可以用“低通滤波器”来去掉。用标准基时,我们只 能在没有第二个音的时候丢弃(0,1,0,0),否则在广播里就能听到一个间隙。
音乐记谱法解决这个问题的办法是用全音符或两个半音符来表示(1,1,1,1),就是说同一信号有多种表达方式。这可不是一个基!全音符、半音符和四分之 一音符对于音乐家来说是够用的,但数学家只要四个向量组成的基。我们的基里要有常数向量(1,1,1,1),第二个基向量可以是(1,1,-1,-1)。 它是一个方小波,上一下下一下。德国数学家Alfred Haar在1910年通过放缩和平移完成了一个正交基。除了以上的两个向量以外,他把(1,1,-1,-1)压缩而成(1,-1,0,0),又把它平移两 个时间间隔而成了(0,0,1,-1)。这些基构成了4×4的“哈尔矩阵”的列:
H4=\begin{array}
1 & 1 & 1 & 0 \ 1 & 1 & -1 & 0 \ 1 & -1 & 0 & 1 \ 1 & -1 & 0 & -1 \\end{array}
你能猜猜H8是什么样的吗?它的第一列是常数向量(1,1,1,1,1,1,1,1)。电子工程师管这一列叫第零列,他们总是从零开始数,就象英国的电梯 层数一样。下一列是(1,1,1,1,-1,-1,-1,-1),由低音提琴演奏的小波。它被压缩成(1,1,-1,-1,0,0,0,0)来给第一大提 琴拉,然后再平移给第二大提琴。H8的后面四列是中提琴所拉的音,比如(1,-1,0,0,0,0,0,0),它们的频率更高,时间更短。所有这些向量都 是互相垂直的!小提琴在16维时上场了。我不鼓励你写下H16矩阵,但你应该知道怎么写。这就是线性代数的特点,当你明白四维后,再高的维数就没有问题 了。我们将用四把中提琴和八把小提琴,所以总共有1+2+4+8=15把琴,少的一个是常数向量(1,1,…,1,1),它是放缩向量,可以作第16 个或第0个基向量。

快速小波变换

我们已经看到了最小的小波变换。(4,2)是3倍(1,1)加上1倍(1,-1)。我们可以从3和1恢复(4,2)。对于更长的信号,我们用均值和差值。 均值沿着一个金字塔型向上一层,将再被平均一次。差值代表每一级的细节。当信号变化很快时,细节是很重要的。而当信号几乎固定不变,我们就不必发送细节。 这就是压缩的原理,忽略那些眼睛看不见、耳朵听不到的东西。
(4,2,5,5)是一个四维的例子。我们知道怎么用前一半的4和2算出3和1。后一半的5和5得出5(均值)和0(差值)。两个均值3和5上一层金字 塔,两个差值1和0是与基向量(1,-1,0,0)、(0,0,1,-1)相联系的细节。两个均值表示一个粗略信号(3,5),我们一致地对待它:计算均 值(3+5)/2=4和差值(3-5)/2=-1。一致性是一个好算法的关键! (4,2,5,5)的小波变换是均值和差值(4,-1,1,0),它被传送出去,到了接收的一端,解码装置先下一层金字塔,用(4,-1)恢复粗略信号 (3,5),再加上细节1和0来恢复(4,2,5,5),这就完成了逆变换。这对有256个数的信号来说是个好消息。这次的金字塔有8层,因为256= 28。最低的一层总是从A=(x1+x2)/2和D=(x1-x2)/2开始,然后均值A向上一层,而差值D就是这一层的细节。这个金字塔的关键是多重分 辨率,我们在用不同的比例来看信号--精细,中等到粗略,每一级都是用细节和平均来表示的。我们的眼睛恐怕就是这样先看个大概再盯住细节的,我们的耳朵也 肯定是长成能够先拣出到达内耳的声音中的高频信息的。工程师的艺术就在于模仿自然。
我们管这种方法叫做“快速”小波变换。由N个数组成的向量经过变换被N个基向量所表示,用矩阵语言来说就是乘以哈尔矩阵H或它的逆矩阵。这通常需要N2个 步骤,而金字塔算法只用N次平均和N次差值。
问题:有没有快速傅立叶变换呢?我们能不能很快地找到N个余弦波的幅度呢?我们可以用傅立叶矩阵F来代替H在“时域”和“频域”之间变换。这种快速变换是 我们所在的时代里最重要的数值算法,是小波的很厉害的竞争对手。它存在于计算机的线路和线性代数的课本中。快速傅立叶变换的关键在于用几乎全是零的矩阵来 生成F。数值线性代数的核心就是矩阵分解--H和F是突出的例子。

杜玻基斯小波

小波是个热门课题,但哈尔小波可是冰冰凉的。它们是在1910年被发明的,它们的图形是用水平线段组成的,那么简单的东西早就被人们想到过了。用它们对大 多数信号所作的近似效果太差,即使是一条简单的斜线,我们都得用多得要命的水平线段来表示,以求基本的准确。哈尔基不能带来我们所希望的20:1或 100:1的压缩比,我们需要更好的基。
这里要讲到的新小波要复杂得多,而且只能用无穷乘积来表示的公式,但数学家总会找到它的。我们将通过英格丽·杜玻基斯的发现来讲述这一进展。1988年, 在AT&T贝尔实验室的杜玻基斯发现了一个开始一会儿就结束的脉冲,它垂直于它的所有放缩和平移。它基于四个“神奇”数字h0、h1、h2和 h3。杜玻基斯小波的放缩向量S就是由它们顺序组成,而小波向量则是W=(h3,-h2,h1,-h0)。你能看出为什么W与S垂直吗?先乘再加:点积 S·W是零。她还希望(1,1,1,1)和(1,2,3,4)在W上的分量是零,这样固定和线性信号就能被很大地压缩,这就要求它们与W的点积是零:
h3-h2+h1-h0=0和h3-2h2+3h1-4h0=0
以上是两个关于那些h的方程式,我们再需要两个。我们先让第一低音提琴所拉的调子(h3,-h2,h1,-h0,0,0)与第二低音提琴所拉的调子 (0,0,h3,-h2,h1,-h0)相垂直,就是它们的点积为零:h1h3+h0h2=0,这是第三个方程式。第四个方程式h0+h1+h2+h3= 2限定了h的大小。这四个方程式的解代表了一个比哈尔好得多的滤波器:4h0=1+\sqrt{3},4h1=3+\sqrt{3},4h2=3-\ sqrt{3},4h3=1-\sqrt{3}。这并不是最好的结果,六个或八个数能得到更好的结果,但需要更多的计算。视频滤波器一般很短,而音频滤波 器一般特别长,因为音乐一般比图象更平缓。
从离散矢量到连续函数的关键一步是放缩方程。它以一种很奇特但也很自然的方式来运用那几个神奇数字h0、h1、h2和h3。放缩函数\phi(t)的表达 式用到了t,2t和那几个神奇的h:
\phi(t)=h0\phi(2t)+h1\phi(2t-1)+h2\phi(2t-2)+h3\phi(2t-3).
代入t=1和t=2我们就有了\phi(1)和\phi(2),然后就可以算出\phi在t=1/2,2/3,2/5时的值,因为那时等号右边的 2t是整数。从半整数我们可以计算四分之一整数的函数值,最终我们就得到了足够的数值来画出从t=0到t=3的函数图象。它是一个有名的(对某些人来说) 函数,有着一些全新的特点,而所有的东西都被包括在放缩方程里了。
正如傅立叶用余弦波、哈尔用方波一样,杜玻基斯的起点就是这个放缩函数。她的小波w(t)方程很类似,只是右边的参数变成了h3、-h2、h1和-h0。 它的图象非常不规则,象一种分型。对它压伸和平移就构成了完整的小波基,但是所有的计算都是基于那四个数字。

高清晰度电视

“全世界都听到的枪声”(译注:美国独立战争的“第一枪”)在1775年还只是一个比喻,而现在全世界都看到的图象信号已经是现实了。那就是电视。你可能 想应该用更好的节目来改进那信号,而工程师们想用更多的象素和更好的压缩来改进它。后者大概就是我们将得到的HDTV。高清晰度电视的标准之战打得出乎意 料的公开。欧洲投了几十亿美元,然后不干了而决定采用北美的标准。联邦通讯委员会(FCC)组织了一个竞赛,最后参加决赛的有四个单位。竞赛中的一个大问 题是运动估计,就是预测图象要怎么变化。然后你就只需传输一个很小的变化值,而实时地跟上所有的象素。纽约时报预测日本方面会赢,因为HDTV在日本已经 播出了。但日本的HDTV是模拟的,不是0和1。数字滤波器现在发展得很完善了,可以很好地分离高频并压缩它。但是明锐的图象边缘必须保持明锐度。专家们 知道看什么,就象品尝酒或茶一样。归功于纽约时报,这场竞赛被报道给了公众。每个月竞赛都有新的变化。FCC结束了测试,但一些公司要求补赛,借口是他们 在测试前一天的晚上睡得很晚,所以他们的系统的状态不佳。(所有的教授都明白这个借口意味着什么,但政府心软,允许了重赛。)我们希望有一个胜利者,但 FCC敦促最终的合作。我们都希望那能成功。
我想解释一下我是怎么知道这些的。一如既往,是一个学生告诉的我。他来我的办公室问到:“你还记得我吗?”我使劲想,但我的学生太多而记性太差,所以还是 老实点好:“不大记得了,”我回答到--不曾想到他随后说:“你是我的论文导师。”你对一个你不记得的论文学生说什么呢?在这种情况下,我建议说些很短的 话:“多讲给我听听。”最令人吃惊的是他的论文课题。“我在设计MIT参加HDTV竞赛的系统中的滤波器组。”有的时候你就是输不了,即使你应该输。我的 确是他的导师。他是应该数学系的学生,想写一篇电子工程论文。他找到了总管MIT与通用器件公司联合参赛系统的Jae Lim,而数学系得有一个正式的委员会。我肯定同意了,因为我的签名在论文上。但是我的学生不肯告诉我他的滤波器组所用的参数h0,…,h6。他对导师的 信任还不够一个几十亿美元的秘密。

概要

一族小波是一个正交基。我们用它们的组合来表示任意信号。音乐有一个独立变量:时间。一张照片有两个变量:横向和纵向。x方向的小波乘上y方向的小波。视 频图象有三个变量:x,y和t--一系列照片代有很重要的可预测性。基之间的竞争是很激烈的。计算机比谁计算得快,数学家比谁的算法快。把运算步骤减半相 当于加倍计算速度。我不知到小波是否在参赛,大概吧。
HDTV标准是基于傅立叶变换的。小波来晚了,没法赶上视频竞赛,但在音频上很有竞争力。四个立体声道只需要TV频道的一小部分。值得一提的是,在另一个 完全不同的由FBI组织的竞赛里,小波已经赢了。FBI有三千万套指纹,而且每天还在增加。它们需要被数字化。当一个司机被叫住或一个小偷被逮着时,他们 的指纹被记录为电子信号。一个中心计算机被用来寻找匹配。在现在的情形,存在华盛顿的无数指纹文件,你挺容易逃掉的。很快一些指纹细节比如纹脉的终止和分 叉将把你抓住。人们认为老一些的傅立叶方法也能胜任,但在20:1的压缩比时,纹脉变得不连续了,在8×8的方块之间有间断。短时傅立叶变换带来太多的块 效应。用小波生成的图象质量更好。FBI的Tom Hopper选择了新
的基。

下次你被逮着,你就可以埋怨小波了。

Next Page »