Wednesday, December 16, 2009

Understanding Recovery and Removal Analysis (ZT)

Understanding Recovery and Removal Analysis

TimeQuest automatically does recovery and removal analysis, which is a type of static timing analysis that many users are unfamiliar with. The purpose of this document is to briefly describe what is being analyzed, as well as what a failure might look like in a user design, and some suggestions on what a user can do to meet timing.

From a high-level, recovery and removal ensure that the logic comes out of the reset state together, so on one clock edge every register is held in reset, and by the subsequent clock edge, every register has been released from reset.

What is Recovery and Removal analysis?

The best way to explain recovery and removal analysis is to compare it to something users generally do know, setup and hold. So starting with user constraints in their SDC file:

create_clock -period 10.000 -name clk_a clk_a

create_clock -period 10.000 -name clk_b clk_b

We have two clocks with the same period, 10ns. This results is a setup requirement of 10ns, and a hold requirement of 0ns.

The paths in a user design are then analyzed like below, where the orange line drives a synchronous port on the destination register, such as the data input, clock enable, or synchronous clear.

For most paths in a user design, clk_a and clk_b are the same clock and on a global. So the clock delays to the source and destination registers come close to cancelling each other out, and the basic requirement is that the data path between the two registers must be shorter than 10ns to pass the setup requirement, and greater than 0ns to pass the hold requirement. Make sure you understand this, as it is a very fundamental principle of timing analysis.

Based on that premise, the ONLY difference for recovery and removal is that the orange line feeds an asynchronous port on the destination register rather than a synchronous port, and timing analysis is done when the reset de-asserts itself, letting the destination register come out of reset. Everything else is identical.

Once again, in the simpler common case, if clk_a and clk_b are the same clock and on a global, i.e. there is little clock skew, then the requirement is that the path from the source register to the destination register’s asynchronous input is less than 10ns for recovery analysis, and greater than 0ns for removal analysis.

I did not draw the “cloud of logic” on this path either, just because it is generally recommended against having any logic on the reset path. So what we’ve found is that:

Recovery Analysis is analogous to Setup Analysis, except the data path feeds an asynchronous port on the destination register.

Removal Analysis is analogous to Hold Analysis, except the data path feeds an asynchronous port on the destination register.

As can be seen with static timing analysis, it is easiest to think of the asynchronous clear and preset as if they were synchronous signals and just labeled recovery and removal instead of setup and hold. Later on we will cover why we don’t just make them synchronous.


What do the names Recovery and Removal mean?

When the asynchronous signal is de-asserted, the removal ensures that it is late enough after the required clock edge, keeping the flip-flops removed from normal operation, and the recovery analysis ensures that it de-asserts before the required clock edge, allowing the registers to recover into normal operation.

Looking back at the waveform on the first page, when a reset de-assertion is launched at time 0ns, recovery and removal ensure that this signal reaches all of its destination registers between the red and blue lines, and all registers come out of reset on the same clock cycle.

What does a Recovery failure look like?

I find it useful to describe what a failure looks like in hardware for users to fully appreciate recovery and removal. Let’s start with recovery, as that is by far the most common type of failure, just like setup failures are the most common inside an FPGA.

Let’s look again at the reset from a design perspsective:

The sys_reset generally feeds all the registers in a clock domain(or group of related clock domains), which may vary from a handful of registers, to over a hundred thousand registers. If sys_reset is asserted, recovery analysis is when the sys_reset de-asserts itself and all of the destination registers come out of reset into active mode. Now, if the delay from sys_reset to all the destination registers occurs within one clock cycle, then all of the registers come out together and they all see next clock edge together. But if some registers fail recovery time, they might still be in reset on that following clock cycle and therefore not see that clock edge.

So how does this actually cause a failure? Let’s say some of these registers were in a state-machine that resets to a state called powerup, but on the first clock moves to a state called idle. Now, if some of the state-machine registers see that first clock cycle and some do not, then some registers switch to the idle state while other registers do not. This could cause the state-machine to go to the wrong state, and possibly an unknown state that it might never recover from. This would be a recovery failure, caused by the asynchronous reset signal not de-asserting its destination registers in time.

Another scenario is a design with counters that immediately start counting out of reset and are supposed to be in-sync with each other. If one of the counters misses the first clock out of reset due to a Recovery failure, then they will not be in sync and the design no longer works as expected.

Are most structures susceptible to Recovery failures?

There are two basic requirements for a recovery failure to affect the design:

1) The logic must “change state” on the first clock cycle/s

2) That “new state” must be important

The first criteria is relatively straightforward. If the user’s logic does not change value on the first clock cycle/s, then it doesn’t matter whether it sees those clocks or not. If a state-machine sits in its reset state out of power-up, then a recovery failure to some of its registers wouldn’t matter, since those registers wouldn’t change anyway. Most registers, especially control registers, do not change state on their first clock cycle.

The second criterion is a little more complex. Usually anything in the data path is ignored on the first few clocks. For example, an adder may have a recovery failure and add to the wrong value, but if the system is ignoring the data out of power-up and waiting for good data to come through, then that bad addition does not matter and the recovery failure will not affect the system.

On most designs, almost all of the logic is immune to recovery failures, as most logic will work fine even if it fails recovery timing. Some designers consciously make sure they don’t put in any logic where a recovery failure would cause problems, which is generally quite easy to do. That being said, if the small portion of a design that is susceptible to recovery failures experiences one, then the design will fail.

These failures are especially aggravating because they are inconsistent. If a design fails recovery by 500ps, most devices will not come out of reset under the worst case conditions of Process, Voltage and Temperature and work fine. But a handful of devices will experience occasional reset failures in the field. There is no way to simulate recovery failures because there is no way to say exactly which registers miss the first clock cycle and which ones do not. I have seen many users spend a LOT of time debugging these issues(when they did not do any recovery/removal analysis to begin with), and it would have been much easier to close timing on recovery and removal to begin with.

What does a Removal Failure look like?

To be honest, I have never seen a removal failure. Removal analysis should always meet timing, as long as the user is not delaying the clock to the destination register, which is usually done with a gated clock or a clock shift.

An example of what a recovery failure might look like is if the destination clock is gated, and therefore it received its clock edge later than the sys_reset source register does. When the sys_reset de-asserts, it may release some registers before the same clock edge that triggered sys_reset, much like what a hold violation looks like. The net result is similar to a recovery failure, in that some registers come out of reset a clock earlier than other registers. Similarly bad things can happen to logic that transitions on that first edge.

Why are we timing an asynchronous circuit?

This is actually the first question most users ask, but having explained how a failure occurs, it hopefully makes sense why these requirements exist. Now that we know what a failure looks like, and that essentially we are timing it as if it were synchronous, probably a better question is:

Why not just make the reset synchronous?

There are two reasons for this. The first is to more effectively utilize the device resources. The asynchronous reset/preset already exists on each FPGA register, whether it is used or not. If the designer made their reset synchronous, then they have to use one of the data inputs into the FPGA, which can hurt timing and area. For example, if a Cyclone III design, which has 4-input LUTs, had a 4-input AND gate feeding a register, then changing the register’s reset from being asynchronous to synchronous would force another input into this function. The logic would require 5-inputs, forcing it to use another LUT, making that data path both larger and slower. If the clock enable were used on this register also, that enable signal would also have to be gated with the synchronous reset(since the reset has priority), possibly making that path larger and slower too. So from a resource perspective, it makes sense to use the “free” asynchronous ports in the FPGA.

The other reason resets are designed asynchronously is to make them more robust. By being asynchronous, they can reset the design without requiring a clock. Although some designs do not have this requirement, many do, and being able to reset the design when the clock is disabled may be an important consideration. Let’s look at the following reset structure as an example of a design that can reset without a clock, but comes out of reset with a clock:

In the above diagram, the ACLR is generally asynchronous, which is why we have two(or more) registers for metastability. When the ACLR goes active to remove the design from operation, it goes through the B register in an asynchronous manner, and resets all of the destination registers, whether or not there is a clock. But when the ACLR is released, the system logic recovers after two clock cycles, allowing the VCC to pass through registers A and B in a synchronous manner. For constraining this circuit:

- The ACLR net should not be timed if it is truly asynchronous. A set_false_path assignment may be necessary.

- The path between registers A and B can get a set_max_delay assignment that is less than the clock period, ensuring the path meets timing with margin and reducing mean time between failures(MTBF). MTBF is a metastability topic which won’t be discussed here.

Note that although this is a common reset structure, it is by no means the only or best asynchronous reset structure. Many users have their own methods, and this example is purely shown on how a system can be reset without a clock and comes out of reset with a clock.

What if I don’t make Recovery timing?

If a system does not meet recovery timing, there are multiple things a user can do to close timing. The most important thing is to first understand why the design is failing timing. Let’s look at a “common” scenario for meeting timing, before looking at failures.

A common system has a reset register that fans out to the asynchronous clear/preset port of many registers(See Register B in Figure 2 above, for an example). This register usually drives a global to easily fan-out across the device. This is a very common scenario, but failures can occur if:

a) The source register is not placed near the global driver. With timing requirements, the fitter should place the register near the location it gets onto the global, but if this doesn’t occur, there will be a long delay to all of the registers. It is always worth checking that the delay to the global is not long. (Globals are somewhat slow paths because they are large, low-skew drivers that fanout across the device, so be sure not to confuse the path to the global with the actual global itself.)

b) The net is not placed on a global and takes too long to cross the device. For example, if the clock domain is 4ns, it may take more than 4ns to get across a large device, and therefore the path fails recovery.

c) The net is on a global, but the global delay is actually too slow compared to the clock rate. For example, in an EP2S180-5, it may take more than 4ns to get across the chip on a global, so the raw delay would fail timing.

d) The register is crossing clock domains, and so the recovery requirement is not realistic. For example, if the reset register is driven by an 8ns clock, and the destination register is driven by a 10ns clock, the requirement will end up being 2ns.

So what can a user do? Here are some suggestions, which must be balanced against what is going wrong:

a) If driving a global and the reset register is not placed near the global driver, then the user should add a location assignment to that register, forcing it near the global.

b) If the asynchronous reset fails timing and does not use global routing, try assigning it to use a global(In the Assignment Editor, make a Global = On assignment to the source register driving the asynchronous signal, e.g. register B in Figure 2)

c) If the asynchronous reset fails timing and uses global routing, try the using local routing. (In the Assignment Editor, make a Global = Off assignment to the source register driving the asynchronous signal, e.g. register B in Figure 2)

Note that b) and c) are exact opposites of each other. If the asynchronous clear does not drive a lot of registers, it is often faster to use local routing, since it can be placed near the registers it drives and get to them very quickly. If the asynchronous clear drives a lot of registers that are spread across the device, then using a global is often faster. There is no exact science for this, which is why both are being recommended.

d) If crossing clock domains, re-register in the new clock domain so that the recovery requirements are not unreasonable. In general, a separate reset structure should be used for each group of synchronous clock domains. (If a PLL creates related clocks with periods of 20ns, 10ns and 5ns, then just a single reset structure could be used for all three domains, since their edges are aligned. But if it creates clocks with periods of 10ns and 6.666ns, then a separate reset structure should be used for each domain)

e) If using local routing, duplicate the reset structure in large sub-hierarchies. A good example of this is shown in Application Note 470: Best Practices for Incremental Compilation Partitions and Floorplan Assignments. Search on “Cascaded Reset” to show the example. With a cascaded reset structure, the fan-outs of each reset will not span the chip, and it is recommended to not use globals for these resets, as they will quickly use up this valuable resource. This is often the only solution for high-speed domains. Note that almost all designs don’t care if different hierarchies come out of reset on different clock cycles, as long as all the registers within that hierarchy come out at the same time. So if a design uses one register to feed a top-level hierarchy, but has two cascaded registers to feed another hierarchy, it should be fine. If there is a problem, since the design meets timing, it will show up in simulation too.

f) If the logic is recovery-immune, add set_false_path assignments. For example, if a user’s state-machine is failing recovery timing, but the designer is confident it is immune to recovery failures, adding a set_false_path -from reset_register -to *state-machine* will cut it from timing analysis. It can be quite difficult to find all the logic that is immune to recovery errors, so this is generally not a recommended solution. If a customer is confident that their entire design is recovery/removal immune, then they can cut the entire net with a set_false_path -from reset_register.

g) Turn on the Physical Synthesis option Perform Automatic Asynchronous Signal Pipelining(see screenshot below). This is under Assignments -> Settings -> Physical Synthesis, and the description is given as:

Specifies that Quartus II should perform automatic insertion of pipeline stages for asynchronous clear and asynchronous load signals during fitting to increase circuit performance. This option is useful for asynchronous signals that are failing recovery and removal timing because they feed registers using a high-speed clock

What if I want to use my asynchronous ports for logic instead of a system reset?

Some designers do this, most often in schematics, since the asynchronous port is visibly sitting there. For example, if some condition occurs and the user wants that to reset a counter, they may hook it up to the aclr port of the counter. This is strongly recommended against, as the asynchronous ports in a design are not intended for this type of logic, and instead the designer should use a synchronous port. Technically, in this example, the user would need to time the reset assertion not only to the counter, but through the counter to the destination registers it feeds. When the signal de-asserts, they would then only time it to the counter. This is not how static timing analysis tools work and the path through the register will not be analyzed, as that is not the intent of asynchronous ports in the FPGA.

Conclusion

Although this may seem like a lot of information, most designs meet recovery and removal timing without any user intervention. It is important that the designer think about their reset structure early on, and design it for their system requirements. This is usually not too difficult, and as an example, many designs only require that the reset be synchronized (double-registered) when feeding unrelated different clock domains. Once that is done, everything falls into place and meets timing on its own.

Tuesday, September 8, 2009

阻抗匹配-2

输入阻抗 输出阻抗 阻抗匹配问题

输入阻抗

输入阻抗是指一个电路输入端的等效阻抗。在输入端上加上一个电压源U,测量输入端的电流I,则输入阻抗Rin就是U/I。你可以把输入端想象成一个电阻的两端,这个电阻的阻值,就是输入阻抗。

输入阻抗跟一个普通的电抗元件没什么两样,它反映了对电流 阻碍作用的大小。对于电压驱动的电路,输入阻抗越大,则对电压源的负载就越轻,因而就越容易驱动,也不会对信号源有影响;而对于电流驱动型的电路,输入阻 抗越小,则对电流源的负载就越轻。因此,我们可以这样认为:如果是用电压源来驱动的,则输入阻抗越大越好;如果是用电流源来驱动的,则阻抗越小越好(注: 只适合于低频电路,在高频电路中,还要考虑阻抗匹配问题。另外如果要获取最大输出功率时,也要考虑 阻抗匹配问题

输出阻抗

无论信号源或放大器还有电源,都有输出阻抗的问题。输出阻抗就是一个信号源的内阻。本来,对于一个理想的电压源(包括电源),内阻应该为0,或理想电流源的阻抗应当为无穷大。输出阻抗在电路设计最特别需要注意

但现实中的电压源,则不能做到这一点。我们常用一个理想电压源串联一个电阻r的方式来等效一个实际的电压源。这个跟 理想电压源串联的电阻r,就是(信号源/放大器输出/电源)的内阻了。当这个电压源给负载供电时,就会有电流I从这个负载上流过,并在这个电阻上产生I× r的电压降。这将导致电源输出电压的下降,从而限制了最大输出功率(关于为什么会限制最大输出功率,请看后面的“阻抗匹配”一问)。同样的,一个理想的电 流源,输出阻抗应该是无穷大,但实际的电路是不可能的

阻抗匹配

阻抗匹配是指信号源或者传输线跟负载之间的一种合适的搭配方式。阻抗匹配分为低频和高频两种情况讨论。 我们先从直流电压源驱动一个负载入手。由于实际的电压源,总是有内阻的(请参看输出阻抗一问),我们可以把一个实际电压源,等效成一个理想的电压源跟一个 电阻r串联的模型。假设负载电阻为R,电源电动势为U,内阻为r,那么我们可以计算出流过电阻R的电流为:I=U/(R+r),可以看出,负载电阻R越 小,则输出电流越大。负载R上的电压为:Uo=IR=U/[1+(r/R)],可以看出,负载电阻R越大,则输出电压Uo越高。再来计算一下电阻R消耗的 功率为:

P=I2×R=[U/(R+r)]2×R=U2×R/(R2+2×R×r+r2)

=U2×R/[(R-r)2+4×R×r]

=U2/{[(R-r)2/R]+4×r}

对于一个给定的信号源,其内阻r是固定的,而负载电阻R则是由我们来选择的。注意式中[(R-r)2/R],当R=r时,[(R-r)2/R]可取得最小值0,这时负载电阻R上可获得最大输出功率Pmax=U2/(4×r)。即,当负载电阻跟信号源内阻相等时,负载可获得最大输出功率,这就是我们常说的阻抗匹配之一对 于纯电阻电路,此结论同样适用于低频电路及高频电路。当交流电路中含有容性或感性阻抗时,结论有所改变,就是需要信号源与负载阻抗的的实部相等,虚部互为 相反数,这叫做共扼匹配。在低频电路中,我们一般不考虑传输线的匹配问题,只考虑信号源跟负载之间的情况,因为低频信号的波长相对于传输线来说很长,传输 线可以看成是“短线”,反射可以不考虑(可以这么理解:因为线短,即使反射回来,跟原信号还是一样的)。从以上分析我们可以得出结论:如果我们需要输出电 流大,则选择小的负载R;如果我们需要输出电压大,则选择大的负载R;如果我们需要输出功率最大,则选择跟信号源内阻匹配的电阻R。有时阻抗不匹配还有另 外一层意思,例如一些仪器输出端是在特定的负载条件下设计的,如果负载条件改变了,则可能达不到原来的性能,这时我们也会叫做阻抗失配。

在高频电路中,我们还必须考虑反射的问题。当信号的频率很 高时,则信号的波长就很短,当波长短得跟传输线长度可以比拟时,反射信号叠加在原信号上将会改变原信号的形状。如果传输线的特征阻抗跟负载阻抗不相等(即 不匹配)时,在负载端就会产生反射。为什么阻抗不匹配时会产生反射以及特征阻抗的求解方法,牵涉到二阶偏微分方程的求解,在这里我们不细说了,有兴趣的可 参看电磁场与微波方面书籍中的传输线理论。传输线的特征阻抗(也叫做特性阻抗)是由传输线的结构以及材料决定的,而与传输线的长度,以及信号的幅度、频率 等均无关。

例如,常用的闭路电视同轴电缆特性阻抗为75Ω,而一些射 频设备上则常用特征阻抗为50Ω的同轴电缆。另外还有一种常见的传输线是特性阻抗为300Ω的扁平平行线,这在农村使用的电视天线架上比较常见,用来做八 木天线的馈线。因为电视机的射频输入端输入阻抗为75Ω,所以300Ω的馈线将与其不能匹配。实际中是如何解决这个问题的呢?不知道大家有没有留意到,电 视机的附件中,有一个300Ω到75Ω的阻抗转换器(一个塑料封装的,一端有一个圆形的插头的那个东东,大概有两个大拇指那么大)。它里面其实就是一个传 输线变压器,将300Ω的阻抗,变换成75Ω的,这样就可以匹配起来了。这里需要强调一点的是,特性阻抗跟我们通常理解的电阻不是一个概念,它与传输线的 长度无关,也不能通过使用欧姆表来测量。为了不产生反射,负载阻抗跟传输线的特征阻抗应该相等,这就是传输线的阻抗匹配,如果阻抗不匹配会有什么不良后果 呢?如果不匹配,则会形成反射,能量传递不过去,降低效率;会在传输线上形成驻波(简单的理解,就是有些地方信号强,有些地方信号弱),导致传输线的有效 功率容量降低;功率发射不出去,甚至会损坏发射设备。如果是电路板上的高速信号线与负载阻抗不匹配时,会产生震荡,辐射干扰等。

当阻抗不匹配时,有哪些办法让它匹配呢?第一,可以考虑使 用变压器来做阻抗转换,就像上面所说的电视机中的那个例子那样。第二,可以考虑使用串联/并联电容或电感的办法,这在调试射频电路时常使用。第三,可以考 虑使用串联/并联电阻的办法。一些驱动器的阻抗比较低,可以串联一个合适的电阻来跟传输线匹配,例如高速信号线,有时会串联一个几十欧的电阻。而一些接收 器的输入阻抗则比较高,可以使用并联电阻的方法,来跟传输线匹配,例如,485总线接收器,常在数据线终端并联120欧的匹配电阻。