Discussion:
[libopencm3-devel] Delay required after enabling peripheral clock
Matthew Lai
2017-03-13 02:33:26 UTC
Permalink
Noticed this while going through the stm32f7 reference manual (RM0385),
and thought it's interesting:

Each peripheral clock can be enabled by the xxxxEN bit of the
RCC_AHBxENR or RCC_APBxENRy registers. When the peripheral clock is
not active, the peripheral registers read or write accesses are not
supported. The peripheral enable bit has a synchronization mechanism
to create a glitch free clock for the peripheral. After the enable
bit is set, there is a 2 peripheral clock cycles delay before the
clock being active.

Just after enabling the clock for a peripheral, software must wait
for a 2 peripheral clock cycles delay before accessing the
peripheral registers.

This is not mentioned in the stm32f42x reference manual for example.
However, there's something similar on the errata sheet (and for stm32l1
as well) [1][2]:

A delay between an RCC peripheral clock enable and the effective
peripheral enabling should be taken into account in order to manage
the peripheral read/write to registers. This delay depends on the
peripheral's mapping:

• If the peripheral is mapped on AHB: the delay should be equal to 2
AHB cycles.

• If the peripheral is mapped on APB: the delay should be equal to 1
+ (AHB/APB prescaler) cycles.

Should we add the delays to rcc_periph_clock_enable() and
rcc_peripheral_enable_clock()?

Matthew

[1]:
http://www.st.com/content/ccc/resource/technical/document/errata_sheet/68/8c/cb/af/de/48/40/17/DM00097022.pdf/files/DM00097022.pdf/jcr:content/translations/en.DM00097022.pdf

[2]:
http://www.st.com/content/ccc/resource/technical/document/errata_sheet/38/e6/37/64/08/38/45/67/DM00068628.pdf/files/DM00068628.pdf/jcr:content/translations/en.DM00068628.pdf
Chuck McManis
2017-03-13 02:44:12 UTC
Permalink
2 AHB cycles are generally essentially one instruction so a delay is
somewhat pointless, there are 4 AHB cycles consumed when returning from the
call to rcc_periph_clock_enable(). This limitation would be hit if you were
to pipeline the clock enable and the first register write in sequence so
something like

RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
GPIO_MODSETR(GPIOA) |= 0x33;

This becomes an issue when you have enable and access instructions in the
mini-instruction cache of the F7 I suspect.

--Chuck
Post by Matthew Lai
Noticed this while going through the stm32f7 reference manual (RM0385),
Each peripheral clock can be enabled by the xxxxEN bit of the RCC_AHBxENR
or RCC_APBxENRy registers. When the peripheral clock is not active, the
peripheral registers read or write accesses are not supported. The
peripheral enable bit has a synchronization mechanism to create a glitch
free clock for the peripheral. After the enable bit is set, there is a 2
peripheral clock cycles delay before the clock being active.
Just after enabling the clock for a peripheral, software must wait for a 2
peripheral clock cycles delay before accessing the peripheral registers.
This is not mentioned in the stm32f42x reference manual for example.
However, there's something similar on the errata sheet (and for stm32l1 as
A delay between an RCC peripheral clock enable and the effective
peripheral enabling should be taken into account in order to manage the
peripheral read/write to registers. This delay depends on the peripheral's
• If the peripheral is mapped on AHB: the delay should be equal to 2 AHB
cycles.
• If the peripheral is mapped on APB: the delay should be equal to 1 +
(AHB/APB prescaler) cycles.
Should we add the delays to rcc_periph_clock_enable() and
rcc_peripheral_enable_clock()?
Matthew
[1]: http://www.st.com/content/ccc/resource/technical/document/
errata_sheet/68/8c/cb/af/de/48/40/17/DM00097022.pdf/files/
DM00097022.pdf/jcr:content/translations/en.DM00097022.pdf
[2]: http://www.st.com/content/ccc/resource/technical/document/
errata_sheet/38/e6/37/64/08/38/45/67/DM00068628.pdf/files/
DM00068628.pdf/jcr:content/translations/en.DM00068628.pdf
------------------------------------------------------------
------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
libopencm3-devel mailing list
https://lists.sourceforge.net/lists/listinfo/libopencm3-devel
Matthew Lai
2017-03-13 11:25:16 UTC
Permalink
Oops, looks like I only replied to Chuck.

--------------------------------------------------------------------------

When compiling with optimizations, I would expect
rcc_periph_clock_enable() to be inlined even at -Os since function body
(one instruction after constant propagation) is smaller than the code to
call the function.

Peripherals on APB would also need more cycles.
Post by Chuck McManis
Try to break it. You know the conditions try to create them.
I suspect it's not so easy to reproduce - otherwise we would be seeing
many more bug reports.

It may only happen for some peripherals when the peripheral and the CPU
are in specific states.

I don't think trying to break it would really be useful - a positive
result means we need to fix it, and a negative result still doesn't mean
we don't have to worry about it.

Matthew
Post by Chuck McManis
2 AHB cycles are generally essentially one instruction so a delay is
somewhat pointless, there are 4 AHB cycles consumed when returning
from the call to rcc_periph_clock_enable(). This limitation would be
hit if you were to pipeline the clock enable and the first register
write in sequence so something like
RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
GPIO_MODSETR(GPIOA) |= 0x33;
This becomes an issue when you have enable and access instructions in
the mini-instruction cache of the F7 I suspect.
--Chuck
Noticed this while going through the stm32f7 reference manual
Each peripheral clock can be enabled by the xxxxEN bit of the
RCC_AHBxENR or RCC_APBxENRy registers. When the peripheral
clock is not active, the peripheral registers read or write
accesses are not supported. The peripheral enable bit has a
synchronization mechanism to create a glitch free clock for
the peripheral. After the enable bit is set, there is a 2
peripheral clock cycles delay before the clock being active.
Just after enabling the clock for a peripheral, software must
wait for a 2 peripheral clock cycles delay before accessing
the peripheral registers.
This is not mentioned in the stm32f42x reference manual for
example. However, there's something similar on the errata sheet
A delay between an RCC peripheral clock enable and the
effective peripheral enabling should be taken into account in
order to manage the peripheral read/write to registers. This
• If the peripheral is mapped on AHB: the delay should be
equal to 2 AHB cycles.
• If the peripheral is mapped on APB: the delay should be
equal to 1 + (AHB/APB prescaler) cycles.
Should we add the delays to rcc_periph_clock_enable() and
rcc_peripheral_enable_clock()?
Matthew
http://www.st.com/content/ccc/resource/technical/document/errata_sheet/68/8c/cb/af/de/48/40/17/DM00097022.pdf/files/DM00097022.pdf/jcr:content/translations/en.DM00097022.pdf
<http://www.st.com/content/ccc/resource/technical/document/errata_sheet/68/8c/cb/af/de/48/40/17/DM00097022.pdf/files/DM00097022.pdf/jcr:content/translations/en.DM00097022.pdf>
http://www.st.com/content/ccc/resource/technical/document/errata_sheet/38/e6/37/64/08/38/45/67/DM00068628.pdf/files/DM00068628.pdf/jcr:content/translations/en.DM00068628.pdf
<http://www.st.com/content/ccc/resource/technical/document/errata_sheet/38/e6/37/64/08/38/45/67/DM00068628.pdf/files/DM00068628.pdf/jcr:content/translations/en.DM00068628.pdf>
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
libopencm3-devel mailing list
https://lists.sourceforge.net/lists/listinfo/libopencm3-devel
<https://lists.sourceforge.net/lists/listinfo/libopencm3-devel>
Karl Palsson
2017-03-13 11:44:31 UTC
Permalink
Post by Matthew Lai
Oops, looks like I only replied to Chuck.
--------------------------------------------------------------------------
When compiling with optimizations, I would expect
rcc_periph_clock_enable() to be inlined even at -Os since
function body (one instruction after constant propagation) is
smaller than the code to call the function.
except it probably won't, because it's not static, and not marked
inline. It might get inlined if you use LTO, but I wouldn't rely
on it.

eg

```
rcc_periph_clock_enable(RCC_TIM6);
80008bc: f240 4084 movw r0, #1156 ; 0x484
80008c0: f001 ffeb bl 800289a <rcc_periph_clock_enable>
rcc_periph_clock_enable(RCC_TIM7);
80008c4: f240 4085 movw r0, #1157 ; 0x485
80008c8: f001 ffe7 bl 800289a <rcc_periph_clock_enable>
```
Post by Matthew Lai
Peripherals on APB would also need more cycles.
Try to break it. You know the conditions try to create them.
I suspect it's not so easy to reproduce - otherwise we would be
seeing many more bug reports.
It may only happen for some peripherals when the peripheral and
the CPU are in specific states.
I don't think trying to break it would really be useful - a
positive result means we need to fix it, and a negative result
still doesn't mean we don't have to worry about it.
I would ignore it. There's very few places where it will really matter, and the people who will need it, will be handling it themselves anyway, not using high level helper functions.
Matthew Lai
2017-03-13 12:18:03 UTC
Permalink
Post by Karl Palsson
Post by Matthew Lai
Oops, looks like I only replied to Chuck.
--------------------------------------------------------------------------
When compiling with optimizations, I would expect
rcc_periph_clock_enable() to be inlined even at -Os since
function body (one instruction after constant propagation) is
smaller than the code to call the function.
except it probably won't, because it's not static, and not marked
inline. It might get inlined if you use LTO, but I wouldn't rely
on it.
eg
```
rcc_periph_clock_enable(RCC_TIM6);
80008bc: f240 4084 movw r0, #1156 ; 0x484
80008c0: f001 ffeb bl 800289a <rcc_periph_clock_enable>
rcc_periph_clock_enable(RCC_TIM7);
80008c4: f240 4085 movw r0, #1157 ; 0x485
80008c8: f001 ffe7 bl 800289a <rcc_periph_clock_enable>
```
For things like fast GPIO toggling, turning on LTO is the easiest way to
get the same performance as writing to registers directly, without
having to change the code at all.
Post by Karl Palsson
Post by Matthew Lai
Peripherals on APB would also need more cycles.
Try to break it. You know the conditions try to create them.
I suspect it's not so easy to reproduce - otherwise we would be
seeing many more bug reports.
It may only happen for some peripherals when the peripheral and
the CPU are in specific states.
I don't think trying to break it would really be useful - a
positive result means we need to fix it, and a negative result
still doesn't mean we don't have to worry about it.
I would ignore it. There's very few places where it will really matter, and the people who will need it, will be handling it themselves anyway, not using high level helper functions.
It would mean things like:
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);

Would violate this, at high prescalar settings (or low prescalar
settings with LTO). That is actually the example given here:
https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70

It would be a potential source of very difficult to find bugs.

Since I imagine rcc_periph_clock_enable() to not be performance-critical
in most applications, wouldn't adding the wait states be better default
behaviour, and people who really need to enable clocks very quickly can
go to registers directly?

Thanks
Matthew
Karl Palsson
2017-03-13 13:25:34 UTC
Permalink
Post by Matthew Lai
Post by Karl Palsson
I would ignore it. There's very few places where it will really matter, and the people who will need it, will be handling it themselves anyway, not using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
Post by Matthew Lai
Since I imagine rcc_periph_clock_enable() to not be
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.

Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.

In my opinion at least :)

Cheers,
Karl P

[1] see
https://github.com/libopencm3/libopencm3/commit/095ed8511a82e7f7da5c59c42ca116bb228a20d6
and there's a whole class of bugs likely related there. That's
just off the top of my head.
Matthew Lai
2017-03-13 17:57:07 UTC
Permalink
Post by Karl Palsson
Post by Matthew Lai
Post by Karl Palsson
I would ignore it. There's very few places where it will really matter, and the people who will need it, will be handling it themselves anyway, not using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3 functions
to setup clocks, enables clock for a timer, and uses it right away will
be bitten.

I'd say we should protect the user in this case, since 1) it's a very
common sequence, 2) most users probably don't know about it (it's not
mentioned at all in reference manuals until f7), 3) resulting bugs will
be very hard to find, and 4) I think there's a pretty easy fix (see below).

APB1 is usually AHB/4 on f2, f4, and f7 at least. That means 5 cycles
are required for f2 and f4, and 8 cycles on f7. I wouldn't want to rely
on the compiler to happen to generate that much delay, especially on f7
which is dual-issue (8 cycles can mean 16 instructions).
Post by Karl Palsson
Post by Matthew Lai
Since I imagine rcc_periph_clock_enable() to not be
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7 chips. All
that's required is a DSB barrier instruction. We don't actually have to
do any cycle counting, or determining which bus the peripheral is on.

Unfortunately that option is not mentioned for f7, which requires 2x
peripheral clock cycles (could be because of the new AXIM bus). However,
there's still no need to figure out which bus a peripheral is on - we
can always just do enough delay for APB1. There's no harm in delaying
for longer than necessary, unless someone needs to enable clocks really
quickly. Figuring out APB1 frequency is not easy. I would just use the
globals and add a note in the documentation that users are on their own
if they are setting up their own clocks.

Matthew
Chuck McManis
2017-03-13 19:41:53 UTC
Permalink
Matthew, it feels like you are trying to create an argument here.

If you want to be helpful, and this is actually a problem given the way
libopencm3 is built, then *reproduce the problem* with libopencm3. You can
use any combination of compiler flags you want, you just can't change the
source inside of libopencm3.

*If* you can reproduce the problem then that will show what the window of
opportunity is. If you cannot, then you will see that Karl is correct in
that it won't occur in libopencm3.

If having this issue is such a problem for you, then by all means use the
ST Micro supplied software which is written presumably by their experts.

---Chuck
Post by Karl Palsson
I would ignore it. There's very few places where it will really matter,
Post by Matthew Lai
Post by Karl Palsson
and the people who will need it, will be handling it themselves anyway, not
using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm
32/common/timer_common_all.c#L70
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3 functions to
setup clocks, enables clock for a timer, and uses it right away will be
bitten.
I'd say we should protect the user in this case, since 1) it's a very
common sequence, 2) most users probably don't know about it (it's not
mentioned at all in reference manuals until f7), 3) resulting bugs will be
very hard to find, and 4) I think there's a pretty easy fix (see below).
APB1 is usually AHB/4 on f2, f4, and f7 at least. That means 5 cycles are
required for f2 and f4, and 8 cycles on f7. I wouldn't want to rely on the
compiler to happen to generate that much delay, especially on f7 which is
dual-issue (8 cycles can mean 16 instructions).
Post by Karl Palsson
Since I imagine rcc_periph_clock_enable() to not be
Post by Matthew Lai
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7 chips. All that's
required is a DSB barrier instruction. We don't actually have to do any
cycle counting, or determining which bus the peripheral is on.
Unfortunately that option is not mentioned for f7, which requires 2x
peripheral clock cycles (could be because of the new AXIM bus). However,
there's still no need to figure out which bus a peripheral is on - we can
always just do enough delay for APB1. There's no harm in delaying for
longer than necessary, unless someone needs to enable clocks really
quickly. Figuring out APB1 frequency is not easy. I would just use the
globals and add a note in the documentation that users are on their own if
they are setting up their own clocks.
Matthew
Matthew Lai
2017-03-13 20:01:23 UTC
Permalink
Chuck, I'm not sure why you feel that way, but having disagreements and
discussions about technical aspects of a project is usually a good thing.

I am also not sure how I am not being helpful by pointing out something
on the errata sheets that would be a potential source of bugs, some ways
it can happen, and some ways to mitigate it, for different chips, based
on the chip manufacturer's recommendations.

Use any combination of compiler flags to do what? Prove that there will
be a less than 16 instructions delay between clock enable and the first
peripheral register write? Does that really need proving? You don't even
need to change compiler flags for that.

I also have no idea what ST's supplied software has anything to do with
any of this.

Can we go back to technical discussions now?

If so, here is how Chromium OS fixed it:
https://chromium-review.googlesource.com/c/246181/

Matthew
Post by Chuck McManis
Matthew, it feels like you are trying to create an argument here.
If you want to be helpful, and this is actually a problem given the
way libopencm3 is built, then *reproduce the problem* with libopencm3.
You can use any combination of compiler flags you want, you just can't
change the source inside of libopencm3.
*If* you can reproduce the problem then that will show what the window
of opportunity is. If you cannot, then you will see that Karl is
correct in that it won't occur in libopencm3.
If having this issue is such a problem for you, then by all means use
the ST Micro supplied software which is written presumably by their
experts.
---Chuck
I would ignore it. There's very few places where it
will really matter, and the people who will need it,
will be handling it themselves anyway, not using high
level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70
<https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70>
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3
functions to setup clocks, enables clock for a timer, and uses it
right away will be bitten.
I'd say we should protect the user in this case, since 1) it's a
very common sequence, 2) most users probably don't know about it
(it's not mentioned at all in reference manuals until f7), 3)
resulting bugs will be very hard to find, and 4) I think there's a
pretty easy fix (see below).
APB1 is usually AHB/4 on f2, f4, and f7 at least. That means 5
cycles are required for f2 and f4, and 8 cycles on f7. I wouldn't
want to rely on the compiler to happen to generate that much
delay, especially on f7 which is dual-issue (8 cycles can mean 16
instructions).
Since I imagine rcc_periph_clock_enable() to not be
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7 chips.
All that's required is a DSB barrier instruction. We don't
actually have to do any cycle counting, or determining which bus
the peripheral is on.
Unfortunately that option is not mentioned for f7, which requires
2x peripheral clock cycles (could be because of the new AXIM bus).
However, there's still no need to figure out which bus a
peripheral is on - we can always just do enough delay for APB1.
There's no harm in delaying for longer than necessary, unless
someone needs to enable clocks really quickly. Figuring out APB1
frequency is not easy. I would just use the globals and add a note
in the documentation that users are on their own if they are
setting up their own clocks.
Matthew
Chuck McManis
2017-03-13 20:15:57 UTC
Permalink
Matthew,

I'm really grateful you brought up the warning you read in the F7 manual.
Clearly if someone tried to read or write a register and it wasn't ready
the system would hard fault and that would be bad.

I have reviewed the code and the warning in the datasheet and believe it is
impossible to hit this situation with the current library. Certainly on the
F0-F4 series and unlikely on the F7 series.

It sounded like you still had concerns. If you do, it would be super
helpful if you had an example that demonstrated this problem and especially
helpful if that example could be recreated on one of the ST demo boards.

--Chuck
Post by Matthew Lai
Chuck, I'm not sure why you feel that way, but having disagreements and
discussions about technical aspects of a project is usually a good thing.
I am also not sure how I am not being helpful by pointing out something on
the errata sheets that would be a potential source of bugs, some ways it
can happen, and some ways to mitigate it, for different chips, based on the
chip manufacturer's recommendations.
Use any combination of compiler flags to do what? Prove that there will be
a less than 16 instructions delay between clock enable and the first
peripheral register write? Does that really need proving? You don't even
need to change compiler flags for that.
I also have no idea what ST's supplied software has anything to do with
any of this.
Can we go back to technical discussions now?
If so, here is how Chromium OS fixed it: https://chromium-review.
googlesource.com/c/246181/
Matthew
Matthew, it feels like you are trying to create an argument here.
If you want to be helpful, and this is actually a problem given the way
libopencm3 is built, then *reproduce the problem* with libopencm3. You can
use any combination of compiler flags you want, you just can't change the
source inside of libopencm3.
*If* you can reproduce the problem then that will show what the window of
opportunity is. If you cannot, then you will see that Karl is correct in
that it won't occur in libopencm3.
If having this issue is such a problem for you, then by all means use the
ST Micro supplied software which is written presumably by their experts.
---Chuck
Post by Karl Palsson
I would ignore it. There's very few places where it will really matter,
Post by Matthew Lai
Post by Karl Palsson
and the people who will need it, will be handling it themselves anyway, not
using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm
32/common/timer_common_all.c#L70
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3 functions to
setup clocks, enables clock for a timer, and uses it right away will be
bitten.
I'd say we should protect the user in this case, since 1) it's a very
common sequence, 2) most users probably don't know about it (it's not
mentioned at all in reference manuals until f7), 3) resulting bugs will be
very hard to find, and 4) I think there's a pretty easy fix (see below).
APB1 is usually AHB/4 on f2, f4, and f7 at least. That means 5 cycles are
required for f2 and f4, and 8 cycles on f7. I wouldn't want to rely on the
compiler to happen to generate that much delay, especially on f7 which is
dual-issue (8 cycles can mean 16 instructions).
Post by Karl Palsson
Since I imagine rcc_periph_clock_enable() to not be
Post by Matthew Lai
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7 chips. All
that's required is a DSB barrier instruction. We don't actually have to do
any cycle counting, or determining which bus the peripheral is on.
Unfortunately that option is not mentioned for f7, which requires 2x
peripheral clock cycles (could be because of the new AXIM bus). However,
there's still no need to figure out which bus a peripheral is on - we can
always just do enough delay for APB1. There's no harm in delaying for
longer than necessary, unless someone needs to enable clocks really
quickly. Figuring out APB1 frequency is not easy. I would just use the
globals and add a note in the documentation that users are on their own if
they are setting up their own clocks.
Matthew
Matthew Lai
2017-03-13 20:58:22 UTC
Permalink
Code:

---------------------------------------------------------------------

#include <libopencm3/stm32/rcc.h>
#include <libopencm3/stm32/timer.h>

int main(void)
{
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1);
return 0;
}

---------------------------------------------------------------------

Using default compiler options for f4:

---------------------------------------------------------------------

080001ac <main>:
80001ac: b508 push {r3, lr}
80001ae: f640 0001 movw r0, #2049 ; 0x801
80001b2: f000 f80b bl 80001cc <rcc_periph_clock_enable>
80001b6: 2101 movs r1, #1
80001b8: 4802 ldr r0, [pc, #8] ; (80001c4 <main+0x18>)
80001ba: f000 f805 bl 80001c8 <timer_set_period>
80001be: 2000 movs r0, #0
80001c0: bd08 pop {r3, pc}
80001c2: bf00 nop
80001c4: 40000400 .word 0x40000400

080001c8 <timer_set_period>:
80001c8: 62c1 str r1, [r0, #44] ; 0x2c
80001ca: 4770 bx lr

080001cc <rcc_periph_clock_enable>:
80001cc: 0943 lsrs r3, r0, #5
80001ce: f103 4380 add.w r3, r3, #1073741824 ; 0x40000000
80001d2: f503 330e add.w r3, r3, #145408 ; 0x23800
80001d6: f000 021f and.w r2, r0, #31
80001da: 6819 ldr r1, [r3, #0]
80001dc: 2001 movs r0, #1
80001de: 4090 lsls r0, r2
80001e0: 4308 orrs r0, r1
80001e2: 6018 str r0, [r3, #0]
80001e4: 4770 bx lr

---------------------------------------------------------------------

The store in rcc_periph_clock_enable happens at 80001e2. After returning
to main(), we take 2 cycles to load arguments to timer_set_period, then
calls it, and timer_set_period writes to the timer register in the first
instruction.

If my math is correct, the second write happens on the 5th cycle after
the first write. This violates the errata sheet constraint by one cycle.
For APB peripherals, the errata sheet says we need to wait 1 + (AHB/APB)
cycles, which in this case would be 5 (most people use (AHB/APB = 4).

Now this is with default compiler options on f4, with the lowest APB
prescaler. With a higher prescaler, it would be even worse. With -flto,
it would be even worse. On f7, we are very very far from the 8 cycles
required (keeping in mind that Cortex-M7 can execute 2 instructions per
cycle).

So I believe we are hitting it on all series (that require this delay),
at default compiler settings and clock settings provided by libopencm3.

Matthew
Post by Chuck McManis
Matthew,
I'm really grateful you brought up the warning you read in the F7
manual. Clearly if someone tried to read or write a register and it
wasn't ready the system would hard fault and that would be bad.
I have reviewed the code and the warning in the datasheet and believe
it is impossible to hit this situation with the current library.
Certainly on the F0-F4 series and unlikely on the F7 series.
It sounded like you still had concerns. If you do, it would be super
helpful if you had an example that demonstrated this problem and
especially helpful if that example could be recreated on one of the ST
demo boards.
--Chuck
Chuck, I'm not sure why you feel that way, but having
disagreements and discussions about technical aspects of a project
is usually a good thing.
I am also not sure how I am not being helpful by pointing out
something on the errata sheets that would be a potential source of
bugs, some ways it can happen, and some ways to mitigate it, for
different chips, based on the chip manufacturer's recommendations.
Use any combination of compiler flags to do what? Prove that there
will be a less than 16 instructions delay between clock enable and
the first peripheral register write? Does that really need
proving? You don't even need to change compiler flags for that.
I also have no idea what ST's supplied software has anything to do
with any of this.
Can we go back to technical discussions now?
https://chromium-review.googlesource.com/c/246181/
<https://chromium-review.googlesource.com/c/246181/>
Matthew
Post by Chuck McManis
Matthew, it feels like you are trying to create an argument here.
If you want to be helpful, and this is actually a problem given
the way libopencm3 is built, then *reproduce the problem* with
libopencm3. You can use any combination of compiler flags you
want, you just can't change the source inside of libopencm3.
*If* you can reproduce the problem then that will show what the
window of opportunity is. If you cannot, then you will see that
Karl is correct in that it won't occur in libopencm3.
If having this issue is such a problem for you, then by all means
use the ST Micro supplied software which is written presumably by
their experts.
---Chuck
I would ignore it. There's very few places where
it will really matter, and the people who will
need it, will be handling it themselves anyway,
not using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the
example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70
<https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/timer_common_all.c#L70>
It would be a potential source of very difficult to
find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3
functions to setup clocks, enables clock for a timer, and
uses it right away will be bitten.
I'd say we should protect the user in this case, since 1)
it's a very common sequence, 2) most users probably don't
know about it (it's not mentioned at all in reference manuals
until f7), 3) resulting bugs will be very hard to find, and
4) I think there's a pretty easy fix (see below).
APB1 is usually AHB/4 on f2, f4, and f7 at least. That means
5 cycles are required for f2 and f4, and 8 cycles on f7. I
wouldn't want to rely on the compiler to happen to generate
that much delay, especially on f7 which is dual-issue (8
cycles can mean 16 instructions).
Since I imagine rcc_periph_clock_enable() to not be
performance-critical in most applications, wouldn't
adding the
wait states be better default behaviour, and people
who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7
chips. All that's required is a DSB barrier instruction. We
don't actually have to do any cycle counting, or determining
which bus the peripheral is on.
Unfortunately that option is not mentioned for f7, which
requires 2x peripheral clock cycles (could be because of the
new AXIM bus). However, there's still no need to figure out
which bus a peripheral is on - we can always just do enough
delay for APB1. There's no harm in delaying for longer than
necessary, unless someone needs to enable clocks really
quickly. Figuring out APB1 frequency is not easy. I would
just use the globals and add a note in the documentation that
users are on their own if they are setting up their own clocks.
Matthew
Chuck McManis
2017-03-13 21:24:07 UTC
Permalink
Thank you Matthew, that is excellent.
--Chuck
Post by Matthew Lai
---------------------------------------------------------------------
#include <libopencm3/stm32/rcc.h>
#include <libopencm3/stm32/timer.h>
int main(void)
{
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1);
return 0;
}
---------------------------------------------------------------------
---------------------------------------------------------------------
80001ac: b508 push {r3, lr}
80001ae: f640 0001 movw r0, #2049 ; 0x801
80001b2: f000 f80b bl 80001cc <rcc_periph_clock_enable>
80001b6: 2101 movs r1, #1
80001b8: 4802 ldr r0, [pc, #8] ; (80001c4 <main+0x18>)
80001ba: f000 f805 bl 80001c8 <timer_set_period>
80001be: 2000 movs r0, #0
80001c0: bd08 pop {r3, pc}
80001c2: bf00 nop
80001c4: 40000400 .word 0x40000400
80001c8: 62c1 str r1, [r0, #44] ; 0x2c
80001ca: 4770 bx lr
80001cc: 0943 lsrs r3, r0, #5
80001ce: f103 4380 add.w r3, r3, #1073741824 ; 0x40000000
80001d2: f503 330e add.w r3, r3, #145408 ; 0x23800
80001d6: f000 021f and.w r2, r0, #31
80001da: 6819 ldr r1, [r3, #0]
80001dc: 2001 movs r0, #1
80001de: 4090 lsls r0, r2
80001e0: 4308 orrs r0, r1
80001e2: 6018 str r0, [r3, #0]
80001e4: 4770 bx lr
---------------------------------------------------------------------
The store in rcc_periph_clock_enable happens at 80001e2. After returning
to main(), we take 2 cycles to load arguments to timer_set_period, then
calls it, and timer_set_period writes to the timer register in the first
instruction.
If my math is correct, the second write happens on the 5th cycle after the
first write. This violates the errata sheet constraint by one cycle. For
APB peripherals, the errata sheet says we need to wait 1 + (AHB/APB)
cycles, which in this case would be 5 (most people use (AHB/APB = 4).
Now this is with default compiler options on f4, with the lowest APB
prescaler. With a higher prescaler, it would be even worse. With -flto, it
would be even worse. On f7, we are very very far from the 8 cycles required
(keeping in mind that Cortex-M7 can execute 2 instructions per cycle).
So I believe we are hitting it on all series (that require this delay), at
default compiler settings and clock settings provided by libopencm3.
Matthew
Matthew,
I'm really grateful you brought up the warning you read in the F7 manual.
Clearly if someone tried to read or write a register and it wasn't ready
the system would hard fault and that would be bad.
I have reviewed the code and the warning in the datasheet and believe it
is impossible to hit this situation with the current library. Certainly on
the F0-F4 series and unlikely on the F7 series.
It sounded like you still had concerns. If you do, it would be super
helpful if you had an example that demonstrated this problem and especially
helpful if that example could be recreated on one of the ST demo boards.
--Chuck
Post by Matthew Lai
Chuck, I'm not sure why you feel that way, but having disagreements and
discussions about technical aspects of a project is usually a good thing.
I am also not sure how I am not being helpful by pointing out something
on the errata sheets that would be a potential source of bugs, some ways it
can happen, and some ways to mitigate it, for different chips, based on the
chip manufacturer's recommendations.
Use any combination of compiler flags to do what? Prove that there will
be a less than 16 instructions delay between clock enable and the first
peripheral register write? Does that really need proving? You don't even
need to change compiler flags for that.
I also have no idea what ST's supplied software has anything to do with
any of this.
Can we go back to technical discussions now?
<https://chromium-review.googlesource.com/c/246181/>
https://chromium-review.googlesource.com/c/246181/
Matthew
Matthew, it feels like you are trying to create an argument here.
If you want to be helpful, and this is actually a problem given the way
libopencm3 is built, then *reproduce the problem* with libopencm3. You can
use any combination of compiler flags you want, you just can't change the
source inside of libopencm3.
*If* you can reproduce the problem then that will show what the window of
opportunity is. If you cannot, then you will see that Karl is correct in
that it won't occur in libopencm3.
If having this issue is such a problem for you, then by all means use the
ST Micro supplied software which is written presumably by their experts.
---Chuck
Post by Matthew Lai
Post by Karl Palsson
I would ignore it. There's very few places where it will really
Post by Matthew Lai
Post by Karl Palsson
matter, and the people who will need it, will be handling it themselves
anyway, not using high level helper functions.
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1024);
Would violate this, at high prescalar settings (or low
prescalar settings with LTO). That is actually the example
https://github.com/libopencm3/libopencm3/blob/master/lib/stm
32/common/timer_common_all.c#L70
It would be a potential source of very difficult to find bugs.
There's plenty of edge cases if you really want to go looking.
I'm not really entirely sure how far to go trying to protect
people. [1]
That is true, but in this case, an user that uses libopencm3 functions
to setup clocks, enables clock for a timer, and uses it right away will be
bitten.
I'd say we should protect the user in this case, since 1) it's a very
common sequence, 2) most users probably don't know about it (it's not
mentioned at all in reference manuals until f7), 3) resulting bugs will be
very hard to find, and 4) I think there's a pretty easy fix (see below).
APB1 is usually AHB/4 on f2, f4, and f7 at least. That means 5 cycles
are required for f2 and f4, and 8 cycles on f7. I wouldn't want to rely on
the compiler to happen to generate that much delay, especially on f7 which
is dual-issue (8 cycles can mean 16 instructions).
Post by Karl Palsson
Since I imagine rcc_periph_clock_enable() to not be
Post by Matthew Lai
performance-critical in most applications, wouldn't adding the
wait states be better default behaviour, and people who really
need to enable clocks very quickly can go to registers
directly?
It's more that I'm not sure you can do it neatly, and be robust
enough to be worth trying. You'll need to interpret what bus a
peripheral is on, as well as what speed it's currently running
at. Doing this using the rcc_axx_speed globals is... ok, but
they're not set if people didn't use clock helpers to set them
up.
Adding your peripheral speed as a parameter is a gross user API.
For the vast majority of use cases, this is simply never going to
be a problem. Doing any sort of meaningful cycle delay is going
to just add code that most people will never need.
In my opinion at least :)
According to the errata sheets, it's very easy on pre-f7 chips. All
that's required is a DSB barrier instruction. We don't actually have to do
any cycle counting, or determining which bus the peripheral is on.
Unfortunately that option is not mentioned for f7, which requires 2x
peripheral clock cycles (could be because of the new AXIM bus). However,
there's still no need to figure out which bus a peripheral is on - we can
always just do enough delay for APB1. There's no harm in delaying for
longer than necessary, unless someone needs to enable clocks really
quickly. Figuring out APB1 frequency is not easy. I would just use the
globals and add a note in the documentation that users are on their own if
they are setting up their own clocks.
Matthew
Karl Palsson
2017-03-13 23:01:08 UTC
Permalink
Post by Matthew Lai
---------------------------------------------------------------------
#include <libopencm3/stm32/rcc.h>
#include <libopencm3/stm32/timer.h>
int main(void)
{
rcc_periph_clock_enable(RCC_TIM3);
timer_set_period(TIM3, 1);
return 0;
}
---------------------------------------------------------------------
---------------------------------------------------------------------
80001ac: b508 push {r3, lr}
80001ae: f640 0001 movw r0, #2049 ; 0x801
80001b2: f000 f80b bl 80001cc <rcc_periph_clock_enable>
80001b6: 2101 movs r1, #1
80001b8: 4802 ldr r0, [pc, #8] ; (80001c4 <main+0x18>)
80001ba: f000 f805 bl 80001c8 <timer_set_period>
80001be: 2000 movs r0, #0
80001c0: bd08 pop {r3, pc}
80001c2: bf00 nop
80001c4: 40000400 .word 0x40000400
80001c8: 62c1 str r1, [r0, #44] ; 0x2c
80001ca: 4770 bx lr
80001cc: 0943 lsrs r3, r0, #5
80001ce: f103 4380 add.w r3, r3, #1073741824 ; 0x40000000
80001d2: f503 330e add.w r3, r3, #145408 ; 0x23800
80001d6: f000 021f and.w r2, r0, #31
80001da: 6819 ldr r1, [r3, #0]
80001dc: 2001 movs r0, #1
80001de: 4090 lsls r0, r2
80001e0: 4308 orrs r0, r1
80001e2: 6018 str r0, [r3, #0]
80001e4: 4770 bx lr
---------------------------------------------------------------------
The store in rcc_periph_clock_enable happens at 80001e2. After
returning to main(), we take 2 cycles to load arguments to
timer_set_period, then calls it, and timer_set_period writes to
the timer register in the first instruction.
If my math is correct, the second write happens on the 5th
cycle after the first write. This violates the errata sheet
constraint by one cycle. For APB peripherals, the errata sheet
says we need to wait 1 + (AHB/APB) cycles, which in this case
would be 5 (most people use (AHB/APB = 4).
Now this is with default compiler options on f4, with the
lowest APB prescaler. With a higher prescaler, it would be even
worse. With -flto, it would be even worse. On f7, we are very
very far from the 8 cycles required (keeping in mind that
Cortex-M7 can execute 2 instructions per cycle).
So I believe we are hitting it on all series (that require this
delay), at default compiler settings and clock settings
provided by libopencm3.
So, you should be able to configure the timer to actually output something, and otuput at the wrong speed because a write was too early right?
Matthew Lai
2017-03-13 23:28:38 UTC
Permalink
Post by Karl Palsson
So, you should be able to configure the timer to actually output
something, and otuput at the wrong speed because a write was too early
right?
The behaviour in this case is unspecified, but yes, I would guess that's
the most likely thing to happen. We don't know if it will happen all the
time, or only if the pipeline is in a certain state, or maybe only if
there's contention on AHB, or if it only happens with some peripherals.
ST didn't give a lot of detail on why there has to be this delay, but
it's now in many errata sheets, as well as f7's reference manual.

Matthew

Loading...