Saturday, 17 March 2018

Higher Speed ADC Part 4

The Structure Of The Code

Pi Side

The Pi side code is fairly straightforward as far as acquisition goes. We'll start up the system with the signal level on the strobe line "high", which we read through the GPIO interface. Then we wait for it to toggle, and every time it toggles we transfer a chunk of SPI data. And that's it.

We read the strobe through Pi GPIO - we want to use GPIO 4 as an input. So we have a simple process of

  • Claim the GPIO pin
  • Set the direction
  • Read the values
We can claim the input pin using the sys interface (write "4" to /sys/class/gpio/export) and set the direction (writing "input" to "/sys/class/gpio/gpio4/direction"). Then we can read the values by looking in "/sys/class/gpio/gpio4/value". Easy Peasy.

Using the SPI is slightly more involved, but not very. First we have to make sure it's enabled in the Pi configuration tool, raspi-config.

When this is done the SPI device will be visible under "/dev/spidev0.0". We want to open this device, use system IOCTLs to configure it and then read data buffers from it when the toggle tells us to.

Setting up the SPI is simple; we need to configure the mode, speed and data format. The mode tells us how the SPI clock and data relate (i.e. which edge to use when sending and receiving). In this case we just want to use "MODE0"to match the setup of the STM32. SPI speed is the clock rate - we're requesting "16000000" for 16MHz, and "8000000" for 8MHz. Data format just specifies the number of bits we get in each transferred word. It's always 8 for us (a byte at a time) So, cutting out the error check the code would look like this....

  spi_mode = SPI_MODE_0;
  spi_bitsPerWord = 8;
  spi_speed = 16000000; // 16M

   spi_cs_fd = open("/dev/spidev0.0", O_RDWR);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_WR_MODE, &spi_mode);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_RD_MODE, &spi_mode);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_WR_BITS_PER_WORD, &spi_bitsPerWord);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_RD_BITS_PER_WORD, &spi_bitsPerWord);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_WR_MAX_SPEED_HZ, &spi_speed);
   rvalue = ioctl(spi_cs_fd, SPI_IOC_RD_MAX_SPEED_HZ, &spi_speed); 

Next up is the actual transfer. This is done by filling in a structure of type "spi_ioc_transfer" and passing it to the SPI_IOC_MESSAGE IOCTL. Assuming the send is of "length" bytes of "data" on device "spi_device", then:

struct spi_ioc_transfer tr;

    memset(&tr, 0, sizeof(struct spi_ioc_transfer));
    tr.tx_buf = (unsigned long)data;
    tr.rx_buf = (unsigned long)data;
    tr.len = length;
    tr.delay_usecs = 0;
    tr.speed_hz = 16000000;
    tr.bits_per_word = 8;
    retVal = ioctl(spi_device, SPI_IOC_MESSAGE(1), &tr) ;
So, our data transfer loop is to wait for the strobe pin to change value and then issue the transfer request.

STM32 Side

This is really a mash up of three of the reference applications: The ADC/DMA, the SPI/DMA and the UART debug application. The peripheral setup is taken (more or less) directly from these reference applications. We build a default application skeleton (using the STM32 Ac6/openstm32 toolset). Then move in the MspInit and DeInit routines from the reference code into a common stm32f7xx_hal_msp.c file. Other than restructuring the includes and making sure the defines in hal_conf.h are correct this is basically just boilerplate.

The UART debug module is useful to include for debug tracing, but is generally too intrusive to use outside error cases.

The reference examples are in the STM32CubeF7 Firmware release under the directory Projects/STM32F767ZI-Nucleo/Examples. I'm using V1.8.0.

The main loop on the STM side is simply the following sequence in a continual loop:

    Wait For ADC Half Complete
    Setup SPI Transfer 
    Toggle Strobe GPIO 

    Wait For ADC Complete
    Setup SPI Transfer 
    Toggle Strobe GPIO 

The completion of ADC conversion and SPI transfer is signalled using volatiles. In the case of the STM32 code these are denoted with the __IO type, e.g.

__IO uint32_t conversionReady;
__IO uint32_t conversionHalfReady;
These variables are set in the interrupt handler, and checked/cleared in the main code. I'm not a fan of this approach since volatiles are neither atomic nor fixed and really shouldn't be used as cheap semaphore substitutes in this way. However this kind of signalling is fairly common in the HAL, and since this is just a check/set/clear process it should be fine.

So, for example, we have the ADC handlers:

void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef* AdcHandle) {
    conversionHalfReady = true;
}

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* AdcHandle) {
    conversionReady = true;
}
And then main code can do tests like:
  if (conversionHalfReady == true) {
            conversionHalfReady = false;
            return true;
        }

As mentioned, since we have a possible overlap between the tail of some SPI transfers and the scheduling of the next then the SPI code then our SPI send is actually:

  Wait for SPI state to become HAL_SPI_STATE_READY
  Issue SPI DMA Transmit

Test Mode

To make sure that we have a solid link initially a test pattern goes through the interface. This is just a counting sequence which is easy to check when we recover it from the incoming buffer. The ADC conversion is running normally on the STM32 side, with the test data buffers substituted in to the final transfer.

Actual Data

For testing then I've got a reference signal generator pushing in waveforms. Testing with a 30KHz sine wave injected into the ADC input with the sampling set to 28 cycles, then from our earlier calculations then we reckon we should be seeing ~675KSample/S, so a 30K sine input implies about 22.5 samples per cycle. So 45 Samples is two cycles, 100 points should cover about 4.4 cycles, and 450 points twenty cycles.

Here's some outputs:
45 Samples:

Graph from samples 10 -> 110:
Graph from samples 10 -> 460:

"Close Enough"

Saturday, 10 March 2018

Higher Speed ADC Part 3

SPI Data Rate Timings

How fast we run the SPI clock from the Pi is a compromise between the sample rate we want to achieve, the time the Pi spends processing and the quality of the wiring.

Looking at the transactions we have something like this:

The timing points here are
  • Before t0: The ADC collects a" half complete" set of samples, and sets up the SPI DMA.
  • t0: The STM takes the strobe line low.
  • t1: The Pi sees this and starts the SPI transfer of 1024 samples.
  • t2: The SPI transfer completes.
  • tn: This process repeats for the upper half of the ADC sample buffer.
The transfer is limited by the
  • SPI clock rate
  • ADC sample rate
  • Processing Delay on the Pi
The SPI clock limits how fast data can move over the bus, the ADC conversion rate determines how much data we want to send. The processing delay on the Pi can increase the t0 to t1 gap, and if this too long then the SPI can still be transferring when the next strobe activates. This is simple enough to handle on the STM32 side, and the operation of the ADC DMA means some overlap is fine, but under particularly late transfer starts we could lose data.

SPI Clock

The SPI clock determines the maximum throughput of the bus. The Pi has a fairly fixed set of rates available. We're using SPI1, and the maximum is 54MHz (SPI1, SPI4, SPI5, and SPI6 can communicate at up to 54 Mbits/s, SPI2 and SPI3 can communicate at up to 25 Mbit/s)

So, if we request 16MHz then the Rpi clocks round this down to 15.6MHz. This is a bitclock so the transfer limit is 15600000 bits per second, or 1.95MByte/s. Since each sample is two bytes this would be 975KSample/s.

For the other clock rates

  • 31.2 MHz = 1.95M Sample/S
  • 15.6 MHz = 975K Sample/S
  • 7.8 MHz = 487.5K Sample/S
  • 3.9 MHz = 243.75K Sample/S

ADC Conversion Rates

For the ADC conversion rate we care about the system clock, the peripheral clock, the ADC clock and then the conversion rate in ADC clock cycles.

  • The STM system clock is set up in SystemClock_Config() as 216000000 (216MHz).
  • AHB is set to the system clock (divider of 1)
  • APB2 Prescaler is set to provide a peripheral clock of AHB/2 (216M/2 = 108M)
  • ADC Clock is set to Peripheral Clock/4 (108M/4 = 27M, or ~37nS per cycle)

From the data sheet (section 15.5) then T conv = Sampling time + 12 cycles, so if we set the conversion time to ADC_SAMPLETIME_56CYCLES then we have 68 cycles per conversion, 2.52uS each, or ~397KSample/S.

At 28CYCLES then each conversion is 1.48uS, or ~675KSample/S. At 84CYCLES each is 3.56uS, or 281KSample/S.

An Example of Transfers

Looking at the sample transfers then there are three signals here: the top two are GPIO pin toggles that I fire when the ADC interrupts arrive (top is "half complete", second is "complete") and the bottom trace is the SPI clock. These traces are with 1024 sample (2048 byte) between ADC buffer interrupts, and the SPI clock is 16MHz.

The sample period is set to 56 clocks in the STM32 configuration, which is 2.52uS * 1024 = 2.58mS, with 5.16mS for all 2048 samples. The scope shows slightly over 2.5mS per half buffer, and just over 5 mS for the complete buffer. Slightly fast, but close enough.

The SPI clock burst is active for about 1.1mS. This is transferring 16 bits per sample, or 1024*16 = 16,384 bits in 1.1mS. We've requested 16MHz at the SPI, but the Rpi clocks round this down to 15.6MHz, which gives us an "ideal" transfer time of (1/(15.6*10^6))*1024*16, or ~1.05mS. Also close enough.

if we crank up the ADC clock to the next timing increment, of 28 cycles, we should push our sample interval up to 1.52mS per half buffer, which is what we see here:


At this point we're getting close to the ADC interval for the SPI transfer. This is workable, but this is around the upper limit for the transfer timing and this sample rate.

Winding back down to 84 cycles gets us 3.64mS per half buffer, and we see this:

However the "lashed together" wiring has a problem at these kind of SPI data rates. Occasionally stray clock pulses throw off the transfer. We can reduce the SPI clock rate, but this drops the maximum sample rate we can get from the ADC.

Fixing the wiring to use slightly better (i.e. shorter, soldered) links we can get good short transfers at faster rates (~31.2MHz), but YMMV. This is "good enough for me" though, and I can tune it on the Pi based on the sample rate I'm targeting.

Problems from the Pi Side

One problem that drops out of a running system is the delay in the Pi responding to the strobe signal. Although the traces above show fairly prompt responses this system is otherwise idle. When the Pi is under load then there can be a gap between the strobe signal firing and the Pi starting the transfer.

In these cases if the SPI clock is low enough then the transfer delay means that the SPI is transferring when the next strobe activates. This is simple enough to handle on the STM32 side, and the operation of the ADC DMA means some overlap is fine, but this is not ideal, and under particularly late transfer starts we could lose data. This is something that the extra memory to memory DMA mentioned earlier would handle.

Practically this means that the SPI link can become less reliable at lower speeds, as this leads to less overhead to cope with the possibility of late transfer starts. Although I'm happy that the correct SPI rate and ADC conversion rate combination will be reliable for the Pi side software, this is something that would need more careful checking for other combinations and processing software.

Thursday, 8 March 2018

Higher Speed ADC Part 2

An Overview of the Solution

Options

There's a couple of different ways we can choose to run the software on the microprocessor to capture samples and push them across the SPI.

Manually Pushing The Peripherals

The simplest solution would be to grab samples directly on the ADC, using ST's HAL_ADC_PollForConversion(). Then when we have "enough" samples send them with HAL_SPI_TransmitReceive().

This is simple enough to do, and a good initial wiring test, but the ADC samples collected will be too irregular to be useful for anything but very low sample rates.

Interrupt Driven

The next simplest thing to do is use the ADC and SPI in interrupt mode: In this use case then when the ADC finishes a conversion it generates an interrupt. Similarly we can kick off the SPI transfers for large blocks of data and get an interrupt when the transfer completes

The ADC can be run in a continuous conversion where we can just start it running, then get regular samples with interrupts when the conversions complete, and we implement a simple handler to pull the value out of the register.

The mechanics of this are fairly simple to set up, and the HAL_SPI_TransmitReceive_IT() and HAL_ADC_Start_IT() provide the front ends to start the whole process.

This approach almost works for this case. If we limited the sample rates to low values (i.e. audio rates, around sub-50kHz ish) then it would be good enough. However as we wind up the sample rates on the ADC and also start handling SPI interrupts then we start dropping samples. The processor simply can't get around to handling all the interrupts in time.

Depending on the application the odd dropped sample might be worth the simplicity of the implementation, however to get reliability at higher rates we have to do something else.

Interrupt Handlers and Weak Bindings

One thing to be aware of is that the ST HAL likes to use weak bindings for the interrupt callback handlers, and expects the application to provide "known" function names for the ISR handlers.

Complicating this is that the sample code uses #define statements to substitute in the "correct" name for a given handler based on the channel defintions, and this can get confusing when trying to build outside of the examples tree.
So, for example, in the STM32F7 reference tree Examples/SPI/SPI_FullDuplex_ComDMA has the header ./Inc/main.h which contains the substitution:

#define SPIx_IRQHandler                  SPI2_IRQHandler
And then both ./Inc/stm32f7xx_it.h and ./Src/stm32f7xx_it.c have references to
void SPIx_IRQHandler(void)
which they expect to be SPI2_IRQHandler(). When porting/re-implementing it's important to make sure the handler resolves correctly, otherwise the interrupt handlers won't fire. I tend to remove the define to keep things clearer and prevent unexpected surprises when hacking around.

Using the DMA Engines

The STM32 chips have DMA engines, which can be set up to transfer blocks of peripheral data to and from memory, and the processor only has to be involved in setting up the transfer and informed when it completes.

There are DMA engine bindings in the HAL which can be used for the transfer of ADC data to memory, and from memory to/from the SPI interface.

DMA Based ADC

Specifically in the case of the ADC we can set up the DMA engine to recover the values from the conversion and transfer them to a block of memory, and then when the memory block is full, return to the start of the memory block and continue converting.

When operating in this mode we can have the DMA engine generate interrupts when it is halfway through the buffer and when it's at the end. This means we can run a simple double buffered conversion approach.

We set the DMA/ADC running and putting data into a large block of memory. When we pick up the "halfway" interrupt we send the first half of the block to the SPI, and when we get the "end" interrupt we send the second half.

Provided we can send the "half block" of memory across the SPI faster than the ADC fills the other "half block" we can leave the ADC to run continuously.

Representing this graphically we allocate a large block of memory and pass it to the ADC:

And then when we receive the "Half Complete" interrupt we know the first half of this buffer is ready to send:

And when we receive the Complete interrupt then we know the second half of the buffer is ready to send, and the DMA engine has looped around and is back to writing the first half:

If we want to get better performance out of the system this is the way to go. Note that this approach also has the weak bindings approach in the reference tree.

DMA Based SPI

There's a similar setup on the SPI DMA side. This allows us to substitute in HAL_SPI_Transmit_DMA() to push the data buffers across the SPI link.

This is a slightly more complex setup when we configure the SPI, but otherwise the transmit process is largely similar.

Another DMA

If this was something I was being paid to do, then I'd likely look at using at least one more DMA. I'd want a DMA to copy the ADC results from memory to another memory location, and then DMA from this memory to SPI.

Using the extra memory to memory copy would allow us to run with more than just a single buffers worth of ADC results queued, which would be a very useful safeguard for the cases where the Pi side of things came under load (always a problem with Linux), and was late when transferring ADC buffers as a result. We could also add headers to the buffers and improve the error checking on the transfer.

However this is just a weekend thing, so for now I won't bother with that.

Sunday, 4 March 2018

Raspberry Pi High Speed ADC

A (slightly) faster ADC for the Raspberry Pi

The Problem

I've had a couple of cases where I wanted a "Raspberry Pi with an ADC", having a reasonable data acquisition rate and resolution, and had some problems with getting consistent data rates through the system and getting the higher rates.

The Solution

I'm going to use a separate microcontroller and use that to acquire ADC samples at a very regular sample rate, and then transfer the samples over to the Pi.

For this prototype I'm using the ST Micro STM32F7; it's a microcontroller with multiple built in ADC converters running at 12 bits and up to 2.4M Samples/s individually. The ADC's can also be chained together to achieve higher sample rates (up to 7.2 MSamples/s), but getting a few hundred K/s will be fine for me.

The STM32F7 is available on a few reference boards, and I'm using the Nucleo-F767ZI.

To get data between the STM32 and the Pi I'll use the SPI bus.

The SPI Bus

The SPI bus is a simple three wire serial bus; one side is the master and the other side is the slave. There's a data line sending from the master to the slave (MOSI), a data line sending from the slave to the master (MISO) and a clock line.

The only real distinction between the slave and master is where the clock comes from - the master is the thing that generates the clock, and the slave just receives it.

Although the Pi and STM32 can both be either master or slave, the Linux Pi driver currently only runs as master: this is a slight problem for this case, since ideally we want the STM32 as the source of data to be the master. However we can get around this with an extra signalling pin.

The Extra Strobe Signal?

In the SPI bus then data is transferred whenever the clock line is active. However in this application then we only want data to clock through when the STM32 has something to send.

The ideal solution here would be to make the STM32 the master, and have it set up a transfer and drive the clock when data is ready to send, and the Pi would just listen continually. However as mentioned if we want to use the stock driver then the Pi has to be the master, and the STM32 the slave.

So either the Pi clocks continually, and we need the STM32 to queue up "No data" messages when there is nothing ready, or we need some way for the STM32 to tell the Pi it has data, so the Pi can start the transfer.

This solution uses the second option - a GPIO pin on the STM32 is acting as a "ready" output, and the Pi reads that line, and will start transfers based on the state of that pin.

The Wiring

Everything here is set up as 3v3 pin levels (since the Pi is only 3v3 capable). On the STM Side the pins are connected to CN7 as

  • 8: GND
  • 12: MISO/PA6
  • 14: MOSI/PA7
  • 15: CLK/PB3
  • 20: PF12
This is the wiring for SPI1, straight out of the sample SPI code as:
#define SPIx                             SPI1
...
#define SPIx_SCK_PIN                     GPIO_PIN_3
#define SPIx_SCK_GPIO_PORT               GPIOB
...
#define SPIx_MISO_PIN                    GPIO_PIN_6
#define SPIx_MISO_GPIO_PORT              GPIOA
...
#define SPIx_MOSI_PIN                    GPIO_PIN_7
#define SPIx_MOSI_GPIO_PORT              GPIOA
Plus the strobe pin out is on pin20, which is "Port F, gpio pin 12".

And On the Pi Side the pins are

  • 7: GPIO4
  • 19: MOSI
  • 21: MISO
  • 23: CLK
  • 25: GND
So, the Pi SPI pins, plus GPIO4 as the input strobe pin.

From the top view of both connectors then it looks like this:

In part 2 we'll cover the overview of the software side of the solution, part 3 will look at the SPI and ADC data rates, and part 4 will cover the implementation structure in more detail.