One or two bits of variables are incorrect on SAM9M10-G45 chips.

Discussion around AT91RM9200 and SAM9 Series Products.

Moderator: nferre

changzhi
Posts: 1
Joined: Tue Sep 25, 2018 5:27 am

One or two bits of variables are incorrect on SAM9M10-G45 chips.

Tue Sep 25, 2018 8:36 am

I have an evaluation board AT91SAM9M10-G45-EK and have made a dozen of them by ourselves. The main chips, like CPU (SoC), nand flash, DDR, come from our supplier.

The software architecture is bootstrap + uboot + linux (2.6.30). Most of the mimic work well, while several don't boot successfully. The issue is that one or two bits of local variables on stack (DDR) are incorrect. It usually happened on nand ECC calculation, if enable ECC soft. About 4% nand page ECC calculation hit ECC error, that is the calculated ECCs are not equal to the ECCs read from nand. When the ECC error occurs, we invoke nand_calculate_ecc function again, the second result mostly is equal to the ECC read from nand. We digged into the local variables on stack (DDR) for the two ECC calculation processes and noticed they are different. The following log shows bit23~20 are incorrect on some variables on the two ECC calculations, like B7D36895 and B7936895, 24830BA6 and 24C30BA6.

[ 5.512000] rp0 B7D36895 2E2FE10D A090C0AC 24830BA6 - F586C593 FF278FA7
[ 5.537000] rp0 B7936895 2E2FE10D A090C0AC 24C30BA6 - F586C593 FF278FA7

We have made some experiments, like exchanging the SoC chip, nand flash and DDR on boards. It looks like the issue exactly follows SoC chips, which means the issue always happens on whichever boards the specific SoC chips go.

My questions are:
1. Are those SoC chips defective, fake or anything else?
2. Are there some variations for SAM9M10-G45 and ways to detect the variations, so that we can fix or patch them from Linux community by software.
3. If those SoC chips are defective or fake, are there any easy method to detect them? By now we detect the issue by exchanging them on boards and it takes too much effort... :(
AMeger
Posts: 1
Joined: Thu Mar 07, 2019 11:58 am

Re: One or two bits of variables are incorrect on SAM9M10-G45 chips.

Thu Mar 07, 2019 12:49 pm

Hello changzhi,

did you find any reason for your problem?

After producing several 1000 modules in the last 8 years with the AT91SAM9G45 we had problems with some of the new produced modules since mid of 2018.
We also seem to have some kind of data corruption resulting in data abort or prefetch abort exeptions running with Windows CE.
Running RAM memory test routines doesn`t show any errors with the problematic modules.
Copying the Windows CE image from NAND flash to RAM is ECC protected and also the copied images from RAM are checksum checked after copying it => no errors.
Building a minimal Linux sytem with Linux4SAM shows kernel oops errors with the problematic modules, no problems with the modules running with Windows CE without errors.
The modules show the problems after different runtimes (30 Seconds - 3 Hours), dependend on the exact modules.
Clock signals and power supply seems ok, no other electrical or mechanical problems found so far.
The runtime until showing the error can be shortened by using multiple interfaces (USB, SD-Card, Ethernet) simultaneous.
Running memtest in u-boot or Windows CE Eboot doesn't show any problems.

After testeing different parameters, systems, modules, module production date,... we found only one factor which is identical to all modules which show the problems.
=> All the problematic modules are equipped with CPUs which show datecode "1734".
=> Not all Modules with CPUs and datecode "1734" shoe the problems but >= 30%.
=> Modules with CPUs and datecodes "1739" and "1818" (this are the charges we buyed) don't show the problems.

BR
Arno

Return to “SAM9 ARM9 MPU”

Who is online

Users browsing this forum: No registered users and 1 guest