niedziela, 6 maja 2018

Visual Studio Express 2017 install problems

Visual Studio Express is still interesting as its license is has no "Organizational License" clause (https://www.visualstudio.com/license-terms/mlt080317/ vs https://www.visualstudio.com/license-terms/mlt553321/).
Unfortunately installation (at least for me - Win10 Enterprise) was huge pain.
Here are installation hints:

  1. Download and start installer - perhaps you will see error message: "At most one command parameter can be specified..."
  2. If this is the case then copy extracted installer from temporary folder (use TaskManager to find location) - in my case it was folder with name "vs_bootstrapper_d15"
  3. Start installer (using command line) from copied location - in that case I observed no "At most..." error, but after installation start another error had appeared - unable to download "sqlsysclrtypes"
  4. Solution to "sqlsysclrtypes" problem is to download the whole installer locally (using "vs_setup_bootstrapper.exe --layout C:\vs2017offline --lang en-US") and then start "sqlsysclrtypes.msi" manually.
  5. After system restart (required after "sqlsysclrtypes.msi" installation) VSExpress2017 installation was successful.

wtorek, 1 maja 2018

Jacinto J6Eco (DRA72x) hypervisor startup

Normal way to start hypervisor for ARM (e.g. to start XEN) is to start non-secure mode via enter into monitor mode using SMC instruction. Next from exception handler update NS (and HCE) bit in SCR (Secure Configuration Register) register ("1" for non-secure) and exit monitor exception handler in non-secure mode (PL1 to be able to switch into PL2). Additionally before SMC call monitor exception handler must be registered.
Such code (of course far more complicated e.g. due to multicore support) can be found in U-Boot bootloader (u-boot/arch/arm/cpu/armv7/nonsec_virt.S).
Fortunately for J6Eco DRA72x (and J6Entry - DRA71x) everything is already prepared and simple "SMC #1" call is enough to start hypervisor.
Example code:

.arch_extension sec
.arch_extension virt
.text
.align 2
.global start_hypervisor
.type start_hypervisor, function
start_hypervisor:
    ldr r12, =0x102
    ldr r0, =HYPERVISOR_ADDR
    smc #1

References




niedziela, 4 marca 2018

ARMv7-a vs Cortex-A7 vs Cortex-A8 vs TI DRA62x +VFP +NEON


"Linaro focuses on the use of the ARM instruction set in its versions 7a (32-bit) and 8 (64-bit) including concrete implementations of these, such as SoCs that contain Cortex-A5, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-A15, Cortex-A53 or Cortex-A57 processor(s)."(https://en.wikipedia.org/wiki/Linaro)

"The ARM Cortex-A8 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture." (https://en.wikipedia.org/wiki/ARM_Cortex-A8)

https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores

https://en.wikipedia.org/wiki/ARM_architecture#VFP


"What is VFP?
VFP is a floating point hardware accelerator. It is not a parallel architecture like Neon. Basically it performs one operation on one set of inputs and returns one output. It's purpose is to speed up floating point calculations. If a processor like ARM does not have floating hardware, then it relies on software math libraries which can prohibitively slow down floating point calculations. The VFP supports both single and double precision floating point calculations compliant with IEEE754. Further, the VFP is not fully pipelined like Neon, so it will not have equivalent performance to Neon.
Neon and VFP both support floating point, which should I use?
The VFPv3 is fully compliant with IEEE 754
Neon is not fully compliant with IEEE 754, so it is mainly targeted for multimedia applications
.. example of showing how Neon pipelining will outperform VFP...
Compile the above function for both Neon and VFP and compare results:
arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=vfp -ftree-vectorize -mfloat-abi=softfp"
(http://processors.wiki.ti.com/index.php/Cortex-A8)

"Using NEON and VFPv3 on Cortex-A8
The compiler supports two different options to control NEON and VFPv3.

--float_support=VFPv3 --neon

The --float_support=VFPv3 option instructs the compiler to generate code that utilizes the VFPv3 coprocessor for both double and single precision floating point operations. The option is also used to enable the assembler to accept VFPv3 instructions in assembly source. To enable VFPv3 the EABI mode must also be enabled through the --abi=eabi option. This is necessary because the calling convention for floating point paramemters changes when VFPv3 is enabled and that convention is only supported in EABI mode.

The --neon option instructs the compiler to automatically vectorize loops to use the NEON instructions. To get benefit from this option you should be using --opt_level=2 or higher and be generating code for performance by using the --opt_for_speed=[3-5] option.
Combining options
The TI ARM compiler supports four modes related to Cortex-A8, NEON, and VFPv3. By default neither NEON or VFPv3 is enabled. In addition to the default the following 3 modes are supported:
VFP enabled without NEON
The compiler will generate VFPv3 instructions for single and double precision floating point operations
NEON enabled without VFP
In this mode the compiler will generate NEON instructions for SIMD integer operations. It will not generate NEON instructions to vectorize floating point operations. The motivation for not allowing floating point NEON instructions if VFP is not enabled is because it is possible to have an integer only variant of NEON implemented. In order for the NEON unit to support floating point operations the VFPv3 coprocessor must be present.
NEON enabled and VFP enabled
In this mode the compiler will generate a mix of NEON and VFP instructions. The NEON instructions can be either integer or floating point.
VFPv3 vs. NEON performance
A common question with regard to TI ARM compiler's support for NEON is how to get more floating point operations on the NEON unit instead of the VFPv3. The reason this is desirable is because the VFPv3 coprocessor is not a pipelined architecture on the Cortex-A8, but the NEON is. The compiler will always use VFP instructions for scalar floating point operations, even if the --neon option is used. The hardware is capable of issuing VFP instructions on the NEON coprocessor if the following conditions are met:

The instruction must be a single precision data processing instruction
The processor must be in flush-to-zero mode. In this mode the processor will treat all denormalized numbers as zero.
The processor must be in default NaN mode. In this mode the operation will return the default NaN regardless of the input, whereas in full-compliance mode the returned NaN follows the rules in the ARM Architecture Reference Manual.
The FPEXC.EX bit must be set to 0. This tells the processor that there is no additional state that must be handled by a context switch."
(http://processors.wiki.ti.com/index.php/Using_NEON_and_VFPv3_on_Cortex-A8)

DRA62x Automotive Application DSP + ARM Processors
The ARM Cortex-A8 processor has a Harvard architecture and provides a complete high-performance subsystem, including:
• ARM Cortex-A8 Integer Core
• Superscalar ARMv7 Instruction Set
• Thumb-2 Instruction Set
• Jazelle RCT Acceleration
• CP14 Debug Coprocessor
• CP15 System Control Coprocessor
• NEON™ 64-/128-bit Hybrid SIMD Engine for Multimedia
• Enhanced VFPv3 Floating-Point Coprocessor
• Enhanced Memory Management Unit (MMU)
• Separate Level-1 Instruction and Data Caches
• Integrated Level-2 Cache
• 128-bit Interconnect with Level 3 Fast (L3) System Memories and Peripherals
• Embedded Trace Module (ETM).

sobota, 12 listopada 2016

base class operator== detection

Scenario:
Class for items storage. Items can be added, read, updated. In case of modification callback notification is sent to all listeners.
To discover modification operator==() can be used. It seems to be natural, non-intrusive solution.
Seems to be fine, but problem comes with inheritance.
Consider types:

struct A {
A(int v) : v(v) {}

int v;

friend bool operator==(const A &lhs, const A &rhs)
{
return lhs.v == rhs.v;
}
};
struct B : A {
B(int v, int w) : A(v), w(w) {}

int w;
};


Note that opeartor==() is defined as friend (therefore not class member - see: ADL, friend name injection, Barton–Nackman trick) but it can also be defined as member function or global function.


The potential problem is visible here:

A a1{1}, a2{1};

std::cout << (a1 == a2); B b1{1,1}, b2{1,2}; std::cout << (b1 == b2);


Both print '1', but is 'b1' really equal to 'b2'?
According to current implementation yes, because only A part of B-type object is compared.
The real problem is no notification of such potential problem. And when generic code is used to compare object, where types are defined in different files it might be easy to forget about operator==() definition for class B.
To avoid potential problem it might be better to decide that such situation is prohibited and shall end in compile-time error.
Required is compile time checking if some type (some, because comparison will be used in generic code) has defined operator==.
Tool to solve the problem might be found e.g. in Boost (has_equal_to from typetraits or Concept Check Library), but simple solution is presented here http://stackoverflow.com/a/6536204/122054.
For C++98:

namespace CHECK
{
class No { bool b[2]; };
template No operator== (const T&, const Arg&);

bool Check (...);
No& Check (const No&);

template
struct EqualExists
{
enum { value = (sizeof(Check(*(T*)(0) == *(Arg*)(0))) != sizeof(No)) };
};
}


Simplified version for C++11:

namespace CHECK
{
struct No {};
template No operator== (const T&, const Arg&);

template
struct EqualExists
{
enum { value = !std::is_same<decltype(*(T*)(0) == *(Arg*)(0)), No>::value };
};
}


Using CHECK::EqualExists::value with static assert allows to detect potential problem.

niedziela, 12 czerwca 2016

Old dog,old tricks (in C++)

Compile time safety, compile time error handling - all about typesystem, constness, ...

C-array safety

If function uses constant length array it is tempting to use:


void f(int t[3])
{
...
t[2] = ...;
}

...

int t[3];
f(t);


as array size is merely for human-programmer, it is possible to use:


int t[4];
f(t);


which perhaps is ok (but not nice).
But it is also possible to do


int t[2];
f(t);


Which for sure is wrong.

Solution to ensure strict array size is:


void f(int (&t)[3])
{
...
}


Now only three elements arrays are allowed.
BTW - above construct is frequently used for C-array handling in templates.

poniedziałek, 7 grudnia 2015

C++11 and initialization

Uniform initialization in C++11 can be tricky.

Invocations mean something completely different and also give different results:

std::vector<int> v(1); // "normal" constructor invocation with (int) param - creates vector<int> with 1 element initialized to default value (i.e. 0)
std::vector<int> v{1}; // invocation of constructor with initializer_list<> param - content of initializer_list is copied into vector (i.e. one value of 1)


Following invocations also mean something completely different and give different results:

std::vector<int> v(1, 1); // "normal" constructor invocation with (int,int) params - creates vector<int> with 1 element initialized to specified value value (i.e. 1)
std::vector<int> v{1, 1}; // invocation of constructor with initializer_list<> param - content of initializer_list is copied into vector (i.e. two values of 1)


But following mean something completely different but give same results:

std::vector<int> v(2, 2); // "normal" constructor invocation with (int,int) params - creates vector<int> with 2 elements initialized to specified value value (i.e. 2)
std::vector<int> v{2, 2}; // invocation of constructor with initializer_list<> param - content of initializer_list is copied into vector (i.e. two values of 2)


Note that following will not compile:

std::vector<int> v(1, 1, 1); // no such "normal" constructor

whereas following is completely fine C++11 statement:

std::vector<int> v{1, 1, 1}; // invocation of constructor with initializer_list<> param - content of initializer_list is copied into vector (i.e. three values of 1)


Also note what is perhaps even more surprising that when container value type cannot be initialized from values in the list (no such conversion), then following construction will invoke "normal" constructor:

std::vector<std::string> v{1}; // same as - std::vector<std::string> v(1);


To avoid such behavior assign can be used in initialization (this is still construction, not assignment):

std::vector<std::string> v = {1}; // this will fail to compile
std::vector<int> v = {1}; // this will invoke initializer_list<> constructor as for std::vector<int> v{1};


There is even more about uniform initialization, especially using "auto" keyword - please check e.g. "Effective Modern C++" by Scott Meyers.
Also please check stackoverflow.

poniedziałek, 9 listopada 2015

Thread-safe Catch

Catch test framework is nice, but not thread-safe - see https://github.com/philsquared/Catch/issues/99

There is thread-safe fork of Catch https://github.com/ned14/Catch-ThreadSafe
At the time of writing this original Catch is 1.2.1 whilst thread-safe fork 1.1