Limitations of presented resolution:
- regexes - shall be used only for "uncertain" data (e.g. xmls are not well formed)
for "real" xml real parser shall be used (e.g. expat)
- elements structure where child element has same name is not allowed e.g.
<a><a></a></a>
- empty-element tags are not recognized e.g. <a/>
XML declaration - search for encoding
"<\\?xml(\\s+(?:[^\\?<>]*?\\s+)*encoding\\s*=\\s*(['\"])((?:(?!\\2).)*)\\2[^\\?<>]*)\\?>"
Result groups:
1 - attributes
3 - encoding attribute value
Element with arbitrary name
"<([^\\s<>]+)(?:(\\s[^<>]*)?>(.*?)</\\1)?\\s*>"
Result groups:
1 - element name
2 - attributes
3 - element value
Element with specified name
"<(" + elem_name + ")(\\s[^<>]*)?>(.*?)</" + elem_name + "\\s*>"
Result groups:
1 - element name
2 - attributes
3 - element value
Element with specified name and required attribute
"<(" + elem_name + ")(\\s+(?:[^<>]*?\\s+)*" + attr_name + "\\s*=\\s*(['\"])((?:(?!\\3).)*)\\3[^<>]*)>(.*?)</" + elem_name + "\\s*>"
Result groups:
1 - element name
2 - attributes
4 - required attribute value
5 - element value
Element with specified name and optional attribute
"<(" + elem_name + ")(\\s*>|\\s+(?:[^<>]*?\\s+)*(?:" + attr_name + "\\s*=\\s*(['\"])((?:(?!\\3).)*)\\3)?[^<>]*)>(.*?)</" + elem_name + "\\s*>"
Result groups:
1 - element name
2 - attributes
4 - optional attribute value
5 - element value
Search for attribute within attribute result from element parsing
"\\s+" + attr_name + "\\s*=\\s*(['\"])(.*?)\\1"
Result group 2 - attribute value
Here is discussion on stackoverflow regarding the regexes for xml:
http://stackoverflow.com/questions/5204022/regex-for-xml-parsing
wtorek, 1 marca 2011
czwartek, 24 lutego 2011
CMMB and not well-formed xml
Chinese mobile TV standard CMMB contains data in xml format.
Unfortunately broadcasters send data in files that are not well-formed xmls.
It is common that ampersand sign '&' is not in entity form '&'.
Who knows what else can we find there...
Now I know that there is lot more:
- time is crazy, especially time shift from UTC, sometimes it is +8h, sometimes -8h, sometimes 0, different across country with special "cases" in Hong-Kong and Macau,
- moreover time in DTMB seems to be delayed from CMMB (and correct time) for ~15min. in Shanghai,
- EPG are not updated properly, sometimes delayed,
- ...
Unfortunately broadcasters send data in files that are not well-formed xmls.
It is common that ampersand sign '&' is not in entity form '&'.
Who knows what else can we find there...
Now I know that there is lot more:
- time is crazy, especially time shift from UTC, sometimes it is +8h, sometimes -8h, sometimes 0, different across country with special "cases" in Hong-Kong and Macau,
- moreover time in DTMB seems to be delayed from CMMB (and correct time) for ~15min. in Shanghai,
- EPG are not updated properly, sometimes delayed,
- ...
niedziela, 9 sierpnia 2009
Priority inversion interview
Interviews are great possibility to evalute one's own memory and cold blood during conversation. It is also good for remembering of some basic terms and problems.
Recentely I had to describe priority inversion problem. Basic stuff :) thread with lower priority is executed in place of higher priority thread. But why? Wait, ..., well, ..., shit, I do not remember.
Why Wikipedia is not connected to my brain - 3 threads, 2 competing for mutex, third executing, and so on (Mars Pathfinder problem, priority inheritance, priority ceiling, disabling interrupts).
Ok, but if I want to simulate such problem in Windows environment?
After quick search I found Priority Inversion and Windows NT Scheduler. I realized that:
1. real-time priority class shall be set for process - to disable kernel altering threads priorites,
2. example shall run on one core - in simple case of 3 threads,
3. on one core machine system will hang (real-time priority), therefore example can be run only on multi-core machine (but threads will use only one of the cores).
Example code for priority inversion:
Program has Console.ReadLine() at the beginning to let user change affinity to one of the cores only and set priority class of the process to real-time. If these conditions are not achieved, priority inversion will not appear.
Additionally to change affinity and priority Windows Task Manager can be used. But if you want to see threads inside process, Process Explorer from Sysinternals (now on Microsoft page) can be used.
Recentely I had to describe priority inversion problem. Basic stuff :) thread with lower priority is executed in place of higher priority thread. But why? Wait, ..., well, ..., shit, I do not remember.
Why Wikipedia is not connected to my brain - 3 threads, 2 competing for mutex, third executing, and so on (Mars Pathfinder problem, priority inheritance, priority ceiling, disabling interrupts).
Ok, but if I want to simulate such problem in Windows environment?
After quick search I found Priority Inversion and Windows NT Scheduler. I realized that:
1. real-time priority class shall be set for process - to disable kernel altering threads priorites,
2. example shall run on one core - in simple case of 3 threads,
3. on one core machine system will hang (real-time priority), therefore example can be run only on multi-core machine (but threads will use only one of the cores).
Example code for priority inversion:
class PrioriyInversion
{
static private object o = new object();
static void tf(object p)
{
string n = (string)p;
Console.WriteLine(p+" critical section needed");
lock (o)
{
Console.WriteLine(p+" critical section entered");
Thread.Sleep(5000);
Console.WriteLine(p+" after sleep");
}
Console.WriteLine(p+" critical section left");
}
static void tf2(object p)
{
string n = (string)p;
Console.WriteLine(p + " start");
for (int i = 0; i < 1000000; ++i)
for (int j = 0; j < 1000000; ++j)
;
Console.WriteLine(p + " stop");
}
static void Main(string[] args)
{
Console.ReadLine();
Thread t1 = new Thread(tf);
t1.Priority = ThreadPriority.BelowNormal;
Thread t2 = new Thread(tf);
t2.Priority = ThreadPriority.AboveNormal;
Thread t3 = new Thread(tf2);
t3.Priority = ThreadPriority.Normal;
t1.Start("t1");
Thread.Sleep(10);
t2.Start("t2");
t3.Start("t3");
}
}
Program has Console.ReadLine() at the beginning to let user change affinity to one of the cores only and set priority class of the process to real-time. If these conditions are not achieved, priority inversion will not appear.
Additionally to change affinity and priority Windows Task Manager can be used. But if you want to see threads inside process, Process Explorer from Sysinternals (now on Microsoft page) can be used.
środa, 8 lipca 2009
Domain Specific Language for WWW with Irony - Part 2
In last post I presented Irony usage for file download DSL. Library and console application was prepared and presented on CodeProject. This time I have added some GUI (WinForms) and multithreading to make application really useful. More in CodeProject article.
Etykiety:
CodeProject,
multithreading,
winforms application,
WWW DSL
sobota, 6 czerwca 2009
Domain Specific Language for WWW with Irony
Recently I have published article at CodeProject. I was influenced by the idea of Domain Specific Languages for some specific tasks. In the article I have presented DSL used to automate some WWW operations (GET, POST, file download). To solve the problem I used Irony as DSL interpreter. More available here. More regarding great project Irony here.
Etykiety:
CodeProject,
Domain Specific Language,
DSL,
Irony,
WWW DSL
sobota, 30 maja 2009
C# 4.0 (dynamic types, optional parameters, named arguments, covariants and contravariants handling)
New features of C# 4.o are presented in several places.
Ironically the most exposed is Doug Holland's at Intel's blog site :) - nice and very brief overview.
More info at Channel9 video provided by C# GoF :). The most striking facts are:
Ironically the most exposed is Doug Holland's at Intel's blog site :) - nice and very brief overview.
More info at Channel9 video provided by C# GoF :). The most striking facts are:
- dynamic typing is for better Office interaction,
- optional parameters and named arguments are evil, provided for VB developers to ease Office development in C#,
- covariance and contravariance are only interesting (good) changes in language, but it should have appeared in previous versions of C#.
czwartek, 21 maja 2009
volatile - what does it mean? (in C++, C, .NET and Java)
Well it is good felling that my interests are also Herb Sutter interests (in programming area of course:). In his article on Dr. Dobb's he presents differences between volatile meanings in different words (C++/C vs. .NET/Java).
Main points:
Not sure points:
Main points:
- volatile in C++ is connected to optimization during access to variable - no optimization is allowed. Operations on nearby non volatile variables depends on compiler (can be move before or after volatile operation).
- operations on volatile in C++ does not guarantee atomicity - resolution is atomic
in C++ (or e.g. atomic_int in C) available in Boost (but I cannot find it) and will be in C++0x. - volatile in managed environments (.Net, Java) does not allow to move some operations on nearby non volatile variables - "ordinary reads and writes can't move upward across (from after to before) an ordered atomic read, and can't move downward across (from before to after) an ordered atomic write. In brief, that could move them out of a critical section of code, and you can write programs that can tell the difference.".
- keeps order - lock free programming,
- allows some code optimizations - not good for interactions with hardware - but BTW managed environments does not allow to treat memory as a resource to write somewhere programmer might like to - unmanaged code have to be used (e.g. C++ with its volatile:).
- might not keep order - depends on compiler,
- does not allow code optimization - good for interactions with hardware.
Not sure points:
- atomicity of volatile operations in .Net/Java - for what variables, e.g. what about architectures 32, 64?,
- "These (volatile) are suitable for nearly all lock-free code uses, except for rare examples similar to Dekker's algorithm. .NET is fixing these remaining corner cases in Visual Studio 2010" - what are the problems?
Subskrybuj:
Posty (Atom)