mercredi 20 février 2008

mozillaOS vs dillo

dillo 2 really kicks ass. if i had money i'd donate to their cause, the world really needs more variety in browser tech. i'm starting to write one in GTK2HS, but its more like a tabulator-style data-browser than making any attempts at supporting SQL, JS, ACID TEST 3.0, and all the other unnecessary stuff that makes firefox take 30 seconds to launch..

here's a pretty simple contest. boot up ms-1013 from cold power no cache (gotta remember the command to flush that without rebooting, but im messing with ati's proprietary drivers to see if it makes ctrl-+ not lock up firefox for 10 seconds as it redraws (zoom is plenty fast in Opera, and even other GTK apps that use Pango, so i dont know why im trying the video drivers, it shows how desperate i am for a decent web browser, what with WebKit-GTK failing to compile half the time, and segfaulting soon after a brief period of tabless slow operation)

before reboot, i went thru and disabled everything. i mean stuff like this:

user_pref("app.update.enabled", false);
user_pref("browser.fixup.alternate.enabled", false);
user_pref("browser.microsummary.enabled", false);
user_pref("browser.formfill.enable", false);
user_pref("browser.safebrowsing.malware.enabled", false);
user_pref("browser.search.suggest.enabled", false);
user_pref("browser.sessionstore.enabled", false);
user_pref("browser.xul.error_pages.enabled", false);
user_pref("dom.event.contextmenu.enabled", false);
user_pref("dom.storage.enabled", false);
user_pref("extensions.blocklist.enabled", false);
user_pref("extensions.update.enabled", false);
user_pref("javascript.enabled", false);
user_pref("keyword.enabled", false);
user_pref("security.OCSP.enabled", 0);
user_pref("svg.enabled", false);

unfortunately this Acrobat Reader style trick doesnt seem to work on firepig:

this is pure reaction time - press ctrl-q once the window was open. so it includes destruction as well. (firefox took about 3, id say)

time firefox

real    0m27.864s
user    0m0.635s
sys     0m0.161s

time dillo
real    0m0.091s
user    0m0.030s
sys     0m0.013s

launch time quickly creeps back to ~27 as linux is pretty aggressive about caching the recently-accessed

if i was going to make an attempt at using mozilla's OS id try to make it boot faster, like initng, cold power to dwm desktop and a rxvt-unicode in about 10 seconds, for comparison. that includes the kernel waiting for the damn disk to spin up, among other things like starting the entire OS . emacs starts in about 5 seconds too.,

its after all this that i ask "what do i _need_' that requires JS? well.. nothing. i order from amazon about once a month, so i can just borrow someone else's comp for a second, and delete my cookies afterwards.

this is the same kind of thing as why i abandoned Redland's RASQAL. basic SPARQL queries were taking 5 seconds, whether BerkeleyDB, SQLite, or MySQL... im sure my db would be considered a joke by most, but for the queries _i_ do the results come back in 50 ms or so. i think this is part of a larger epidemic in many software projects revolving 'standards', overspecification (why is there 23-step if/then case statement stuff spelled out in HTML5? shouldnt that be in some kind of user-agent spec instead?) complicating the trivial:

- although RDF makes a big deal about # meaning 'these resources are children of this one, because the URI has a certain char in a certain place', it also insists URIs are opaque, so you cant benefit from the implicit tree thats going on there.

- same with /'s, the hash-vs-slash debate only even exists because of the assumption that 'http://host/a/b/file.rdf means file.rdf is a file on disk, at this point in the subtree' but in "follow the W3C rules" RDF APIs youre reduced to doing string regexes on every URI in the database to find the children, or reinventing your own layer on top of said lib (now in slow triples and even more indirection via slow python and even additionally tunneled thru an HTTP request and a layer of serialization and parsing - all instead of simple syscall), due to their pathological denial of certain things that they also assume in other areas. at least they could be consistent.

HTML5 could be serious if it stuck to document markup instead of being a tightly-interdependent clusterfuck of highly-specific user-agent behavior (raising the barrier of entry incredibly high, and likely causing projects like Dillo to ignore the useful bits lest they be buried so deep)

a 30 second launch time is a great punishment (and telltale sign) for ignoring the 'nix philosophy and abstraction sense by building an entire OS inside a single process (then doing that in Javascript (then letting a few Google/Apple-employed Wiki-editors/OS-coders dictate your API)). thousands of VCs seem to disagree with me, though. if i'd found a way to profit off of the fact practically the entire planet is letting arbitrary JS be run on their machines for the main purpose of displaying ads i'd consider it a pretty lucky coincidence and worry how much longer it could last before people switch to better tools and are able to find what they need without going thru a single central index. if theres any doubt i'm completely incompatible with the modern tech industry, how does complete divorce with decades of common abstraction sense and hyperfocus on inverse distribution sum it up? refreshingly, Dillo is grounded in sanity and reason and respect for common sense, i love them for it. 306x faster launch time is just icing on the cake really, but would have been much harder if they had tried to implement HTML5..

i think the whole HTML5 thing grows out of some human satisfaction in trying to prescribe/create order. its some feeling one might get from organizing your closet. i wonder what Hixie's closet looks like. a bunch of nested/interlocking boxes on a single shelf with a bunch of empty space everywhere else?

the XHTML2 shelf is looking pretty empty too, with Dillo trying to save shelley's site to disk instead of letting me read it :/ one reason i've decided i cant put off writing my own user-agent any more..

dimanche 17 février 2008

why i could care less about rubinius

the fact that the 3 groups reimplementing Ruby at the moment seem happy with 1.8 as far as the 'design' of the language is why im disinterested in JRuby/Rubinius. at least start with a real foundation instead of 1.5 decade-high mound of Guido/Larry/Matz-style random hacks/tweaks and arbitrary distinctions/limitations.

Lua basically said 'Perl, Ruby, PHP, Python, are all the same thing: dynamically-typed garbage-collected object-oriented language interpreters written in C, with extensibility/speed via C function calls'.

saddled with that are a ton of subtle distinctions between each that alone take a half decade to master/remember. in fact, i know most of them. i'd surmise most consultants do, out of necessity. its the same class as browser-bugs and there are much better uses of brain cells.

Lua melted off all the fat ,and left you with an as-minimalist-as-possible-without-pain 'last word' on 'this particular line of thinking'. it was this flushing away of arbitrary 'baked in' hacks/distinctions and distillation down to core elements that allowed LuaJIT to occur via the work of a single crafty genius, compared to the venture-funded team on Rubinius or corporate-fat funded team on Java VMs.

Lua removed one limitation inherent to the other languages in its class. namely an enforced OO system. by choosing Hash tables as its sole datastructure, and giving you ways of intercepting lookups and modifications on them lets you implement as crazy a multiple-inheritance scheme as you want.

but why enforce 'Objects' (with an enforced way to treat even them, outside of Lua). and why enforce hash tables at the exclusion of every other structure?

no need, if you just provide a way to create other data structures (which may be used to emulate "objects", whatever that means) they can be designed on demand

the simplest way that is reconfigurable. is a tuple. one field is not enough. 3 is too many. 2 is a cons cell, the foundation of Lisp. since we've removed all the enforced concepts above and are left with functions, atomic values, and these primitive tuples, it lends itself to easier translation to machine code (or different Lisp code), and such there is SBCL, GHC, ocamlopt, and a variety of other compilers. it seems the more flexible environment is also faster. what a lucky win the laws of CS physics bestowed upon this tradeoff, and noone except a few academics are taking advantage of it

one can argue limitations are good. and indeed they can provide a common ground and prevent one from 'wasting time' tweaking the implementation. but the reality is you're not forced to do any of that, you can get started right away using someone else's high-level library in Haskell/ML/Scheme/Lisp. often these libraries are more interesting and varied since theyre not so predictably defined by 'objects' and strict evaluation and side-effecting interlocking object's state which is so actively hostile to concurrency and so on..and you have the option to drop down all the way to cons-cells (and indeed below that, if you want to write a compiler extension, a la the variety of GHC stuff out there) at any time, for any reason. as many limitations as possible have been removed, and theres a variety of systems to ensure everything remains consistent. if you dont like Haskell's type system, theres OCaml. if you dont like it, you can try a gradual-typed Scheme or Qi. if you dont like that, you can use plain old lists and cadadadr to your heart's content at the risk of runtime errors. its up to you, not your 'genius' overlords who probably have never written 'map' in their life. even evaluttion strategies are no longer fixed, and you can play with lazy, strict, partial, curried, lambda-rewrting macros, etc.

wheres Ruby's specification for compiler extensions? lol. the Rubinius and JRuby teams are spending enormous efforts to enforce a very specific and limited way to do things. and because its powerful enough for writing basic string-concatenating web-apps, and Goldman will be able to flip its EngineYard shares for a 10x profit due to the 'cutting-edge talent' and 'growth potential', this is A-OK.

don't forget the inner loop!

i read Knuth chapter 6.1-6.5 when i wanted to start at the utmost basic in searching/sorting implementation.. the current academic papers are way too specialized, and the main open-source DB engines are more interested in explaining how to write a SQL statement, or how much money youll save switching off Oracle, than talk about anything interesting.

i've some primal instinct of whether something is O(1) or O(n^2) but never really considered how much is influenced by how the code is written, regardless of the underlying (overarching?) algorithm affecting the order-of-magnitude scaling. i just figured 'perl/ruby is slow no matter what' and any way is good as another, with 300 ways of doing anything how are you supposed to try them all anyway?

since i never had a formal education, i definitely had some confidence issues. like was i missing some big chunk of CS? it was comforting to realize there was nothing in the book that seemed completely foreign (alright, except the proofs..) and his optimization strategy boils down to reading code, thinking about it, and making improvements. it gave me confidence to see his methods are plain sucking it up and using your brain for a minute, something within my reach.

back to the Planet example..

the first implementation of the fetch just tries adding every triple, whether it existed or not. this generates a filestat per triple, along with URI and object-construction in Ruby, and fetched-data destruction. every time the update runs, the disk light glows bright blue for about 10 solid seconds. then its done.

time is your friend...

user    0m2.920s
sys     0m1.098s

user    0m2.920s
sys     0m1.098s

the kernel is spending a full on second just doing file stats for the hundreds of blog posts and their thousands of triples.

the disk grind this creates gives me that same 'ugh' feeling as when firefox takes 30 seconds to boot up.. i know its unacceptable, but so far ive been too lazy/busy to get around to fixing it. no longer! (replacing firefox is a larger project, to be detailed in later posts)

my thought was 'check each resource to see if its already there, then write it if need be'. constructing a temporary model for querying of Date attributes and for writing into the real model as necessary:

  def update
    n.keys.map do |u|
      e.e || e.in_hash(n[u])

  def get
    _in do |s,p,o|
      r[s] ||= {}
      r[s][p] ||= []
      r[s][p].push o

  def in_hash h
    h.map do |p,vs|
      vs.each{|v| self[p,v]}

well, the disk grind is gone, but it doesnt seem much faster. hmm. maybe all that object creation/allocation is expensive.

user    0m1.944s
sys     0m0.400s

user    0m1.947s
sys     0m0.360s

user    0m1.982s
sys     0m0.363s

the sys is doing less than before, as expected. ruby is mainly setting references rather than constructing objects, for a sizable reduction in work. i really love how consistent 'time' is, and the C ruby VM. JRuby was all over the place, and you had to run 10 times before it sort of stabilized. 'enterprise-grade' unpredictability..

my instinct (primed by the stuff Knuth pulls out in his MIX optimizations) tells me theres more to be squeezed out of this. i can run the check each time i see a new resource.and keep a running tab of the checked resources, while consuming the original stream of triples from the parser.

  def update
    _in do |s,p,o|
      (e[s] ||= r.e) || r[p,o]

this is the kind small improvement that while not the difference between n and n^2 but well worth the brief amount of time that is code-review.

user    0m1.374s
sys     0m0.132s

user    0m1.398s
sys     0m0.142s

indeed, ruby is doing even less work than before. presumably the 1 second or so left is mainly Redland's parsing and object creation, which I can't really avoid. lets check to see if the blame is really out of my territory:

 %self     total     self     wait    child    calls  name
  9.88      0.75     0.32     0.00     0.43    19595  Class#new
  9.26      3.24     0.30     0.00     2.94        6  Array#map
  9.26      0.30     0.30     0.00     0.00        9  <Module::Redland>#librdf_parser_parse_as_stream
  2.78      0.09     0.09     0.00     0.00     4572  <Module::Redland>#librdf_new_node_from_node
  2.47      0.08     0.08     0.00     0.00    13729  Kernel#class
  2.16      0.07     0.07     0.00     0.00    13263  <Module::ObjectSpace>#define_finalizer
  2.16      0.07     0.07     0.00     0.00     3690  <Module::Redland>#librdf_statement_get_object
  2.16      0.07     0.07     0.00     0.00     3891  <Module::Redland>#librdf_node_get_uri
  1.54      0.05     0.05     0.00     0.00     2688  <Module::Redland>#librdf_new_node_from_literal

re: Array#map, I got a 6x (!) slowdown after factoring some of the subexpressions in Hydrogen into seperate functions. i had no idea ruby had that kind of function call penalty, none of my functions are longer than a single expression and a guard pattern; if more performance was needed here the next step is switch to something with a real compiler and type system and without pointless distinctions between 'blocks', 'procs', and 'methods' needlessly complicating things from implemenetations to code to books.

theres one more boost, cpu was on 0.8 instead of 2g.

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

user    0m0.673s
sys     0m0.058s

user    0m0.679s
sys     0m0.057s

and voila. the kernel has plent of time to waste on epolling torrent downloads again. its too bad theres nothing to leech. i am thinking of just taking rlslog out of the default list and isohunting Twitch posts.. or making my own flicks

mercredi 13 février 2008


im using Hydrogen.

its a lot faster than Markaby+Builder which is 50K of _whytf? and broken on recent ruby vintages..

the new Hash syntax matches that of JS if you avoid some other syntactic sugar.. thanks Matz (or whoever) for this and Eigenclass for the heads up

mardi 22 janvier 2008

element:: planet

the prior half of 'planet in 2 lines':

 E(E::Type,'planet/source').po(E(E::RSS+'channel')).map{|b| b.gr('planet/posts').in}

defined the input pipe. adding some new sources:


gives a good mishmash of random news..

we need a view on the data. how about plain old HTML..

finding the proper resources is a matter of a depth-first search on the date URIs - literals are expressible as URIs which simplifies literal sorting infrastructure down to a single search in the AVL-ish tree of URIs on the fs. incidentally Kowari/Mulgara uses the same technique but i decided Hans Reiser could write a better tree and expose it via a classic 40 year-strong API than i could. although i'll definitely experiment with the AVL module in the Haskell port, especially for in-memory indexing. if someone hires me or pays me anyway.. slow ass Ruby and its C calls are more than fine for my needs..

html do
  style <<-S
    body {background-color: #444}
    .p {float:left;height:8.8em;position:relative;overflow:hidden;width:20%;-moz-border-radius:0 1em 1em 1em;background-color:#264;color:#fff}
    .t {background-color:#00f;color:white;display:inline}
    img {max-width:100%}
  body do
    h1 "today's headlines"
    E(Date,'planet/posts').iT(64).each do |r|
      div.p :onmouseover => 'this.style.overflow="scroll"' do
        h3.t r.i(RSS+'title')
        %w{modules/content/encoded description}.each{|c| self << r.i(RSS+c)}

whats going there? first its Markaby - HTML as Ruby. we're lazily grabbing the fields we decided we wanted, and throwing them into a layout. after a request, its cached:

T -> [AP]
  GET /planet HTTP/1.1..Host: m..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20071221 BonEcho/ text/xml,application/xml,application/
  xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5..Accept-Language: en-us,e
  n;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7..
  Keep-Alive: 5..Connection: keep-alive..If-Modified-Since: Tue, 22 Jan 2008 15:56:36 GMT.
  .If-None-Match: "47961234-6fd3-1ec6cd8"..Cache-Control: max-age=0....                   
T -> [AFP]
  HTTP/1.1 304 Not Modified..Connection: close..Date: Tue, 22 Jan 2008 16:19:08 GMT..Conte
  nt-Encoding: gzip..ETag: "47961234-6fd3-1ec6cd8"....                                    

theres more that can be done. breaking out the resource preview into its own layout. generalizing the view data-source so that tabulator or other RDF tools can see the same thing we're showing in HTML, etc. this took only a minute to whip up, and the granularity and modularity of the components are awaiting bespoke adjustment to the situation..

dimanche 13 janvier 2008

diagrams of the day

- page 1 de 15