(JoSch) I recently programmed a little Python app for being able to read *ALL* Wikipedia articles (german wikipedia) on my Nokia E70 Symbian phone. It tought me that a 4GB miniSD Card is (with a whole bunch of tweaks) enough for the (german) wikipedia and that Symbian programming really is *no* fun. So why not do this for OpenMoko instead?
what about compressing the data? probably wouldn't allow you to search, but hey. --Minime 16:21, 16 July 2007 (CEST)
- (JoSch)I mentioned compression as an option in the article. It would be necessary to compress every single article on its own because it would be overkill to seek a few kilobytes in several GB compressed file every time. Then a title search could be made by filename but I think it's a better idea to have a title list file.
Compress batches of 10 files or so at a time.
- Compress the data in small portions - say 100K compressed - that can be decompressed in under a second.You probably want to use some sort of page-sorting-compressor - so that pages in one batch are similar - and will compress a bit better.
- It sounds logical that electronics based articles together will compress better than random (of course - in reality)...
- Then store a search-keyword database into this data.
- Works well.
- I use 'wwwoffle'  to search my browsed web-pages.
- --Speedevil 17:10, 16 July 2007 (CEST)
- Thanks for your ideas - I will consider them!
- --JoSch 17:57, 16 July 2007 (CEST)
Bzip is almost certainly a bad idea. It's really quite slow on this class of hardware. On another topic - worthwhile may be one 'core' encyclopedia, which contains entries like "Germany", "Paris", "1928". Combined with a daily or weekly download of 'topical' pages. "Steve Irwin", "Paris Hilton". This results in much better hit-rates for most users.
On compression. Of the most popular 5000 pages, they are 393M of uncompressed text. Compressing the whole lot as a solid block with gzip -9 results in 88M, gzip -1 is 101M. Individually gzipping -9 each article is 94M, and gzipping them in batches of 10 gives again 88M.
If the stats supplied are accurate, then this would cover some 80% of a months searches. Perhaps another 500M might take this to 90%+ --Speedevil 00:14, 18 July 2007 (CEST)
most read articles
Is it possible and a good idea to filter out the most readed articles?
- I added this to the feature list for you ;)
- --JoSch 20:10, 16 July 2007 (CEST)
Is Mokopedia a good name for this project? I also considered:
The "problem" is, that this application will run on every Linux box and not only on the OpenMoko platform but I ideally want to credit both: wikipedia and openmoko without making the impression that this is the OpenMoko Wiki or a wikipedia Editing Tool but simply an offline viewer for mobile linux devices.
"Mokopedia" was not the initial name but I thought that this word just sounds nice! ;)
Please also post additional naming ideas and discuss mine - I'm currently making good progress in programming the stuff. Just got stuck with the gtkhtml implementation... :-/ --JoSch 20:20, 16 July 2007 (CEST)