Talk:Mokopedia

Revision as of 00:14, 18 July 2007

(JoSch) I recently programmed a little Python app for being able to read *ALL* Wikipedia articles (german wikipedia) on my Nokia E70 Symbian phone. It tought me that a 4GB miniSD Card is (with a whole bunch of tweaks) enough for the (german) wikipedia and that Symbian programming really is *no* fun. So why not do this for OpenMoko instead?

Compression

what about compressing the data? probably wouldn't allow you to search, but hey. --Minime 16:21, 16 July 2007 (CEST)

(JoSch)I mentioned compression as an option in the article. It would be necessary to compress every single article on its own because it would be overkill to seek a few kilobytes in several GB compressed file every time. Then a title search could be made by filename but I think it's a better idea to have a title list file.

Compress batches of 10 files or so at a time.

Compress the data in small portions - say 100K compressed - that can be decompressed in under a second.You probably want to use some sort of page-sorting-compressor - so that pages in one batch are similar - and will compress a bit better.

It sounds logical that electronics based articles together will compress better than random (of course - in reality)...

Then store a search-keyword database into this data.

Works well.

I use 'wwwoffle' [1] to search my browsed web-pages.

--Speedevil 17:10, 16 July 2007 (CEST)

Thanks for your ideas - I will consider them!

--JoSch 17:57, 16 July 2007 (CEST)

Bzip is almost certainly a bad idea. It's really quite slow on this class of hardware. On another topic - worthwhile may be one 'core' encyclopedia, which contains entries like "Germany", "Paris", "1928". Combined with a daily or weekly download of 'topical' pages. "Steve Irwin", "Paris Hilton". This results in much better hit-rates for most users.

On compression. Of the most popular 5000 pages, they are 393M of uncompressed text. Compressing the whole lot as a solid block with gzip -9 results in 88M, gzip -1 is 101M. Individually gzipping -9 each article is 94M, and gzipping them in batches of 10 gives again 88M.

If the stats supplied are accurate, then this would cover some 80% of a months searches. Perhaps another 500M might take this to 90%+ --Speedevil 00:14, 18 July 2007 (CEST)

most read articles

Is it possible and a good idea to filter out the most readed articles?

I added this to the feature list for you ;)

--JoSch 20:10, 16 July 2007 (CEST)

Project Name

Is Mokopedia a good name for this project? I also considered:

WikiMobile
MobileWiki
MokoWiki
WikipediaOffline

The "problem" is, that this application will run on every Linux box and not only on the OpenMoko platform but I ideally want to credit both: wikipedia and openmoko without making the impression that this is the OpenMoko Wiki or a wikipedia Editing Tool but simply an offline viewer for mobile linux devices.

"Mokopedia" was not the initial name but I thought that this word just sounds nice! ;)

Please also post additional naming ideas and discuss mine - I'm currently making good progress in programming the stuff. Just got stuck with the gtkhtml implementation... :-/ --JoSch 20:20, 16 July 2007 (CEST)

Talk:Mokopedia

From Openmoko

Revision as of 00:14, 18 July 2007

Compression

most read articles

Project Name

Views

Personal tools

Compression

most read articles

Project Name

Search

Tools

@@ Line 8: / Line 8: @@
 : ([[User:JoSch|JoSch]])I mentioned compression as an option in the article. It would be necessary to compress every single article on its own because it would be overkill to seek a few kilobytes in several GB compressed file every time. Then a title search could be made by filename but I think it's a better idea to have a title list file.
+Compress batches of 10 files or so at a time.
 ::Compress the data in small portions - say 100K compressed - that can be decompressed in under a second.You probably want to use some sort of page-sorting-compressor - so that pages in one batch are similar - and will compress a bit better.
@@ Line 17: / Line 18: @@
 :::Thanks for your ideas - I will consider them!
 :::--[[User:JoSch|JoSch]] 17:57, 16 July 2007 (CEST)
+Bzip is almost certainly a bad idea.
+It's really quite slow on this class of hardware.
+On another topic - worthwhile may be one 'core' encyclopedia, which contains entries like "Germany", "Paris", "1928".
+Combined with a daily or weekly download of 'topical' pages. "Steve Irwin", "Paris Hilton".
+This results in much better hit-rates for most users.
+On compression. Of the most popular 5000 pages, they are 393M of uncompressed text.
+Compressing the whole lot as a solid block with gzip -9 results in 88M, gzip -1 is 101M.
+Individually gzipping -9 each article is 94M, and gzipping them in batches of 10 gives again 88M.
+If the stats supplied are accurate, then this would cover some 80% of a months searches.
+Perhaps another 500M might take this to 90%+
+--[[User:Speedevil|Speedevil]] 00:14, 18 July 2007 (CEST)
 == most read articles ==