Reply
Thread Tools
Posts: 370 | Thanked: 443 times | Joined on Jan 2006 @ Italy
#1
Hi all,

this post is intentionally kept light & clear in order to have a scratch page to be updated only with sure and certain findings and conclusion. My goal would be to find the better configuration (if any) of system files for a specified use-case. When something is being explicitely working (I mean, lot of people report it works for their use case), we can put that single item in a wiki.

Here's the usual
WARNING FOR EVERYBODY - MANY OF THOSE TWEAKINGS COULD LEAD TO A NON WORKING N900 AND A REFLASH TO RESTORE IT IN CASE OF MISTAKES - SO BE AWARE!!!

Now you can go on reading, you've been warned

STILL NO DATA CONFIRMED NOR AVAILABLE

Things under investigation:
VM tweaks
SYSPART / OHMD tweaks
SCHEDULER/SYStemBLOCK tweaks
OTHER tweaks
 

The Following 9 Users Say Thank You to jurop88 For This Useful Post:
Posts: 370 | Thanked: 443 times | Joined on Jan 2006 @ Italy
#2
First of all, if a single document speaking about the aforementioned matters does exist, please point me in the right direction, it will be very nice from you! After having said that, now, this post is a long one.
Lot of comments had been posted during last year and a half, together with scripts, scripts collections, programs then completed with GUI (thank you DeBernardis & Saturn!!! and all the other that contributed, I hope to have thanked all of you when I found something useful), but until today I did not find a resume with subsequent explanations and conclusions on many detaiils expressed in the subject. One thing on we all could agree - I think - is that most of the lagginess problems that arise on our beloved 900s are due to memory constraint. Probably with 512M of rams, the quantity of threads around full of people wanting to smash the phone against a wall would be 1/10 or 1/100 in respect of as today, knowing that anyway the VSYNC problem will make our 900s feel slow perhaps forever (or until Stskeeps changes his mind and decides to try finally to compile vsync against 2.6.28 kernels... I know I know, many times you said it is almost impossible to too many changes required in the source, but we can always hope!).
So said, I tried in a year really LOT of this tweaks. I am a curious person, and I learnt lot of things about kernel internals of VM and the like. On one side it has been rather rewarding, sometime frustrating, but at the end is a week now that I really feel that 'urge to change something' going down. To verify it is not a placebo effect, I used this morning half a day my 900 in its original configuration (not from scratch, with the full bloat of software I have installed only with no tweaks applied in its standard configuration except the overclock), and now I can rather firmly claim that for my use case the difference does exist. Reverted now happily to tweaked configuration.
I also made some rough benchmarks, and often the results from those benchmarks left me wiithout clear ideas. So i decided the final judge was the use I made of the phone day by day. Such decision on a side is very important because inspite everything we could say, we have a n900 to USE it, not only to hack with it. On the other side, it is rather hard to stay subjective when travelling in the feeling realm, and I decided to write this post in order to find some other testers willing to compare opinions.
I installed some test tools also from the tools repository, being IOSTAT perhaps the most important. I modified Conky in order to have dirty pages, writebacks and uninterruptible processes updated once per second.I started working in a systematic manner around a month ago, but decided not to share anything until the point I had some clear ideas on what I wanted and was looking from/for those experiments. For working, I mean not only try to change something and to say: -"yes, it feels better". I mean methodically change 1 parameter, fire a test script with 128M dd to and from swap partitions, fire at least three memory hogs (browser with standard pages with flash AD, maps and mobilestellarium for example) keeping htop and my modified conky running in the background all the day. When testing, this modified conky+htop alone keeps the sistem load around 1 when screen is on and Xorg working, so I hope the stress test is good enough. Batteries never got to the 4 hours mark while testing, with phone always warm... I cannot tell how many times my phone rebooted under such high loads with modified parameters, and especially the number of phone calls i lost while doing those test !!!


I think a definitive conclusion will be almost impossible to achieve, because VM organization and prioritizing is not a simple matter, and a good part of that is pure math. But at least to achieve some confidence that if I use my N900 as a media server, for example, some modifications will be helpful, I think that's a reasonable goal!

Next post will resume my tests. The idea is to keep the thread clear as much as possible in order to collect all infos in the first post. So please, if somebody would like to join and share his experiences, please try to follow the scheme I am explaining:
SETUP:
N900
mmc yes//no which one
stock/modified kernel which one
USAGE
short description of your use case
TWEAKS APPLIED
divided in the area where they affect
WHY THOSE TWEAKS?
Here is the trickyest part. It would be nice to explain WHY and HOW you get to the conclusion that the modifications make the n900 feel better, technical background and kind of response (feeling, stress test, benchmark...)

So let's start, hoping somebody will follow me in this crazy job. After all the work I did, the feeling is that Nokia engineers did a very good job on their part, inspite of some comments stating the opposite. Keep in mind they have to provide a resilient machine instead of a top-performing one optimizing what they have and, keeping in mind the kind of device the 900 is, I think they did really a great job
 

The Following 9 Users Say Thank You to jurop88 For This Useful Post:
Posts: 370 | Thanked: 443 times | Joined on Jan 2006 @ Italy
#3
Everything said, here follows a resume of my experiences so far.

SETUP:
N900
8Gb class 6 uSD card
Power kernel, std overclock 850 (sometimes I up to 1100 while watching a quick video or using heavily Gnumeric)

USAGE:
Browser (maemo.org, home banking, other forums, no flash video and flash adverts blocked), 4 online IM accounts OR (mutually exclusive) bluetooth tethering for my PC, mediaplayer for OGGs, Sygyc maps, games from time to time, calendar and obviously PHONE

TWEAKS:
VM
  • Swap on MMC
  • modified /proc/sys/vm as follows:
  • swappines 70
  • dirty_ratio 8
  • dirty_background_ratio 4
  • vfs_cache_pressure 1000
  • page-cluster 6
  • oom_kill_allocating_task 1
  • min_free_kbytes 4096
MMC QUEUE
  • modified sys/block/mmcblk1/queue
  • nr_requests 32
  • read_ahead_kb 512
OHMD/SYSPART (modified /usr/share/policy/etc/rx51/syspart.conf)
  • Partititions:
  • partition desktop memory-limit 70M
  • partition desktop cpu-shares 4096
  • partition active_ui memory-limit 130M
  • partition standby_ui memory-limit 95M
  • partition background memory-limit 25M
Rules:
i have flashlight-extra and panorama installed, therefore I added in
  • [classify camera]
    /usr/bin/flashlight-extra
    /usr/bin/panorama
  • [classify desktop]
    removed /usr/bin/matchbox-window-manager
    added /usr/bin/hildon-sv-notification-daemon
  • [classify mediarend]
    added /usr/bin/matchbox-window-manager
  • [classify mediasrc]
    removed /usr/bin/hildon-sv-notification-daemon
OTHER
modified WSEGL_UseHWSync=1
---------------

MODS EXPLANATION
  • Swap on uSD. I think there is still a lot to understand about this tweak. I made someDD benchmarks on both eMMC (on an ext3 partition) and card, and i can tell the eMMC is slightly faster than my class 6 memory card when reading but slower when writing. Roughly, timings with 128Mb files states the speed at approx 1,2 MB/sec write and 14 MB/sec read. Probably reading times are affected of buffer kicking in at the start of reading, but then we should also consider processes running, system overhead and everything else. The internal memory figured around 0,95 MB/s and 16MB/sec
    This difference in speed (eMMC vs uSD, eMMC slower 20% reading and faster 10% reading) is almost confirmed, albeit with different figures, under real usage as swap partitions. IOSTAT near usage peaks during the day showed the top figures to be 180 KB/s reading and 17KB/s wiriting for the eMMC and respectively 160 and 20Kb/s for the uSD
  • The changes in VM management have been tested almost by feeling, ranging through the whole scale of values. Tried swappiness 0, dirty ratios very low and very high, different philosophies as explained in so many threads on TMO. The final setting have been tuned by feeling, after having decided to keep memory clear via dirty ratios very low - I noticed on Maemo never more than 2 PDFlush threads are spawned probably due to the bandwith bottleneck, so having too much memory to free at once is not a good idea while suddendly you need ram. With a dirty_background_ratio of 3 or less, I noticed the nr_dirty value being always 0 and values of nr_dirty_writebacks increasing every moment I did something, so probably it was the moment when it became too aggressive. Same procedure had been followed for dirty_ratio setting. It is mentioned from many sources that some kernel version have a lower bound of 5 at these values, but this seemed not the case for my kernel.
    I would like to have some confirmations or negations on what follows, because it is what I understood from looking on the internet on VM settings: Vfs_cache pressure had been greatly increased in order to convince the kernel to almost always decide to discard filesystem cache in favour of free ram. With a solid state memory there is little penalty reading the values again, it's better always to look for datas then start swapping. Reading datas from cache or looking them all around the disk take the same time. I prefer to keep in memory some useful page of user programs instead of their data with so low ram and no penalty. To speed up this process, the scheduler for uSD had been swapped to NOOP in order to put datas just as they are ready and not to use any extra computational power. We don't have moving parts and we don't need elevators!
    Since the HW size of block of uSD is reported from my kernel to be 512K, and every single page is 4K, I set page cluster set to 7. It means that the kernel will try to swap pages in groups of 2^7=128 pages*4KB each=512; I wonder if I am missing something because Nokia engineers could not have mistaken this. If anyone has a pointer it will be very well accepted. Lastly, min_free_kbytes had been slightly increased and the OOM activated, but to be honest after the modifications in syspart.conf I never saw it kicking in. The uSD nr_request had been lowered after having read the comments on I/O pressure. In my use cases it made no percetible difference and benchmarks does not show difference, too. Read-ahead equalled to the dimension of phisical block. Sincerely, r/w benchmarks did not show any difference changing these values and I don't know if they are of any benefit globally.
  • Then come ohmd/syspart modifications. I have to say that these, combined to moving the swap to uSD, are the things that made my day. Don't know if everybody has the same tremendous slowness when a notify has to arrive. You know, those 4/5 seconds in which you see your phone stop all of its activities and you wait to feel it rumble and when it eumbles you know that - if everything goes well - in 3 or 4 seconds an annoying yellow baloon will appear on the top of the screen. It could be an SMS, or an IM, who knows? If in this moment somebody decide to phone call you and you have an uptime of more than 10 hours and perhaps one or two browser windows open in the background, no way you will be able to respond and you will have to recall that unfortunate girls who had that urgent necessity to hear your voice...
    Nokians decided to have the notification daemon in the group of essential services, such as telepathy or gstreamer. Well, if I see a yellow baloon 5 seconds later, it is not a problem for me, is it? So i slightly reorganized the assignment of syspart partitions (and thus priorities), also taking the occasion to promote matchbox (the window manager). I also dedicated a little bit less memory to applications and a little more to desktop and essential services. Everything will be clear if you take some time to read the values i put in /usr/share/policy/etc/rx51/syspart.conf. I also reduced the CPU slice of desktop group.

FINAL COMMENTS
  • modifications in syspart UNTIL NOW for me had no visible drawbacks - so far so good. Feedbacks on that will be greatly appreciated
  • moving swap on uSD worked for me (TM) - it will be nice to understand WHY, because the difference between internal memory and card is not so big. We really need some more details from somebody who knows N900 hardware very well (STSKeeps??? Where are you?!?!?)
  • VM values are a balancing and probably are the most subject to be adapted to use cases. After lowering the dirty ratios and increasing vfs pressure for the aforementioned reasons, I slowly increased swappiness until the point I saw visually in real-time a certain balance between nr_dirty and nr_dirty_writebacks with system under high load.

With the settings reported here and my use case, I read 14 uninterruptible processes in the queue and processor 100% @850, with a waiting reaction time always less than 3/4s maximum. During last week only once I had to leave the n900 to settle for some minutes before going responsive again. Try to launch, without waiting states between your clicks, microb, contacts (my list is over 580 buddies), mediaplayer, angry birds, calendar, bounce evolution, mobilestellarium, gnumeric, and panorama - you get it!
But the best thing happened this morning - I was testing with tons of apps active going back and forth between them, system load was over 4, 12 D processes, processor ranging from 50 to 100%, I was messaging and the phone rang - OK, I thought, let's see who I will have to recall now... and 1 second later the phone interface appeared! At that point I decided it was the time to post on TMO

So that's all folks! I hope this one is only the start of a constructive process trying to understand better the internals of 900, and at the same time the start for a good 'optimization based on use cases' wiki, or best, some CSSU packages adaptation based on use cases who any user could then choose!

Cheers, everybody.

PS: please, don't blame me too much for grammar and english mistakes - english is not my native language!
EDIT - And thank you for patience if you read everything - tried to clean a little bit with formatting after vi_ suggestion

Last edited by jurop88; 2011-03-16 at 21:16. Reason: cleaning and formatting
 

The Following 40 Users Say Thank You to jurop88 For This Useful Post:
Posts: 1,680 | Thanked: 3,685 times | Joined on Jan 2011
#4
Originally Posted by jurop88 View Post
Everything said, here follows a resume of my experiences so far.
SETUP:
N900
8Gb class 6 uSD card
Power kernel, std overclock 850 (sometimes I up to 1100 while watching a quick video or using heavily Gnumeric)
USAGE:
Browser (maemo.org, home banking, other forums, no flash video and flash adverts blocked), 4 online IM accounts OR (mutually exclusive) bluetooth tethering for my PC, mediaplayer for OGGs, Sygyc maps, games from time to time, calendar and obviously PHONE
TWEAKS:

VM
Swap on MMC
modified /proc/sys/vm as follows:
swappines 70
dirty_ratio 8
dirty_background_ratio 4
vfs_cache_pressure 1000
page-cluster 6
oom_kill_allocating_task 1
min_free_kbytes 4096

MMC QUEUE
modified sys/block/mmcblk1/queue
nr_requests 32
read_ahead_kb 512

OHMD/SYSPART (modified /usr/share/policy/etc/rx51/syspart.conf)
partititions:
partition desktop memory-limit 70M
partition desktop cpu-shares 4096
partition active_ui memory-limit 130M
partition standby_ui memory-limit 95M
partition background memory-limit 25M
rules:
i have flashlight-extra and panorama installed, therefore I added in
[classify camera]
/usr/bin/flashlight-extra
/usr/bin/panorama
[classify desktop]
removed /usr/bin/matchbox-window-manager
added /usr/bin/hildon-sv-notification-daemon
[classify mediarend]
added /usr/bin/matchbox-window-manager
[classify mediasrc]
removed /usr/bin/hildon-sv-notification-daemon

OTHER
modified WSEGL_UseHWSync=1

---------------
MODS EXPLANATION
- swap on uSD
I think there is still a lot to understand about this tweak. I made someDD benchmarks on both eMMC (on an ext3 partition) and card, and i can tell the eMMC is slightly faster than my class 6 memory card when reading but slower when writing. Roughly, timings with 128Mb files states the speed at approx 1,2 MB/sec write and 14 MB/sec read. Probably reading times are affected of buffer kicking in at the start of reading, but then we should also consider processes running, system overhead and everything else. The internal memory figured around 0,95 MB/s and 16MB/sec
This difference in speed (eMMC vs uSD, eMMC slower 20% reading and faster 10% reading) is almost confirmed, albeit with different figures, under real usage as swap partitions. IOSTAT near usage peaks during the day showed the top figures to be 180 KB/s reading and 17KB/s wiriting for the eMMC and respectively 160 and 20Kb/s for the uSD
The changes in VM management have been tested almost by feeling, ranging through the whole scale of values. Tried swappiness 0, dirty ratios very low and very high, different philosophies as explained in so many threads on TMO. The final setting have been tuned by feeling, after having decided to keep memory clear via dirty ratios very low - I noticed on Maemo never more than 2 PDFlush threads are spawned probably due to the bandwith bottleneck, so having too much memory to free at once is not a good idea while suddendly you need ram. With a dirty_background_ratio of 3 or less, I noticed the nr_dirty value being always 0 and values of nr_dirty_writebacks increasing every moment I did something, so probably it was the moment when it became too aggressive. Same procedure had been followed for dirty_ratio setting. It is mentioned from many sources that some kernel version have a lower bound of 5 at these values, but this seemed not the case for my kernel.
I would like to have some confirmations or negations on what follows, because it is what I understood from looking on the internet on VM settings: Vfs_cache pressure had been greatly increased in order to convince the kernel to almost always decide to discard filesystem cache in favour of free ram. With a solid state memory there is little penalty reading the values again, it's better always to look for datas then start swapping. Reading datas from cache or looking them all around the disk take the same time. I prefer to keep in memory some useful page of user programs instead of their data with so low ram and no penalty. To speed up this process, the scheduler for uSD had been swapped to NOOP in order to put datas just as they are ready and not to use any extra computational power. We don't have moving parts and we don't need elevators!
Since the HW size of block of uSD is reported from my kernel to be 512K, and every single page is 4K, I set page cluster set to 7. It means that the kernel will try to swap pages in groups of 2^7=128 pages*4KB each=512; I wonder if I am missing something because Nokia engineers could not have mistaken this. If anyone has a pointer it will be very well accepted. Lastly, min_free_kbytes had been slightly increased and the OOM activated, but to be honest after the modifications in syspart.conf I never saw it kicking in. The uSD nr_request had been lowered after having read the comments on I/O pressure. In my use cases it made no percetible difference and benchmarks does not show difference, too. Read-ahead equalled to the dimension of phisical block. Sincerely, r/w benchmarks did not show any difference changing these values and I don't know if they are of any benefit globally.

Then come ohmd/syspart modifications.
I have to say that these, combined to moving the swap to uSD, are the things that made my day. Don't know if everybody has the same tremendous slowness when a notify has to arrive. You know, those 4/5 seconds in which you see your phone stop all of its activities and you wait to feel it rumble and when it eumbles you know that - if everything goes well - in 3 or 4 seconds an annoying yellow baloon will appear on the top of the screen. It could be an SMS, or an IM, who knows? If in this moment somebody decide to phone call you and you have an uptime of more than 10 hours and perhaps one or two browser windows open in the background, no way you will be able to respond and you will have to recall that unfortunate girls who had that urgent necessity to hear your voice...
Nokians decided to have the notification daemon in the group of essential services, such as telepathy or gstreamer. Well, if I see a yellow baloon 5 seconds later, it is not a problem for me, is it? So i slightly reorganized the assignment of syspart partitions (and thus priorities), also taking the occasion to promote matchbox (the window manager). I also dedicated a little bit less memory to applications and a little more to desktop and essential services. Everything will be clear if you take some time to read the values i put in /usr/share/policy/etc/rx51/syspart.conf. I also reduced the CPU slice of desktop group.

FINAL COMMENTS
- modifications in syspart UNTIL NOW for me had no visible drawbacks - so far so good. Feedbacks on that will be greatly appreciated
- moving swap on uSD worked for me (TM) - it will be nice to understand WHY, because the difference between internal memory and card is not so big. We really need some more details from somebody who knows N900 hardware very well (STSKeeps??? Where are you?!?!?)
- VM values are a balancing and probably are the most subject to be adapted to use cases. After lowering the dirty ratios and increasing vfs pressure for the aforementioned reasons, I slowly increased swappiness until the point I saw visually in real-time a certain balance between nr_dirty and nr_dirty_writebacks with system under high load.

With the settings reported here and my use case, I read 14 uninterruptible processes in the queue and processor 100% @850, with a waiting reaction time always less than 3/4s maximum. During last week only once I had to leave the n900 to settle for some minutes before going responsive again. Try to launch, without waiting states between your clicks, microb, contacts (my list is over 580 buddies), mediaplayer, angry birds, calendar, bounce evolution, mobilestellarium, gnumeric, and panorama - you get it!
But the best thing happened this morning - I was testing with tons of apps active going back and forth between them, system load was over 4, 12 D processes, processor ranging from 50 to 100%, I was messaging and the phone rang - OK, I thought, let's see who I will have to recall now... and 1 second later the phone interface appeared! At that point I decided it was the time to post on TMO

So that's all folks! I hope this one is only the start of a constructive process trying to understand better the internals of 900, and at the same time the start for a good 'optimization based on use cases' wiki, or best, some CSSU packages adaptation based on use cases who any user could then choose!

Cheers, everybody.

PS: please, don't blame me too much for grammar and english mistakes - english is not my native language!
Holy WALL OF TEXT bro. Please, some formatting!
__________________
N900: One of God's own prototypes. A high-powered mutant of some kind never even considered for mass production. Too weird to live, and too rare to die.
 

The Following User Says Thank You to vi_ For This Useful Post:
Posts: 370 | Thanked: 443 times | Joined on Jan 2006 @ Italy
#5
Tried to clean up a little bit, lot of text meant lot of formatting. TY for the suggestion, didn't thought about that
 

The Following User Says Thank You to jurop88 For This Useful Post:
Posts: 1,680 | Thanked: 3,685 times | Joined on Jan 2011
#6
Whoa. Mind. Blown.
__________________
N900: One of God's own prototypes. A high-powered mutant of some kind never even considered for mass production. Too weird to live, and too rare to die.
 
Posts: 140 | Thanked: 40 times | Joined on Sep 2010
#7
amazing post jurop88, should implement it to swappolube
 
hawaii's Avatar
Posts: 1,030 | Thanked: 792 times | Joined on Jun 2009
#8
Originally Posted by vi_ View Post
Whoa. Mind. Blown.
Kaboom.

Great finds here with modifications to priority processes omhd. I've have pulseaudio et al media at a high stack priority in order to reduce jittering for a few months now - but never felt the need to play around more.

Thanks for the testing you've done. Seriously.
 
hawaii's Avatar
Posts: 1,030 | Thanked: 792 times | Joined on Jun 2009
#9
Moving hildon-sv-notification-daemon out of [mediasrc] closes the socket and doesn't allow any sound?
 
Posts: 2,014 | Thanked: 1,581 times | Joined on Sep 2009
#10
Made some of these changes and will see how it pans out over the next few days. I am a fairly heavy user so it will be interesting.
__________________
Class .. : Power Poster, Potential Coder
Humor .. : [*********] Alignment: Chaotic Evil
Patience : [***-------] Weapon(s): +2 Logic Mace
Agro ... : |*****-----] Relic(s) : G1, N900

 

The Following User Says Thank You to Bratag For This Useful Post:
Reply


 
Forum Jump


All times are GMT. The time now is 11:43.