majestic12.co.uk dsearch Forum Index majestic12.co.uk dsearch
Distributed Search forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Struggling? TRY THIS
Goto page 1, 2, 3, 4  Next
 
Post new topic   Reply to topic    majestic12.co.uk dsearch Forum Index -> Crawling
View previous topic :: View next topic  
Author Message
OldChap
Senior Member


Joined: 26 May 2010
Posts: 427
Location: Plymouth UK

PostPosted: Sun Aug 26, 2012 8:10 pm    Post subject: Struggling? TRY THIS Reply with quote

Those of you who are struggling to keep nodes running at anywhere near normal should maybe consider this in the interim in the sure knowledge that Alex will get to make the necessary changes in due course.

I tried a whole number of ways to keep my crawl rate up and have eventually ended up with the following settings on a 100/5 connection (this connection is in fact limited to around 54/5 during the daytime)

Please notice that the 5Mb limit to my upload is being pushed to the max at night in this case so the 100 part is not going to come into play

I have a single rig running MJ. It is an i7-920 with 16GB of ram. The OS is on a 250GB WD 7200 HDD. The Nodes are on a single Intel 520 120GB SSD. which has 20% over-provisioning
I have a 4GB ramdisk in use.

The OS is Linux Mint. The nodes are using Refic's ready built solution.

Each node is set up as 225 workers and 75 buckets just now. This might still be adjusted for better success ratios. all other settings are the same as I used when 5 nodes would keep my connection busy.

The key is to make sure that you don't get too many buckets open on each node so not more that 75 for sure ...you might do better with a few less. I would go as far as to say that somewhere in the 50's seems ideal for me.

Now the surprise to me was just how many nodes it takes....

To keep my 5Mb upload full takes around 18-20 nodes

I am back to tuning off at 17:30 local so as not to exceed my daytime limit imposed by my ISP for the first time in quite a while.

I have my nodes auto re-starting every 24 hours which helps to ensure that no bucket becomes too old



Is it perfect? No. but maybe this will work for a few of you guys especially if you leave one node running normally in order to observe a return to normal crawling.

I am hoping that using this method will give Alex a break and give you guys the performance you crave.

Maybe I should have kept quiet and got a bigger slice of the pie this quarter Shocked but lets get these horrid buckets done and dusted Wink


Last edited by OldChap on Sun Aug 26, 2012 10:12 pm; edited 2 times in total
Back to top
View user's profile Send private message
rilian
Senior Member


Joined: 30 Mar 2008
Posts: 1074

PostPosted: Sun Aug 26, 2012 8:54 pm    Post subject: Reply with quote

do you just run several nodes in the folders

.../nodes/node1/*
.../nodes/node2/*

?
_________________
I crunch&crawl for Ukraine
Back to top
View user's profile Send private message Visit poster's website
OldChap
Senior Member


Joined: 26 May 2010
Posts: 427
Location: Plymouth UK

PostPosted: Sun Aug 26, 2012 9:46 pm    Post subject: Reply with quote

Yes, in my particular instance The ssd is mounted and named Fast MJ. Inside that I have two folders, one is Lost & Found the other is TestMJ and in that there are 20 individual folders named, believe it or not, 1 to 20. Smile

You may find yourself having to trial which setting for Options>cpu usage tuning>minimise disk activity, works best for you.

Edit:

If you are wondering about what I mean by being upload limited at night, I proffer this pic:



The combined upload of results and requests for individual websites seems to allow for only 60-65Mb of download. Apparently, getting my upload up to 10Mb could take many months Evil or Very Mad
Back to top
View user's profile Send private message
rilian
Senior Member


Joined: 30 Mar 2008
Posts: 1074

PostPosted: Sun Aug 26, 2012 11:39 pm    Post subject: Reply with quote

i have previously used VMware nodes but linux looks faster to support and less buggy. thanks!
_________________
I crunch&crawl for Ukraine
Back to top
View user's profile Send private message Visit poster's website
OldChap
Senior Member


Joined: 26 May 2010
Posts: 427
Location: Plymouth UK

PostPosted: Mon Aug 27, 2012 12:04 am    Post subject: Reply with quote

On my WCG crunchers I am running Linux Mint Cinnamon to good effect. I am not sure what difference it makes here but that one is based on the 3.2.0 Kernel whilst I believe that the normal Mint I have on the nodes box uses 3.0.0.

An aside for those interested in how much processor is being used... I am running 4 cores on WCG on this machine too and getting quite reasonable completion times. CPU is at 4.0 so I am sure this would run MJ comfortably at stock.
Back to top
View user's profile Send private message
Deadly_Fire
Senior Member


Joined: 28 Jan 2008
Posts: 1589

PostPosted: Mon Aug 27, 2012 2:40 am    Post subject: Reply with quote

I'm going to give mono another try(fingers crossed). I'm using Xubuntu 12.04.1 LTS 64bit w/ 3.2 kernel, what distro are you guys running?
Back to top
View user's profile Send private message
Quix0r
Junior Member


Joined: 28 Dec 2010
Posts: 35
Location: Krefeld, Germany

PostPosted: Sun Sep 09, 2012 1:59 am    Post subject: Reply with quote

Debian AMD64 with 32 bit (on one node) and 64 bit MJ12 code. Smile
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger
BaalMcKloud
Junior Member


Joined: 31 Jan 2011
Posts: 37
Location: Germany

PostPosted: Sun Sep 09, 2012 9:21 am    Post subject: Reply with quote

3 x Debian 6 "Squeeze" x64
2 x Debian 7 "Wheezy" x86

Very Happy Very Happy
_________________
Back to top
View user's profile Send private message
rilian
Senior Member


Joined: 30 Mar 2008
Posts: 1074

PostPosted: Sun Sep 09, 2012 10:58 pm    Post subject: Reply with quote

set up 2 nodes on 1 ubuntu machine. i'll see if it works well tomorrow Smile
_________________
I crunch&crawl for Ukraine
Back to top
View user's profile Send private message Visit poster's website
Deadly_Fire
Senior Member


Joined: 28 Jan 2008
Posts: 1589

PostPosted: Mon Sep 10, 2012 12:10 am    Post subject: Reply with quote

Anyone who is running Ubuntu or any derivative of it, do you get any time drift? i.e. the clock falls behind over time? Wondering whether to ignore it or if wrong system clock affects nodes at all Confused
Back to top
View user's profile Send private message
arni
Senior Member


Joined: 24 May 2005
Posts: 613

PostPosted: Mon Sep 10, 2012 12:13 am    Post subject: Reply with quote

Deadly_Fire wrote:
Anyone who is running Ubuntu or any derivative of it, do you get any time drift? i.e. the clock falls behind over time? Wondering whether to ignore it or if wrong system clock affects nodes at all Confused


apt-get install ntp

Wink
Back to top
View user's profile Send private message Visit poster's website
Deadly_Fire
Senior Member


Joined: 28 Jan 2008
Posts: 1589

PostPosted: Mon Sep 10, 2012 12:23 am    Post subject: Reply with quote

arni wrote:

apt-get install ntp

Wink


Yep first thing I thought of Razz I've added multiple time servers and tried manually forcing ntp update(time consuming but works). Right now I have a ntpsync job in cron.hourly and it doesn't seem to work 100% of the time but it's an improvement.
Back to top
View user's profile Send private message
[eNeRGy]
Senior Member


Joined: 22 Jul 2008
Posts: 417
Location: Nijmegen, The Netherlands

PostPosted: Mon Sep 10, 2012 7:25 am    Post subject: Reply with quote

Which HyperVisor doe you use?

I now the older HyperV versions had some issues and you could fix that with some kernel properties.

http://blog.hompus.nl/2010/01/08/correcting-time-drift-with-centos-on-hyper-v/
_________________

Back to top
View user's profile Send private message Visit poster's website
rilian
Senior Member


Joined: 30 Mar 2008
Posts: 1074

PostPosted: Mon Sep 10, 2012 11:45 am    Post subject: Reply with quote

1) how do you keep the nodes zoo up on the linux machine ? do you experience situations when some nodes just crash some time ?

2) do you run nodes with ? nohup ./run.sh &

3) is it possible to set nodes use different dns servers on one machine ?
_________________
I crunch&crawl for Ukraine
Back to top
View user's profile Send private message Visit poster's website
Cow_tipping
Senior Member


Joined: 08 Jul 2008
Posts: 955
Location: On the Run

PostPosted: Mon Sep 10, 2012 12:15 pm    Post subject: Reply with quote

I have 1 stable linux node and a second folder ready for the second node, however i cant get to entering the registration details.
I assume i would need a webserver for each node using a different piet number, so I wanted to change the webserver's port number and then start up, but somehow I ended up with port 10.
Trying to change to a different port number seems to be interrupted by keystrokes. Start with P, wait 2 seconds, a 4 pops up, entering port number e.g.1090 gets interrupted by registration details comment, press enter, a ] symbol appears, enter again and finally the remark that new port number has been entered. Unfortunately at this point the port number is something like 9 or 19, not 1090 and webserver won't use this setting.

Must be my lacking Linux skills, or else something interfering.
Distro is ubuntu v12 32 bits with refic's package.
_________________

Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    majestic12.co.uk dsearch Forum Index -> Crawling All times are GMT + 2 Hours
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group