MapReduce framework Disco (12)
share
digg
by
tmielika (13)
on
High Scalability - Building bigger, faster, more reliable websites. (68)
1 day, 8 hours
ago
permalink
Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.
Re: Hadoop over Lustre? (1)
share
digg
by
Joel Welling (0)
on
core-user@hadoop.apache.org Archives (0)
6 days, 17 hours
ago
permalink
That seems to have done the trick! I am now running Hadoop 0.18 straight out of Lustre, without an intervening HDFS. The unusual things about my hadoop-site.xml are: <property> <name>fs.default.name</name> <value>file:///bessemer/welling</value> </property> <property> <name>mapred.system.dir</name> <value>${fs.default.name}/hadoop_tmp/mapred/system</value> <description>The shared directory where MapReduce stores control files. </description> </property> where /bessemer/welling is a directory on a mounted Lustre filesystem. I then do 'bin/start-mapred.sh' (without starting dfs), and I can run Hadoop programs normally. I do have to specify full ...
PROOF < ROOT < TWiki (1)
share
digg
1 week
ago
permalink
Sort of like Symphony, in that it's intended to support interactive applications that aren't necessarily best suited to a batch system.
Cloud Computing Jobs: Cloud enablers needed, US and Europe (1)
share
digg
1 week
ago
permalink
I guess Univa has gone from Grid to cluster to Cloud now.
Google's MapReduce suddenly not so backward (2)
share
digg
on
The Register (100)
1 week, 1 day
ago
permalink
SQL tools plug gaps What was seen as a major hole in Google's MapReduce database technology has been plugged, not once but twice. In the same week.…
TACC software sees TeraGrid as one big resource (1)
share
digg
by
John West (0)
on
insideHPC (0)
1 week, 1 day
ago
permalink
I think this is an interesting bit of technology (tip of the hat to HPCwire for the link). The tool, developed at TACC, is called MyCluster and allows researchers to aggregate resources on the TeraGrid such that they appear to be a single large cluster This wasn’t a trivial task a few years ago, when Jeffrey P. Gardner, senior research scientist in high-performance computing at the University of Washington, was looking to run his research ...
Re: HDFS Vs KFS (1)
share
digg
by
C G (0)
on
core-user@hadoop.apache.org Archives (0)
1 week, 2 days
ago
permalink
I've built and deployed KFS outside of Hadoop and it seems to work. I'm planning to bring up a test environment shortly running Hadoop with KFS. With all due respect to HDFS developers and committers, I am strongly hesitant to call HDFS "stable." We've had several major issues with HDFS in post-0.15.x releases.� I don't know if KFS will be any better or more reliable, but it seems worth investing time finding out. --- On ...
Cloud computing: A catchphrase in puberty (2)
share
digg
on
The Register (100)
1 week, 2 days
ago
permalink
How Google and Amazon will take your money and step on your dreams Fail and You It's been called a lot of things: utility computing, grid computing, distributed computing, and now cloud computing. You can come up with any CTO-friendly name you like, but they all mean the same shit: Renting your quickly depreciating physical assets out because your software company is out of ideas for computer programs.…
-
DJ said:
The author is extremely ignorant of the technologies and what their real benefits are.
Elastic Hadoop Clusters with Amazon's Elastic Block Store (11)
share
digg
by
Tom White (3)
on
Tom White (3)
1 week, 5 days
ago
permalink
I gave a talk on Tuesday at the first Hadoop User Group UK about Hadoop and Amazon Web services - how and why you can run Hadoop with AWS. I mentioned how integrating Hadoop with Amazon's "Persistent local storage", which Werner Vogels had pre-announced in April, would be a great feature to have to enable truly elastic Hadoop clusters that you could stop and start on demand.Well, the very next day Amazon launched this service, ...
Mathematica's Cloud Computing Initiative (1)
share
digg
by
mike@mikeriley.com (0)
on
Blog Entries (0)
1 week, 5 days
ago
permalink
I recently interviewed Schoeller Porter, Technical Development Specialist in the Wolfram Partnerships Group, about Mathematica's cloud computing initiatives.MR: Mathematica is known for its powerful desktop-centric computational aspects and to some degree its cluster computing capacity, but what about its role in the cloud computing space?SP: Cloud computing is a new area we are beginning to look at. Mathematica is very well known on the desktop and has pretty g [...]
-
cansmith said:
To me, this is the natural way to leverage cloud services. The mechanics are hidden from the application user.
Re: Hadoop over Lustre? (1)
share
digg
by
Arun C Murthy (0)
on
core-user@hadoop.apache.org Archives (0)
1 week, 6 days
ago
permalink
It wouldn't be too much of a stretch to use Lustre directly... although it isn't trivial either. You'd need to implement the 'FileSystem' interface for Lustre, define a URI scheme (e.g. lfs://) etc. Please take a take a look at the KFS/ S3 implementations. Arun On Aug 21, 2008, at 9:59 AM, Joel Welling wrote: > Hi folks; > I'm new to Hadoop, and I'm trying to set it up on a cluster for which ...
Re: Hadoop over Lustre? (1)
share
digg
by
Steve Loughran (0)
on
core-user@hadoop.apache.org Archives (0)
1 week, 6 days
ago
permalink
Joel Welling wrote: > Thanks, Steve and Arun. I'll definitely try to write something based on > the KFS interface. I think that for our applications putting the mapper > on the right rack is not going to be that useful. A lot of our > calculations are going to be disordered stuff based on 3D spatial > relationships like nearest-neighbor finding, so things will be in a > random access pattern most of the ...
Re: Hadoop over Lustre? (1)
share
digg
by
Joel Welling (0)
on
core-user@hadoop.apache.org Archives (0)
1 week, 6 days
ago
permalink
Thanks, Steve and Arun. I'll definitely try to write something based on the KFS interface. I think that for our applications putting the mapper on the right rack is not going to be that useful. A lot of our calculations are going to be disordered stuff based on 3D spatial relationships like nearest-neighbor finding, so things will be in a random access pattern most of the time. Is there a way to set up the ...
Storage/NAS Evals (1)
share
digg
by
jkowall (0)
on
Adventures of a Technical MasterMind (0)
2 weeks, 1 day
ago
permalink
We are looking for a strong NAS system, I'm leaning towards a clustered scale up/out system versus buying a box that I have to replace every 4 years. I think Isilon is the proven leader in this space, and we've selected them to go up against the king (Netapp). Here are the criteria we used to get it down to these two. We are looking into them in depth now. RequirementSub-FeatureWeightArchitecture Centralized Management6 Clustered Device ...
First UK Hadoop meet (1)
share
digg
on
Steve: Developing on the Edge (0)
2 weeks, 1 day
ago
permalink
I did manage to get over to the Hadoop UK event in London yesterday, which was sponsored by Skillsmatter, Last.fm and Yahoo! and hosted at the Skillsmatter office near Farringdon tube station. To get there we had to leave Bristol on the 7am train, which involved getting up at half five and sharing a car to the Bristol Parkway station, a station we didn't see again until half nine in the evening -a fairly long ...
Intel Parallel Studio (1)
share
digg
by
james.r.reinders@intel.com (0)
on
Blog Entries (0)
2 weeks, 1 day
ago
permalink
An open invitation to everyone - sign up (free) to be in on the first betas for the Intel Parallel Studio tools we just announced at Intel's Developer Forum on August 20th (betas start later this year, product mid-2009). intel.com/go/parallel It's for C/C++ programmers using Microsoft Visual Studio. (There's a link on the go/parallel page to related products in beta for Linux and Mac OS X.)I'm happy to do my best to answer any questions ...
Rackable buys TerraScale and now dumps TerraScale (1)
share
digg
by
joe (53)
on
scalability.org (0)
2 weeks, 2 days
ago
permalink
TerraScale were an innovative bunch that developed some interesting technologies around the xfs file system, and made it scale in a cluster. Some time ago, Rackable bought them. Now it appears that Rackable is pulling back from this market, and is putting TerraScale … er … RapidScale on the auction block. Ok, its not quite like [...]
Re: Why is scaling HBase much simpler then scaling a relational db? (1)
share
digg
by
Mork0075 (0)
on
core-user@hadoop.apache.org Archives (0)
2 weeks, 3 days
ago
permalink
I've read some papers and tutorials this week and now got some conrete questions: (1) Sharding is also available in common relational systems. Often it is discribed that you need an application layer for the (shards) federation. I unterstand HBase like this layer, which implements the whole sharding thing. HBase distributes the shards (regions) over the region servers if a certain size increases. Wouldn't it be more practicabel to distribute the regions by load and ...