<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>pangea.com.mt</title>
	<atom:link href="http://www.pangea.com.mt/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pangea.com.mt</link>
	<description>Statistical machine translation training, MT services, customized machine-translation developments</description>
	<lastBuildDate>Wed, 30 Jun 2010 19:59:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Pangeanic welcomes Sony Europe Localization Director</title>
		<link>http://www.pangea.com.mt/2010/06/pangeanic-welcomes-sony-europe-localization-director/</link>
		<comments>http://www.pangea.com.mt/2010/06/pangeanic-welcomes-sony-europe-localization-director/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 19:59:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=499</guid>
		<description><![CDATA[While we were attending the last EAMT Conference (see dedicated post in our News), Pangeanic also welcomed Sony Professional Europe Localization Director, Salomé López-Lavado. Pangeanic has been a realiable language vendor and technology consultant for several years.
The visit highlighted and strengthened our relationship even further by focusing on the expansion and deployment of our PangeaMT [...]]]></description>
			<content:encoded><![CDATA[<p>While we were attending the last <a href="http://www.eamt2010.org/">EAMT Conference</a> (see dedicated <a href="http://www.pangea.com.mt/?p=495">post</a> in our News), Pangeanic also welcomed Sony Professional Europe Localization Director, Salomé López-Lavado. Pangeanic has been a realiable language vendor and technology consultant for several years.</p>
<p>The visit highlighted and strengthened our relationship even further by focusing on the expansion and deployment of our PangeaMT customized MT solution for Sony Europe in several languages and exploring the integration of our technologies in tailor-made, corporate globalization management environments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/06/pangeanic-welcomes-sony-europe-localization-director/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic, first LSP to create SMT division deploying TDA data, TAUS Data Association partner highlights</title>
		<link>http://www.pangea.com.mt/2010/06/pangeanic-lsp-create-smt-spin-off-taus-data-association-partner-highlights/</link>
		<comments>http://www.pangea.com.mt/2010/06/pangeanic-lsp-create-smt-spin-off-taus-data-association-partner-highlights/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 16:03:47 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=311</guid>
		<description><![CDATA[In a new online report, TAUS Data Association (TDA) highlights the fact that Pangeanic is a first example of a LSP company that, making extensive use* of TDA data, has succeeded in creating a new Statistical Machine Translation division, PangeaMT. In so doing, Pangeanic then evolves from being a well-established language service provider to becoming [...]]]></description>
			<content:encoded><![CDATA[<p>In a new <a href="http://www.tausdata.org/index.php/news/news/113" target="_blank">online report</a>, TAUS Data Association (TDA) highlights the fact that Pangeanic is a first example of a LSP company that, making extensive use* of TDA data, has succeeded in creating a new Statistical Machine Translation division, PangeaMT. In so doing, Pangeanic then evolves from being a well-established language service provider to becoming an innovative language technology solution provider that supports and benefits from globalization industry data-geared initiatives, such as TAUS´s TDA.</p>
<p>PangeaMT provides industry specific statistical machine translation (SMT) engines for automotive, consumer electronics, and industrial sectors. The service was launched in 2009 at a recent TAUS User Conference with an offer to train engines for free for companies seriously looking into deploying open source MT with a TMX workflow. If you would like to know more about PangeaMT´s current Spring campaign, please contact us.</p>
<blockquote><p>* Worth pointing out that Pangeanic leads the TDA data downloaders´ list, well ahead companies, such as Lionbridge, Oracle or WeLocalize. Downloaded data: 302,334,953 words. Info collated by TAUS Data Association and distributed to their partners at the end of March 2010.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/06/pangeanic-lsp-create-smt-spin-off-taus-data-association-partner-highlights/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PangeaMT BDM attends EAMT 2010</title>
		<link>http://www.pangea.com.mt/2010/06/pangeamt-bdm-attends-eamt-2010/</link>
		<comments>http://www.pangea.com.mt/2010/06/pangeamt-bdm-attends-eamt-2010/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 01:59:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=495</guid>
		<description><![CDATA[Elia Yuste, Pangeanic´s BDM, attended the 14th Annual Conference of the European Association for Machine Translation (EAMT) held in Saint-Raphaël, France, on 27th-28th May 2010.
Pangeanic relies heavily on the previous and current international research and expertise in Machine Translation and related areas of some of their team members. This is why conferences such as EAMT [...]]]></description>
			<content:encoded><![CDATA[<p>Elia Yuste, Pangeanic´s BDM, attended the <a href="http://www.eamt2010.org/">14th Annual Conference of the European Association for Machine Translation (EAMT)</a> held in Saint-Raphaël, France, on 27th-28th May 2010.</p>
<p>Pangeanic relies heavily on the previous and current international research and expertise in Machine Translation and related areas of some of their team members. This is why conferences such as EAMT are no new to us at all. We are now happy to devote as much effort as possible to keep abreast of and benefit from what the MT research arena has to offer to companies like us. It is worth pointing out that EAMT 2010 was an event catering for academics, researchers and corporate practitioners alike. Although PangeaMT was represented but not formally discussed in any presentation, conference attendants, especially those coming from the industry and some universities, with whom we collaborate, were fully aware of our advancements in the field.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/06/pangeamt-bdm-attends-eamt-2010/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Pangeanic CEO to speak in Localization World 2010</title>
		<link>http://www.pangea.com.mt/2010/06/pangeanic-ceo-speak-localization-world-2010/</link>
		<comments>http://www.pangea.com.mt/2010/06/pangeanic-ceo-speak-localization-world-2010/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 07:59:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=506</guid>
		<description><![CDATA[Mr M Herranz, Pangeanic&#8217;s CEO, will take part in the MT in the Real World discussion panel within the Localization World 2010 conference in Berlin on Wed, 9th June. As highlighted by its title, this session focuses on authentic MT implementations and practices, moving away from mere sales talk to comprobable facts and informative experiences. From [...]]]></description>
			<content:encoded><![CDATA[<p>Mr <a href="http://www.localizationworld.com/lwber2010/speakers.php#mHerranz">M Herranz</a>, Pangeanic&#8217;s CEO, will take part in the <a href="http://www.localizationworld.com/lwber2010/programDescription.php#C7">MT in the Real World</a> discussion panel within the <a href="http://www.localizationworld.com/">Localization World</a> 2010 conference in Berlin on Wed, 9th June. As highlighted by its title, this session focuses on authentic MT implementations and practices, moving away from mere sales talk to comprobable facts and informative experiences. From our own standpoint, Manuel will stress what it takes to make PangeaMT system implementation sucessful. Discussion across the panelist team, made up of MT buyers and providers, as well as with the audience, is meant to be highly interactive, providing a practical insight on current projects and results and outlining ongoing work and future challenges.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/06/pangeanic-ceo-speak-localization-world-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic to visit associated companies in Japan and China</title>
		<link>http://www.pangea.com.mt/2010/05/pangeanic-ceo-visit-lsp-companies-japan-china/</link>
		<comments>http://www.pangea.com.mt/2010/05/pangeanic-ceo-visit-lsp-companies-japan-china/#comments</comments>
		<pubDate>Sat, 01 May 2010 08:01:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=489</guid>
		<description><![CDATA[Following our participation in TAUS Tokyo Summit in April, Mr M Herranz, our CEO will visit B.I. Japan in Tokyo and B.I. China in Shanghai. Apart from discussing in detail ongoing joint business operations with these outstanding language service providers in Asia, with whom Pangeanic has been working in association for a number of years [...]]]></description>
			<content:encoded><![CDATA[<p>Following our participation in <a href="http://www.pangea.com.mt/2010/04/pangeamt-presented-tokyo/" target="_blank">TAUS Tokyo Summit</a> in April, Mr M Herranz, our CEO will visit B.I. Japan in Tokyo and B.I. China in Shanghai. Apart from discussing in detail ongoing joint business operations with these outstanding language service providers in Asia, with whom Pangeanic has been working in association for a number of years now, the main goal will be to explore further business avenues in connection with PangeaMT. Our translation automation solutions are already internally in use within Pangeanic for major localization accounts derived from these Asian partners, especially those ascribed to the automotive sector.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/05/pangeanic-ceo-visit-lsp-companies-japan-china/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New version of PangeaMT for German</title>
		<link>http://www.pangea.com.mt/2010/05/new-version-pangeamt-german/</link>
		<comments>http://www.pangea.com.mt/2010/05/new-version-pangeamt-german/#comments</comments>
		<pubDate>Mon, 31 May 2010 07:55:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=502</guid>
		<description><![CDATA[Pangeanic follows a constant improvement policy with regard to all PangeaMT developments. We are a forward-looking, innovation-driven company, always eager to test, adopt and apply the latest MT-related techniques, also for the languages that are being used or considered by our customers.
An example of this would be the improved version that PangeaMT has been made [...]]]></description>
			<content:encoded><![CDATA[<p>Pangeanic follows a constant improvement policy with regard to all PangeaMT developments. We are a forward-looking, innovation-driven company, always eager to test, adopt and apply the latest MT-related techniques, also for the languages that are being used or considered by our customers.</p>
<p>An example of this would be the improved version that PangeaMT has been made available today to Sybase, one long-standing client for the English-German language pair.<br />
This version makes use of special tokenization techniques and post-processing modules to reach a considerably better output.</p>
<p>The version is customization of Moses that integrates features such as<br />
* TMX generator (for TMX input and output)<br />
* TXT data handling<br />
* Inline parser to handle tags and formatting information contained in TMX directed to documentation (HTML, FrameMaker, InDesign, Word, etc)<br />
* Cygwin integration</p>
<p>If you are considering the implementation of MT in your workflow, please contact Ms <a href="mailto: e.yuste@pangeanic.com" target="_blank">Elia Yuste</a> at our Business Development Department.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/05/new-version-pangeamt-german/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PangeaMT to be presented in Tokyo</title>
		<link>http://www.pangea.com.mt/2010/04/pangeamt-presented-tokyo/</link>
		<comments>http://www.pangea.com.mt/2010/04/pangeamt-presented-tokyo/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 02:06:11 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangea.com.mt/?p=285</guid>
		<description><![CDATA[PangeaMT will be introduced in Japan as part of TAUS Tokyo Summit from 14th-16th April, 2010.
The Tokyo summit will be the first of its kind in the country and it will have a strong focus on Use Cases and MT practical applications to localization workflows for Japanese industries or the Japanese language.
PangeaMT will feature as [...]]]></description>
			<content:encoded><![CDATA[<p>PangeaMT will be introduced in Japan as part of <a href="http://translationautomation.com/events/forums/taus-executive-forum-localization-business-innovation-focus-on-asia.html" target="_self">TAUS Tokyo Summit</a> from 14th-16th April, 2010.</p>
<p>The Tokyo summit will be the first of its kind in the country and it will have a strong focus on Use Cases and MT practical applications to localization workflows for Japanese industries or the Japanese language.</p>
<p>PangeaMT will feature as a leader in open standards implementation, with a strong focus in compatibility via its TMX and XLIFF workflows.</p>
<p>If you would like to speak to a representative of PangeaMT, please <a href="mailto:eyuste@pangea.com.mt">email us</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/04/pangeamt-presented-tokyo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FEDER funds award to develop Statistical Machine Translation</title>
		<link>http://www.pangea.com.mt/2010/01/feder-funds-award-to-develop-statistical-machine-translation/</link>
		<comments>http://www.pangea.com.mt/2010/01/feder-funds-award-to-develop-statistical-machine-translation/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 10:13:28 +0000</pubDate>
		<dc:creator>pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com.mt/?p=135</guid>
		<description><![CDATA[Pangeanic has been awarded EU funds under the FEDER programme and Valencia’s local government IMPIVA in order to develop English-Spanish Statistical Machine Translation prototypes.
The award number is IMIDTA/2009/741.  For Pangeanic, this marks the beginning of a series of developments into other European languages and different combinations to service both industry and institutions.
The award corroborates the company’s [...]]]></description>
			<content:encoded><![CDATA[<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">Pangeanic has been awarded EU funds under the FEDER programme and Valencia’s local government IMPIVA in order to develop English-Spanish Statistical Machine Translation prototypes.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">The award number is IMIDTA/2009/741.  For Pangeanic, this marks the beginning of a series of developments into other European languages and different combinations to service both industry and institutions.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">The award corroborates the company’s long-term drive to implement, develop and offer customized translation automation solutions that accelerate and cut multilingual translation costs.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;"><img src="http://www.pangeanic.com/wp-content/uploads/2010/01/Feder.png" alt="Feder" width="50" height="35" /> <img src="http://www.pangeanic.com/wp-content/uploads/2010/01/inpiva.jpg" alt="inpiva" width="98" height="35" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2010/01/feder-funds-award-to-develop-statistical-machine-translation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic only Spanish LSP to be mentioned in EU report</title>
		<link>http://www.pangea.com.mt/2009/12/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/</link>
		<comments>http://www.pangea.com.mt/2009/12/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 10:11:14 +0000</pubDate>
		<dc:creator>pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com.mt/?p=133</guid>
		<description><![CDATA[Pangeanic has been mentioned as one of very few LSPs that are embracing technology and leading the way in deployment and tuning of open-source machine translation (MT) solutions to particular needs in the recent EU report “Studies on translation and multlinguism –  The size of the language industry in the EU”, pg 83.
The strategy of investing in people [...]]]></description>
			<content:encoded><![CDATA[<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">Pangeanic has been mentioned as one of very few LSPs that are embracing technology and leading the way in deployment and tuning of open-source machine translation (MT) solutions to particular needs in the recent EU report <a style="color: #000000; text-decoration: none; font-weight: bolder;" href="http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf" target="_blank">“Studies on translation and multlinguism –  The size of the language industry in the EU”</a>, pg 83.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">The strategy of investing in people who can bring new skills and customize fit-for-purpose solutions to particular machine-translation applications is highlighted in the report, which places importance in access to data via initiatives like <a style="color: #000000; text-decoration: none; font-weight: bolder;" href="http://www.pangeanic.com/2009/12/17/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/www.tausdata.org" target="_blank">TDA</a>.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">Whilst large sets of data do not automatically translate in perfect statistical machine translation engines, the selection and customization of TM as well as other data is one of the key points in developments designed to accelerate language transfer, bringing time and cost savings to companies.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">The study also reports on large LSPs like SDL and Lionbridge and the possible MT ”lock” strategies behind their marketing.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;"><a style="color: #000000; text-decoration: none; font-weight: bolder;" href="http://www.pangeanic.com/2009/12/17/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/Pangeanic.com.MT" target="_blank">Pangeanic.com.MT</a> is the statistical machine translation division, with a mission to build and adapt SMT solutions that will work effectively in particular applications and domains, with client data and customized sets.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;">The report can be downloaded from the link below.</p>
<p style="font: normal normal normal 90%/175% Arial, Helvetica, sans-serif;"><span style="font-family: 'MS Shell Dlg'; font-size: 12px;"><a style="color: #000000; text-decoration: none; font-weight: bolder;" href="http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf" target="_blank">http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2009/12/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PangeaMT with TDA tests provide up to 50% more</title>
		<link>http://www.pangea.com.mt/2009/10/pangeamt-with-tda-tests-provide-up-to-50-more/</link>
		<comments>http://www.pangea.com.mt/2009/10/pangeamt-with-tda-tests-provide-up-to-50-more/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 10:03:35 +0000</pubDate>
		<dc:creator>pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com.mt/?p=131</guid>
		<description><![CDATA[Valencia, 1 October 2009.
Pangeanic conducted a series of tests with PangeaMT for specific language domains by combining its own statistical data with data obtained from TAUS&#8217;sTDA during late September. The aim of the test was to prove that increased amounts of trustable, regular data from TDA would help Pangeanic&#8217;s own technologies to improve output percentage quality, [...]]]></description>
			<content:encoded><![CDATA[<p align="left">Valencia, 1 October 2009.</p>
<p align="left">Pangeanic conducted a series of tests with PangeaMT<a style="width: 12px; height: 12px; background-image: url(http://www.pangeanic.com/wp-includes/js/tinymce/themes/advanced/skins/default/img/items.gif); background-attachment: initial; background-origin: initial; background-clip: initial; background-color: initial; background-position: initial initial; background-repeat: no-repeat no-repeat; border: 0px initial initial;" name="PangeanicMT"></a> for specific language domains by combining its own statistical data with data obtained from <a href="http://www.translationautomation.com/">TAUS</a>&#8217;s<a href="http://www.tausdata.org/">TDA </a>during late September. The aim of the test was to prove that increased amounts of trustable, regular data from TDA would help Pangeanic&#8217;s own technologies to improve output percentage quality, and to open up new domain developments.</p>
<p align="left"><span style="font-family: Arial, sans-serif; color: #ff6633;"><span style="font-size: medium;"><span style="text-decoration: underline;"><strong style="font-weight: bold;">Background</strong></span></span></span></p>
<p align="left">&gt;Version 1 was a development concerned mainly with technical/engineering, electronics and automotive industries for general, user-manuals and scientific journal publication. Version 2 (PangeaMT) builds on that experience and adds several new areas: Software (SOF), Consumer and Professional Electronics + Computer Hardware (ECH), Marketing-Business-Economics (MBE), Legal-Pro (LEG), Healthcare-Pharma-Life Sciences (HEALTH).</p>
<p align="left"><a style="width: 12px; height: 12px; background-image: url(http://www.pangeanic.com/wp-includes/js/tinymce/themes/advanced/skins/default/img/items.gif); background-attachment: initial; background-origin: initial; background-clip: initial; background-color: initial; background-position: initial initial; background-repeat: no-repeat no-repeat; border: 0px initial initial;" name="pangeanicMT"></a>PangeaMT is based on a Moses engine enhanced with an applied set of heuristics according to each language in question. The translation process is fully TMX-based. The concept is to have SMT acting as a plug-in to existing systems, not as an alternative solution or technology. It also integrates a parser that can interpret code/tags in the TMX and place it in the resulting translated segment. Post-editing can take place in any environment, thus resulting in an application-agnostic SMT plug-in.</p>
<p align="left">
<p align="left"><span style="font-family: Arial, sans-serif; color: #ff6633;"><span style="font-size: medium;"><span style="text-decoration: underline;"><strong style="font-weight: bold;">Data</strong></span></span></span></p>
<p align="left">Three domains were selected for the test in the English-Spanish language pair (no distinction as to Lat.Am/EU), with the following number of files:</p>
<ul>
<li>ECH (Electronics-Computer Hardware): 800 tmx</li>
<li>MBE (Marketing-Business-Economics): 76 tmx</li>
<li>SOF (Software): 80 tmx</li>
</ul>
<p align="left">Data sets were selected according to the following criteria.</p>
<p align="left">a) Language Model to follow</p>
<p align="left">b) TDA data availability</p>
<p align="left">c) Subject field</p>
<p align="left">
<p align="left"><span style="text-decoration: underline;"><strong style="font-weight: bold;">ELECTRONICS – COMPUTER HARDWARE</strong></span></p>
<p align="left">The aim was to improve on existing engines (Electronics). To this end, TDA data from Intel and Dell in Spanish was added to existing sets coming from Sony. Not all data available from TDA from particular donors was used as fit for the customized training. Some was discarded for a variety of reasons. Client-specific terminology was applied to original donor&#8217;s data sets for terminology standardization purposes. Pangeanic contributed with small sets of self-generated data. The result was a medium size 3,9M word engine specifically designed for the field of application and with the client&#8217;s terminology applied through donor&#8217;s TMX files in order to ease post-editing.</p>
<p align="left">
<p align="left">The data set for electronics was:</p>
<p align="left"><img style="border: 0px initial initial;" title="2009-09__m21a22d76" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__m21a22d76.gif" alt="2009-09__m21a22d76" width="364" height="126" /></p>
<p align="left">
<p align="left"><strong style="font-weight: bold;">SOFTWARE</strong></p>
<p align="left">The aim of this development was to build a fresh engine with TDA data only in the subject field of a potential client to offer a solution which would show enough ROI for our SMT as a plug-in. To this end, we selected TDA data from several software donors in a subject field related to the product lines. We did not include Microsoft data initially as the size of the TM would have created a bias towards Microsoft terminology. However, engine enhancement is not discarded in future or more general releases. Again, not all data available from TDA from particular donors was used in the customized training. Some data was discarded and Pangeanic contributed with small sets of self-generated data.</p>
<p align="left">The data set for software was:</p>
<p align="left"><img style="border: 0px initial initial;" title="2009-09__m40c55628" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__m40c55628.gif" alt="2009-09__m40c55628" width="364" height="162" /></p>
<p align="left">
<p align="left"><strong style="font-weight: bold;">MARKETING-BUSINESS-ECONOMICS</strong></p>
<p align="left">The aim of this development was to build a first test-bench engine serving as a business case within an uncontrolled, general field that has usually been “a work of literature” and out of the scope of traditional MT systems (particularly Rule-Based MT). Marketing and Economics are above natural speech and can be elaborate, complex texts and sometimes flowery or metaphorical. Again, the aim is to offer a solution which would show enough ROI for our SMT as a plug-in. The client did not provide enough training data and TDA did not offer enough bulk related material for this purpose. In this case, to show some results was more essential than to finalize a large engine.</p>
<p align="left">The data set for marketing-business-economics was:</p>
<p align="left"><img style="border: 0px initial initial;" title="2009-09__3cf5d5a2" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__3cf5d5a2.gif" alt="2009-09__3cf5d5a2" width="364" height="126" /></p>
<p align="left">
<p align="left">
<p align="left"><span style="text-decoration: underline;"><strong style="font-weight: bold;">Process</strong></span></p>
<p align="left">The tables below describe the processes followed in the training. We can see that sentence length increases from domain to domain, that 2,000 representative segments (just over 20,000 words in all three cases) were not incorporated in the training so they could be used in the tests (BLEU/Meteor scores). Some sentences happened to be common (identical) to the training (18, 12, 2 respectively) mostly because of the nature of the source files (user manuals, software strings/commands in some cases which contain certain repetitions).</p>
<p align="left">Perplexity is a measure that gives us an idea of the complexity of the task and how similar the test is to the training.??The higher the perplexity, the higher the difficulty.</p>
<p align="left"><img style="border: 0px initial initial;" title="2009-09__mb136cc4" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__mb136cc4.gif" alt="2009-09__mb136cc4" width="449" height="227" /></p>
<p align="left"><img style="border: 0px initial initial;" title="2009-09__7327d93a" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__7327d93a.gif" alt="2009-09__7327d93a" width="449" height="244" /></p>
<p align="left">
<p align="left">
<p align="left"><img style="border: 0px initial initial;" title="2009-09__mcfe648c" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__mcfe648c.gif" alt="2009-09__mcfe648c" width="449" height="244" /></p>
<p align="left"><span style="text-decoration: underline;"><strong style="font-weight: bold;">Results</strong></span></p>
<p align="left">Model training + optimization: Moses+MERT</p>
<p align="left">Language models: 5-grams</p>
<p align="left"># TMX files for each category</p>
<p align="left">ECH: 800</p>
<p align="left">MEB: 76</p>
<p align="left">SOF: 80</p>
<p align="left">Translation results English-&gt;Spanish</p>
<p align="left">BLEU: ECH: 49.98</p>
<p align="left">MEB: 24.39</p>
<p align="left">SOF: 47.78</p>
<p align="left">Meteor 0.8.3</p>
<p align="left">ECH: 0.4312</p>
<p align="left">MEB: 0.2610</p>
<p align="left">SOF: 0.4377</p>
<p align="left">The best scoring domain is Electronics-Computer Hardware, with almost 50% scoring in BLEU and 43 in METEOR.</p>
<p align="left">Results in Software are also very high (47,78% and 43,7% respectively).</p>
<p align="left">This is a new domain for our development and we have used TDA data almost exclusively.</p>
<p align="left">Marketing-Business-Economics lags behind with around 25% in both. Specific, “imaginative” marketing TMs weigh a lot here, and there is less content from TDA. Marketing literature is, by definition, not necessarily as accurate as the other two fields, which are fairly controlled languages. The engine was a first step, a test development still to be enhanced with further data.</p>
<p align="left">Nevertheless, the results surpass our expectations. A 50% BLEU-Meteor scoring can translate in large increases in language production. Even the 25%, as an initial result for marketing leaves a lot of room for improvement once even more data is available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangea.com.mt/2009/10/pangeamt-with-tda-tests-provide-up-to-50-more/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
