Searching in Wikipedia for SL and TL simultaneously
Thread poster: 2nl (X)
2nl (X)
2nl (X)  Identity Verified
Netherlands
Local time: 11:25
Apr 24, 2015

Searching in Wikipedia for SL and TL simultaneously

Following my posting about searching in Google's images for your source language (SL: en) and target language (TL: nl) simultaneously, here is another interesting idea (based on info from Meta and J-C):

It should be possible to create a web search resource for CafeTran Espresso 2015 that searches for a given word in your source language.

E.g. search for 'Computer-assisted translation' in the English Wikipe
... See more
Searching in Wikipedia for SL and TL simultaneously

Following my posting about searching in Google's images for your source language (SL: en) and target language (TL: nl) simultaneously, here is another interesting idea (based on info from Meta and J-C):

It should be possible to create a web search resource for CafeTran Espresso 2015 that searches for a given word in your source language.

E.g. search for 'Computer-assisted translation' in the English Wikipedia and get:

http://SL.wikipedia.org/wiki/Computer-assisted_translation

CafeTran should parse this page and look for a link that starts with href="//TL.wikipedia.org/wiki/ and then get the word between the last "/" and the "Less Than" character to get to:

http://nl.wikipedia.org/wiki/Computerondersteund_vertalen.

Then CafeTran should display both pages side by side in the tabbed pane.


[Edited at 2015-04-24 13:17 GMT]
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 11:25
English to Hungarian
+ ...
Wikipedia glossary Apr 24, 2015

You're not the first one to come up with that idea... a couple of people have mined wikipedia interlanguage links for glossary data before. Wikipedia's old dumps contained this info, so it was possible to extract this data for offline use. The most conveniently usable new dumps don't seem to contain the interlanguage links, but some third party dumps/databases do. I compiled an EN-FR-DE-NL-HU glos... See more
You're not the first one to come up with that idea... a couple of people have mined wikipedia interlanguage links for glossary data before. Wikipedia's old dumps contained this info, so it was possible to extract this data for offline use. The most conveniently usable new dumps don't seem to contain the interlanguage links, but some third party dumps/databases do. I compiled an EN-FR-DE-NL-HU glossary from it: http://farkastranslations.com/glossaries.php (free download).
Of course an online lookup gets you the most up-to-date info, but it's slower.
Collapse


 
Meta Arkadia
Meta Arkadia
Local time: 16:25
English to Indonesian
+ ...
Completely ambitionless in this matter Apr 26, 2015

FarkasAndras wrote:
You're not the first one to come up with that idea... a couple of people have mined wikipedia interlanguage links for glossary data before.

I'm not interested in mining the Wikipedia for glossary data, and I'm most certainly not striving to be the first to come up with an idea.

I'm aware of the shortcomings of the Wikipedia, but the interlanguage links do offer benefits other encyclopaedias don't. All I want is to be able to search for a term or a name (of a company, for example) in my source language that will show its Wiki entry and the article in my target language. I then want to read both entries to see what I can use in my translation or my writings.

For example: If the word "platypus" occurs in my text, I want a shortcut that shows me both http://en.wikipedia.org/wiki/Platypus and http://nl.wikipedia.org/wiki/Vogelbekdier, so I can read all about it in both languages.

I wrote an Automator Service to achieve that, but it doesn't work yet. Here it is. It does show the English site, of course, but not the Dutch version. The trouble is, that in my Action, I can search for http://nl.wikipedia.org/wiki/Vogelbekdier in the page source of the website, but not for http://nl.wikipedia.org/wiki/b\w+/g which is necessary (I think) for showing other Wiki entries. Yesterday, I tried the regex way, today I'll concentrate on awk and sed, and then there's curl and pbcopy (never used before). By the time I will succeed, I'll probably have wasted decennia worth of time writing the Service instead of simply clicking "Nederlands" in the English wiki page.

By the way, I don't think it is possible to integrate this in the CafeTran UI. And I don't care, as long as clicking the search term in CafeTran to look it up in the Wiki in two languages works.

Cheers,

Hans

[Edited at 2015-04-26 01:44 GMT]


 
2nl (X)
2nl (X)  Identity Verified
Netherlands
Local time: 11:25
TOPIC STARTER
Seeing the full context Apr 26, 2015

FarkasAndras wrote:

I compiled an EN-FR-DE-NL-HU glossary from it: http://farkastranslations.com/glossaries.php (free download).
Of course an online lookup gets you the most up-to-date info, but it's slower.


I was aware of your generous offer . However, I think that seeing both terms in full context does have its advantages too. Now we'll have to wait until Igor implements dual searches (and in the case of the wiki add some parsing). Since he's a full-time developer now, I guess that this waiting won't take long .


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 10:25
Member (2009)
Dutch to English
+ ...
@Hans (pretending to be Meta): Apr 26, 2015

Meta Arkadia wrote:

FarkasAndras wrote:
You're not the first one to come up with that idea... a couple of people have mined wikipedia interlanguage links for glossary data before.

I'm not interested in mining the Wikipedia for glossary data, and I'm most certainly not striving to be the first to come up with an idea.

I'm aware of the shortcomings of the Wikipedia, but the interlanguage links do offer benefits other encyclopaedias don't. All I want is to be able to search for a term or a name (of a company, for example) in my source language that will show its Wiki entry and the article in my target language. I then want to read both entries to see what I can use in my translation or my writings.

For example: If the word "platypus" occurs in my text, I want a shortcut that shows me both http://en.wikipedia.org/wiki/Platypus and http://nl.wikipedia.org/wiki/Vogelbekdier, so I can read all about it in both languages.

I wrote an Automator Service to achieve that, but it doesn't work yet. Here it is. It does show the English site, of course, but not the Dutch version. The trouble is, that in my Action, I can search for http://nl.wikipedia.org/wiki/Vogelbekdier in the page source of the website, but not for http://nl.wikipedia.org/wiki/b\w+/g which is necessary (I think) for showing other Wiki entries. Yesterday, I tried the regex way, today I'll concentrate on awk and sed, and then there's curl and pbcopy (never used before). By the time I will succeed, I'll probably have wasted decennia worth of time writing the Service instead of simply clicking "Nederlands" in the English wiki page.

By the way, I don't think it is possible to integrate this in the CafeTran UI. And I don't care, as long as clicking the search term in CafeTran to look it up in the Wiki in two languages works.

Cheers,

Hans

[Edited at 2015-04-26 01:44 GMT]


It shouldn’t be much more complicated than, e.g. (note this is a Dutch to English example):

write a script (AHK if on Windows) that opens this page:

http://nl.wikipedia.org/wiki/Test-Aankoop when searching for the word "Test-Aankoop"

and then clicks on the "English" hyperlink on this page and opens the two pages in new browser windows side by side on your screen. of course, there might be a few other instances of the word "English" on the Dutch page, but most of the time there probably won't.

or use: hreflang="en">English in the src code


 
Meta Arkadia
Meta Arkadia
Local time: 16:25
English to Indonesian
+ ...
It ain't that easy Apr 26, 2015

Michael Beijer wrote:
or use: hreflang="en">English in the src code

That's the only way, but as far as I can see, you would need a regex for that.

I cannot click the language link in my script, since it's not a button or anything else scriptably clickable.

I cannot copy the link (I think) and make it pop up, because that would always trigger the same TL site to show up time and again (in your example http://en.wikipedia.org/wiki/Test-Achats) . So I'll need a regex. I'm almost there. But easy, it is not. Please tell me (and show me) that I'm wrong, so I don't have to enter my third aspirin day.

Cheers,

Hans (posting under his daughter's name and subscription, as usual, and signing with "Hans," as usual)


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Searching in Wikipedia for SL and TL simultaneously






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »