CharacterDB:Community portal

From CharacterDB

Jump to:navigation, search

This is the right place to ask questions, get help or raise a new topic.

Go ahead and add a new topic.

Contents

[edit] Initial "Beta" phase

Currently there are some open issues (see Todo) that need to be solved. Also I'd like input and further suggestions what can be done and how the workflow of entering data can be improved. Only then an import will incorporate all glyph data currently shipped with cjklib. --Cburgmer 20:47, 25 April 2010 (UTC)

I am now working on a MediaWiki extension specificly for this Wiki with the source code now being hosted under http://code.google.com/p/cjklib/source/browse/trunk/characterdb. --Cburgmer 19:07, 28 April 2010 (UTC)

[edit] 500-Entry Limit in Queries

Hello, I am trying to download a list of all the stroke order and decomposition information but it appears that the queries are hard coded to limit output to 500 entries. Is there a way to export all the information to a csv file?

Many thanks. --Dgivenslxm 05:19, 8 April 2011 (UTC)

Hi, this is MediaWiki's way of exporting things, you'll only see 500 at a time. There's an easy solution to that though:
Does this help? --cburgmer 07:54, 8 April 2011 (UTC)
Many thanks. I have installed Python and have had some success in getting stroke order data in the IDE. The script says that the script sends the output to stdout. I have found out that stdout is the standard output device but do not know how to set it to send output to a specific file (e.g., "C:\StrokeOrderData.csv"). The script generates data in 100-entry batches up to an offset of 700 and then stops.
If you could advise me on how to set the output file and how to have the script keep going through all the entries, I would greatly appreciate it. I am a complete newbie re Python and greatly appreciate you help.
Many thanks. --Dgivenslxm
I believe you are running the script under Windows, so I hope my advice is as valid there as it is here under Ubuntu.
You can normally redirect the standard output to a file by using the "greater than" character, for me this works:
$ python export.py strokeorder_all > StrokeOrder.csv
I have already used the new "set name", I just updated a few minutes ago. The script was actually just downloading the "minimal character set". To know what that means you have to understand how the wiki (and cjklib, the programming library behind that) uses stroke order data: For a character without a stroke order its decomposition into smaller characters is used to automatically derive the stroke order from these smaller ones. So what you have seen before is this smaller set of ~700 entries. I've added the full set (do re-download the script from the address given), and I'm still downloading stroke order entries, as I write, I'm currently at 9900 entries.
I've also updated the documentation here: CharacterDB:FAQ#I_want_to_use_the_data._How_can_I_get_it.3F --cburgmer 21:32, 11 April 2011 (UTC)

[edit] Strokeorder field and autogeneration of strokeorders

A huge number of glyphs that don't have custom stroke-order data, don't show the autogenerated stroke-order on the characters page. For example . However, it is visible on 念/0. The downloadscript to gather all data to csv-files doesn't contain stroke-order data for such characters. I found out that editing+saving the glyph will update the character's page. Performing ASK queries on the glyphs doesn't show the strokeorder for 念/0 either, so I think that not only the characterpage is outdated, but also the internal data.

--Boukeversteegh 14:50, 16 May 2011 (UTC)

I could see the same behaviour for and 埝/0. I normally call the URL with "?action=purge" appended to regenerate the data but it didn't help here. Only an empty page edit (click on edit and then on save without actually doing any changes) did trigger the re-evaluation. Now the character pages correctly lists the glyph's data.
A good way to actually see what's stored in the semantic database is to view the Special:Browse page, e.g. Special:Browse/埝-2F0 for 埝/0. In your example above you can clearly see that the information is missing.
I am not exactly sure why this isn't picked up propery, but an issue here could be that a glyph's data depends on data from other glyphs which again have dependencies. If the dependencies aren't propagated correctly from "smaller" glyphs up, then only partial data is seen. I'll have to look into this and probably need to refresh all data. However the huge amount of pages makes this a very slow task.
If you need a quick & correct answer in turns of data you can feed any data to [1] which is a Python implementation on top of the stroke and decomposition data.
Thanks by the way for the examples on decomposition structures. You changed the template at the very end so I had to purge the other pages for the ask queries to pick this up. --cburgmer 23:14, 16 May 2011 (UTC)

[edit] Happy New Year 2012!

Good evening!  

Happy New Year! Health, luck and love!

Navigation
Toolbox