Simian version runs under any Java2 1.4 or higher Java Virtual Machine (JVM) and any Dot Net 1.1 or higher environment, meaning Simian can be run on anything from windows, macOS and linux to zOS.
The distribution contains everything you need to be up and running in minutes:
Aslak Hellesoy has kindly donated a Maven plugin.
Neil Bartlett has kindly donated an Eclipse plugin.
Simian fully supports the following languges:
with partial support for the following languages:
If the file is not of a supported type, it is treated as plain text. This means that you can usually run Simian on just about any type of human-readable file with good results.
Ignores whitespace, curly braces, comments, imports, includes, package declarations, etc.
Supports the following processing options:
Option | Languages | Default | Possible values | Description |
---|---|---|---|---|
formatter | all | none | plain, xml, emacs, vs (visual studio), yaml | Specifies the format in which processing results will be produced. |
threshold | all | 6 | integer >= 2 | Matches will contain at least the specified number of lines. |
language | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes all files are in the specified language |
defaultLanguage | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes files are in the specified language if none can be inferred |
failOnDuplication | all | true | boolean | Causes the checker to fail the current process if duplication is detected |
reportDuplicateText | all | false | boolean | Prints the duplicate text in reports |
ignoreBlocks | all | none | string | Ignores all lines between specified START/END markers |
ignoreCurlyBraces | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Curly braces are ignored. |
ignoreIdentifiers | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | Completely ignores all identfiers. |
ignoreIdentifierCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | Matches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match. |
ignoreRegions | C# | false | boolean | Ignore lines between #region/#endregion. |
ignoreStrings | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | MyVariable and myvariablewould both match. |
ignoreStringCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | true | boolean | "Hello, World" and "HELLO, WORLD" would both match. |
ignoreNumbers | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | int x = 1; and int x = 576; would both match. |
ignoreCharacters | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | 'A' and 'Z'would both match. |
ignoreCharacterCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | 'A' and 'a'would both match. |
ignoreLiterals | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | 'A', "one" and 27.8would all match. |
ignoreSubtypeNames | Java, C, Groovy | false | boolean | BufferedReader, StringReader and Reader would all match. |
ignoreModifiers | Java, C#, C, C++, JavaScript, Groovy | true | boolean | public, protected, static, etc. |
ignoreVariableNames | Java, C, Groovy | false | boolean | Completely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match |
balanceParentheses | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one. |
balanceCurlyBraces | Ruby | false | boolean | Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one. |
balanceSquareBrackets | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false. |
Recognises the following file extensions/language options:
Language | Extensions |
---|---|
java | java |
c sharp | cs, c#, csharp |
c | c, h, m |
cpp | cpp, c++, hpp, cplusplus |
ruby | rb, ruby |
cobol | cobol |
abap | abap |
xml | xml, xsl, xsd |
jsp | jsp |
asp | asp |
javascript | js, javascript |
html | html, htm |
vb | vb, bas, cls, frm |
lisp | lisp, lsp |
groovy | groovy |
text | this is the default when no appropriate language can be determined |
Here is an example of the standard output produced by Simian (version 2.2.23) when run against the JDK 1.5.0_13 source code:
Similarity Analyser 2.2.23 - http://www.harukizaemon.com/simian Copyright (c) 2003-11 Simon Harris. All rights reserved. Simian is not free unless used solely for non-commercial or evaluation purposes. {failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6} Found 6 duplicate lines in the following files: Between lines 201 and 207 in simian/build/dist/src/java/awt/image/WritableRaster.java Between lines 1305 and 1311 in simian/build/dist/src/java/awt/image/Raster.java Found 6 duplicate lines in the following files: Between lines 920 and 926 in simian/build/dist/src/com/sun/imageio/plugins/jpeg/JFIFMarkerSegment.java Between lines 908 and 914 in simian/build/dist/src/com/sun/imageio/plugins/jpeg/JFIFMarkerSegment.java Found 6 duplicate lines in the following files: Between lines 553 and 558 in simian/build/dist/src/java/net/URLStreamHandler.java Between lines 1262 and 1267 in simian/build/dist/src/java/net/URL.java Between lines 1245 and 1250 in simian/build/dist/src/java/net/URL.java Between lines 656 and 661 in simian/build/dist/src/java/net/URL.java Found 6 duplicate lines in the following files: Between lines 509 and 514 in simian/build/dist/src/java/util/concurrent/ConcurrentHashMap.java Between lines 413 and 418 in simian/build/dist/src/java/util/concurrent/ConcurrentHashMap.java ... Found 167 duplicate lines in the following files: Between lines 7172 and 7579 in simian/build/dist/src/javax/swing/JTable.java Between lines 1016 and 1273 in simian/build/dist/src/javax/swing/table/JTableHeader.java Found 199 duplicate lines in the following files: Between lines 6380 and 6854 in simian/build/dist/src/javax/swing/JTable.java Between lines 7181 and 7655 in simian/build/dist/src/javax/swing/JTable.java Found 216 duplicate lines in the following files: Between lines 48 and 451 in simian/build/dist/src/org/omg/CosNaming/_NamingContextStub.java Between lines 203 and 606 in simian/build/dist/src/org/omg/CosNaming/_NamingContextExtStub.java Found 232 duplicate lines in the following files: Between lines 22 and 343 in simian/build/dist/src/com/sun/corba/se/PortableActivationIDL/_ServerManagerStub.java Between lines 17 and 338 in simian/build/dist/src/com/sun/corba/se/PortableActivationIDL/_ActivatorStub.java Found 66375 duplicate lines in 5949 blocks in 1260 files Processed a total of 390309 significant (1196065 raw) lines in 4242 files Processing time: 9.490sec
To see the full results* for the JDK 1.5.0_13 source code, download the compressed file.
* Results may vary depending on factors such as hardware used, number of duplicate lines, etc.
Java and all Java-based marks are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States and other countries.
.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and
other countries.
Copyright (c) 2003-2011 Simon Harris. All rights reserved.